Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KEP for cAdvisor-less, CRI-full Container and Pod Stats #2364

Merged
merged 1 commit into from May 12, 2021

Conversation

haircommander
Copy link
Contributor

Signed-off-by: Peter Hunt pehunt@redhat.com

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 29, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @haircommander!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @haircommander. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 29, 2021
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 29, 2021
@haircommander
Copy link
Contributor Author

First time doing this, not sure if I did everything right :)

/cc @bobbypage (esteemed coauthor)
/cc @mrunalp @umohnani8 @saschagrunert (cri-o folks)
/cc @derekwaynecarr @dchen1107 @sjenning (sig-node folks)
/cc @dims @mikebrow @crosbymichael (containerd folks)

@bobbypage
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 29, 2021
Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the general idea and would support the efforts :)

Summary API has two interfaces:
* [cAdvisor stats provider](https://github.com/kubernetes/kubernetes/blob/release-1.20/pkg/kubelet/stats/cadvisor_stats_provider.go)
* Calls cAdvisor directly to obtain node, pod, and container stats
* [CRI stats provider](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/stats/cri_stats_provider.go#L54)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: link to a commit or release branch to not implicitly invalidate the link in the future.

| | InodesFree | cAdvisor or N/A | cAdvisor or N/A |
| | Inodes | cAdvisor or N/A | cAdvisor or N/A |
| | InodesUsed | CRI | CRI |
| UserDefinedMetrics | All Fields | cAdvisor | N/A |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we proposing to remove them completely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that depends on how we introduce these changes. The proposition as defined is that we don't implement them in CRI API. I believe they're deemed less useful because users can register custom metrics with prometheus in a more idiomatic way--and that UserDefinedMetrics were for a prometheus-less world.

If these changes are absorbed into the Kubelet by a new stats provider (leaving the old cri stats provider in tact), then the new one would just not support user defined metrics
if these changes were absorbed into the Kubelet by adapting the old cri stats provider, we likely would have to scrape these metrics from cAdvisor.

I'm generally more in favor of the former option. I believe there's a TODO below that notes that we should decide how to handle it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the field to be cAdvisor or N/A to better reflect that intention

@haircommander haircommander changed the title sig-node: Add KEP for cAdvisor-less, CRI-full Container and Pod Stats Add KEP for cAdvisor-less, CRI-full Container and Pod Stats Jan 29, 2021
* There are some fields that are defined in SummaryAPI (e.g time in a lot). Do we add those?
* How do we tell cAdvisor to not collect container metrics?
* - Give it a bogus root?
* - Flag?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a configuration file as a third option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find evidence there is a config file for cAdvisor. @bobbypage is this an option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added "cAdvisor configuration option", which could mean config file if possible, or the config struct used in cAdvisor startup

Copy link
Member

@bobbypage bobbypage Jan 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cAdvisor doesn't support file based config currently. Adding something like that may make sense, or maybe just adding a flag if needed similar to how kubelet configures cAdvisor options currently: https://github.com/kubernetes/kubernetes/blob/5d6dc8d/pkg/kubelet/cadvisor/cadvisor_linux.go#L70-L76

@crosbymichael
Copy link

Thank you for the ping, taking a look at this.

@fuweid
Copy link

fuweid commented Feb 1, 2021

/cc

@k8s-ci-robot
Copy link
Contributor

@fuweid: GitHub didn't allow me to request PR reviews from the following users: fuweid.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: Peter Hunt <pehunt@redhat.com>
Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, ehashman, haircommander

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2021
@ehashman
Copy link
Member

I think all of Dawn's comments have been addressed so...

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 12, 2021
@k8s-ci-robot k8s-ci-robot merged commit 5bd5fa1 into kubernetes:master May 12, 2021
SIG-Windows automation moved this from Backlog (v1.22) to Done (v1.21) May 12, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone May 12, 2021
@dchen1107
Copy link
Member

A note for SIG Windows @kubernetes/sig-windows-leads

Had offline discussion with @bobbypage. There are still open questions for Windows specific fields. Since we haven't heard the feedbacks from SIG Windows, we decided to move forward, but treat them as part of our alpha investigation. Please raise your concerns and suggestion to us.

@haircommander
Copy link
Contributor Author

xref initial kubelet implementation kubernetes/kubernetes#103095

Copy link
Member

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super excited to see movement on this!

I don't really see any consideration for cgroups v2 in this KEP. The reason I say this is because several of the metrics required in this KEP were deemed optional (because unavailable in cgroups v2) in @giuseppe's cgroups v2 proposal, and I don't see the ramifications of that reflected in this KEP. That shouldn't block this work from going alpha, but I feel it's probably something that will need to be figured out for beta.

Other large metrics work like this also took the kubernetes-mixin (by far the most popular collection of recording & alerting rules and dashboards using these metrics) into consideration, in case any metrics do end up changing (I realize that the goal is that that doesn't happen, but I find it hard to believe that 100% will stay unchanged). I think it would be good to keep this in mind for this work as well.

3. Add support for the new CRI additions in supported container runtimes (CRI-O and containerd).
4. Switch Kubelet's CRI stats provider from querying container and pod level stats from cAdvisor to newly added CRI pod and container level stats
5. cAdvisor should stop collecting container and pod level stats. If any other components need container or pod level stats from cAdvisor, the CRI implementation should be queried instead.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to "Graduation criteria", this should state "cAdvisor should be updated to support no longer collecting stats that are duplicated with CRI implementation" as cAdvisor supports much larger set of container statistics than CRI is planned to.

### /metrics/cadvisor

1. Expose the metric fields provided in `/metrics/cadvisor` in an analogous Prometheus endpoint directly from the CRI implementation.
2. cAdvisor should stop collecting container and pod level stats, as well as stop broadcasting from `/metrics/cadvisor`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to "Graduation criteria", this should state "cAdvisor should be updated to support no longer collecting stats that are duplicated with CRI implementation" as cAdvisor supports much larger set of container statistics than CRI is planned to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, I've opened #3003 PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet