Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retriable and non-retriable Pod failures for Jobs #3329

Open
8 tasks done
alculquicondor opened this issue Jun 1, 2022 · 86 comments
Open
8 tasks done

Retriable and non-retriable Pod failures for Jobs #3329

alculquicondor opened this issue Jun 1, 2022 · 86 comments
Assignees
Labels
sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. stage/beta Denotes an issue tracking an enhancement targeted for Beta status wg/batch Categorizes an issue or PR as relevant to WG Batch.

Comments

@alculquicondor
Copy link
Member

alculquicondor commented Jun 1, 2022

Enhancement Description

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 1, 2022
@alculquicondor
Copy link
Member Author

/sig apps
/wg batch

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. wg/batch Categorizes an issue or PR as relevant to WG Batch. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 1, 2022
@alculquicondor
Copy link
Member Author

/assign

@mimowo
Copy link
Contributor

mimowo commented Jun 2, 2022

/assign

@alculquicondor alculquicondor changed the title Retriable and non-retriable Job pod failures Retriable and non-retriable Pod failures for Jobs Jun 6, 2022
@alculquicondor
Copy link
Member Author

/sig scheduling
/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Jun 9, 2022
@Priyankasaggu11929
Copy link
Member

Hello @alculquicondor 👋, 1.25 Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00 PT on Thursday June 23, 2022, which is just over 2 days from now.

For note, This enhancement is targeting for stage alpha for 1.25 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • KEP file using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable
  • KEP has a updated detailed test plan section filled out
  • KEP has up to date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements.

The open PR #3374 is addressing all the listed criteria above. We would just require getting it merged by the Enhancements Freeze.

For note, the status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

@Priyankasaggu11929 Priyankasaggu11929 added the tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team label Jun 22, 2022
@Priyankasaggu11929 Priyankasaggu11929 added this to the v1.25 milestone Jun 22, 2022
@Priyankasaggu11929
Copy link
Member

With KEP PR #3374 merged, the enhancement is ready for the 1.25 Enhancements Freeze.

For note, the status is now marked as tracked. Thank you so much!

@kcmartin
Copy link

Hello @alculquicondor 👋, 1.25 Release Docs Lead here.
This enhancement is marked as ‘Needs Docs’ for 1.25 release.

Please follow the steps detailed in the documentation to open a PR against dev-1.25 branch in the k/website repo. This PR can be just a placeholder at this time, and must be created by August 4.
 Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.

Thank you!

@Atharva-Shinde
Copy link
Contributor

Atharva-Shinde commented Jul 25, 2022

Hi @alculquicondor @mimowo, Enhancements team here again 👋

Checking in as we approach Code Freeze at 01:00 UTC on Wednesday, 3rd August 2022.

Please ensure that the following items are completed before the code-freeze:

Let me know if there are any additional k/k PRs besides the ones listed above

Currently, the status of the enhancement is marked as at-risk

Thanks :)

@mimowo
Copy link
Contributor

mimowo commented Jul 29, 2022

@Atharva-Shinde @alculquicondor there is one more PR that should be included before the code freeze: kubernetes/kubernetes#111475

@Atharva-Shinde
Copy link
Contributor

thank you @mimowo,
I have updated my comment with the PR and have also tagged you for future reference :)

@johnbelamaric
Copy link
Member

Is this planning to go to beta in 1.28? It's on the board for PRR but I don't see an open KEP PR linked here?

@mimowo
Copy link
Contributor

mimowo commented Jun 14, 2023

@johnbelamaric the KEP is already in Beta.

There are two updates for the third iteration of Beta:

  1. https://github.com/kubernetes/enhancements/pull/3965/files (just reflecting a bugfix that is already merged, and status update)
  2. https://github.com/kubernetes/enhancements/pull/3940/files/8eb45d267c0885159d53ed9355d5b7995eefdd39#diff-01c599fe2c09bcb8104563c0fd5162bd711cc04814d049357c6dda0ac7bf2a2a (and update currently coupled with the KEP-3939) to delay pod recreation until we have analyzed with pod failure policy the failed pod.

Do you think (2.) should be decoupled as a dedicated PR for this KEP -3329 or can stay as is?

@mimowo
Copy link
Contributor

mimowo commented Jun 14, 2023

Do you think (2.) should be decoupled as a dedicated PR for this KEP -3329 or can stay as is?

Also asking @wojtek-t who is PRR reviewer of the other KEP (3939).

@mimowo
Copy link
Contributor

mimowo commented Jun 14, 2023

The other KEP is close to merge so seems safe to keep the update coupled there.

@alculquicondor
Copy link
Member Author

I discussed this with @deads2k and he was ok having a single PR for changes in both KEPs, because changes in 3329 are minimal and highly related to the other KEP

@johnbelamaric
Copy link
Member

Sure that's fine

@wojtek-t
Copy link
Member

Sure that's fine

I reviewed that integration part from the PRR POV carefully too.

@npolshakova
Copy link

Hi @mimowo and @alculquicondor, just checking in again before the enhancement freeze coming up on 01:00 UTC Friday, 16th June 2023. This enhancement is marked as at risk, but it looks like the only thing left is for #3940 to go in. Let me know if I've missed anything. Thanks!

@alculquicondor
Copy link
Member Author

#3940 just merged 😄

@Atharva-Shinde
Copy link
Contributor

With all the requirements fulfilled this enhancement is now marked as tracked for the upcoming enhancements freeze🚀

@AdminTurnedDevOps
Copy link

Hi @alculquicondor

1.28 Docs Shadow here.

Does this enhancement work planned for 1.28 require any new docs or modification to existing docs?

If so, please follows the steps here to open a PR against dev-1.28 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday 20th July 2023.

Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.

Thank you!

@npolshakova
Copy link

npolshakova commented Jul 11, 2023

Hey again @alculquicondor 👋

Just checking in as we approach Code freeze at 01:00 UTC Friday, 19th July 2023 .

Here’s the enhancement’s state for the upcoming code freeze:

For this enhancement, it looks like the following code related PR/s are open and they need to be merged or should be in merge-ready state before the code freeze commences :

Also please let me know if there are other PRs in k/k we should be tracking for this KEP.
As always, we are here to help if any questions come up. Thanks!

@mimowo
Copy link
Contributor

mimowo commented Jul 12, 2023

We will also have an improvement to this KEP as part of: kubernetes/kubernetes#117015 in 1.28.

@Rishit-dagli
Copy link
Member

Hey @alculquicondor , could you please create a docs PR even if it is a draft PR with no content yet against dev-1.28 branch in the k/website repo. The deadline to create this draft PR is Thursday 20th July 2023.

@alculquicondor
Copy link
Member Author

Thanks @Rishit-dagli, that's gonna be kubernetes/website#41745 (yes, the enhancements are tightly coupled).

I added the necessary PRs to the issue description above.

@AdminTurnedDevOps
Copy link

AdminTurnedDevOps commented Jul 25, 2023

removed.

@npolshakova
Copy link

/remove-label lead-opted-in

@k8s-ci-robot k8s-ci-robot removed the lead-opted-in Denotes that an issue has been opted in to a release label Aug 27, 2023
@mimowo
Copy link
Contributor

mimowo commented Oct 20, 2023

@alculquicondor please update the list of code changes for Beta with this PR: kubernetes/kubernetes#121103.

@salehsedghpour
Copy link
Contributor

Hello 👋 1.30 Enhancements Lead here,

I'm closing milestone 1.28 now,
If you wish to progress this enhancement in v1.30, please follow the instructions here to opt in the enhancement and make sure the lead-opted-in label is set so it can get added to the tracking board and finally add /milestone v1.30. Thanks!

/milestone clear

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2024
@mimowo
Copy link
Contributor

mimowo commented Apr 25, 2024

/remove-lifecycle stale
The plan is to explore (and potentially execute) graduation to stable in 1.31.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. stage/beta Denotes an issue tracking an enhancement targeted for Beta status wg/batch Categorizes an issue or PR as relevant to WG Batch.
Projects
Status: Graduating
Status: Tracked
Status: Tracked
Status: Needs Triage
Status: In Progress
Development

No branches or pull requests