Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backoff Limit Per Index For Indexed Jobs #3850

Open
8 tasks done
jensentanlo opened this issue Feb 7, 2023 · 31 comments
Open
8 tasks done

Backoff Limit Per Index For Indexed Jobs #3850

jensentanlo opened this issue Feb 7, 2023 · 31 comments
Assignees
Labels
sig/apps Categorizes an issue or PR as relevant to SIG Apps. stage/beta Denotes an issue tracking an enhancement targeted for Beta status wg/batch Categorizes an issue or PR as relevant to WG Batch.

Comments

@jensentanlo
Copy link
Contributor

jensentanlo commented Feb 7, 2023

Enhancement Description

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Feb 7, 2023
@jensentanlo
Copy link
Contributor Author

/sig apps
/wg batch

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. wg/batch Categorizes an issue or PR as relevant to WG Batch. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 7, 2023
@jensentanlo jensentanlo changed the title Backoff Limit Per Job Backoff Limit Per Index For Indexed Jobs Feb 7, 2023
@alculquicondor
Copy link
Member

/assign @mimowo

@alculquicondor
Copy link
Member

In addition to configuring the backoff per index, we should probably have FailIndex as one of the actions for pod failure policies.

@soltysh
Copy link
Contributor

soltysh commented May 30, 2023

/milestone v1.28
/stage alpha
/label lead-opted-in

@k8s-ci-robot k8s-ci-robot added the stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status label May 30, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone May 30, 2023
@k8s-ci-robot k8s-ci-robot added the lead-opted-in Denotes that an issue has been opted in to a release label May 30, 2023
@aramase
Copy link
Member

aramase commented Jun 14, 2023

Hello @mimowo 👋, Enhancements team here.

Just checking in as we approach enhancements freeze on 01:00 UTC Friday, 16th June 2023.

This enhancement is targeting for stage alpha for 1.28 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • KEP readme using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable for latest-milestone: 1.28
  • KEP readme has a updated detailed test plan section filled out
  • KEP readme has up to date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements.

The status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

@mimowo
Copy link
Contributor

mimowo commented Jun 14, 2023

@aramase I think the first point is addressed as the KEP has been merged: #3967.

@mimowo
Copy link
Contributor

mimowo commented Jun 15, 2023

@aramase is there anything missing to make it tracked?

@Atharva-Shinde
Copy link
Contributor

Hey @mimowo
With all the KEP requirements in place and merged into k/enhancements, this enhancement is all good for the upcoming enhancements freeze. 🚀

The status of this enhancement is marked as tracked. Please keep the issue description up-to-date with appropriate stages as well. Thank you :)

@Rishit-dagli
Copy link
Member

Hello @mimowo 👋, 1.28 Docs Lead here.

Does this enhancement work planned for 1.28 require any new docs or modification to existing docs?

If so, please follows the steps here to open a PR against dev-1.28 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday 20th July 2023.

Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.

Thank you!

@aramase
Copy link
Member

aramase commented Jul 17, 2023

Hey again @mimowo 👋

Just checking in as we approach Code freeze at 01:00 UTC Friday, 19th July 2023 .

Here’s the enhancement’s state for the upcoming code freeze:

  • All the PRs that are related to your enhancement are linked in the above issue description (for tracking purposes). This includes code, tests, and documentation related PR/s.
  • All code related PR/s are merged or are in merge-ready state ( i.e they have approved and lgtm labels applied) by the code freeze deadline. This includes any tests related PR/s too.

I see kubernetes/kubernetes#118009 PR in the issue description. If there are any other k/k related PR(s) that we should be tracking for this KEP please link them in the issue description above.

As always, we are here to help if any questions come up. Thanks!

@Atharva-Shinde
Copy link
Contributor

Hey @mimowo 👋 Enhancements Lead here,
With kubernetes/kubernetes#118009 and
kubernetes/kubernetes#119294 merged as per the issue description, this enhancement is now tracked for v1.28 Code Freeze!

@npolshakova
Copy link

/remove-label lead-opted-in

@k8s-ci-robot k8s-ci-robot removed the lead-opted-in Denotes that an issue has been opted in to a release label Aug 27, 2023
@katcosgrove
Copy link

Hey there @mimowo and @soltysh 👋, v1.29 Docs Lead here.
Does this enhancement work planned for v1.29 require any new docs or modification to existing docs?
If so, please follows the steps here to open a PR against dev-1.29 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday, 19 October 2023.
Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.
Thank you!

@James-Quigley
Copy link

Hi @jensentanlo 👋 from the v1.29 Communications Release Team! We would like to check if you have any plans to publish blogs for this KEP regarding new features, removals, and deprecations for this release.
If so, you need to open a PR placeholder in the website repository.
The deadline will be on Tuesday 14th November 2023 (after the Docs deadline PR ready for review)
Here's the 1.29 Calendar

@sanchita-07
Copy link
Member

Hey again @mimowo 👋, 1.29 Enhancements team here.

Just checking in as we approach code freeze at 01:00 UTC Wednesday 1st November 2023:

Here's where this enhancement currently stands:

  • All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • All PR/s are ready to be merged (they have approved and lgtm labels applied) by the code freeze deadline. This includes tests.

For this enhancement, it looks like the following PR was merged before code freeze:

Also, please let me know if there are other PRs in k/k we should be tracking for this KEP.

With the merged and linked in the issue description, this KEP is tracked for code freeze for v1.29. 🚀
As always, we are here to help if any questions come up ✌. Thanks :)

@mimowo
Copy link
Contributor

mimowo commented Oct 25, 2023

Also, please let me know if there are other PRs in k/k we should be tracking for this KEP.

Yes, there are two other PRs (both added to the description now):

@sanchita-07
Copy link
Member

Thanks @mimowo for mentioning them!
Since both the PRs have lgtm and approved level and are linked in the issue description we are good to go. 😃 🚀

@jensentanlo
Copy link
Contributor Author

I performed some manual testing on this feature and saw everything working as expected, a short summary of the details are below if you're interested.


I ran on a local kind cluster (1.28) with alpha feature gate enabled, indexed jobs with completions = 1000,
mainly checking whether:

  1. All pods ran to completion or failure
  2. All failed indices are correctly recorded on the job object

Related to indexed jobs in general but not this specific feature, I was also interested in the delete behavior, because I've had trouble with bulk deletions of non-indexed jobs in the past, but it looks like everything was correctly cleaned up relatively quickly, even though I was churning through a couple indexed jobs (so thousands of pods) on my local machine.

@salehsedghpour
Copy link
Contributor

/remove-label lead-opted-in

@k8s-ci-robot k8s-ci-robot removed the lead-opted-in Denotes that an issue has been opted in to a release label Jan 6, 2024
@salehsedghpour
Copy link
Contributor

Hello 👋 1.30 Enhancements Lead here,

I'm closing milestone 1.29 now,
If you wish to progress this enhancement in v1.30, please follow the instructions here to opt in the enhancement and make sure the lead-opted-in label is set so it can get added to the tracking board and finally add /milestone v1.30. Thanks!

/milestone clear

@kannon92
Copy link
Contributor

kannon92 commented Feb 1, 2024

@mimowo if I am not mistaken, this feature should have a stage of beta.

It turns out I can update the label!

@kannon92
Copy link
Contributor

kannon92 commented Feb 1, 2024

/stage beta

@k8s-ci-robot k8s-ci-robot added stage/beta Denotes an issue tracking an enhancement targeted for Beta status and removed stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status labels Feb 1, 2024
@salehsedghpour
Copy link
Contributor

Hi @soltysh, @mimowo, and @kannon92 , Enhancements Team here! Just wondering, if you are aiming to have this Enhancement in 1.30. If yes, please follow the instructions here to opt in the enhancement and make sure the lead-opted-in label is set so it can get added to the tracking board and finally add /milestone v1.30. Thanks!

@alculquicondor
Copy link
Member

No plans to graduate in this release.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 1, 2024
@alculquicondor
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/apps Categorizes an issue or PR as relevant to SIG Apps. stage/beta Denotes an issue tracking an enhancement targeted for Beta status wg/batch Categorizes an issue or PR as relevant to WG Batch.
Projects
Status: Tracked
Status: Tracked for Code Freeze
Status: Backlog
Development

No branches or pull requests