Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(applicationset): ApplicationSets with rolling sync stuck in Pending #20230

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Fsero
Copy link
Contributor

@Fsero Fsero commented Oct 4, 2024

In issue #19535 we have discovered a bug where ApplicationSet Progressive Sync feature gets stuck and it never recovers and get stuck forever unless you manually delete the existing applicationsStatus of the ApplicationSet affected or trigger a manual sync.

When the ApplicationSet is performing a progressive sync, the apps in the step being synced get the status Pending in the ApplicationSet applicationStatus . This means that the apps are gonna be synced and is waiting for the sync to start progressing.

Let's set an example, applicationset generates 3 applications. In the initial moment applicationset points to commit A

applicationset will generate those 3 applications and start the progressive sync, then application 2 is in Pending status, the applicationset status for application 2 is marked to targetrevision for app2 to A

At this moment in time applicationset gets updated to point to commit B, since app2 is in pending state the progressive sync allows it to the app to be synced and hence the app2 is synced to commit B

since applicationset targetrevision for app2 expects to be A but it's B will never move app2 from 'Pending' to 'Progressing' state.

Here is the logic that performs this check.

This new check was introduced in ArgoCD 2.12 causing this bug when a progressive sync is already being performed.

  • Because first it will get the applicationStatus from the existing applicationStatus which is the one that has the old revision
  • And since the app inside the ApplicationSet applicationStatus is in "Pending" the revision is never updated when it enters the if statement (see how currentAppStatus.TargetRevision never will be updated)

This means that the ApplicationSet will always think that the app is not being synced to the latest version and never progress, but in reality the app is actually in a later version but tha ApplicationSet never updated it in the apps that are in "Pending".

  • This PR fixes this bug changing the logic that checks when an applications needs to be moved from Pending to Progressing, instead of rely on the targetrevision we actually rely just in the application being synced to move it. This also don't introduce a prior bug where it was cheched that the application was synced in a certain moment in time to ensure it was triggered by the applicationset controller.

  • Note that if someone manually sync one application of the applicationset while it's being progresively synced after merging this PR the applciationset controller will continue the rollout

  • Ensure that a certain revision is applied orderly in all applications generated from the applicationset it's certainly possible that a given application can be synced in a different revision than the one explicitly set in the appset

Fixes: #19535

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

@Fsero Fsero requested a review from a team as a code owner October 4, 2024 13:15
Copy link

bunnyshell bot commented Oct 4, 2024

🔴 Preview Environment stopped on Bunnyshell

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

  • 🔵 /bns:start to start the environment
  • 🚀 /bns:deploy to redeploy the environment
  • /bns:delete to remove the environment

Copy link

bunnyshell bot commented Oct 4, 2024

✅ Preview Environment created on Bunnyshell but will not be auto-deployed

See: Environment Details

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

@crenshaw-dev
Copy link
Member

@wmgroot thoughts on this?

Copy link

codecov bot commented Oct 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.12%. Comparing base (e2bc96b) to head (3b62869).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #20230      +/-   ##
==========================================
- Coverage   55.17%   55.12%   -0.05%     
==========================================
  Files         324      324              
  Lines       55257    55249       -8     
==========================================
- Hits        30489    30457      -32     
- Misses      22142    22173      +31     
+ Partials     2626     2619       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Fsero
Copy link
Contributor Author

Fsero commented Oct 21, 2024

Came here to report that we are running this in production since 2 weeks ago and it fixed our issues syncing applicationsets when the conditions explained happened

@Fsero
Copy link
Contributor Author

Fsero commented Oct 21, 2024

anyone else than @wmgroot can take a look?

@Fsero Fsero force-pushed the appset-deadlock branch 2 times, most recently from 54012b3 to 74abca1 Compare October 24, 2024 13:47
In issue argoproj#19535 we have discovered a bug where ApplicationSet Progressive Sync feature gets stuck and it never recovers and get stuck forever unless you manually delete the existing `applicationsStatus` of the ApplicationSet affected or trigger a manual sync.

When the ApplicationSet is performing a progressive sync, the apps in the step being synced get the status `Pending` in the ApplicationSet `applicationStatus `. This means that the apps are gonna be synced and is waiting for the sync to start progressing.

Let's set an example, applicationset generates 3 applications. In the initial moment applicationset points to commit A

applicationset will generate those 3 applications and start the progressive sync, then application 2 is in Pending status, the applicationset status for application 2 is marked to targetrevision for app2 to A

At this moment in time applicationset gets updated to point to commit B, since app2 is in pending state the progressive sync allows it to the app to be synced and hence the app2 is synced to commit B

since applicationset targetrevision for app2 expects to be A but it's B will never move app2 from 'Pending' to 'Progressing' state.

[Here](https://github.com/argoproj/argo-cd/blob/master/applicationset/controllers/applicationset_controller.go#L1078) is the logic that performs this check.

This new check was introduced in ArgoCD 2.12 causing this bug when a progressive sync is already being performed.

- Because first it will get the `applicationStatus` from the existing applicationStatus which is the one that has the [old revision](https://github.com/argoproj/argo-cd/blob/master/applicationset/controllers/applicationset_controller.go#L1045)
- And since the app inside the ApplicationSet `applicationStatus` is in "Pending" the revision is never updated when it enters the [if statement](https://github.com/argoproj/argo-cd/blob/master/applicationset/controllers/applicationset_controller.go#L1069-L1092) (see how currentAppStatus.TargetRevision never will be updated)

This means that the ApplicationSet will always think that the app is not being synced to the latest version and never progress, but in reality the app is actually in a later version but tha ApplicationSet never updated it in the apps that are in "Pending".

- This PR fixes this bug changing the logic that checks when an applications needs to be moved from Pending to Progressing, instead of rely on the targetrevision we actually rely just in the application being synced to move it. This also don't introduce a prior bug where it was cheched that the application was synced in a certain moment in time to ensure it was triggered by the applicationset controller.

- Note that if someone manually sync one application of the applicationset while it's being progresively synced after merging this PR the applciationset controller will continue the rollout

- Ensure that a certain revision is applied orderly in all applications generated from the applicationset it's certainly possible that a given application can be synced in a different revision than the one explicitly set in the appset

Fixes: argoproj#19535

<!--
Note on DCO:

If the DCO action in the integration test fails, one or more of your commits are not signed off. Please click on the *Details* link next to the DCO action for instructions on how to resolve this.
-->

Checklist:

* [X] Either (a) I've created an [enhancement proposal](https://github.com/argoproj/argo-cd/issues/new/choose) and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
* [X] The title of the PR states what changed and the related issues number (used for the release note).
* [X] The title of the PR conforms to the [Toolchain Guide](https://argo-cd.readthedocs.io/en/latest/developer-guide/toolchain-guide/#title-of-the-pr)
* [X] I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
* [ ] I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
* [ ] Does this PR require documentation updates?
* [ ] I've updated documentation as required by this PR.
* [X] I have signed off all my commits as required by [DCO](https://github.com/argoproj/argoproj/blob/master/community/CONTRIBUTING.md#legal)
* [X] I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
* [X] My build is green ([troubleshooting builds](https://argo-cd.readthedocs.io/en/latest/developer-guide/ci/)).
* [ ] My new feature complies with the [feature status](https://github.com/argoproj/argoproj/blob/master/community/feature-status.md) guidelines.
* [ ] I have added a brief description of why this PR is necessary and/or what this PR solves.
* [ ] Optional. My organization is added to USERS.md.
* [ ] Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

<!-- Please see [Contribution FAQs](https://argo-cd.readthedocs.io/en/latest/developer-guide/faq/) if you have questions about your pull-request. -->

Co-authored-by: Thibault Jamet <tjamet@users.noreply.github.com>
Co-authored-by: Carlos Rejano <carlosrejanoromeu@gmail.com>
Signed-off-by: Fabián Sellés <fabian.selles@adevinta.com>
@crenshaw-dev
Copy link
Member

I've chatted w/ some folks at Red Hat. They're interested in moving Progressive Syncs from alpha to beta, so there's a chance they'll be able to put some time into reviews of progressive syncs bugs soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ApplicationSet rolling sync stuck in ArgoCD 2.12
2 participants