Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use actual deployment replica count instead of env variable for accuracy #20041

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nueavv
Copy link
Member

@nueavv nueavv commented Sep 21, 2024

fixes #19928

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

@nueavv nueavv requested a review from a team as a code owner September 21, 2024 03:56
Copy link

bunnyshell bot commented Sep 21, 2024

🔴 Preview Environment stopped on Bunnyshell

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

  • 🔵 /bns:start to start the environment
  • 🚀 /bns:deploy to redeploy the environment
  • /bns:delete to remove the environment

Copy link

bunnyshell bot commented Sep 21, 2024

✅ Preview Environment created on Bunnyshell but will not be auto-deployed

See: Environment Details

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

@nueavv nueavv marked this pull request as draft September 21, 2024 03:56
@nueavv nueavv changed the title Use actual deployment replica count instead of env variable for accuracy feat: Use actual deployment replica count instead of env variable for accuracy Sep 21, 2024
@nueavv nueavv force-pushed the issue-19928 branch 2 times, most recently from 85f0376 to 53cabce Compare September 21, 2024 11:30
return nil, fmt.Errorf("(dynamic cluster distribution) failed to get app controller deployment: %w", err)
}
applicationControllerName := env.StringFromEnv(common.EnvAppControllerName, common.DefaultApplicationControllerName)
appControllerDeployment, err := kubeClient.AppsV1().Deployments(settingsMgr.GetNamespace()).Get(context.Background(), applicationControllerName, metav1.GetOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The controller can be a Deployment or a StatefulSet. I am not sure if enableDynamicClusterDistribution must be true when using a deployment. If it is not possible, I don't think we should perform 2 k8s api calls for a warning log to help detect a misconfiguration, considering that 1 call will always fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My issue was that I attempted sharding with enableDynamicClusterDistribution set to false, but the actual Deployment replica count was smaller than the sharding configuration. This caused a problem, and it was difficult to debug since no logs were recorded. Since the ArgoCD application controller typically runs as a Deployment, I believe it would be more accurate to retrieve the actual replica count using the Kubernetes API, even when enableDynamicClusterDistribution is set to false. This approach could help prevent discrepancies between the sharding configuration and the actual replica count, making debugging easier. Would this be a suitable implementation? I would appreciate any feedback or suggestions on this matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encountered an issue when attempting sharding with enableDynamicClusterDistribution set to false. The root cause was my incorrect assumption that the ArgoCD application controller always runs as a Deployment, but I overlooked the fact that it can also operate as a StatefulSet.

In this scenario, even with enableDynamicClusterDistribution disabled, an error occurred due to a mismatch between the actual replica count and the sharding configuration. To prevent such discrepancies and make debugging easier, I have revised the code to account for both StatefulSet and Deployment replica counts. This change should help avoid potential issues arising from misconfigurations.
Let me know what you think

@nueavv nueavv force-pushed the issue-19928 branch 2 times, most recently from bb33596 to 8b8db9f Compare October 1, 2024 06:51
@nueavv nueavv marked this pull request as ready for review October 7, 2024 11:58
controller/sharding/sharding.go Show resolved Hide resolved
appControllerStatefulSet, err := kubeClient.AppsV1().StatefulSets(namespace).Get(context.Background(), applicationControllerName, metav1.GetOptions{})
if err != nil {
replicasCount = 1
log.Warnf("Failed to retrieve StatefulSet '%s'. Defaulting replicasCount to 1.", applicationControllerName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it won't work with the code below, e.g. if the number of replicas is > 1 and the call fails.

controller/sharding/sharding.go Show resolved Hide resolved
@agaudreault agaudreault marked this pull request as draft November 11, 2024 23:26
Comment on lines +482 to +484
if err != nil {
replicasCount = 1
log.Warnf("Failed to retrieve StatefulSet '%s'. Defaulting replicasCount to 1.", applicationControllerName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the issue, if the call fails, we should default to the environment variable. We cannot assume that the call will always work, and defaulting to 1 is not accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add warning log if ARGOCD_CONTROLLER_REPLICAS greater than configured replicas
3 participants