Skip to content

Conversation

@codyrancher
Copy link
Member

@codyrancher codyrancher commented Oct 16, 2025

Summary

Fixes #14201

Note: I can provide a figma and an instance with the backend implementation.

Occurred changes and/or fixed issues

This adds the following abilities:

  • Turn autoscaling on/off
  • Set the min and max machine counts
  • Pause and resume scaling
  • Display status information in two locations
  • Display event information in an autoscaler tab within the management cluster detail page

Technical notes summary

  • There were a couple bits of refactoring to better reuse components and concepts

Areas or cases that should be tested

  • Presence of the cluster-autoscaling feature flag
  • Management cluster list page
    • The autoscaler column is present
    • The various statuses of the autoscaler when it's provisioning, enabled on edit and when scaling
  • The status popover is present on the management cluster detail page
  • The autoscaler tab events table
  • Pause and resume actions on the status popover and action menu

Notes:

  • For scaling up you can create a deployment which looks like:
kind: Deployment
metadata:
  name: scaler
  namespace: default
  resourceVersion: '3101337'
  uid: cb4b1164-05d0-4748-8d20-12ca9fa6ab4e
spec:
  progressDeadlineSeconds: 600
  replicas: 500
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      workload.user.cattle.io/workloadselector: apps.deployment-default-scaler
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        workload.user.cattle.io/workloadselector: apps.deployment-default-scaler
      namespace: default
    spec:
      affinity: {}
      containers:
        - image: nginx:latest
          imagePullPolicy: Always
          name: container-0
          resources:
            limits:
              memory: 2000Mi
            requests:
              memory: 1000Mi
          securityContext:
            allowPrivilegeEscalation: false
            privileged: false
            readOnlyRootFilesystem: false
            runAsNonRoot: false
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

The memory and replica portions are the important bits

  • For scaling down you can reduce the replicas or delete the deployment. There's a timeout which takes iirc 5 mins before scaling down occurs.

Areas which could experience regressions

  • Explorer button on the management cluster list page
  • The namespace/project popover on detail pages

Screenshot/Video

autoscaling-featureflag autoscaler-edit autoscaler-popover-list autoscaler-popover-detail autoscaler-tab image

Checklist

  • The PR is linked to an issue and the linked issue has a Milestone, or no issue is needed
  • The PR has a Milestone
  • The PR template has been filled out
  • The PR has been self reviewed
  • The PR has a reviewer assigned
  • The PR has automated tests or clear instructions for manual tests and the linked issue has appropriate QA labels, or tests are not needed
  • The PR has reviewed with UX and tested in light and dark mode, or there are no UX changes
  • The PR has been reviewed in terms of Accessibility
  • The PR has considered, and if applicable tested with, the three Global Roles Admin, Standard User and User Base

@codyrancher codyrancher force-pushed the autoscaler branch 3 times, most recently from 4453845 to 23d9f68 Compare October 17, 2025 15:24
@rancher-ui-project-bot rancher-ui-project-bot bot added this to the v2.13.0 milestone Oct 17, 2025
@codyrancher codyrancher marked this pull request as ready for review October 17, 2025 15:51
@codyrancher codyrancher changed the title Autoscaler Initial Autoscaler Implementation Oct 17, 2025
@codyrancher codyrancher force-pushed the autoscaler branch 7 times, most recently from 0dd86b1 to c7a1e38 Compare October 19, 2025 22:24
Comment on lines +1085 to +1089
async loadAutoscalerConfigMap() {
const url = `/k8s/clusters/${ this.mgmtClusterId }/v1/${ CONFIG_MAP }/${ AUTOSCALER_CONFIG_MAP_ID }`;

return await this.$dispatch('cluster/request', { url }, { root: true });
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this instead of using the conventional cluster/find because we need to fetch the configmap from within a management product which means the cluster or explorer has not been loaded.

Loading the cluster seemed far too hackish but it also means that this configMap will not be cached or receive live updates wherever it's used. In those cases we use polling instead.

Copy link
Member

@mantis-toboggan-md mantis-toboggan-md Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long term I think we do need to be able to use loadCluster or something similar in the cluster management context, but that's a can of worms. I think this is a good approach for this feature. loadCluster would be loading quite a bit of extra data each time a user hovers over a new row of the autoscaler column.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. Ideally we should make the stores lightweight enough to be able to query resources in any context.

const params = stevePaginationUtils.createParamsForPagination({ schema: eventSchema, opt: { pagination } });
const url = `/k8s/clusters/${ this.mgmtClusterId }/v1/${ EVENT }?${ params }`;

const events = (await this.$dispatch('cluster/request', { url }, { root: true }))?.data || [];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See loadAutoscalerConfigMap comments for details on why we're using cluster/request.

I will also note that I tested this with ui-sql-cache both enabled and disabled. I thought it had to be enabled for the filter fields to work but that doesn't appear to be the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I factored this out into PopoverCard.vue so that it could be reused for the Autoscaler formatter.

Copy link
Member

@mantis-toboggan-md mantis-toboggan-md left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a cluster with autoscaling is first provisioned, the autoscaling popover shows it's already been scaled up and down. Not sure we can address that from the UI side given what I've seen of the autoscaler api -- looks like the UI is just reporting what the configmap says in this case -- but figured I'd call it out anyway.
Screenshot 2025-10-21 at 2 49 58 PM

I'm not sure these buttons make sense at all with autoscaler enabled. The progress bar might need some adjustment too. Since we're disabling the field they correspond to in the cluster creation page, maybe they should be removed from this view entirely?

Screenshot 2025-10-21 at 3 02 45 PM

There are console errors when looking at the autoscaler popover or cluster detail page if the current user doesn't have access to the autoscaler configmap, but I know the plan around access to that configmap is still uncertain.

@codyrancher
Copy link
Member Author

codyrancher commented Oct 22, 2025

When a cluster with autoscaling is first provisioned, the autoscaling popover shows it's already been scaled up and down. Not sure we can address that from the UI side given what I've seen of the autoscaler api -- looks like the UI is just reporting what the configmap says in this case -- but figured I'd call it out anyway. Screenshot 2025-10-21 at 2 49 58 PM

I'm not sure these buttons make sense at all with autoscaler enabled. The progress bar might need some adjustment too. Since we're disabling the field they correspond to in the cluster creation page, maybe they should be removed from this view entirely?
Screenshot 2025-10-21 at 3 02 45 PM

There are console errors when looking at the autoscaler popover or cluster detail page if the current user doesn't have access to the autoscaler configmap, but I know the plan around access to that configmap is still uncertain.

Good call outs.

  • I'm going to leave the scale up/scale down in the popover but I'll bring it up with the backend.
  • I'm going to remove the scaling options when autoscaler is enabled.
  • I'm still waiting on the backend to respond before I change how we handle access.
    • Edit: I decided I'm going to restrict access to be consistent with the state of the backend and I'll relax the restrictions if the backend supports it.

Copy link
Member

@mantis-toboggan-md mantis-toboggan-md left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I just had one question about validation. I was able to create a one-pool cluster with

count:1 
min:1
max:0

The cluster provisioned fine, with one machine in its one pool, but the autoscaler doesn't seem to initialize. The autoscaler configmap reports that its "initializing" and the autoscaler pod is full of errors about an invalid max count annotation.

It sounds like max count needs a bit more validation. I noticed we're relying on backend validation for this feature (seems fine given that all of our machine configs are validated once the user clicks create anyway) -- should I file a r/r issue about this?

@codyrancher
Copy link
Member Author

I'll attempt to do front end validation and if there's problems I'll open the r/r issue.

@codyrancher
Copy link
Member Author

I'm going to put the validation in a separate pr which includes another change requested by the backend.

@codyrancher codyrancher merged commit 93bd6bc into rancher:master Oct 24, 2025
138 of 147 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cluster Autoscaler Support - Initial Release

2 participants