Skip to content

Cluster controller first iteration#1688

Draft
DmytroPI-dev wants to merge 4 commits intofeature/database-controllersfrom
feature/database-controllers-cluster-controller
Draft

Cluster controller first iteration#1688
DmytroPI-dev wants to merge 4 commits intofeature/database-controllersfrom
feature/database-controllers-cluster-controller

Conversation

@DmytroPI-dev
Copy link

Description

Cluster controller for CNPG first iteration

Key Changes

Drafted API and controller for Cluster controller

Testing and Verification

Manual tests only

Related Issues

JIRA: CPI-1883

PR Checklist

  • Code changes adhere to the project's coding standards.
  • Relevant unit and integration tests are included.
  • Documentation has been updated accordingly.
  • All tests pass locally.
  • The PR description follows the project's guidelines.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request

// Storage overrides the storage size from ClusterClass.
// Example: "5Gi"
// +optional
// +kubebuilder:validation:XValidation:rule="self == null || oldSelf == null || quantity(self).compareTo(quantity(oldSelf)) >= 0",message="storage size can only be increased"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't comparing it to the value inherited from ClusterClass, is there any mechanism that would also prohibit the following scenerio? Would CMPG block this behaviour?

  1. Create the cluster without implicitly specifying the Storage (it's inherited from the ClusterClass) - e.g. 20Gi
  2. Cluster created
  3. Putting 10 Gi here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage is not downgradable, but good point, will check, did I understand you correctly, @limak9182 , that you are describing a situation when user created a default cluster, and then updated spec?

@@ -0,0 +1,111 @@
/*
Copyright 2021.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2021?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will change.

// Storage overrides the storage size from ClusterClass.
// Example: "5Gi"
// +optional
// +kubebuilder:validation:XValidation:rule="self == null || oldSelf == null || quantity(self).compareTo(quantity(oldSelf)) >= 0",message="storage size can only be increased"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this rule correctly, self=null allows situation when user adds the attribute with value and then remove it. IN that case we would rely on class provided value which might be lower. IMO we should not allow such behaviour and once set attribute should not be removed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it also checks, if the quantity is not lower then set previously, so will check but should be fine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont this is the way it works. You put || between rules so it means that either of this rule succeed the whole validation pass i.e:

  • self refers to the current/new value being validated
  • oldSelf refers to the previous value (when updating an existing resource)
  • self == null || oldSelf == null - If either value is null, the validation passes (this allows creation and
    deletion)
  • quantity(self).compareTo(quantity(oldSelf)) >= 0 - If both values exist, convert them to Kubernetes
    quantities and compare them. The new value must be greater than or equal to the old value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added separate validation methods to check values inherited from ClusterClass, as, IMO, using CEL I can check only against old/new values from Cluster, not ClusterClass

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update the logic, you are correct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And will update for PG version as well

}

func init() {
SchemeBuilder.Register(&Cluster{}, &ClusterList{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially named the CRD's ClusterClass, Cluster and Database but when I look at this right now I think we can do better. Maybe we can rename ClusterClass to DatabaseClusterClass, Cluster to DatabaseCluster and leave Database as is?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss it, I have some ideas as well.


// CNPG config status, for reference when getting cluster status.
// +optional
CNPGConfigStatus *CNPGConfig `json:"cnpgConfigStatus,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shouldnt return in our Cluster status Specific to CNPG directly - the aim of this abstraction is to allow lower level pg operator swap so the Status ( and input) should be generic

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but we'd probably need to have some more details when describing our cluster, i.e, image version, storage size, etc, which are available in clusters.postgresql.cnpg.io, that's why I put it here (not working yet)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have most of this info when you describe your Cluster CR as you should see merged params ( from class and cluster CR itself). Here we should keep the status not static config

@@ -0,0 +1,9 @@
apiVersion: enterprise.splunk.com/v4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is examples should we also create class postgresql-dev? Otherwise this example failes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you've created it already, in enterprise_v4_clusterclass_dev.yaml

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but examples should be probably atomic?

Name: resourceName,
Namespace: "default",
},
// TODO(user): Specify other spec details if needed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you plan to fills this comments?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from a boilerplate, I didn't write any tests yet.

Copy link
Collaborator

@mploski mploski Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok nvm, lets leave it for now

if statusErr := r.updateClusterStatus(ctx, cluster, cnpgCluster, err); statusErr != nil {
logger.Error(statusErr, "Failed to update Cluster status after ensure up to date error")
}
return ctrl.Result{}, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return combined err if also statusErr is present?

}
return ctrl.Result{}, err
}
if updated {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this variable. Maybe just err returned is enough then if err == nil we return Cluster is up to date. you can add addiitonal logs to ensureClusterUpToDate if you want and this will be enough

resultConfig.PostgresVersion = clusterClassDefaulfConfig.PostgresVersion
}
if resultConfig.Resources == nil {
resultConfig.Resources = clusterClassDefaulfConfig.Resources
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if clusterClassDefaulfConfig.Resources is also nil? We dont specify default anywhere and we cast this value in line https://github.com/splunk/splunk-operator/pull/1688/changes#diff-63aa8a2ed732cb58bc15b9cb714fe27e1f4ba134541b9b0219867d8e9f2b2326R218 to int.

resultConfig.Storage = clusterClassDefaulfConfig.Storage
}
if len(resultConfig.PostgreSQLConfig) == 0 {
resultConfig.PostgreSQLConfig = clusterClassDefaulfConfig.PostgreSQLConfig
Copy link
Collaborator

@mploski mploski Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil: We should consider initializing clusterClassDefaultConfig.PostgreSQLConfig when it is nil. While nil is currently accepted by CNPG, this becomes risky if we later decide to modify the value directly in the controller, for example:

cnpgCluster.Spec.PostgresConfiguration.Parameters["max_connections"] = "100"

This would cause a panic, and the attribute would also be omitted from kubectl describe output of our CR. Initializing it to an empty map ({}) would be safer.

Alternatively, we can set a default at the class level so the parameter is always initialized to an empty map:

// +kubebuilder:default={}
same goes for pgHBA

) (*cnpgv1.Cluster, error) {

// Validate that required fields are present in the merged configuration before creating the CNPG Cluster.
if err := validateClusterConfig(mergedConfig, clusterClass); err != nil {
Copy link
Collaborator

@mploski mploski Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? why not enforce this on openAPI level i.e put a comment that the attribute is mandatory? It feels a bit redundant + with every change we will need to update this function

if err := validateClusterConfig(mergedConfig, clusterClass); err != nil {
return false, err
}
// Validate that storage size is not decreased
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already checked by CEL

if err := validateStorageSize(mergedConfig.Storage, cnpgCluster.Spec.StorageConfiguration.Size); err != nil {
return false, err
}
// Validate that PostgresVersion is not decreased
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have that requirement? maybe lets also put it as a part of CEL validation

Resources: resources,
}

if !equality.Semantic.DeepEqual(cnpgCluster.Spec, desiredState) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we simply check if previous merged cluster config == current merged config? We can just fetch the current cluster state and compare with mergedConfig. If there is any difference then initiate clusterSpec. In that case we dont even need separate function for a new cluster (defineNewCNPGCluster) vs updating cluster as at the end the process is exactly the same

}

// validateClusterConfig checks that all required fields are present in the merged configuration.
func validateClusterConfig(mergedConfig *enterprisev4.ClusterSpec, clusterClass *enterprisev4.ClusterClass) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in one of the comment I feel like validate functions are redundant here - we can use k8s mechanism to validate input and reject CR apply even before we trigger reconciliation loop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants