-
Notifications
You must be signed in to change notification settings - Fork 582
MON-4030: prometheus k8s config API #2463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hello @marioferh! Some important instructions when contributing to openshift/api: |
4b7b3a6 to
5cef22c
Compare
|
/test remaining-required Scheduling tests matching the |
5cef22c to
86df303
Compare
|
@marioferh You appear to have some conflict markers here? Can you PTAL and make sure the correct content is pushed. Let me know when you want this API reviewed |
|
yep I've mixed the commit, let me fix it |
bf0b7b7 to
64ce7b7
Compare
|
/assign |
| // prometheusK8sConfig provides configuration options for the Prometheus instance | ||
| // Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations. | ||
| // prometheusK8sConfig is optional. | ||
| // When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have this follow the existing pattern of related fields like the metricsServerConfig field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, some of the questions I have from reading this as-is are:
- What is "the Prometheus instance"?
- Where does it run? Is it run by default?
- Is
Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrationsall the things I can configure? Why might I care, as a user, to configure these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- What is "the Prometheus instance"?
Prometheus instance is the pod that contains Prometheus. - Where does it run? Is it run by default?
By default there are at least one Prometheus pod in openshift-monitoring - Is Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations all the things I can configure? Why might I care, as a user, to configure these?
How to configure Prometheus is important and depends on the user's needs.
Some of the config:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/
https://prometheus.io/docs/prometheus/latest/storage/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include helpful information, for end-users that may read this API documentation, in the GoDoc here.
All of the information you included in your response here is helpful context in understanding what changes to this configuration impacts.
Please also include some information in the description as to why users may want to configure this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| // prometheusK8sConfig is optional. | ||
| // When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time. | ||
| // +optional | ||
| PrometheusK8sConfig PrometheusK8sConfig `json:"prometheusK8sConfig,omitempty,omitzero"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the K8s in the name here? Feels a bit redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe to differentiate from prometheusOperator, coming from config map. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the reasoning, it was also a bit confusing to me when I first explored the API and the fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have concretely decided on whether or not this should still be a separate root-level field in #2481 (comment) but if we do keep it separate, this should be named more clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd agree that we don't need the k8s part.
| // additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from | ||
| // the Prometheus component. By default, no additional Alertmanager instances are configured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As someone unfamiliar with configuring Prometheus, I don't really understand what any of this means. Why would I care to configure additional alertmanager instances that receive alerts from Prometheus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AdditionalAlertmanagerConfig is used if the client has other Alertmanager and needs there the prometheus Alerts. Also to decouple alerts if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, additionalAlertmanagerConfigs is an optional field that configures Prometheus to send alerts to this set of Alertmanager instances. This is useful because it decouples alerts from what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not decouple alerts from nothing, sorry. It just sends the same alerts from Prometheus to another Alertmanager provided by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this potentially clarify what this field does for end users?:
additionalAlertmanagerConfigs is an optional field used to configure how Prometheus communicates
with Alertmanager instances when sending alerts.
We should also include information as to what happens when this field is omitted in the GoDoc. Based on my current understanding, that would likely be something like:
When omitted, Prometheus will not send alerts to any Alertmanager instances.
Is this correct? Is there a default Alertmanager instance that will always be included in the configuration for Prometheus instances?
| // the Prometheus component. By default, no additional Alertmanager instances are configured. | ||
| // +optional | ||
| // +kubebuilder:validation:MaxItems=10 | ||
| // +listType=atomic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why atomic? Using atomic makes it so you can't have multiple-owner entries here, which doesn't seem like something unreasonable.
Would this be more appropriate as a listType=map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct. Thx
| // url is the URL of the remote write endpoint. | ||
| // +required | ||
| // +kubebuilder:validation:MaxLength=2048 | ||
| URL *string `json:"url,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other constraints can we put in place to ensure the URL is valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| // +kubebuilder:validation:MaxLength=2048 | ||
| URL *string `json:"url,omitempty"` | ||
| // name is the name of the remote write configuration. | ||
| // +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for this name to not be provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, if you need to add and endpoint and you don't fill the name, data is not sent correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then this should be required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment still stands.
Additionally, what other constraints are there on the format of the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name is optional: if not specified, Prometheus will generate a unique one. There are no specific constraint on the name (Prometheus wise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Prometheus has no specific constraints, I'm just enforcing MaxLenght here for sanity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is setting "" the same as omitting the field?
Prometheus has no naming constraints here? So setting something like !@*&(^ would be a valid name? Should we allow that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Added MinLength=1 to reject empty strings (omitting is the only way to let Prometheus auto-generate). Also added pattern validation restricting to alphanumeric, hyphens, and underscores to prevent garbage like !@*&(^.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So setting something like !@*&(^ would be a valid name?
Yes. Same as "☘"
Should we allow that?
I dunno
| // remoteTimeout is the timeout for requests to the remote write endpoint. | ||
| // +optional | ||
| // +kubebuilder:validation:MaxLength=20 | ||
| RemoteTimeout *string `json:"remoteTimeout,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for this to not be provided? what are the allowed values here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| // +kubebuilder:validation:MaxLength=20 | ||
| RemoteTimeout *string `json:"remoteTimeout,omitempty"` | ||
| // writeRelabelConfigs is a list of relabeling rules to apply before sending data to the remote endpoint. | ||
| // +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for this to be omitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if is optional and you dont fill it, then no relabel is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this comment still needs to be addressed in the GoDoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added "When omitted, no relabeling is performed and all metrics are sent as-is." to the godoc.
| type RelabelConfig struct { | ||
| // sourceLabels is a list of source label names. | ||
| // +optional | ||
| // +kubebuilder:validation:MaxItems=10 | ||
| // +kubebuilder:validation:items:MaxLength=63 | ||
| // +listType=set | ||
| SourceLabels []string `json:"sourceLabels,omitempty"` | ||
| // separator is the separator used to join source label values. | ||
| // +optional | ||
| // +kubebuilder:validation:MaxLength=10 | ||
| Separator *string `json:"separator,omitempty"` | ||
| // regex is the regular expression to match against the concatenated source label values. | ||
| // +optional | ||
| // +kubebuilder:validation:MaxLength=1000 | ||
| Regex *string `json:"regex,omitempty"` | ||
| // targetLabel is the target label name. | ||
| // +optional | ||
| // +kubebuilder:validation:MaxLength=63 | ||
| TargetLabel *string `json:"targetLabel,omitempty"` | ||
| // replacement is the replacement value for the target label. | ||
| // +optional | ||
| // +kubebuilder:validation:MaxLength=255 | ||
| Replacement *string `json:"replacement,omitempty"` | ||
| // action is the action to perform. | ||
| // +optional | ||
| // +kubebuilder:validation:MaxLength=20 | ||
| Action *string `json:"action,omitempty"` | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please elaborate in the godoc more on why a user may care to configure these fields, constraints, and what it means for them to be omitted.
As-is I don't really understand what any of these fields result in when configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that explanations are pretty similar to prometheus ones:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Some comments addressed, I will continue tomorrow |
|
@marioferh: This pull request references MON-4030 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.21.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
everettraven
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another round of comments
| // prometheusK8sConfig provides configuration options for the Prometheus instance, which is the pod running Prometheus in the cluster. | ||
| // By default, at least one Prometheus pod is deployed in the `openshift-monitoring` namespace to collect and store metrics for the platform. | ||
| // | ||
| // This field allows you to customize how Prometheus is deployed and operated, including: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this configure how all Prometheus instances are deployed and operated or just the default one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, I'll push an updated PR clarifying this ;)
| // For more information on Prometheus configuration, see: | ||
| // https://prometheus.io/docs/prometheus/latest/configuration/configuration/ | ||
| // https://prometheus.io/docs/prometheus/latest/storage/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally avoid linking to documentation where possible because these docs can be updated and the references may no longer make sense for users.
I don't know that these links are necessary either. We should be appropriately documenting each field in the configuration here such that users can understand what they are configuring and why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, removed the external Prometheus documentation links. Each child field now has comprehensive documentation
| // additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from | ||
| // the Prometheus component. By default, no additional Alertmanager instances are configured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this potentially clarify what this field does for end users?:
additionalAlertmanagerConfigs is an optional field used to configure how Prometheus communicates
with Alertmanager instances when sending alerts.
We should also include information as to what happens when this field is omitted in the GoDoc. Based on my current understanding, that would likely be something like:
When omitted, Prometheus will not send alerts to any Alertmanager instances.
Is this correct? Is there a default Alertmanager instance that will always be included in the configuration for Prometheus instances?
| // +optional | ||
| // +kubebuilder:validation:MaxItems=10 | ||
| // +listType=map | ||
| // +listMapKey=apiVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this must be unique based on the apiVersion?
| // +kubebuilder:validation:MaxLength=50 | ||
| // +kubebuilder:validation:Pattern=`^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$` | ||
| // +optional | ||
| EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it actually make sense for this to be a string value? This makes it more complex to evaluate valid values.
For example, would it make sense for me to specify 3,000,000,000B here? The "integer" value there is higher than the maximum possible int32 value - is that problematic? What if I specified higher than the maximum possible int64 value in bytes - is that problematic?
Something like:
enforcedBodySizeLimit:
value: 4000
units: KBwould probably be easier to put constraints on that forces end users to make reasonable decisions here.
Alternatively, we decide a reasonable base unit that all values provided by a user will use and name the field accordingly (i.e enforcedBodySizeLimitMB)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, would it make sense to nest this under something like a scrapeConfig field?
Looking at https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config there are a lot of options that we aren't yet exposing. Do you ever intend to expand the set of scraping-related configuration options a user can configure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my comments here still stand and in general we prefer using an explicit unit name in the field.
See https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#units for reference. In OpenShift APIs we generally prefer the units in the field name approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point on the unit naming convention. Per the K8s API conventions, I'll rename enforcedBodySizeLimit to enforcedBodySizeLimitBytes and use an integer type instead of the string pattern.
Regarding nesting under scrapeConfig and expanding options we're not planning to expose the full Prometheus scrape config surface here. This API is intentionally scoped to mirror what's already exposed in the CMO ConfigMap (see cluster-monitoring-operator/pkg/manifests/types.go), not to be a 1:1 mapping of upstream Prometheus config. So keeping it flat should be fine.
| // replacement is the value against which a regex replace is performed if the | ||
| // regular expression matches. Regex capture groups are available. | ||
| // +optional | ||
| // +kubebuilder:validation:MaxLength=255 | ||
| Replacement *string `json:"replacement,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if I don't supply this value?
What is the difference between omitting and providing an empty string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in latest commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these questions have been addressed. I still don't understand what happens if I omit this field or set "".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I've now explicitly documented the difference in the godoc as there were just so many comments xD :
When omitted: Defaults to "$1" (the first regex capture group is used as the replacement value)
- When set to
"": Explicitly clears/empties the target label value
For example, with sourceLabels: ["env"] where env=production:
- Omitted → targetLabel gets value
"production"(via$1) ""→ targetLabel gets empty string""
| // action is the action to perform. | ||
| // +optional | ||
| // +kubebuilder:validation:MaxLength=20 | ||
| Action *string `json:"action,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if I don't specify an action?
What are the allowed actions that I can specify?
| // +optional | ||
| // +kubebuilder:validation:Enum=Verify;InsecureSkipVerify | ||
| // +kubebuilder:default=Verify | ||
| InsecureSkipVerify string `json:"insecureSkipVerify,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
insecureSkipVerify: Verify and insecureSkipVerify: InsecureSkipVerify read weird and could be confusing for users. Consider using a new field name that better reflects what this field is used for.
| // Allowed values are "Verify" (default, secure) and "InsecureSkipVerify" (skip certificate verification, insecure). | ||
| // By default, certificate verification is performed ("Verify"). | ||
| // +optional | ||
| // +kubebuilder:validation:Enum=Verify;InsecureSkipVerify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using an enum, create a type alias with constants for the allowed values.
| // LocalObjectReference contains enough information to let you locate the | ||
| // referenced object inside the same namespace. | ||
| // --- | ||
| // New uses of this type are discouraged because of difficulty describing its usage when embedded in APIs. | ||
| // 1. Invalid usage help. It is impossible to add specific help for individual usage. In most embedded usages, there are particular | ||
| // restrictions like, "must refer only to types A and B" or "UID not honored" or "name must be restricted". | ||
| // Those cannot be well described when embedded. | ||
| // 2. Inconsistent validation. Because the usages are different, the validation rules are different by usage, which makes it hard for users to predict what will happen. | ||
| // 3. We cannot easily change it. Because this type is embedded in many locations, updates to this type | ||
| // will affect numerous schemas. Don't make new APIs embed an underspecified API type they do not control. | ||
| // | ||
| // Instead of using this type, create a locally provided and used type that is well-focused on your reference. | ||
| // For example, ServiceReferences for admission registration: https://github.com/kubernetes/api/blob/release-1.17/admissionregistration/v1/types.go#L533 . | ||
| // +structType=atomic | ||
| type LocalObjectReference struct { | ||
| // name of the referent. | ||
| // This field is effectively required, but due to backwards compatibility is | ||
| // allowed to be empty. Instances of this type with an empty value here are | ||
| // almost certainly wrong. | ||
| // More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names | ||
| // +optional | ||
| // +default="" | ||
| // +kubebuilder:default="" | ||
| // +kubebuilder:validation:MaxLength=253 | ||
| // TODO: Drop `kubebuilder:default` when controller-gen doesn't need it https://github.com/kubernetes-sigs/kubebuilder/issues/3896. | ||
| Name *string `json:"name,omitempty" protobuf:"bytes,1,opt,name=name"` | ||
| } | ||
|
|
||
| // SecretKeySelector selects a key of a Secret. | ||
| // +structType=atomic | ||
| type SecretKeySelector struct { | ||
| // The name of the secret in the pod's namespace to select from. | ||
| LocalObjectReference `json:",inline" protobuf:"bytes,1,opt,name=localObjectReference"` | ||
| // key of the secret to select from. Must be a valid secret key. | ||
| // +required | ||
| // +kubebuilder:validation:MinLength=1 | ||
| // +kubebuilder:validation:MaxLength=253 | ||
| Key string `json:"key,omitempty" protobuf:"bytes,2,opt,name=key"` | ||
| // optional specifies whether the Secret or its key must be defined | ||
| // +optional | ||
| // +kubebuilder:validation:Enum=true;false | ||
| Optional string `json:"optional,omitempty" protobuf:"varint,3,opt,name=optional"` | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different than the existing SecretKeyReference type you've created earlier in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree here, I think we can just use one. While SecretKeyReference is used for reference and SecretKeySelector is for TLS certificates I think we may want to mimic Prom Op and use just v1.SecretKeySelector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer not importing types for this because they generally don't implement the same granularity of validations that we can impose on a custom type.
505533b to
da2c1ef
Compare
WalkthroughAdds an optional PrometheusK8sConfig to ClusterMonitoringSpec, introduces many new public types/enums for Prometheus/Alertmanager/remote-write, regenerates deepcopy/swagger/OpenAPI code and CRD payloads (with duplicate Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented Comment |
|
@everettraven I've gotten through all the comments and took a look at most of the concerns, could you take another, fresh review at it? Thanks! |
8473dba to
e748202
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (9)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)
1634-1651: Incorrect component name in logLevel description (duplicate).Line 1636 states "logLevel defines the verbosity of logs emitted by Alertmanager," but this field is under PrometheusK8sConfig and should refer to "Prometheus," not "Alertmanager." This appears to be copy-pasted from the alertmanagerConfig section.
logLevel: description: |- - logLevel defines the verbosity of logs emitted by Alertmanager. + logLevel defines the verbosity of logs emitted by Prometheus.config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)
1634-1651: Fix logLevel description — should reference Prometheus, not Alertmanager.The description at line 1636 incorrectly refers to "logs emitted by Alertmanager" when this field is part of
prometheusK8sConfigand should refer to Prometheus. This appears to have been copied from thealertmanagerConfig.customConfig.logLevelfield without updating the component name.logLevel: description: |- - logLevel defines the verbosity of logs emitted by Alertmanager. + logLevel defines the verbosity of logs emitted by Prometheus. This field allows users to control the amount and severity of logs generated, which can be usefulconfig/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)
1634-1651:⚠️ Unresolved: logLevel description still references Alertmanager instead of Prometheus.The description at Line 1635 states: "logLevel defines the verbosity of logs emitted by Alertmanager", but this field resides within
prometheusK8sConfig, which configures Prometheus, not Alertmanager. This is a copy-paste error from the Alertmanager section that appears not to have been fixed.Update the description to reference Prometheus instead. The source type definition in
config/v1alpha1/types_cluster_monitoring.goshould be corrected first, then the CRD regenerated.payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)
1634-1651: Resolve incorrect component reference in logLevel description (unresolved from previous review).Line 1636 incorrectly states "logLevel defines the verbosity of logs emitted by Alertmanager". This field belongs to
prometheusK8sConfig, so the description should refer to Prometheus, not Alertmanager. This issue was previously flagged but remains unresolved.Apply this diff to fix the description:
logLevel: description: |- - logLevel defines the verbosity of logs emitted by Alertmanager. + logLevel defines the verbosity of logs emitted by Prometheus. This field allows users to control the amount and severity of logs generated, which can be useful for debugging issues or reducing noise in production environments. Allowed values are Error, Warn, Info, and Debug.config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)
1634-1651: Fix copy-paste error: Replace "Alertmanager" with "Prometheus" in logLevel description.The logLevel field description incorrectly references "Alertmanager" when this field is under
prometheusK8sConfigand should reference "Prometheus". This appears to be a copy-paste error from the Alertmanager configuration section.Apply this fix:
logLevel: description: |- - logLevel defines the verbosity of logs emitted by Alertmanager. + logLevel defines the verbosity of logs emitted by Prometheus. This field allows users to control the amount and severity of logs generated, which can be useful for debugging issues or reducing noise in production environments. Allowed values are Error, Warn, Info, and Debug.payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)
1634-1651: Fix logLevel description to reference Prometheus, not Alertmanager (same issue as earlier review).This
logLevelfield is part ofprometheusK8sConfig, but the description still says “logs emitted by Alertmanager.” It should describe Prometheus log verbosity instead, mirroring semantics but naming the correct component, and then regenerate the CRD.config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
260-260: PrometheusK8sConfig.logLevel swagger text still references Alertmanager instead of Prometheus.In
map_PrometheusK8sConfig["logLevel"], the text says “logs emitted by Alertmanager” but this field belongs to PrometheusK8sConfig. Update the Go comment forPrometheusK8sConfig.LogLevelto reference Prometheus, then regenerate swagger so this entry matches.config/v1alpha1/types_cluster_monitoring.go (2)
655-660: Documentation claims default value but field is required without a default.The
schemefield is marked+requiredwith no+kubebuilder:default, yet line 657 states "The default value ishttp." Since users must explicitly provide a value, remove the misleading default claim.- // Possible values are `http` or `https`. The default value is `http`. + // Possible values are `http` or `https`.
100-100: Documentation mentions "remote write/read" but only remote write is supported.The comment mentions "remote write/read" but the
PrometheusK8sConfigstruct only exposesRemoteWriteconfiguration—there is noRemoteReadfield. Update the documentation to say "remote write" only, or clarify that remote read may be added in the future.- // - External integrations (remote write/read, alerting, and scraping configuration) + // - External integrations (remote write, alerting, and scraping configuration)
🧹 Nitpick comments (4)
config/v1alpha1/types_cluster_monitoring.go (4)
683-695: Missing validation for Prometheus label name format onkeyfield.The documentation states the key must be "a valid Prometheus label name, starting with a letter or underscore, followed by letters, digits, or underscores," but there's no CEL or pattern validation enforcing this. Invalid label names will be accepted at admission and fail at runtime.
Add validation to enforce the documented constraint:
// key is the name of the label. // The key must be a valid Prometheus label name, starting with a letter or underscore, // followed by letters, digits, or underscores. // +required // +kubebuilder:validation:MaxLength=63 // +kubebuilder:validation:MinLength=1 + // +kubebuilder:validation:XValidation:rule="self.matches('^[a-zA-Z_][a-zA-Z0-9_]*$')",message="key must be a valid Prometheus label name" Key string `json:"key,omitempty"`
721-726: Missing format validation forremoteTimeoutduration string.The documentation states the value must be "a valid Go time.Duration string (e.g. 30s, 5m, 1h)" but there's no validation enforcing this format. Invalid values will be accepted at admission time.
Consider adding CEL validation to match duration patterns:
// remoteTimeout is the timeout for requests to the remote write endpoint. // When omitted, the default is 30s. // +optional // +kubebuilder:validation:MaxLength=20 // +kubebuilder:validation:MinLength=2 + // +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+(ns|us|µs|ms|s|m|h)$')",message="remoteTimeout must be a valid duration (e.g. 30s, 5m, 1h)" RemoteTimeout string `json:"remoteTimeout,omitempty"`
781-786: Consider adding DNS subdomain validation forserverName.The documentation states serverName "must be a valid DNS subdomain as per RFC 1123" but there's no validation enforcing this. Consider adding CEL validation:
// serverName is the server name to use for TLS connections. // If specified, must be a valid DNS subdomain as per RFC 1123. // +optional // +kubebuilder:validation:MinLength=1 // +kubebuilder:validation:MaxLength=253 + // +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="serverName must be a valid DNS subdomain" ServerName string `json:"serverName,omitempty"`
799-808: Add Kubernetes secret name validation tonamefield.The
namefield references a Kubernetes Secret, which must follow DNS-1123 subdomain naming rules. Add validation to catch invalid names at admission time:// name is the name of the secret in the same namespace to select from. // +required // +kubebuilder:validation:MinLength=1 // +kubebuilder:validation:MaxLength=253 + // +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="name must be a valid Kubernetes secret name (DNS subdomain)" Name string `json:"name,omitempty"`
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (11)
config/v1alpha1/types_cluster_monitoring.go(3 hunks)config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml(1 hunks)config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml(1 hunks)config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml(1 hunks)config/v1alpha1/zz_generated.deepcopy.go(8 hunks)config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml(1 hunks)config/v1alpha1/zz_generated.swagger_doc_generated.go(4 hunks)openapi/generated_openapi/zz_generated.openapi.go(13 hunks)payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml(1 hunks)payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml(1 hunks)payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlopenapi/generated_openapi/zz_generated.openapi.gopayload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yamlpayload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.swagger_doc_generated.gopayload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yamlconfig/v1alpha1/types_cluster_monitoring.goconfig/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.deepcopy.go
🧬 Code graph analysis (3)
config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
config/v1alpha1/types_cluster_monitoring.go (8)
AdditionalAlertmanagerConfig(636-679)ExternalLabels(698-708)Label(682-695)PrometheusK8sConfig(447-618)RelabelConfig(736-768)RemoteWriteSpec(711-733)SecretKeySelector(798-823)TLSConfig(771-794)
config/v1alpha1/types_cluster_monitoring.go (2)
config/v1alpha1/zz_generated.swagger_doc_generated.go (9)
PrometheusK8sConfig(273-275)AdditionalAlertmanagerConfig(132-134)ExternalLabels(227-229)RemoteWriteSpec(299-301)ContainerResource(218-220)SecretKeySelector(310-312)TLSConfig(323-325)Label(237-239)RelabelConfig(287-289)operator/v1/types.go (1)
LogLevel(94-94)
config/v1alpha1/zz_generated.deepcopy.go (1)
config/v1alpha1/types_cluster_monitoring.go (9)
AdditionalAlertmanagerConfig(636-679)TLSConfig(771-794)PrometheusK8sConfig(447-618)ExternalLabels(698-708)Label(682-695)RemoteWriteSpec(711-733)ContainerResource(318-348)RelabelConfig(736-768)SecretKeySelector(798-823)
🔇 Additional comments (13)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)
1635-1636: FixlogLeveldescription to reference Prometheus, not Alertmanager.The
logLevelfield description at lines 1635–1636 in the CRD file should reference Prometheus, not Alertmanager. Update the description, the corresponding Go doc comment inconfig/v1alpha1/types_cluster_monitoring.go, and regenerate the CRD.config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
121-134: New swagger docs for Prometheus-related helper types look consistent with Go types.The swagger maps for
AdditionalAlertmanagerConfig,ExternalLabels,Label,PrometheusK8sConfig(aside from the logLevel wording),RelabelConfig,RemoteWriteSpec,SecretKeySelector, andTLSConfigclosely mirror the struct fields and kubebuilder annotations intypes_cluster_monitoring.go, including enums, list semantics, and validation constraints. No additional issues from a docs/API-shape perspective.Also applies to: 222-239, 255-275, 277-301, 303-325
openapi/generated_openapi/zz_generated.openapi.go (9)
424-424: LGTM: Schema registrations correctly added.The new v1alpha1 type registrations are properly formatted and follow the established pattern.
Also applies to: 442-442, 453-453, 465-465, 467-468, 472-472, 474-474
21918-21924: LGTM: PrometheusK8sConfig properly integrated.The new
prometheusK8sConfigfield is correctly added toClusterMonitoringSpecwith comprehensive documentation and proper dependency declarations.Also applies to: 21936-21936
22032-22068: Verify min/max label constraints are enforced.Line 22049 documents "At least 1 label must be specified, with a maximum of 50 labels," but the schema lacks
minItems: 1andmaxItems: 50constraints. Confirm these limits are validated via CRD validation or admission logic.
22462-22488: LGTM: Label schema correctly defined.The key/value structure is appropriate. Note that line 22471 describes a Prometheus label name pattern, but schema-level regex validation is absent—confirm this is validated elsewhere.
22930-23119: LGTM: PrometheusK8sConfig schema is comprehensive and well-structured.This central configuration type correctly references all dependencies. Multiple descriptions document size limits (e.g., max 10 items for various arrays), but as with other types, ensure these are enforced via CRD validation rules or admission webhooks.
23149-23272: LGTM: RelabelConfig and RemoteWriteSpec schemas are correctly defined.Both types have appropriate required fields and dependencies. Note that line 23206 lists valid actions for RelabelConfig without enum enforcement, and line 23253 documents a maximum without
maxItems—consistent with the validation pattern throughout this API.
23368-23406: LGTM: SecretKeySelector follows Kubernetes patterns correctly.The schema properly uses
x-kubernetes-map-type: atomicto ensure atomicity of secret references, with appropriate required fields.
23438-23486: LGTM: TLSConfig schema is correctly structured.All fields appropriately optional for flexible TLS configuration, with proper references to
SecretKeySelectorfor sensitive data.
21227-21302: Verify that documented limits are enforced elsewhere.The description states "Maximum of 10 endpoints can be specified," but the schema lacks a
maxItemsconstraint. Confirm this limit is enforced via CRD validation rules or admission webhooks, or update the description if the limit is not actually enforced.config/v1alpha1/types_cluster_monitoring.go (1)
620-631: Enum types are well-defined with proper validation.The
AlertmanagerAPIVersion,AlertmanagerScheme,RelabelAction, andCollectionProfiletypes have appropriate enum constraints and constant definitions. The lowercase values forRelabelActionalign with Prometheus conventions.Also applies to: 836-867
config/v1alpha1/zz_generated.deepcopy.go (1)
14-39: Autogenerated deepcopy implementations are correct.The generated
DeepCopyIntoandDeepCopymethods for the new types correctly handle:
- Slice allocation and element copying (including calling
DeepCopyIntofor complex elements)- Map allocation and key-value copying
- Pointer field allocation and value copying
- Nil checks for pointer and slice fields
The struct assignments for
TLSConfigandSecretKeySelectorare correct since they only contain value types (strings).Also applies to: 460-479, 734-748, 994-1061, 1089-1156, 1216-1270
| prometheusK8sConfig: | ||
| description: |- | ||
| prometheusK8sConfig provides configuration options for the Prometheus instance, which is the pod running Prometheus in the cluster. | ||
| By default, at least one Prometheus pod is deployed in the `openshift-monitoring` namespace to collect and store metrics for the platform. | ||
| This field allows you to customize how Prometheus is deployed and operated, including: | ||
| - Deployment settings (such as replica count and update strategy) | ||
| - Pod scheduling (node selectors, tolerations, affinity) | ||
| - Resource allocation (CPU, memory, and storage requests/limits) | ||
| - Retention policies (how long metrics are stored) | ||
| - External integrations (remote write/read, alerting, and scraping configuration) | ||
| Configuring Prometheus is important because it enables you to tailor monitoring to your cluster's needs, such as: | ||
| - Ensuring high availability and reliability of monitoring data | ||
| - Managing resource usage to fit your infrastructure | ||
| - Controlling how long metrics are retained for compliance or troubleshooting | ||
| - Integrating with external systems for alerting or long-term storage | ||
| - Adjusting scraping and relabeling to match your workloads and security requirements | ||
| For more information on Prometheus configuration, see: | ||
| https://prometheus.io/docs/prometheus/latest/configuration/configuration/ | ||
| https://prometheus.io/docs/prometheus/latest/storage/ | ||
| This field is optional. When omitted, the platform chooses reasonable defaults, which may change over time. | ||
| minProperties: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align prometheusK8sConfig description with actual fields (no replicas/update-strategy/remote read/scrape config exposed).
The doc block claims control over replica count, update strategy, remote read, and scraping configuration, but the prometheusK8sConfig schema here only exposes alertmanager integrations, external labels, node selector/tolerations/topology spread, resources, retention, remoteWrite, and PVC. Either add the missing fields or update this description (and the upstream Go field comment) to only advertise what’s actually configurable, then regenerate the CRD.
🤖 Prompt for AI Agents
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
around lines 1287-1311: the prometheusK8sConfig description advertises control
over replicas, update strategy, remote read and scraping configuration which are
not present in the schema; either add those fields to the API or update the
description to exactly list the exposed fields (alertmanager integrations,
externalLabels, nodeSelector/tolerations/topologySpread, resources, retention,
remoteWrite and PVC), then update the upstream Go field comment to match and
regenerate the CRD (run the repo’s CRD generation/make target) so the YAML and
comments stay in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explained above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielmellado Thanks for the response! I don't see the explanation in this comment thread, so I may have missed it in another part of the PR discussion.
Could you clarify or point me to where this discrepancy was addressed? Specifically, the concern is that the prometheusK8sConfig description mentions:
- "replica count and update strategy"
- "remote read"
- "scraping configuration"
...but these fields don't appear to be present in the actual schema (which only exposes additionalAlertmanagerConfigs, collectionProfile, enforcedBodySizeLimit, externalLabels, logLevel, nodeSelector, queryLogFile, remoteWrite, resources, retention/retentionSize, tolerations, topologySpreadConstraints, and volumeClaimTemplate).
If this was intentionally left as future-looking documentation or if there's a plan to add these fields later, that's totally fine—I just want to make sure I'm not missing something. Let me know if you'd like me to mark this as resolved or if there's additional context I should review!
|
@danielmellado It looks like you've picked this work back up. As it has been sitting for quite some time, let me know when you are ready for another round of review. |
|
Hi @everettraven as soon as I rebase it, I think it's ready for it, thanks! |
e748202 to
0d09af6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (11)
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)
1634-1651: Fix logLevel description to reference Prometheus (not Alertmanager).Line 1636 states "logs emitted by Alertmanager," but this field is in
prometheusK8sConfigand should reference Prometheus. This appears to be a copy-paste error from the earlieralertmanagerConfig.customConfig.logLevelsection (line 66–83).logLevel: description: |- - logLevel defines the verbosity of logs emitted by Alertmanager. + logLevel defines the verbosity of logs emitted by Prometheus.config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)
1634-1651: Incorrect component name in logLevel description.Line 1636 states "defines the verbosity of logs emitted by Alertmanager," but this field is for PrometheusK8sConfig and should refer to Prometheus, not Alertmanager. This appears to be copy-pasted from the alertmanagerConfig section.
logLevel: description: |- - logLevel defines the verbosity of logs emitted by Alertmanager. + logLevel defines the verbosity of logs emitted by Prometheus.payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)
1634-1651: Fix incorrect component reference in logLevel description (not addressed from previous review).The
logLevelfield description (line 1636) still incorrectly states "logLevel defines the verbosity of logs emitted by Alertmanager". Since this field belongs toprometheusK8sConfig, it should refer to Prometheus instead. This was flagged in a previous review but the fix was not applied.Apply this diff to correct the description:
logLevel: description: |- - logLevel defines the verbosity of logs emitted by Alertmanager. + logLevel defines the verbosity of logs emitted by Prometheus. This field allows users to control the amount and severity of logs generated, which can be useful for debugging issues or reducing noise in production environments. Allowed values are Error, Warn, Info, and Debug.config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)
1634-1651: Fix logLevel description—replace "Alertmanager" with "Prometheus".Line 1636 incorrectly states "logLevel defines the verbosity of logs emitted by Alertmanager," but this field is nested within
prometheusK8sConfigand controls Prometheus log verbosity, not Alertmanager's. The description must reference the correct component to avoid confusing API users.logLevel: description: |- - logLevel defines the verbosity of logs emitted by Alertmanager. + logLevel defines the verbosity of logs emitted by Prometheus. This field allows users to control the amount and severity of logs generated, which can be useful for debugging issues or reducing noise in production environments.payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)
1634-1651: Fix Prometheus logLevel description (currently mentions Alertmanager).Within
prometheusK8sConfig.logLevelthe description still says “logs emitted by Alertmanager”, but this field configures Prometheus logging. That’s a copy/paste error and can confuse consumers of the API.Please update the source comment on
PrometheusK8sConfig.LogLevelinconfig/v1alpha1/types_cluster_monitoring.goto reference Prometheus instead, then regenerate swagger/OpenAPI and CRDs so both this YAML and the Go docs stay in sync.For example, in
PrometheusK8sConfig:- // logLevel defines the verbosity of logs emitted by Alertmanager. + // logLevel defines the verbosity of logs emitted by Prometheus.Also applies to: 1287-2528
config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
255-275: Correct logLevel description in PrometheusK8sConfig Swagger docs (Alertmanager vs Prometheus).
map_PrometheusK8sConfig["logLevel"]still describes “logs emitted by Alertmanager”, but this field belongs toPrometheusK8sConfigand controls Prometheus logging. This mirrors the CRD issue and comes from the same Go comment onPrometheusK8sConfig.LogLevel.Once you fix the Go comment to say “Prometheus” and rerun the generators, this swagger text (and all CRDs) will update automatically; no need to edit the generated files directly.
Also applies to: 121-325
config/v1alpha1/types_cluster_monitoring.go (5)
683-689: Add validation for Prometheus label name format on Key field.Per past review feedback, the
Keyfield should validate that label names conform to Prometheus requirements. Prometheus label names must match^[a-zA-Z_][a-zA-Z0-9_]*$. Without this, users could specify invalid names likemy-label(with hyphen) that will fail at runtime.// key is the name of the label. // The key must be a valid Prometheus label name, starting with a letter or underscore, // followed by letters, digits, or underscores. // +required // +kubebuilder:validation:MaxLength=63 // +kubebuilder:validation:MinLength=1 + // +kubebuilder:validation:XValidation:rule="self.matches('^[a-zA-Z_][a-zA-Z0-9_]*$')",message="key must be a valid Prometheus label name: start with letter or underscore, followed by letters, digits, or underscores" Key string `json:"key,omitempty"`
100-100: Documentation mentions "remote write/read" but only remote write is exposed.The comment on line 100 states "External integrations (remote write/read, alerting, and scraping configuration)" but the
PrometheusK8sConfigstruct only exposesRemoteWrite— there are noRemoteReadfields. This could confuse users expecting remote read configuration.Update to say "remote write" only, or clarify that remote read may be added in the future.
483-494: Documentation incorrectly references Alertmanager instead of Prometheus.Line 483 states "logLevel defines the verbosity of logs emitted by Alertmanager" but this field is within
PrometheusK8sConfigand should reference Prometheus. This appears to be a copy-paste error fromAlertmanagerCustomConfig.- // logLevel defines the verbosity of logs emitted by Alertmanager. + // logLevel defines the verbosity of logs emitted by Prometheus.
642-646: Remove implementation detail from user-facing documentation.Line 644 states "This is a custom type to allow for admission time validations" which is an implementation detail that shouldn't be in user-facing godoc. This was flagged in past review.
// bearerToken defines the secret reference containing the bearer token // to use when authenticating to Alertmanager. - // This is a custom type to allow for admission time validations. // +optional
647-660: PathPrefix example references v1 but only v2 is supported; Scheme default claim is misleading.Two documentation issues per past review:
Line 649 mentions
/api/v1/alertsin the example, butapiVersiononly allowsv2. Update to/api/v2/alerts.Line 657 says "The default value is
http" but the field is+requiredwith no+kubebuilder:default, so users must always provide a value.// pathPrefix defines an optional URL path prefix to prepend to the Alertmanager API endpoints. // For example, if your Alertmanager is behind a reverse proxy at "/alertmanager/", - // set this to "/alertmanager" so requests go to "/alertmanager/api/v1/alerts" instead of "/api/v1/alerts". + // set this to "/alertmanager" so requests go to "/alertmanager/api/v2/alerts" instead of "/api/v2/alerts".// scheme defines the URL scheme to use when communicating with Alertmanager // instances. - // Possible values are `http` or `https`. The default value is `http`. + // Possible values are `http` or `https`.
🧹 Nitpick comments (2)
openapi/generated_openapi/zz_generated.openapi.go (2)
21227-21302: Documented constraints are not enforced by the schemaIn
schema_openshift_api_config_v1alpha1_AdditionalAlertmanagerConfig, several fields document strict constraints (e.g.apiVersiononlyv2,schemeonlyhttp/httpswith a default,staticConfigsas<host>:<port>with a max of 10 entries), but the OpenAPI schema here exposes them as plain strings/arrays with noEnum,Pattern,MinItems, orMaxItems. This means invalid configs will pass CRD validation and only fail later when the operator consumes them.If you intend these constraints to be enforced at admission time, consider adding the appropriate
+kubebuilder:validationtags on the Go types (enums, patterns, min/max items) so that openapi-gen emits them. The same observation applies to other new types where the descriptions mention limits or allowed values (e.g.ExternalLabels.labels, several fields onPrometheusK8sConfiglikeenforcedBodySizeLimit,remoteWrite,resources,tolerations,topologySpreadConstraints,collectionProfile, andRemoteWriteSpec.writeRelabelConfigs).
23368-23486: Consider enum + fail‑closed handling for SecretKeySelector.optional and TLSConfig.certificateVerification
SecretKeySelector.optionalandTLSConfig.certificateVerificationare documented as having a small, fixed set of string values (RequiredvsOptional,VerifyvsSkipVerify), but the OpenAPI schema exposes them as unrestricted strings with no enum. That makes it easy for users to typo values and for the implementation to accidentally treat unknown values like “SkipVerify”/“Optional”, which would be a security regression.I’d recommend:
- Modeling these as explicit enum types in the Go API with
+kubebuilder:validation:Enum=...so bad values are rejected at admission.- Verifying that the consuming code treats unknown/empty values in a fail‑closed way (i.e. defaulting to
Required/Verify, never toOptional/SkipVerify).This would harden the API surface without changing the intended semantics.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (12)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml(1 hunks)config/v1alpha1/types_cluster_monitoring.go(3 hunks)config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml(1 hunks)config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml(1 hunks)config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml(1 hunks)config/v1alpha1/zz_generated.deepcopy.go(8 hunks)config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml(1 hunks)config/v1alpha1/zz_generated.swagger_doc_generated.go(4 hunks)openapi/generated_openapi/zz_generated.openapi.go(13 hunks)payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml(1 hunks)payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml(1 hunks)payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlconfig/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlpayload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yamlconfig/v1alpha1/types_cluster_monitoring.goopenapi/generated_openapi/zz_generated.openapi.goconfig/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.swagger_doc_generated.gopayload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yamlpayload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.deepcopy.go
🧬 Code graph analysis (2)
config/v1alpha1/types_cluster_monitoring.go (2)
config/v1alpha1/zz_generated.swagger_doc_generated.go (9)
PrometheusK8sConfig(273-275)AdditionalAlertmanagerConfig(132-134)ExternalLabels(227-229)RemoteWriteSpec(299-301)ContainerResource(218-220)SecretKeySelector(310-312)TLSConfig(323-325)Label(237-239)RelabelConfig(287-289)operator/v1/types.go (1)
LogLevel(94-94)
config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
config/v1alpha1/types_cluster_monitoring.go (8)
AdditionalAlertmanagerConfig(636-679)ExternalLabels(698-708)Label(682-695)PrometheusK8sConfig(447-618)RelabelConfig(736-768)RemoteWriteSpec(711-733)SecretKeySelector(798-823)TLSConfig(771-794)
🔇 Additional comments (27)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)
354-778: ✅ Comprehensive test coverage for PrometheusK8sConfig validation.The test cases thoroughly validate PrometheusK8sConfig behavior across multiple scenarios: minimal config, valid configurations with various optional fields (nodeSelector, resources, tolerations, topologySpreadConstraints, additionalAlertmanagerConfigs, remoteWrite, externalLabels), and rejection cases for constraint violations. Enum values are correctly cased (e.g., "Info" not "info").
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)
1634-1651: FixlogLeveldescription: should reference Prometheus, not Alertmanager.Line 1636 incorrectly states "logs emitted by Alertmanager," but this field configures Prometheus logging (it is nested under
prometheusK8sConfig). Update the description to "logs emitted by Prometheus" and regenerate the CRD from the updated Go type doc comments.Apply this diff to update the description:
logLevel: description: |- - logLevel defines the verbosity of logs emitted by Alertmanager. + logLevel defines the verbosity of logs emitted by Prometheus. This field allows users to control the amount and severity of logs generated, which can be useful for debugging issues or reducing noise in production environments. Allowed values are Error, Warn, Info, and Debug.After updating the Go doc comment in
config/v1alpha1/types_cluster_monitoring.go, regenerate the CRD manifest to sync the description.config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (2)
1562-1575: ✓ collectionProfile enum–description consistency resolved.The description correctly shows
FullandMinimalin PascalCase, matching the enum values. This aligns with the prior review feedback (commit 3b005e4).
1287-2528: Verify whether prometheusK8sConfig duplication exists.The AI summary indicates duplicate
prometheusK8sConfigentries in the CRD OpenAPI schema block. However, the provided excerpt shows only one definition (lines 1287–2528). Duplication may exist in:
- The CustomNoUpgrade or DevPreviewNoUpgrade CRD variants (mentioned in the summary as also having duplicates)
- The full file beyond the provided range
- Generated artifacts that were not regenerated cleanly
Please confirm whether duplication is present and, if so, whether it was introduced during code generation.
config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
191-197: ClusterMonitoringSpec.prometheusK8sConfig Swagger doc looks consistent with the CRD.The high-level description for
prometheusK8sConfigmatches the CRD and underlying intent; no additional structural or validation issues from the swagger side.openapi/generated_openapi/zz_generated.openapi.go (4)
421-477: OpenAPI registration for new v1alpha1 types looks consistentThe new v1alpha1 schemas (AdditionalAlertmanagerConfig, ExternalLabels, Label, PrometheusK8sConfig, RelabelConfig, RemoteWriteSpec, SecretKeySelector, TLSConfig) are correctly wired into the OpenAPI definitions map alongside related config types; no issues from a registration/lookup perspective.
21918-21937: Wiring ofprometheusK8sConfiginto ClusterMonitoringSpec looks correctThe new
prometheusK8sConfigproperty referencesv1alpha1.PrometheusK8sConfigwith an empty-object default, and the dependencies slice is updated to include that type alongside the existing config types. From an OpenAPI/wiring perspective this addition looks sound.
22032-22069: ExternalLabels / Label schema structure matches intended usage
Labelas a simple{key,value}object andExternalLabelsas a list-map keyed bykeyis a good fit for enforcing uniqueness of label keys and mirrors Prometheus’ external labels model. Aside from the earlier general note about documenting min/max label counts without schema-level enforcement, the structure and references here look correct.Also applies to: 22462-22488
23149-23272: RelabelConfig / RemoteWriteSpec model aligns with Prometheus semantics
RelabelConfigclosely follows the standard Prometheus relabel rule shape (sourceLabels, separator, regex, targetLabel, replacement, action), andRemoteWriteSpecuses that forwriteRelabelConfigswithurlrequired, which matches expectations for remote_write configuration. Structurally and from a reference standpoint this looks correct; just keep in mind the earlier comment if you decide you want schema‑level enforcement of allowedactionvalues or max relabel rules.config/v1alpha1/types_cluster_monitoring.go (9)
443-461: LGTM on additionalAlertmanagerConfigs structure.The
listType=atomicapproach and documentation foradditionalAlertmanagerConfigsaddresses the prior review feedback about uniqueness constraints. The field correctly documents the max items limit and behavior when omitted.
505-518: LGTM on queryLogFile documentation and validation.The field has clear documentation explaining supported paths (including
/dev/stderr,/dev/stdout,/dev/null), restrictions on other/dev/paths, and appropriate length constraints.
519-527: LGTM on remoteWrite configuration.Using
urlas the list map key ensures uniqueness per remote endpoint, and the documentation clearly explains the purpose and links to Prometheus documentation.
600-617: LGTM on collectionProfile; note on volumeClaimTemplate pattern.The
collectionProfilefield has clear documentation with meaningful enum values. ThevolumeClaimTemplateusesv1.PersistentVolumeClaimdirectly, which matches the existing pattern used inAlertmanagerCustomConfig(line 278). While past review suggested using a custom type for better validation, this maintains consistency with the existing API surface in this file.
710-733: LGTM on RemoteWriteSpec.The required
urlfield with length constraints and optional configuration fields are well-structured. UsinglistType=atomicforwriteRelabelConfigsis appropriate for relabeling rule ordering.
735-768: LGTM on RelabelConfig.The structure appropriately mirrors Prometheus
relabel_configwith theactionfield being required and other fields optional. Length constraints are reasonable.
770-794: LGTM on TLSConfig with improved certificateVerification naming.The
certificateVerificationfield withVerify/SkipVerifyenum values is much clearer than the previousinsecureSkipVerifynaming. The default ofVerifyensures secure behavior by default.
796-834: LGTM on SecretKeySelector with detailed Optional behavior documentation.The
optionalfield documentation (lines 809-818) thoroughly explains the runtime behavior when set toOptionaland the Secret/key is missing. This addresses past review feedback and provides clear guidance to users.
836-867: LGTM on RelabelAction and CollectionProfile enums.
RelabelActioncorrectly uses lowercase values to match Prometheus conventions, whileCollectionProfileuses CamelCase per OpenShift conventions. Both have well-documented constants.config/v1alpha1/zz_generated.deepcopy.go (9)
14-39: LGTM on AdditionalAlertmanagerConfig deepcopy.The auto-generated deepcopy correctly handles
StaticConfigsslice andTLSConfigpointer. TheBearerTokenvalue copy is correct sinceSecretKeySelectorcontains only value types.
389-397: LGTM on ClusterMonitoringSpec deepcopy integration.The
PrometheusK8sConfig.DeepCopyIntocall follows the established pattern for other config fields.
460-479: LGTM on ExternalLabels deepcopy.The
Labelsslice is correctly deep copied.
734-748: LGTM on Label deepcopy.Simple value copy is correct for a struct with only string fields.
994-1061: LGTM on PrometheusK8sConfig deepcopy.All complex fields are correctly handled:
- Slices with struct elements use per-element
DeepCopyInto- Maps are properly deep copied
- Pointer fields allocate new memory and copy values
1089-1128: LGTM on RelabelConfig deepcopy.All pointer fields (
Separator,Regex,TargetLabel,Replacement) and theSourceLabelsslice are correctly deep copied.
1130-1156: LGTM on RemoteWriteSpec deepcopy.The optional
Namepointer andWriteRelabelConfigsslice are correctly deep copied.
1216-1230: LGTM on SecretKeySelector deepcopy.Value copy is correct for a struct with only string and string-alias fields.
1253-1270: LGTM on TLSConfig deepcopy.All fields are value types (
SecretKeySelectorand strings), so simple value copy is correct.
| // LogLevel defines the verbosity of logs emitted by Alertmanager. | ||
| // Valid values are Error, Warn, Info and Debug. | ||
| // +kubebuilder:validation:Enum=Error;Warn;Info;Debug | ||
| type LogLevel string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LogLevel type documentation is too specific for a shared type.
This LogLevel type is used by both AlertmanagerCustomConfig and PrometheusK8sConfig, but the godoc says "defines the verbosity of logs emitted by Alertmanager." This is misleading when users reference the type documentation in the Prometheus context.
-// LogLevel defines the verbosity of logs emitted by Alertmanager.
+// LogLevel defines the verbosity of logs emitted by monitoring components.
// Valid values are Error, Warn, Info and Debug.🤖 Prompt for AI Agents
In config/v1alpha1/types_cluster_monitoring.go around lines 299 to 302, the
godoc for the LogLevel type is too specific to Alertmanager even though the type
is shared; update the comment to a generic description such as "LogLevel defines
the verbosity of logs emitted by components (for example Alertmanager and
Prometheus)." and keep the existing validation Enum tag intact so the allowed
values remain Error;Warn;Info;Debug.
...ed.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
Show resolved
Hide resolved
|
@danielmellado I just wanted to follow up and let you know that this is next on my list to review. Because there is a lot of historical review here I took some time to re-read that history and feel sufficiently caught up to perform another review round. My next review round is going to take the approach of starting from scratch so that I can tune out any noise from previous review cycles. I should have a review completed by the end of the week. |
| // prometheusK8sConfig provides configuration options for the Prometheus instance, which is the pod running Prometheus in the cluster. | ||
| // By default, at least one Prometheus pod is deployed in the `openshift-monitoring` namespace to collect and store metrics for the platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only for configuring the default prometheus instance or does the configuration specified here apply to all prometheus instances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This configures the default platform Prometheus instance that runs in openshift-monitoring. User-workload Prometheus instances are configured separately. Updated godoc to clarify.
| // https://prometheus.io/docs/prometheus/latest/configuration/configuration/ | ||
| // https://prometheus.io/docs/prometheus/latest/storage/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation links like this can become stale and may not properly reflect the configuration options that this type exposes.
We should appropriately document all the configuration options we expose instead of linking out.
In this particular case, I'd probably just remove these links here and let the child fields explain what they configure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
| // +optional | ||
| // +kubebuilder:validation:MaxItems=10 | ||
| // +listType=atomic | ||
| AdditionalAlertmanagerConfigs []AdditionalAlertmanagerConfig `json:"additionalAlertmanagerConfigs,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Circling back to this, I think something like:
additionalAlertmanagerConfigs:
- apiVersion: v2
url: "alertmanager.external.com:9093"
path: "/critical"
authenticationMethods:
- type: BearerToken
bearerToken:
name: "prod-token"
key: "token"
- type: TLS
tls:
certificate:
name: "..."
key: "..."Would be a better UX for this field and you can enforce some form of uniqueness on a combination of fields if necessary.
What do you think?
| // +kubebuilder:validation:MaxLength=50 | ||
| // +kubebuilder:validation:Pattern=`^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$` | ||
| // +optional | ||
| EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my comments here still stand and in general we prefer using an explicit unit name in the field.
See https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#units for reference. In OpenShift APIs we generally prefer the units in the field name approach.
| // and Alertmanager. By default, no labels are added. | ||
| // When omitted, no external labels are applied (default behavior). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think that just saying this is sufficient:
| // and Alertmanager. By default, no labels are added. | |
| // When omitted, no external labels are applied (default behavior). | |
| // and Alertmanager. | |
| // When omitted, no external labels are applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Simplified the documentation.
| // When omitted, defaults to "Required". | ||
| // +optional | ||
| // +kubebuilder:default=Required | ||
| Optional SecretRequirement `json:"optional,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field name here is a bit confusing to me.
setting:
optional: Requiredor
optional: Optionaldoesn't seem intuitive to me.
Something like presence feels like it would fit better here.
| ) | ||
|
|
||
| // RelabelAction defines the action to perform in a relabeling rule. | ||
| // +kubebuilder:validation:Enum=replace;keep;drop;hashmod;labelmap;labeldrop;labelkeep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enum values should be CamelCase:
| // +kubebuilder:validation:Enum=replace;keep;drop;hashmod;labelmap;labeldrop;labelkeep | |
| // +kubebuilder:validation:Enum=Replace;Keep;Drop;HashMod;LabelMap;LabelDrop;LabelKeep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
| // +kubebuilder:validation:MaxLength=255 | ||
| Replacement *string `json:"replacement,omitempty"` | ||
| // action is the action to perform. | ||
| // Valid actions are: replace, keep, drop, hashmod, labelmap, labeldrop, labelkeep. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does setting this to each of these values do?
How do I determine which action I want to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added detailed documentation for each action in the updated godoc
simonpasquier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the max values, I'm worried that some are too low. Maybe we can try to leverage Insights data to understand the current practices?
| // prometheusK8sConfig is optional. | ||
| // When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time. | ||
| // +optional | ||
| PrometheusK8sConfig PrometheusK8sConfig `json:"prometheusK8sConfig,omitempty,omitzero"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd agree that we don't need the k8s part.
| // +optional | ||
| // +kubebuilder:validation:MaxItems=10 | ||
| // +listType=atomic | ||
| AdditionalAlertmanagerConfigs []AdditionalAlertmanagerConfig `json:"additionalAlertmanagerConfigs,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that enabling multiple owners for the field is important. For instance ACM wants to manage its own entry(ies) while cluster admins may want to use other external Alertmanagers.
| // +kubebuilder:validation:MinLength=1 | ||
| // +kubebuilder:validation:MaxLength=50 | ||
| // +optional | ||
| EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to ensure the format of the string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are going to be continuing with the size format approach as opposed to using explicit units, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonpasquier I was thinking of just getting this to automatic if omitted, wdyt?
| // remoteWrite supports a maximum of 10 items in the list. | ||
| // +kubebuilder:validation:MaxItems=10 | ||
| // +listType=map | ||
| // +listMapKey=url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's the only field we should use for the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added prose explaining: "Maximum of 10 remote write configurations can be specified. Each entry must have a unique URL."
@simonpasquier - using
urlas the list map key since it's required and should be unique per endpoint. Is there a better key you'd suggest?
| // Allowed values: "v2". "v1" is no longer supported. | ||
| // +kubebuilder:validation:Enum=v2 | ||
| // +required | ||
| APIVersion AlertmanagerAPIVersion `json:"apiVersion,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can omit the API version field (v1 shouldn't be used anymore).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point there! I removed the apiVersion field entirely since v1 is no longer supported and v2 is the only valid option. CMO will hardcode v2 when communicating with Alertmanager.
| // +optional | ||
| // +kubebuilder:validation:MaxLength=20 | ||
| // +kubebuilder:validation:MinLength=2 | ||
| RemoteTimeout string `json:"remoteTimeout,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to validate duration strings.
| // writeRelabelConfigs is a list of relabeling rules to apply before sending data to the remote endpoint. | ||
| // Maximum of 10 relabeling rules can be specified. | ||
| // +optional | ||
| // +kubebuilder:validation:MaxItems=10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 is too low
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! What would be a reasonable limit? 50? 100? We want to allow complex relabeling while preventing unbounded configs, TL;DR same as in the other one
|
|
||
| const ( | ||
| // RelabelActionReplace replaces the target label with the replacement value. | ||
| RelabelActionReplace RelabelAction = "replace" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already fixed, will push an updated commit, thanks!
| // When omitted, defaults to "Required". | ||
| // +optional | ||
| // +kubebuilder:default=Required | ||
| Optional SecretRequirement `json:"optional,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can make all secret keys required and not have the optional field. The Prometheus operator itself will fail the reconciliation if a secret key ref is missing hence a ref is required de facto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, seems like a cleaner approach, wdyt @everettraven ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't need some kind of optionality evaluation here, yes that is better.
| // +optional | ||
| // +kubebuilder:validation:Enum=Verify;SkipVerify | ||
| // +kubebuilder:default=Verify | ||
| CertificateVerification string `json:"certificateVerification,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be a custom type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we typically encourage the use of a type alias for enums
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed - created CertificateVerificationType type alias with proper constants.
0d09af6 to
0643f0d
Compare
Signed-off-by: Mario Fernandez <mariofer@redhat.com>
Signed-off-by: Mario Fernandez <mariofer@redhat.com>
Signed-off-by: Mario Fernandez <mariofer@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (6)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)
1634-1651: The logLevel description still incorrectly references Alertmanager instead of Prometheus.Lines 1635-1636 state "logs emitted by Alertmanager", but this field configures Prometheus logging under
prometheusK8sConfig. This documentation error was flagged in previous reviews and marked as addressed, but the current code still contains the incorrect reference.Update the description to reference Prometheus instead of Alertmanager. The Go doc comment in
config/v1alpha1/types_cluster_monitoring.goshould be updated and the CRD regenerated.Apply this diff to fix the description:
logLevel: description: |- - logLevel defines the verbosity of logs emitted by Alertmanager. + logLevel defines the verbosity of logs emitted by Prometheus. This field allows users to control the amount and severity of logs generated, which can be useful for debugging issues or reducing noise in production environments.config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)
1287-2529: LGTM! Past issues resolved, schema is well-structured.The
prometheusK8sConfigfield addition looks good. Both previously flagged issues have been successfully addressed:
✅ collectionProfile (lines 1527-1528): The description now correctly matches the enum values using PascalCase (
Full,Minimal).✅ logLevel (line 1605): The description now correctly references "Prometheus" instead of the previously incorrect "Alertmanager" reference.
The CRD schema is comprehensive with:
- Proper field validation rules (length constraints, patterns, enums)
- Clear, detailed descriptions for all fields
- Appropriate nested structure for complex configurations
- Consistent naming conventions following Kubernetes patterns
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)
1634-1651: Fix incorrect component reference in logLevel description.Line 1636 still incorrectly states "logLevel defines the verbosity of logs emitted by Alertmanager". This field belongs to
prometheusK8sConfig, so it should reference Prometheus, not Alertmanager.This issue was flagged in previous reviews but remains unfixed.
Apply this diff to correct the description:
logLevel: description: |- - logLevel defines the verbosity of logs emitted by Alertmanager. + logLevel defines the verbosity of logs emitted by Prometheus. This field allows users to control the amount and severity of logs generated, which can be useful for debugging issues or reducing noise in production environments. Allowed values are Error, Warn, Info, and Debug.payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)
1634-1651: Fix copy-paste error: logLevel description references wrong component.Line 1636 incorrectly states that
logLevelcontrols verbosity for "Alertmanager" when it should reference "Prometheus" since this field is nested withinprometheusK8sConfig.Apply this diff to correct the component reference:
logLevel: description: |- - logLevel defines the verbosity of logs emitted by Alertmanager. + logLevel defines the verbosity of logs emitted by Prometheus. This field allows users to control the amount and severity of logs generated, which can be useful for debugging issues or reducing noise in production environments.config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
303-311: Document the missingoptionalfield in SecretKeySelector swagger docs.The swagger documentation for
SecretKeySelectoris missing theoptionalfield, which is present in the actual type definition (as seen in the CRD schema at lines 1356-1373, 1436-1449, etc. in the CRD file). Theoptionalfield is an enum with valuesRequired/Optionaland defaults toRequired.Update the source Go type comment for
SecretKeySelector.Optionalto include appropriate documentation, then regenerate swagger docs. The documentation should describe:
- Allowed values:
Required,Optional- Default value:
Required- Behavior when set to
RequiredvsOptionalconfig/v1alpha1/types_cluster_monitoring.go (1)
288-291: LogLevel documentation is too component-specific for a shared type.This
LogLeveltype is used by bothAlertmanagerCustomConfig(line 174) andPrometheusK8sConfig(line 481), but the godoc states it "defines the verbosity of logs emitted by Alertmanager." This is misleading when users reference the type documentation in the Prometheus context.This was flagged in past review comments but not yet addressed.
Update to a generic description:
-// LogLevel defines the verbosity of logs emitted by Alertmanager. +// LogLevel defines the verbosity of logs emitted by monitoring components (such as Alertmanager and Prometheus). // Valid values are Error, Warn, Info and Debug.
🧹 Nitpick comments (5)
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)
1855-1866: Consider adding pattern validation for duration and quantity fields.Several fields accept duration or quantity strings but lack CRD-level pattern validation:
retention(lines 1855-1866): Description specifies pattern[0-9]+(ms|s|m|h|d|w|y)but nopatternfield enforces itremoteTimeout(lines 1673-1679): Described as duration string but no pattern validationretentionSize(lines 1867-1874): Described as Kubernetes quantity format but no pattern validationWhile runtime validation will catch invalid values, adding pattern validation here would provide earlier feedback to users during resource creation/update.
Note: If this is intentionally omitted (e.g., to allow for forward compatibility with new duration formats or because quantity parsing is complex), this suggestion can be safely ignored.
Also applies to: 1673-1679, 1867-1874
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (2)
1576-1590: Add pattern validation for enforcedBodySizeLimit or clarify validation approach.The description (line 1585) states "The value must match the following pattern: ^(automatic|[0-9]+(B|KB|MB|GB|GB|TB)?)$", but the schema lacks a
patternconstraint. Without CRD-level validation, users receive no immediate feedback when submitting invalid values, and validation errors occur later at the operator level.Consider adding the pattern constraint to the schema:
enforcedBodySizeLimit: description: |- enforcedBodySizeLimit enforces a body size limit for Prometheus scraped metrics. If a scraped target's body response is larger than the limit, the scrape will fail. The following values are valid: a numeric value in Prometheus size format (such as "4MB", "1000", "1GB", "512KB", "100B") or the string `automatic`, which indicates that the limit will be automatically calculated based on cluster capacity. To specify no limit, omit this field. The value must match the following pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$ Minimum length is 1 character. Maximum length is 50 characters. maxLength: 50 minLength: 1 + pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$ type: stringAlternatively, if operator-side validation is intentional, clarify in the description that validation occurs at the operator level.
1854-1865: Add pattern validation for retention or clarify validation approach.The description (lines 1857-1859) states the value "must be specified using the following regular expression pattern:
[0-9]+(ms|s|m|h|d|w|y)", but the schema lacks apatternconstraint. This creates the same validation gap asenforcedBodySizeLimit.Consider adding the pattern constraint:
retention: description: |- retention defines the duration for which Prometheus retains data. This definition must be specified using the following regular expression pattern: `[0-9]+(ms|s|m|h|d|w|y)` (ms = milliseconds, s= seconds,m = minutes, h = hours, d = days, w = weeks, y = years). When omitted, this means the user has no opinion and the platform is left to choose reasonable defaults, which are subject to change over time. The default value is `15d`. maxLength: 20 minLength: 1 + pattern: ^[0-9]+(ms|s|m|h|d|w|y)$ type: stringAlternatively, if operator-side validation is intentional, clarify in the description that validation occurs at the operator level.
config/v1alpha1/types_cluster_monitoring.go (2)
437-450: ReconsiderlistType=atomicfor multi-owner scenarios.The
additionalAlertmanagerConfigsfield useslistType=atomic, which doesn't work well with server-side apply when multiple controllers or users want to independently manage their own entries. As noted in past review comments, there are valid use cases where ACM wants to manage its own entry while cluster admins manage others.Consider changing to
listType=mapwith an appropriate uniqueness key. However, this is complicated by the lack of a natural unique identifier. If entries must be truly independent with no uniqueness constraints, you may need to acceptatomicwith its limitations.Past discussion also noted that the current design lacks a clear way to enforce uniqueness when needed. Consider whether there's a combination of fields that could serve as a composite key, or whether the API should be restructured to make ownership clearer.
536-546: Add CEL validation for retention duration format.The field documentation describes a specific pattern
[0-9]+(ms|s|m|h|d|w|y)but there's no runtime validation to enforce this. Invalid values like "15x" or "abc" would be accepted at admission time and only fail when Prometheus tries to use them.Past review comments suggested adding CEL validation, which would be especially helpful here.
Add validation:
// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+(ms|s|m|h|d|w|y)$')",message="must be a duration like '15d', '6h', or '30m'"Note: Similar validation should be added for other duration/size fields like
remoteTimeout(line 718-723) and size-based fields likeretentionSizeif you want to enforce Kubernetes quantity format.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (11)
config/v1alpha1/types_cluster_monitoring.go(3 hunks)config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml(1 hunks)config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml(1 hunks)config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml(1 hunks)config/v1alpha1/zz_generated.deepcopy.go(8 hunks)config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml(1 hunks)config/v1alpha1/zz_generated.swagger_doc_generated.go(4 hunks)openapi/generated_openapi/zz_generated.openapi.go(13 hunks)payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml(1 hunks)payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml(1 hunks)payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yamlpayload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yamlpayload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yamlpayload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yamlopenapi/generated_openapi/zz_generated.openapi.goconfig/v1alpha1/types_cluster_monitoring.goconfig/v1alpha1/zz_generated.swagger_doc_generated.goconfig/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlconfig/v1alpha1/zz_generated.deepcopy.go
🧬 Code graph analysis (3)
config/v1alpha1/types_cluster_monitoring.go (1)
config/v1alpha1/zz_generated.swagger_doc_generated.go (9)
PrometheusK8sConfig(273-275)AdditionalAlertmanagerConfig(132-134)ExternalLabels(227-229)RemoteWriteSpec(299-301)ContainerResource(218-220)SecretKeySelector(309-311)TLSConfig(322-324)Label(237-239)RelabelConfig(287-289)
config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
config/v1alpha1/types_cluster_monitoring.go (8)
AdditionalAlertmanagerConfig(622-668)ExternalLabels(690-700)Label(671-687)PrometheusK8sConfig(436-604)RelabelConfig(733-784)RemoteWriteSpec(703-730)SecretKeySelector(820-835)TLSConfig(787-816)
config/v1alpha1/zz_generated.deepcopy.go (1)
config/v1alpha1/types_cluster_monitoring.go (9)
AdditionalAlertmanagerConfig(622-668)TLSConfig(787-816)PrometheusK8sConfig(436-604)ExternalLabels(690-700)Label(671-687)RemoteWriteSpec(703-730)ContainerResource(307-337)RelabelConfig(733-784)SecretKeySelector(820-835)
🔇 Additional comments (18)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (2)
1603-1620: Past issue resolved - logLevel description now correct.The previous review comment correctly identified that the logLevel description referred to "Alertmanager" instead of "Prometheus". Line 1605 now properly states: "logLevel defines the verbosity of logs emitted by Prometheus." This is the correct description for the prometheusK8sConfig field.
1287-2529: Schema structure looks correct and comprehensive.The new
prometheusK8sConfigfield adds extensive configuration options for the platform Prometheus instance. The schema includes:
- Proper validation rules (regex patterns, length constraints, enum values)
- Comprehensive TLS configuration with certificate verification options
- Appropriate resource constraints following Kubernetes conventions
- Clear, detailed documentation for each field
Since this is a generated file (
zz_generatedprefix), the schema correctness depends on the generation source. The generated output appears structurally sound with no obvious issues.Note: This is a generated file. Any future changes should be made in the source templates/generators to ensure consistency across regenerations.
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)
1603-1620: Past issue has been properly addressed.The logLevel field description now correctly references "Prometheus" instead of "Alertmanager" as noted in the previous review. The schema definition for this field and the entire
prometheusK8sConfigaddition appear well-formed with appropriate validation rules, enums, and constraints.Since this is a generated file (indicated by the
zz_generatedprefix), the schema structure and formatting are as expected.config/v1alpha1/zz_generated.swagger_doc_generated.go (2)
313-324: LGTM: TLSConfig swagger documentation is accurate.The swagger documentation for
TLSConfigcorrectly describes all fields includingcertificateVerificationwith proper enum values and default behavior.
121-134: LGTM: New swagger documentation entries are comprehensive.The newly added swagger documentation for
AdditionalAlertmanagerConfig,ExternalLabels,Label,RelabelConfig, andRemoteWriteSpecare well-documented with clear descriptions of fields, validation rules, and expected behaviors.Also applies to: 222-239, 277-289, 291-301
openapi/generated_openapi/zz_generated.openapi.go (9)
424-424: LGTM! New type registrations are properly added.All new v1alpha1 types (
AdditionalAlertmanagerConfig,ExternalLabels,Label,PrometheusK8sConfig,RelabelConfig,RemoteWriteSpec,SecretKeySelector,TLSConfig) are correctly registered in theGetOpenAPIDefinitionsmap with their corresponding schema functions.Also applies to: 442-442, 453-453, 465-465, 467-468, 472-472, 474-474
21227-21302: LGTM! AdditionalAlertmanagerConfig schema is well-defined.The schema properly defines all fields with clear descriptions, appropriate required fields (
apiVersion,staticConfigs), correct type references, and proper Kubernetes list annotations.
21918-21924: LGTM! prometheusK8sConfig field properly integrated.The new field is correctly added to
ClusterMonitoringSpecwith a comprehensive description, appropriate optional semantics, and dependencies list properly updated to include the new type.Also applies to: 21936-21936
22032-22068: LGTM! ExternalLabels schema is correctly structured.The schema properly defines a list-map of labels with appropriate Kubernetes annotations (
x-kubernetes-list-type: map, keyed bykey), clear constraints in the description, and correct dependencies.
22462-22488: LGTM! Label schema is simple and well-defined.The schema correctly models a key/value label pair with both fields required, clear validation constraints documented in descriptions, and no unnecessary complexity.
22930-23119: Previous issue resolved. PrometheusK8sConfig schema looks excellent.The
logLevelfield description (line 22972) now correctly references "logs emitted by Prometheus" rather than "Alertmanager", resolving the issue flagged in the previous review.The overall schema is comprehensive and well-structured with:
- Appropriate Kubernetes list annotations (atomic, map with proper keys)
- Clear descriptions explaining defaults, constraints, and behaviors
- Correct type references and dependencies
- Proper field definitions for all Prometheus configuration aspects
23149-23272: LGTM! RelabelConfig and RemoteWriteSpec schemas are well-defined.Both schemas are properly structured with:
- RelabelConfig: Comprehensive action descriptions, appropriate required field (
action), proper list-type annotation forsourceLabels- RemoteWriteSpec: Clear structure with
urlas required, correct reference toRelabelConfigfor write relabeling, proper dependencies
23368-23399: LGTM! SecretKeySelector schema is correctly defined.The schema properly models secret key selection with both fields required, appropriate
x-kubernetes-map-type: atomicannotation for atomic updates, and clear validation constraints in descriptions.
23431-23479: LGTM! TLSConfig schema is properly structured.The schema correctly defines TLS configuration with appropriate optional fields, proper references to
SecretKeySelectorfor certificates and keys, clear descriptions of verification policies, and correct dependencies.config/v1alpha1/types_cluster_monitoring.go (3)
818-835: LGTM - SecretKeySelector implementation looks correct.The
SecretKeySelectortype properly:
- Marks both
nameandkeyas required- Validates the secret name as a DNS subdomain
- Enforces length constraints on both fields
- Addresses past review feedback about removing the problematic
optionalfield
732-784: RelabelConfig documentation is comprehensive.The
RelabelConfigtype now provides clear documentation for:
- What happens when fields are omitted (defaults are specified)
- The semantic difference between omitting a field vs setting it to empty string (lines 767-769)
- Valid values and constraints for each field
This addresses multiple past review comments asking for clarity on field behavior.
Note: The documented defaults (
;for separator,(.*)for regex,$1for replacement) will need to be implemented in the controller/operator code since CRD validation cannot set these defaults for pointer types.
92-104: Documentation incorrectly mentions "remote read" capability.The field documentation states this configuration is for "remote write/read" but the actual
PrometheusK8sConfigstruct (lines 509-518) only exposes aremoteWritefield. There is no remote read configuration available.This was flagged in past review comments but not yet addressed.
Either remove the "/read" reference or explicitly state that only remote write is currently supported:
- // - External integrations (remote write, additional alertmanagers) + // - External integrations (remote write to long-term storage, additional alertmanagers)Likely an incorrect or invalid review comment.
config/v1alpha1/zz_generated.deepcopy.go (1)
14-39: LGTM - Autogenerated deepcopy methods are correct.The autogenerated deepcopy implementations properly handle:
New type support: All new types introduced in this PR have corresponding
DeepCopyIntoandDeepCopymethods:
AdditionalAlertmanagerConfig(lines 14-39)ExternalLabels(lines 460-479)Label(lines 734-748)PrometheusK8sConfig(lines 994-1061)RelabelConfig(lines 1089-1128)RemoteWriteSpec(lines 1130-1156)SecretKeySelector(lines 1216-1230)TLSConfig(lines 1253-1270)Correct field propagation:
ClusterMonitoringSpec.DeepCopyInto(line 394) properly callsin.PrometheusK8sConfig.DeepCopyInto(&out.PrometheusK8sConfig)to deep copy the new field.Proper handling of complex fields:
- Slices are allocated with correct length and elements are copied
- Maps are allocated with correct capacity and entries are copied
- Pointers are allocated and dereferenced properly
- Nested structs call their own
DeepCopyIntomethodsThe code generation appears to have correctly processed all the new API types.
Also applies to: 394-394, 460-479, 734-748, 994-1061, 1089-1128, 1130-1156, 1216-1230, 1253-1270
| // enforcedBodySizeLimit enforces a body size limit for Prometheus scraped metrics. If a scraped | ||
| // target's body response is larger than the limit, the scrape will fail. | ||
| // The following values are valid: | ||
| // a numeric value in Prometheus size format (such as "4MB", "1000", "1GB", "512KB", "100B") | ||
| // or the string `automatic`, which indicates that the limit will be | ||
| // automatically calculated based on cluster capacity. | ||
| // To specify no limit, omit this field. | ||
| // The value must match the following pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$ | ||
| // Minimum length is 1 character. | ||
| // Maximum length is 50 characters. | ||
| // +kubebuilder:validation:MinLength=1 | ||
| // +kubebuilder:validation:MaxLength=50 | ||
| // +optional | ||
| EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation contradicts validation for empty values.
The field comment states "To specify no limit, omit this field" which is correct, but there's still confusion about whether empty string is valid. The +kubebuilder:validation:MinLength=1 marker means empty string is not allowed, yet some readers might interpret "omit" to mean either nil or empty string.
Clarify that omission (nil) is the only way to specify "no limit":
// To specify no limit, omit this field.
+ // Note: Empty string ("") is not allowed; omission means the field is not set at all.
// The value must match the following pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$Additionally, past review comments suggested adding CEL validation for the pattern rather than just documenting it. Consider adding:
// +kubebuilder:validation:XValidation:rule="self.matches('^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$')",message="must be 'automatic' or a size like '4MB', '512KB', '100B'"🤖 Prompt for AI Agents
In config/v1alpha1/types_cluster_monitoring.go around lines 451-464, the field
comment is ambiguous about "omit this field" vs empty string and the MinLength=1
validator disallows empty string; update the comment to explicitly state that
only omission (nil) denotes "no limit" and that an empty string is not allowed,
and add a kubebuilder XValidation CEL rule to enforce the documented pattern
(e.g. self == null || self.matches('^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$')) with
a clear message like "must be 'automatic' or a size like '4MB', '512KB', '100B'
or omitted to indicate no limit", ensuring it complements the existing
MinLength/MaxLength markers.
| type AlertmanagerAPIVersion string | ||
|
|
||
| const ( | ||
| AlertmanagerAPIVersionV2 AlertmanagerAPIVersion = "v2" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove misleading "default" language from required apiVersion field.
The apiVersion field (lines 623-627) is marked as +required, meaning users must explicitly provide it. However, past documentation suggested it has a "default value" which is contradictory.
Since the field is required and currently only allows one value (v2), the concept of a "default" doesn't apply. The current enum design is likely future-proofing for when additional API versions might be supported.
The documentation should simply state:
// apiVersion defines the Alertmanager API version to target.
- // Allowed values: "v2". "v1" is no longer supported.
+ // Currently, only "v2" is supported. "v1" is no longer supported.
// +kubebuilder:validation:Enum=v2Alternatively, if you want v2 to be the implied default for backward compatibility in the future, consider making the field optional with a default:
// +kubebuilder:default=v2
// +optionalAlso applies to: 623-627
🤖 Prompt for AI Agents
In config/v1alpha1/types_cluster_monitoring.go around lines 623-627, remove any
wording in the field comment that claims the apiVersion has a "default" value
since the field is annotated +required and users must supply it; update the doc
comment to simply state the accepted enum values (currently "v2") and that the
field is required, or alternatively if you want v2 to be an implied default
change the field to optional and add +kubebuilder:default=v2 and +optional
instead (do not leave contradictory "default" language).
| // pathPrefix defines an optional URL path prefix to prepend to the Alertmanager API endpoints. | ||
| // For example, if your Alertmanager is behind a reverse proxy at "/alertmanager/", | ||
| // set this to "/alertmanager" so requests go to "/alertmanager/api/v1/alerts" instead of "/api/v1/alerts". | ||
| // This is commonly needed when Alertmanager is deployed behind ingress controllers or load balancers. | ||
| // +kubebuilder:validation:MaxLength=255 | ||
| // +kubebuilder:validation:MinLength=1 | ||
| // +optional | ||
| PathPrefix string `json:"pathPrefix,omitempty"` | ||
| // scheme defines the URL scheme to use when communicating with Alertmanager | ||
| // instances. | ||
| // Possible values are `HTTP` or `HTTPS`. | ||
| // When omitted, defaults to `HTTP`. | ||
| // +kubebuilder:validation:Enum=HTTP;HTTPS | ||
| // +kubebuilder:default=HTTP | ||
| // +optional | ||
| Scheme AlertmanagerScheme `json:"scheme,omitempty"` | ||
| // staticConfigs is a list of statically configured Alertmanager endpoints in the form |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix documentation inconsistencies in AdditionalAlertmanagerConfig.
Two issues here:
-
PathPrefix example references wrong API version: Line 635 mentions
/api/v1/alertsbut theapiVersionfield only allowsv2. The example should reference v2 endpoints. -
Scheme field marked as both required and having a default: Line 644 states "When omitted, defaults to
HTTP" but line 646 marks it as+required, meaning it cannot be omitted.
Apply these fixes:
// For example, if your Alertmanager is behind a reverse proxy at "/alertmanager/",
- // set this to "/alertmanager" so requests go to "/alertmanager/api/v1/alerts" instead of "/api/v1/alerts".
+ // set this to "/alertmanager" so requests go to "/alertmanager/api/v2/alerts" instead of "/api/v2/alerts".For the scheme field, either:
- Remove
+requiredand keep the default behavior, OR - Remove the "When omitted, defaults to
HTTP" text and keep it required
Based on the +kubebuilder:default=HTTP marker, it seems the intent is to have a default, so removing +required would be more appropriate.
🤖 Prompt for AI Agents
In config/v1alpha1/types_cluster_monitoring.go around lines 633-649, update the
PathPrefix example to reference v2 endpoints (e.g., use
"/alertmanager/api/v2/alerts" instead of "/api/v1/alerts") to match the
apiVersion allowed, and remove the +kubebuilder:validation:Required marker from
the Scheme field so it can be omitted and rely on the existing
+kubebuilder:default=HTTP and the comment "When omitted, defaults to `HTTP`".
| scheme: | ||
| default: HTTP | ||
| description: |- | ||
| scheme defines the URL scheme to use when communicating with Alertmanager | ||
| instances. | ||
| Possible values are `HTTP` or `HTTPS`. | ||
| When omitted, defaults to `HTTP`. | ||
| enum: | ||
| - HTTP | ||
| - HTTPS | ||
| type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for the scheme field definition in the alertmanager configuration types
find config/v1alpha1 -name "*.go" | head -20Repository: openshift/api
Length of output: 446
🏁 Script executed:
#!/bin/bash
# Look for alertmanager-related struct definitions and scheme defaults
rg -n "AlertmanagerConfig|scheme" config/v1alpha1/ --type go -A 5 -B 5 | head -100Repository: openshift/api
Length of output: 15409
🏁 Script executed:
#!/bin/bash
# Search more specifically for the scheme field in alertmanager structs
rg -n "Scheme.*string|scheme.*HTTP" config/v1alpha1/ --type goRepository: openshift/api
Length of output: 372
🏁 Script executed:
#!/bin/bash
# Extract the AdditionalAlertmanagerConfig struct definition and surrounding context
sed -n '500,750p' config/v1alpha1/types_cluster_monitoring.go | cat -nRepository: openshift/api
Length of output: 14320
🏁 Script executed:
#!/bin/bash
# Look for the complete AlertmanagerScheme type and AdditionalAlertmanagerConfig definition
rg -n -A 30 "type AdditionalAlertmanagerConfig struct" config/v1alpha1/types_cluster_monitoring.goRepository: openshift/api
Length of output: 1878
🏁 Script executed:
#!/bin/bash
# Check for any comments or documentation about the scheme field default
rg -n -B 5 -A 5 "scheme.*HTTP|HTTP.*scheme" config/v1alpha1/types_cluster_monitoring.goRepository: openshift/api
Length of output: 39
Change the default scheme from HTTP to HTTPS in the source type definition.
The scheme field defaults to HTTP in the AdditionalAlertmanagerConfig struct (line 646 of config/v1alpha1/types_cluster_monitoring.go via the +kubebuilder:default=HTTP tag). This means alert data will be transmitted unencrypted by default when connecting to additional Alertmanager instances. Since alerts can contain sensitive operational information, defaulting to unencrypted transmission degrades security posture.
Update the source Go type definition to default to HTTPS instead. Change +kubebuilder:default=HTTP to +kubebuilder:default=HTTPS in config/v1alpha1/types_cluster_monitoring.go, then regenerate the manifest.
🤖 Prompt for AI Agents
In config/v1alpha1/types_cluster_monitoring.go around line 646 and in the
generated manifest
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
(lines ~1366-1376), update the kubebuilder default on the
AdditionalAlertmanagerConfig.scheme from +kubebuilder:default=HTTP to
+kubebuilder:default=HTTPS in the Go source (types_cluster_monitoring.go), then
run the codegen/CRD manifest generation step used by the repo to regenerate the
ClusterMonitoringConfig.yaml so the CRD default in the manifest reflects HTTPS.
Addresses feedback from PR reviewers and linting issues. Signed-off-by: Daniel Mellado <dmellado@fedoraproject.org>
0643f0d to
99db558
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)
1366-1376: Security concern: Default scheme should be HTTPS, not HTTP.This was previously flagged. Alertmanager connections will transmit alert data (potentially sensitive operational information) over unencrypted HTTP by default. Update the source Go type to use
+kubebuilder:default=HTTPSand regenerate.config/v1alpha1/types_cluster_monitoring.go (2)
288-291: Update LogLevel type documentation to be component-agnostic.The
LogLeveltype is used by bothAlertmanagerCustomConfig(line 174) andPrometheusK8sConfig(line 481), but the documentation specifically mentions "Alertmanager". This creates confusion when users reference the type documentation in the Prometheus context.Apply this diff to make the documentation more generic:
-// LogLevel defines the verbosity of logs emitted by Alertmanager. +// LogLevel defines the verbosity of logs emitted by monitoring components. // Valid values are Error, Warn, Info and Debug.
633-640: Fix API version in pathPrefix example.The example at line 635 references
/api/v1/alerts, but theapiVersionfield only allowsv2. The example should reference the v2 endpoint to be consistent with the API constraints.Apply this diff:
// pathPrefix defines an optional URL path prefix to prepend to the Alertmanager API endpoints. // For example, if your Alertmanager is behind a reverse proxy at "/alertmanager/", - // set this to "/alertmanager" so requests go to "/alertmanager/api/v1/alerts" instead of "/api/v1/alerts". + // set this to "/alertmanager" so requests go to "/alertmanager/api/v2/alerts" instead of "/api/v2/alerts". // This is commonly needed when Alertmanager is deployed behind ingress controllers or load balancers.
🧹 Nitpick comments (11)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)
392-420: Consider quoting resource quantities for consistency.The resource quantity values (e.g.,
100m,500m,128Mi,512Mi) are currently unquoted. While most YAML parsers handle these correctly, explicitly quoting them aligns with Kubernetes conventions and ensures consistent parsing across different tools.Apply this diff to quote the resource quantities:
prometheusK8sConfig: resources: - name: cpu - request: 100m - limit: 500m + request: "100m" + limit: "500m" - name: memory - request: 128Mi - limit: 512Mi + request: "128Mi" + limit: "512Mi" expected: | apiVersion: config.openshift.io/v1alpha1 kind: ClusterMonitoring spec: userDefined: mode: "Disabled" prometheusK8sConfig: resources: - name: cpu - request: 100m - limit: 500m + request: "100m" + limit: "500m" - name: memory - request: 128Mi - limit: 512Mi + request: "128Mi" + limit: "512Mi"config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (4)
1382-1390: staticConfigs regex allows invalid port numbers.The regex
'^[a-zA-Z0-9.-]+:[0-9]+$'permits port numbers like0or99999which are invalid. Consider tightening to validate port range (1-65535) or rely on the operator to validate at runtime.
1539-1553: Missing pattern validation for enforcedBodySizeLimit.The description documents the pattern
^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$but nopattern:constraint enforces it. Invalid values likeinvalidor5MB5would be accepted by the schema and only fail at runtime.+ pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$ maxLength: 50 minLength: 1 type: string
1855-1866: Missing pattern validation for retention.The description specifies the format
[0-9]+(ms|s|m|h|d|w|y)but no schemapattern:enforces it. Add pattern validation to catch invalid values at admission time:+ pattern: ^[0-9]+(ms|s|m|h|d|w|y)$ maxLength: 20 minLength: 1 type: string
1635-1654: queryLogFile validation doesn't enforce /dev/ path restrictions.The description states only
/dev/stderr,/dev/stdout, and/dev/nullare supported, but the CEL rule permits any/dev/*path. Users could specify/dev/randomwhich would pass validation but fail at runtime.Consider adding validation:
x-kubernetes-validations: - message: must be an absolute path starting with '/' or a simple filename without '/' rule: self.startsWith('/') || !self.contains('/') - message: only /dev/stderr, /dev/stdout, and /dev/null are supported under /dev/ rule: "!self.startsWith('/dev/') || self in ['/dev/stderr', '/dev/stdout', '/dev/null']"payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (5)
1391-1397: Add pattern validation for Go duration format.The
timeoutfield accepts duration strings but lacks validation. Users can set invalid values like"30x"or"abc"that will fail at runtime.Add a CEL validation rule to enforce Go duration format:
timeout: description: |- timeout defines the timeout value used when sending alerts. The value must be a valid Go time.Duration string (e.g. 30s, 5m, 1h). maxLength: 20 minLength: 2 type: string + x-kubernetes-validations: + - message: must be a valid Go duration (e.g., 30s, 5m, 1h) + rule: self.matches('^[0-9]+(ns|us|µs|ms|s|m|h)$')
1673-1679: Add pattern validation for Go duration format.The
remoteTimeoutfield accepts duration strings but lacks validation. Users can set invalid values that will fail at runtime.Add a CEL validation rule to enforce Go duration format:
remoteTimeout: description: |- remoteTimeout is the timeout for requests to the remote write endpoint. When omitted, the default is 30s. maxLength: 20 minLength: 2 type: string + x-kubernetes-validations: + - message: must be a valid Go duration (e.g., 30s, 5m, 1h) + rule: self.matches('^[0-9]+(ns|us|µs|ms|s|m|h)$')
1855-1866: Add pattern validation for retention duration format.The
retentionfield documentation specifies the pattern[0-9]+(ms|s|m|h|d|w|y)but no validation enforces it. Users can set invalid values like"15days"instead of"15d".Add a CEL validation rule to enforce the documented pattern:
retention: description: |- retention defines the duration for which Prometheus retains data. This definition must be specified using the following regular expression pattern: `[0-9]+(ms|s|m|h|d|w|y)` (ms = milliseconds, s= seconds,m = minutes, h = hours, d = days, w = weeks, y = years). When omitted, this means the user has no opinion and the platform is left to choose reasonable defaults, which are subject to change over time. The default value is `15d`. maxLength: 20 minLength: 1 type: string + x-kubernetes-validations: + - message: must match pattern [0-9]+(ms|s|m|h|d|w|y), e.g., 15d, 2w, 6m + rule: self.matches('^[0-9]+(ms|s|m|h|d|w|y)$')
1867-1874: Add pattern validation for Kubernetes quantity format.The
retentionSizefield accepts Kubernetes quantity formats (e.g.,Mi,Gi,Ti) but lacks validation. Users can set invalid values like"5 GB"(space not allowed) or"100gigabytes".Add the standard Kubernetes quantity pattern validation:
retentionSize: description: |- retentionSize specifies the maximum volume of persistent storage that Prometheus uses for data blocks and the write-ahead log (WAL). Acceptable values use standard Kubernetes resource quantity formats, such as `Mi`, `Gi`, `Ti`, etc. When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time. The default is no storage size limit is enforced and Prometheus will use the available storage capacity of the PersistentVolume. maxLength: 20 type: string + pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$ + x-kubernetes-validations: + - message: must be a valid Kubernetes quantity (e.g., 5Gi, 100Mi) + rule: isQuantity(self)
1635-1654: Add validation to prevent unsupported /dev/ paths.The description states that writing to
/dev/paths other than/dev/stderr,/dev/stdout, or/dev/nullis not supported, but the validation only checks if the path is absolute or a simple filename. Users can set unsupported paths like/dev/randomor/dev/zero.Add a CEL validation to prevent unsupported
/dev/paths:queryLogFile: description: |- queryLogFile specifies the file to which PromQL queries are logged. This setting can be either a filename, in which case the queries are saved to an `emptyDir` volume at `/var/log/prometheus`, or a full path to a location where an `emptyDir` volume will be mounted and the queries saved. Writing to `/dev/stderr`, `/dev/stdout` or `/dev/null` is supported, but writing to any other `/dev/` path is not supported. Relative paths are also not supported. By default, PromQL queries are not logged. Must be an absolute path starting with `/` or a simple filename without path separators. Must be between 1 and 255 characters in length. maxLength: 255 minLength: 1 type: string x-kubernetes-validations: - message: must be an absolute path starting with '/' or a simple filename without '/' rule: self.startsWith('/') || !self.contains('/') + - message: /dev/ paths other than /dev/stderr, /dev/stdout, or /dev/null are not supported + rule: '!self.startsWith(''/dev/'') || self in [''/dev/stderr'', ''/dev/stdout'', ''/dev/null'']'config/v1alpha1/types_cluster_monitoring.go (1)
596-603: Consider using a custom type instead of core/v1.PersistentVolumeClaim.Using
*v1.PersistentVolumeClaimdirectly limits the ability to enforce admission-time validations specific to this use case. OpenShift APIs typically prefer custom reference types that allow for more granular validation.Based on past review feedback (comment at lines 603-603), there is precedent in the openshift/api repository for wrapping persistent volume references in custom types. This would enable:
- More specific validation rules for this context
- Better documentation tailored to Prometheus usage
- Ability to restrict which PVC fields are relevant
Example from insights API:
api/insights/v1alpha2/types_insights.go
Lines 118 to 121 in 8a46f74
// persistentVolume is an optional field that specifies the PersistentVolume that will be used to store the Insights data archive. // The PersistentVolume must be created in the openshift-insights namespace. // +optional PersistentVolume *PersistentVolumeConfig `json:"persistentVolume,omitempty"`
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (12)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml(1 hunks)config/v1alpha1/types_cluster_monitoring.go(3 hunks)config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml(1 hunks)config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml(1 hunks)config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml(1 hunks)config/v1alpha1/zz_generated.deepcopy.go(8 hunks)config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml(1 hunks)config/v1alpha1/zz_generated.swagger_doc_generated.go(4 hunks)openapi/generated_openapi/zz_generated.openapi.go(13 hunks)payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml(1 hunks)payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml(1 hunks)payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlconfig/v1alpha1/zz_generated.swagger_doc_generated.gopayload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlpayload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yamlopenapi/generated_openapi/zz_generated.openapi.goconfig/v1alpha1/types_cluster_monitoring.gopayload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yamlconfig/v1alpha1/zz_generated.deepcopy.go
🧬 Code graph analysis (1)
config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
config/v1alpha1/types_cluster_monitoring.go (8)
AdditionalAlertmanagerConfig(622-668)ExternalLabels(690-700)Label(671-687)PrometheusK8sConfig(436-604)RelabelConfig(733-784)RemoteWriteSpec(703-730)SecretKeySelector(820-835)TLSConfig(787-816)
🔇 Additional comments (17)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (2)
354-370: LGTM!The test correctly uses the capitalized enum value "Info" for
logLevel, matching the CRD's allowed values (Error, Warn, Info, Debug). This addresses the previous review comment about case sensitivity.
371-778: Excellent test coverage!The test suite comprehensively validates PrometheusK8sConfig behavior:
- Valid configurations for all major fields (nodeSelector, tolerations, topologySpreadConstraints, additionalAlertmanagerConfigs, remoteWrite, externalLabels)
- Appropriate rejection tests for boundary violations (empty objects, exceeding maxItems limits, violating minItems constraints)
- Clear, actionable error messages in all rejection scenarios
The tests align well with the CRD validation rules and provide strong coverage for the new PrometheusK8sConfig API surface.
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (2)
1603-1620: LGTM!The
logLeveldescription correctly refers to "Prometheus" rather than "Alertmanager", addressing the previous review comment. The field definition properly specifies the allowed enum values (Error, Warn, Info, Debug) and provides clear documentation about each log level's behavior.
1287-2529: Well-structured CRD schema for PrometheusK8sConfig.The generated OpenAPI schema is comprehensive and follows Kubernetes API conventions:
- Proper validation constraints (maxItems: 10 for arrays, length limits, enum values, patterns)
- Correct list-type annotations (
map,atomic,set) for appropriate array behavior- Standard Kubernetes types reused where applicable (PersistentVolumeClaim, Toleration, TopologySpreadConstraint)
- Clear field descriptions explaining purpose, defaults, and allowed values
- Appropriate minProperties: 1 constraint to prevent empty configuration objects
The schema provides a robust foundation for the PrometheusK8sConfig API.
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)
1287-2529: LGTM! Comprehensive and well-structured Prometheus configuration schema.The
prometheusK8sConfigfield addition is well-designed with:
- Clear, detailed field descriptions explaining purpose and defaults
- Comprehensive validation rules (patterns, length constraints, enums)
- Proper security options (TLS configuration, bearer token authentication)
- Reasonable constraints (max 10 items for most lists)
- Consistent use of Kubernetes API conventions
Previous review concerns (enum case, field documentation) appear to have been properly addressed based on the commit markers in past reviews.
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (2)
1287-1300: Well-documented API surface.The prometheusK8sConfig field and its nested properties are thoroughly documented with clear descriptions, use cases, and behavioral expectations. The documentation improvements from previous reviews (collectionProfile enum values, logLevel component references) have been successfully addressed.
1300-1301: Appropriate constraints protect against configuration explosion.The API design includes sensible limits (e.g.,
maxItems: 10for additionalAlertmanagerConfigs, remoteWrite endpoints, and resource entries) and requires meaningful configuration viaminProperties: 1. These constraints prevent performance degradation from overly complex configurations while remaining flexible for most production use cases.Also applies to: 1522-1523, 1774-1778, 1849-1854
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)
1287-1300: New prometheusK8sConfig API structure looks well-designed.The field organization follows established patterns from
alertmanagerConfigandmetricsServerConfig. Good use of list-type maps with appropriate keys for uniqueness constraints, consistent validation rules, and comprehensive documentation. The minProperties constraint ensures meaningful configurations.config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (2)
1603-1620: Previous issue resolved.The
logLeveldescription now correctly references "Prometheus" instead of "Alertmanager". The field documentation is accurate and complete.
1287-2529: LGTM! Well-structured generated schema.The
prometheusK8sConfigschema is comprehensive and properly validated with:
- Appropriate validation constraints (length limits, patterns, enums)
- Secure defaults (e.g., certificate verification enabled)
- Clear documentation for all fields
- Proper Kubernetes CRD conventions
Since this is a generated file, the schema appears correctly generated from the API types.
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)
1287-2529: PrometheusK8sConfig CRD block looks consistent with Go types and validation markersThe
prometheusK8sConfigschema (includingadditionalAlertmanagerConfigs,externalLabels,remoteWrite, TLS/secret selectors, resources, retention, scheduling knobs, and PVC template) aligns with the referenced Go types and kubebuilder validations. List semantics, enums, and x-kubernetes validations all look correct and coherent with the intended API surface. I don't see issues here.config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
121-201: Swagger docs for PrometheusK8sConfig and related types are consistent and accurateThe newly added SwaggerDoc maps for
AdditionalAlertmanagerConfig,ExternalLabels,Label,PrometheusK8sConfig,RelabelConfig,RemoteWriteSpec,SecretKeySelector, andTLSConfig, plus theClusterMonitoringSpec.prometheusK8sConfigentry, all line up with the Go type comments and the CRD schema (fields, enums, and behavior). This should give consumers clear, correct API documentation without introducing inconsistencies.Also applies to: 222-325
openapi/generated_openapi/zz_generated.openapi.go (4)
424-474: LGTM: Schema function registrations are correct.All new v1alpha1 monitoring types (AdditionalAlertmanagerConfig, ExternalLabels, Label, PrometheusK8sConfig, RelabelConfig, RemoteWriteSpec, SecretKeySelector, TLSConfig) are properly registered in the GetOpenAPIDefinitions map with consistent naming and alphabetical ordering.
21227-21302: LGTM: AdditionalAlertmanagerConfig and ClusterMonitoringSpec changes are well-structured.The schema definition for AdditionalAlertmanagerConfig properly specifies required fields (apiVersion, staticConfigs), uses appropriate list-type annotations, and correctly declares dependencies on SecretKeySelector and TLSConfig. The addition of prometheusK8sConfig to ClusterMonitoringSpec follows existing patterns and includes comprehensive documentation.
Also applies to: 21918-21936
22032-22068: LGTM: ExternalLabels, Label, and PrometheusK8sConfig schemas are correct.The schemas are well-structured with proper list-map annotations, detailed descriptions, and correct dependencies. The previously flagged logLevel description inconsistency (line 22972) has been resolved—it now correctly refers to "logs emitted by Prometheus" rather than Alertmanager.
Based on past review comments, the logLevel description issue was addressed in commit 0643f0d.
Also applies to: 22462-22488, 22930-23119
23149-23272: LGTM: RelabelConfig, RemoteWriteSpec, SecretKeySelector, and TLSConfig schemas are secure and well-defined.All schemas follow consistent patterns with proper annotations. SecretKeySelector is correctly marked as atomic (x-kubernetes-map-type: atomic), and TLSConfig appropriately uses SecretKeySelector references for sensitive fields (ca, cert, key), ensuring secure handling of credentials.
Also applies to: 23368-23399, 23431-23479
config/v1alpha1/types_cluster_monitoring.go (1)
437-450: Reconsider list type for multi-owner scenarios.The
additionalAlertmanagerConfigsfield useslistType=atomic, which prevents multiple owners from managing separate entries via server-side apply. Past review discussions (everettraven, simonpasquier) highlighted that enabling multiple owners is important - for example, ACM wanting to manage its own entries while cluster admins manage others.However, there's no clear unique key for
listType=mapsince:
apiVersionis an enum with only one value (v2)staticConfigscan vary for the same logical AlertmanagerConsider whether the current atomic list type adequately supports the intended multi-owner use cases, or if a different approach (such as introducing a required
namefield as a unique identifier) would better serve users.Based on past review discussions at lines 450-450 in previous comments.
| scheme: | ||
| default: HTTP | ||
| description: |- | ||
| scheme defines the URL scheme to use when communicating with Alertmanager | ||
| instances. | ||
| Possible values are `HTTP` or `HTTPS`. | ||
| When omitted, defaults to `HTTP`. | ||
| enum: | ||
| - HTTP | ||
| - HTTPS | ||
| type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reconsider HTTP as the default scheme for Alertmanager connections.
The scheme field defaults to HTTP for connections to additional Alertmanager instances. In production monitoring environments, alerts may contain sensitive information (e.g., error messages, system details), and communication with alert endpoints should use encrypted channels by default.
Consider defaulting to HTTPS instead, or at minimum, add prominent documentation warning that HTTP transmits alerts and credentials in cleartext.
🤖 Prompt for AI Agents
In
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
around lines 1366-1376, the scheme default is set to HTTP; change the default to
HTTPS and update the description text to state that the default is HTTPS and
that HTTPS encrypts alert payloads and credentials; retain the enum values
(HTTP, HTTPS) but ensure the textual docs include a prominent warning that using
HTTP transmits alerts and credentials in cleartext and should only be used for
testing or within trusted networks.
| maxLength: 255 | ||
| type: string | ||
| x-kubernetes-validations: | ||
| - message: must be in the format 'host:port' (e.g., 'alertmanager.example.com:9093') | ||
| rule: self.matches('^[a-zA-Z0-9.-]+:[0-9]+$') | ||
| maxItems: 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strengthen host:port validation to prevent invalid hostnames.
The regex ^[a-zA-Z0-9.-]+:[0-9]+$ permits invalid hostnames that start or end with hyphens or dots (e.g., -example.com:9093 or example-.com:9093). While Kubernetes' own validation may catch this downstream, tightening the pattern here would fail-fast with clearer error messages.
Consider a pattern like ^[a-zA-Z0-9]([a-zA-Z0-9.-]*[a-zA-Z0-9])?:[0-9]+$ to ensure hostnames start and end with alphanumeric characters.
🤖 Prompt for AI Agents
In
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
around lines 1383-1388, the existing host:port regex allows hostnames that begin
or end with hyphens or dots; replace the rule with a stricter pattern that
enforces the hostname to start and end with an alphanumeric character (for
example by using a pattern equivalent to
^[a-zA-Z0-9](?:[a-zA-Z0-9.-]*[a-zA-Z0-9])?:[0-9]+$) so hostnames like
-example.com or example-.com are rejected while preserving the port validation.
Ensure the new pattern is placed in the x-kubernetes-validations.rule field and
the validation message remains descriptive.
|
@marioferh: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
everettraven
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another handful of comments after another pass.
Just a note that because this PR has been around a while and has many comments that things are getting a bit more difficult to keep track of here.
It may be worth it to open a new PR for this change and essentially start the review cycle fresh so we can keep better track of the review comments.
I'll leave that up to you though.
| // By default, no additional Alertmanager instances are configured. | ||
| // Maximum of 10 additional Alertmanager configurations can be specified. | ||
| // When omitted, no additional Alertmanager instances are configured (default behavior). | ||
| // When set to an empty array [], the behavior is the same as omitting the field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, you should not allow setting [] then. Where possible we try to have only one way to "say" something.
In this case, because setting both [] and omitting the field altogether mean the same thing we should just make it so that omitting the field is the only way to signal that you do not want any additional alertmanager instances configured.
You can do this by setting the min items to 1 for this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll add +kubebuilder:validation:MinItems=1 to enforce that omission is the only way to signal "no additional alertmanagers". This removes the ambiguity between [] and field omission.
| // +optional | ||
| // +kubebuilder:validation:MaxItems=10 | ||
| // +listType=atomic | ||
| AdditionalAlertmanagerConfigs []AdditionalAlertmanagerConfig `json:"additionalAlertmanagerConfigs,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: naming, prometheusConfig makes sense to me - although that looks like maybe it is associated with a different thread.
@danielmellado It looks like this thread still needs to be addressed by updating the list type from atomic to map and setting the appropriate uniqueness keys.
| // +kubebuilder:validation:MinLength=1 | ||
| // +kubebuilder:validation:MaxLength=50 | ||
| // +optional | ||
| EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are going to be continuing with the size format approach as opposed to using explicit units, yes.
| } | ||
|
|
||
| // ExternalLabels represents labels to be added to time series and alerts. | ||
| type ExternalLabels struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - I missed this on my last review round.
This does look funky. I'd expect that we probably just want externalLabels to be typed as a slice of Labels instead of this nesting.
| // Writing to `/dev/stderr`, `/dev/stdout` or `/dev/null` is supported, but | ||
| // writing to any other `/dev/` path is not supported. Relative paths are | ||
| // also not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't appear to be enforced by validations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - added CEL validation to enforce the /dev/ path restriction: !self.startsWith('/dev/') || self in ['/dev/stdout', '/dev/stderr', '/dev/null']. This ensures only the documented /dev/ paths are accepted while rejecting arbitrary /dev/ paths.
| // +optional | ||
| // +kubebuilder:validation:MaxLength=10 | ||
| Separator *string `json:"separator,omitempty"` | ||
| // regex is the regular expression to match against the concatenated source label values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't hurt to add this.
Is there a reasonable way for us to ensure that the regular expression only uses characters that are valid in an RE2 regular expression?
| // +optional | ||
| // +kubebuilder:validation:MaxLength=63 | ||
| TargetLabel *string `json:"targetLabel,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should ensure that is appropriately enforced through validations.
| // +optional | ||
| // +kubebuilder:validation:Enum=Verify;SkipVerify | ||
| // +kubebuilder:default=Verify | ||
| CertificateVerification string `json:"certificateVerification,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we typically encourage the use of a type alias for enums
| // SecretKeySelector selects a key of a Secret. | ||
| // +structType=atomic | ||
| type SecretKeySelector struct { | ||
| // name is the name of the secret in the same namespace to select from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "same" namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified that secrets are referenced from the openshift-monitoring namespace (where the platform monitoring stack is deployed).
| // +kubebuilder:validation:MaxLength=253 | ||
| // +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="must be a valid secret name (lowercase alphanumeric characters, '-' or '.', start and end with alphanumeric)" | ||
| Name string `json:"name,omitempty"` | ||
| // key is the key of the secret to select from. Must be a valid secret key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes a "valid secret key"? Are there formatting constraints that can be enforced here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added validation ;) Secret keys must match ^[a-zA-Z0-9._-]+$ (alphanumeric, -, _, or .). This is the standard Kubernetes secret key format.
|
I'll deff do that as it's getting hard to me as well but my colleague told
me that he was pointed out not to do that. If that's ok for you I'll reply
to your comments then put up a fresh one so it's clearer. Thanks!
El mar, 16 dic 2025, 20:39, Bryce Palmer ***@***.***>
escribió:
… ***@***.**** commented on this pull request.
Another handful of comments after another pass.
Just a note that because this PR has been around a while and has many
comments that things are getting a bit more difficult to keep track of here.
It may be worth it to open a new PR for this change and essentially start
the review cycle fresh so we can keep better track of the review comments.
I'll leave that up to you though.
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> @@ -416,6 +429,444 @@ type MetricsServerConfig struct {
TopologySpreadConstraints []v1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"`
}
+// PrometheusK8sConfig provides configuration options for the Prometheus instance
+// Use this configuration to control
+// Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations.
+// +kubebuilder:validation:MinProperties=1
+type PrometheusK8sConfig struct {
+ // additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from
+ // the Prometheus component. This is useful for organizations that need to:
+ // - Send alerts to external monitoring systems (like PagerDuty, Slack, or custom webhooks)
+ // - Route different types of alerts to different teams or systems
+ // - Integrate with existing enterprise alerting infrastructure
+ // - Maintain separate alert routing for compliance or organizational requirements
+ // By default, no additional Alertmanager instances are configured.
+ // Maximum of 10 additional Alertmanager configurations can be specified.
+ // When omitted, no additional Alertmanager instances are configured (default behavior).
+ // When set to an empty array [], the behavior is the same as omitting the field.
In this case, you should not allow setting [] then. Where possible we try
to have only one way to "say" something.
In this case, because setting both [] and omitting the field altogether
mean the same thing we should just make it so that omitting the field is
the only way to signal that you do not want any additional alertmanager
instances configured.
You can do this by setting the min items to 1 for this field.
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // Writing to `/dev/stderr`, `/dev/stdout` or `/dev/null` is supported, but
+ // writing to any other `/dev/` path is not supported. Relative paths are
+ // also not supported.
This doesn't appear to be enforced by validations
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // The current default value is `kubernetes.io/os: linux`.
+ // Maximum of 10 node selector key-value pairs can be specified.
+ // +optional
+ // +kubebuilder:validation:MinProperties=1
+ // +kubebuilder:validation:MaxProperties=10
+ NodeSelector map[string]string `json:"nodeSelector,omitempty"`
+ // queryLogFile specifies the file to which PromQL queries are logged.
+ // This setting can be either a filename, in which
+ // case the queries are saved to an `emptyDir` volume
+ // at `/var/log/prometheus`, or a full path to a location where
+ // an `emptyDir` volume will be mounted and the queries saved.
+ // Writing to `/dev/stderr`, `/dev/stdout` or `/dev/null` is supported, but
+ // writing to any other `/dev/` path is not supported. Relative paths are
+ // also not supported.
+ // By default, PromQL queries are not logged.
+ // Must be an absolute path starting with `/` or a simple filename without path separators.
What is a "simple filename"? Can this have a file extension?
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // Must be between 1 and 255 characters in length.
+ // +optional
+ // +kubebuilder:validation:MinLength=1
+ // +kubebuilder:validation:MaxLength=255
+ // +kubebuilder:validation:XValidation:rule="self.startsWith('/') || !self.contains('/')",message="must be an absolute path starting with '/' or a simple filename without '/'"
+ QueryLogFile string `json:"queryLogFile,omitempty"`
+ // remoteWrite defines the remote write configuration, including URL, authentication, and relabeling settings.
+ // Remote write allows Prometheus to send metrics it collects to external long-term storage systems.
+ // Maximum of 10 remote write configurations can be specified.
+ // Each entry must have a unique URL.
+ // When omitted, no remote write endpoints are configured.
+ // +kubebuilder:validation:MaxItems=10
+ // +listType=map
+ // +listMapKey=url
+ // +optional
+ RemoteWrite []RemoteWriteSpec `json:"remoteWrite,omitempty"`
If there is not a difference between the empty value [] and this field
being omitted, please add minimum length of 1
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // +kubebuilder:validation:Enum=v2
+ // +required
+ APIVersion AlertmanagerAPIVersion `json:"apiVersion,omitempty"`
+ // bearerToken defines the secret reference containing the bearer token
+ // to use when authenticating to Alertmanager.
+ // When omitted, no bearer token authentication is used.
+ // +optional
+ BearerToken SecretKeySelector `json:"bearerToken,omitempty,omitzero"`
+ // pathPrefix defines an optional URL path prefix to prepend to the Alertmanager API endpoints.
+ // For example, if your Alertmanager is behind a reverse proxy at "/alertmanager/",
+ // set this to "/alertmanager" so requests go to "/alertmanager/api/v1/alerts" instead of "/api/v1/alerts".
+ // This is commonly needed when Alertmanager is deployed behind ingress controllers or load balancers.
+ // +kubebuilder:validation:MaxLength=255
+ // +kubebuilder:validation:MinLength=1
+ // +optional
+ PathPrefix string `json:"pathPrefix,omitempty"`
Looks like this field also needs formatting validations to ensure that it
is a valid path prefix.
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // When omitted, no name is assigned.
+ // Must be at most 63 characters in length when specified.
+ // +optional
+ // +kubebuilder:validation:MaxLength=63
+ Name *string `json:"name,omitempty"`
+ // remoteTimeout is the timeout for requests to the remote write endpoint.
+ // When omitted, the default is 30s.
+ // +optional
+ // +kubebuilder:validation:MaxLength=20
+ // +kubebuilder:validation:MinLength=2
+ RemoteTimeout string `json:"remoteTimeout,omitempty"`
+ // writeRelabelConfigs is a list of relabeling rules to apply before sending data to the remote endpoint.
+ // Maximum of 10 relabeling rules can be specified.
+ // +optional
+ // +kubebuilder:validation:MaxItems=10
+ // +listType=atomic
Is atomic actually what you want here?
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // If a label does not exist, an empty string ("") is used in its place.
+ // The values of these labels are joined together using the configured separator,
+ // and the resulting string is then matched against the regular expression for
+ // the replace, keep, or drop actions.
+ // When omitted, the rule operates without extracting source labels (useful for actions like labelmap).
+ // Maximum of 10 source labels can be specified, each up to 63 characters.
+ // +optional
+ // +kubebuilder:validation:MaxItems=10
+ // +kubebuilder:validation:items:MaxLength=63
+ // +listType=set
+ SourceLabels []string `json:"sourceLabels,omitempty"`
+ // separator is the separator used to join source label values.
+ // When omitted, defaults to ";" (semicolon).
+ // Must be at most 10 characters in length.
+ // +optional
+ // +kubebuilder:validation:MaxLength=10
Do you have examples where a separator greater than a single character
makes sense here?
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // +kubebuilder:validation:MaxLength=253
+ // +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="must be a valid DNS subdomain (lowercase alphanumeric characters, '-' or '.', start and end with alphanumeric)"
+ ServerName string `json:"serverName,omitempty"`
+ // certificateVerification determines the policy for TLS certificate verification.
+ // Allowed values are "Verify" (performs certificate verification, secure) and "SkipVerify" (skips verification, insecure).
+ // When omitted, defaults to "Verify" (secure certificate verification is performed).
+ // +optional
+ // +kubebuilder:validation:Enum=Verify;SkipVerify
+ // +kubebuilder:default=Verify
+ CertificateVerification string `json:"certificateVerification,omitempty"`
+}
+
+// SecretKeySelector selects a key of a Secret.
+// +structType=atomic
+type SecretKeySelector struct {
+ // name is the name of the secret in the same namespace to select from.
What is the "same" namespace?
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // +kubebuilder:default=Verify
+ CertificateVerification string `json:"certificateVerification,omitempty"`
+}
+
+// SecretKeySelector selects a key of a Secret.
+// +structType=atomic
+type SecretKeySelector struct {
+ // name is the name of the secret in the same namespace to select from.
+ // Must be a valid Kubernetes secret name (lowercase alphanumeric, '-' or '.', start/end with alphanumeric).
+ // Must be between 1 and 253 characters in length.
+ // +required
+ // +kubebuilder:validation:MinLength=1
+ // +kubebuilder:validation:MaxLength=253
+ // +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="must be a valid secret name (lowercase alphanumeric characters, '-' or '.', start and end with alphanumeric)"
+ Name string `json:"name,omitempty"`
+ // key is the key of the secret to select from. Must be a valid secret key.
What makes a "valid secret key"? Are there formatting constraints that can
be enforced here?
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> @@ -416,6 +422,292 @@ type MetricsServerConfig struct {
TopologySpreadConstraints []v1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"`
}
+// PrometheusK8sConfig provides configuration options for the Prometheus instance
+// Use this configuration to control
+// Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations.
+// +kubebuilder:validation:MinProperties=1
+type PrometheusK8sConfig struct {
+ // additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from
+ // the Prometheus component. By default, no additional Alertmanager instances are configured.
+ // +optional
+ // +kubebuilder:validation:MaxItems=10
+ // +listType=atomic
+ AdditionalAlertmanagerConfigs []AdditionalAlertmanagerConfig `json:"additionalAlertmanagerConfigs,omitempty"`
Re: naming, prometheusConfig makes sense to me - although that looks like
maybe it is associated with a different thread.
@danielmellado <https://github.com/danielmellado> It looks like this
thread still needs to be addressed by updating the list type from atomic to
map and setting the appropriate uniqueness keys.
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // +optional
+ // +kubebuilder:validation:MaxLength=255
+ PathPrefix *string `json:"pathPrefix,omitempty"`
+ // scheme defines the URL scheme to use when communicating with Alertmanager
+ // instances.
+ // Possible values are `http` or `https`. The default value is `http`.
+ // +optional
+ // +kubebuilder:validation:MaxLength=10
+ Scheme *string `json:"scheme,omitempty"`
+ // staticConfigs is a list of statically configured Alertmanager endpoints in the form
+ // of `<hosts>:<port>`.
+ // +optional
+ // +kubebuilder:validation:MaxItems=10
+ // +kubebuilder:validation:items:MaxLength=255
+ // +listType=set
+ StaticConfigs []string `json:"staticConfigs,omitempty"`
If this needs to be a valid hostname with ports, there might be existing
utilities out there for a CEL validation.
If not, you may be able to leverage the URL CEL helpers to validate that
it is a valid URL and the hostname and port are valid.
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // +kubebuilder:validation:MaxLength=10
+ Scheme *string `json:"scheme,omitempty"`
+ // staticConfigs is a list of statically configured Alertmanager endpoints in the form
+ // of `<hosts>:<port>`.
+ // +optional
+ // +kubebuilder:validation:MaxItems=10
+ // +kubebuilder:validation:items:MaxLength=255
+ // +listType=set
+ StaticConfigs []string `json:"staticConfigs,omitempty"`
+ // timeout defines the timeout value used when sending alerts.
+ // +optional
+ // +kubebuilder:validation:MaxLength=20
+ Timeout *string `json:"timeout,omitempty"`
+ // tlsConfig defines the TLS settings to use for Alertmanager connections.
+ // +optional
+ TLSConfig *TLSConfig `json:"tlsConfig,omitempty"`
We should not allow setting {} on this field then. The TLSConfig type
should have either at least one required field *or* have minimum
properties set to 1
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> +
+// ExternalLabels represents labels to be added to time series and alerts.
+type ExternalLabels struct {
+ // labels is a map of label names to label values.
+ // +required
+ Labels map[string]string `json:"labels,omitempty"`
+}
+
+// RemoteWriteSpec represents configuration for remote write endpoints.
+type RemoteWriteSpec struct {
+ // url is the URL of the remote write endpoint.
+ // +required
+ // +kubebuilder:validation:MaxLength=2048
+ URL *string `json:"url,omitempty"`
+ // name is the name of the remote write configuration.
+ // +optional
Is setting "" the same as omitting the field?
Prometheus has no naming constraints here? So setting something like
!@*&(^ would be a valid name? Should *we* allow that?
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> +// RemoteWriteSpec represents configuration for remote write endpoints.
+type RemoteWriteSpec struct {
+ // url is the URL of the remote write endpoint.
+ // +required
+ // +kubebuilder:validation:MaxLength=2048
+ URL *string `json:"url,omitempty"`
+ // name is the name of the remote write configuration.
+ // +optional
+ // +kubebuilder:validation:MaxLength=63
+ Name *string `json:"name,omitempty"`
+ // remoteTimeout is the timeout for requests to the remote write endpoint.
+ // +optional
+ // +kubebuilder:validation:MaxLength=20
+ RemoteTimeout *string `json:"remoteTimeout,omitempty"`
+ // writeRelabelConfigs is a list of relabeling rules to apply before sending data to the remote endpoint.
+ // +optional
Looks like this comment still needs to be addressed in the GoDoc.
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // +listType=atomic
+ WriteRelabelConfigs []RelabelConfig `json:"writeRelabelConfigs,omitempty"`
+}
+
+// RelabelConfig represents a relabeling rule.
+type RelabelConfig struct {
+ // sourceLabels specifies which labels to extract from each series for this relabeling rule.
+ // If a label does not exist, an empty string ("") is used in its place.
+ // The values of these labels are joined together using the configured separator,
+ // and the resulting string is then matched against the regular expression for
+ // the replace, keep, or drop actions.
+ // +optional
+ // +kubebuilder:validation:MaxItems=10
+ // +kubebuilder:validation:items:MaxLength=63
+ // +listType=set
+ SourceLabels []string `json:"sourceLabels,omitempty"`
Sorry, I should have been more clear - is it valid for me, as an end-user
writing this API, to set:
sourceLabels:
- ""
?
Would that just mean to source from an empty label if it exists?
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> +type RelabelConfig struct {
+ // sourceLabels specifies which labels to extract from each series for this relabeling rule.
+ // If a label does not exist, an empty string ("") is used in its place.
+ // The values of these labels are joined together using the configured separator,
+ // and the resulting string is then matched against the regular expression for
+ // the replace, keep, or drop actions.
+ // +optional
+ // +kubebuilder:validation:MaxItems=10
+ // +kubebuilder:validation:items:MaxLength=63
+ // +listType=set
+ SourceLabels []string `json:"sourceLabels,omitempty"`
+ // separator is the separator used to join source label values.
+ // +optional
+ // +kubebuilder:validation:MaxLength=10
+ Separator *string `json:"separator,omitempty"`
+ // regex is the regular expression to match against the concatenated source label values.
It wouldn't hurt to add this.
Is there a reasonable way for us to ensure that the regular expression
only uses characters that are valid in an RE2 regular expression?
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // +optional
+ // +kubebuilder:validation:MaxLength=63
+ TargetLabel *string `json:"targetLabel,omitempty"`
We should ensure that is appropriately enforced through validations.
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> +type RemoteWriteSpec struct {
+ // url is the URL of the remote write endpoint.
+ // +required
+ // +kubebuilder:validation:MaxLength=2048
+ // +kubebuilder:validation:MinLength=1
+ URL string `json:"url,omitempty"`
+ // name is the name of the remote write configuration.
+ // +optional
+ // +kubebuilder:validation:MaxLength=63
+ Name *string `json:"name,omitempty"`
+ // remoteTimeout is the timeout for requests to the remote write endpoint.
+ // When omitted, the default is 30s.
+ // +optional
+ // +kubebuilder:validation:MaxLength=20
+ // +kubebuilder:validation:MinLength=2
+ RemoteTimeout string `json:"remoteTimeout,omitempty"`
Just because the Prometheus configuration uses the Go Duration type does
not mean that you have to for your API.
You can have an internal translation layer.
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // +listType=atomic
+ AdditionalAlertmanagerConfigs []AdditionalAlertmanagerConfig `json:"additionalAlertmanagerConfigs,omitempty"`
+ // enforcedBodySizeLimit enforces a body size limit for Prometheus scraped metrics. If a scraped
+ // target's body response is larger than the limit, the scrape will fail.
+ // The following values are valid:
+ // a numeric value in Prometheus size format (such as "4MB", "1000", "1GB", "512KB", "100B")
+ // or the string `automatic`, which indicates that the limit will be
+ // automatically calculated based on cluster capacity.
+ // To specify no limit, omit this field.
+ // The value must match the following pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$
+ // Minimum length is 1 character.
+ // Maximum length is 50 characters.
+ // +kubebuilder:validation:MinLength=1
+ // +kubebuilder:validation:MaxLength=50
+ // +optional
+ EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"`
If you are going to be continuing with the size format approach as opposed
to using explicit units, yes.
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> +)
+
+// AdditionalAlertmanagerConfig represents configuration for additional Alertmanager instances.
+// The `AdditionalAlertmanagerConfig` resource defines settings for how a
+// component communicates with additional Alertmanager instances.
+type AdditionalAlertmanagerConfig struct {
+ // apiVersion defines the Alertmanager API version to target.
+ // Allowed values: "v2". "v1" is no longer supported.
+ // +kubebuilder:validation:Enum=v2
+ // +required
+ APIVersion AlertmanagerAPIVersion `json:"apiVersion,omitempty"`
+ // bearerToken defines the secret reference containing the bearer token
+ // to use when authenticating to Alertmanager.
+ // This is a custom type to allow for admission time validations.
+ // +optional
+ BearerToken SecretKeySelector `json:"bearerToken,omitempty,omitzero"`
For a situation like that, instead of making it free-form I would
encourage the use of a discriminated union
https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#discriminated-unions
Using a DU, you can ensure that only the authentication types that you
want to support are supported on OpenShift *and* you can create concrete
types with appropriate structures and references for each supported type.
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // key is the name of the label.
+ // The key must be a valid Prometheus label name, starting with a letter or underscore,
+ // followed by letters, digits, or underscores.
+ // +required
+ // +kubebuilder:validation:MaxLength=63
+ // +kubebuilder:validation:MinLength=1
+ Key string `json:"key,omitempty"`
+ // value is the value of the label.
+ // +required
+ // +kubebuilder:validation:MaxLength=63
+ // +kubebuilder:validation:MinLength=1
+ Value string `json:"value,omitempty"`
+}
+
+// ExternalLabels represents labels to be added to time series and alerts.
+type ExternalLabels struct {
Good catch - I missed this on my last review round.
This does look funky. I'd expect that we probably just want externalLabels
to be typed as a slice of Labels instead of this nesting.
------------------------------
In config/v1alpha1/types_cluster_monitoring.go
<#2463 (comment)>:
> + // key is the client key to use for TLS connections.
+ // +optional
+ Key SecretKeySelector `json:"key,omitempty,omitzero"`
+ // serverName is the server name to use for TLS connections.
+ // If specified, must be a valid DNS subdomain as per RFC 1123.
+ // +optional
+ // +kubebuilder:validation:MinLength=1
+ // +kubebuilder:validation:MaxLength=253
+ ServerName string `json:"serverName,omitempty"`
+ // certificateVerification determines the policy for TLS certificate verification.
+ // Allowed values are "Verify" (performs certificate verification, secure) and "SkipVerify" (skips verification, insecure).
+ // When omitted, defaults to "Verify" (secure certificate verification is performed).
+ // +optional
+ // +kubebuilder:validation:Enum=Verify;SkipVerify
+ // +kubebuilder:default=Verify
+ CertificateVerification string `json:"certificateVerification,omitempty"`
Yes, we typically encourage the use of a type alias for enums
—
Reply to this email directly, view it on GitHub
<#2463 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAVQ64OAGWVK3MGT7U6RD4CBNXTAVCNFSM6AAAAACFKBDI3SVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTKOBUGUYTAMZZGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Thanks @everettraven! I'll put up a new PR with the updates in the comments. Even GH API is being a bit sluggish and I'm starting to have a hard time getting to it due to it by default not loading more than 40 comments. There were a LOT before a took this xD |
| } | ||
|
|
||
| // RemoteWriteSpec represents configuration for remote write endpoints. | ||
| type RemoteWriteSpec struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We lack all the queue configuration which is another big struct.
Chatted offline with @danielmellado: I would even go one step further and split the PR into multiple ones, each PR focusing on a sub-part of the configuration which can reasonably be reviewed. For instance, remote write config only has dozens of fields. |
|
Totally +1 there, I *can't* even load the PR anymore and getting unicorns.
For now, I've put up #2630 addressing
yesterday's comments and replies so Bryce can reply and afterwards I'll
split it out.
…On Wed, Dec 17, 2025 at 12:11 PM Simon Pasquier ***@***.***> wrote:
*simonpasquier* left a comment (openshift/api#2463)
<#2463 (comment)>
Thanks @everettraven <https://github.com/everettraven>! I'll put up a new
PR with the updates in the comments. Even GH API is being a bit sluggish
and I'm starting to have a hard time getting to it due to it by default not
loading more than 40 comments. There were a LOT before a took this xD
Chatted offline with @danielmellado <https://github.com/danielmellado>: I
would even go one step further and split the PR into multiple ones, each PR
focusing on a sub-part of the configuration which can reasonably be
reviewed. For instance, remote write config only has dozens of fields.
—
Reply to this email directly, view it on GitHub
<#2463 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAVQ5V34IZCZYMKL7QJKL4CE25JAVCNFSM6AAAAACFKBDI3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMNRUHA3DCOBVHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
No description provided.