MON-4030: prometheus k8s config API #2463

marioferh · 2025-09-01T10:36:01Z

No description provided.

openshift-ci · 2025-09-01T10:36:05Z

Hello @marioferh! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

openshift-ci-robot · 2025-09-15T16:35:47Z

/test remaining-required

Scheduling tests matching the pipeline_run_if_changed parameter:
/test e2e-aws-ovn
/test e2e-aws-ovn-hypershift
/test e2e-aws-ovn-hypershift-conformance
/test e2e-aws-ovn-techpreview
/test e2e-aws-serial-1of2
/test e2e-aws-serial-2of2
/test e2e-aws-serial-techpreview-1of2
/test e2e-aws-serial-techpreview-2of2
/test e2e-azure
/test e2e-gcp
/test e2e-upgrade
/test e2e-upgrade-out-of-change

JoelSpeed · 2025-09-16T09:48:49Z

@marioferh You appear to have some conflict markers here? Can you PTAL and make sure the correct content is pushed.

Let me know when you want this API reviewed

marioferh · 2025-09-16T12:52:22Z

yep I've mixed the commit, let me fix it

everettraven · 2025-09-17T11:53:07Z

/assign

everettraven · 2025-09-17T14:07:59Z

config/v1alpha1/types_cluster_monitoring.go

+	// prometheusK8sConfig provides configuration options for the Prometheus instance
+	// Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations.
+	// prometheusK8sConfig is optional.
+	// When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.


Can we have this follow the existing pattern of related fields like the metricsServerConfig field?

Specifically, some of the questions I have from reading this as-is are:

What is "the Prometheus instance"?

Where does it run? Is it run by default?

Is Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations all the things I can configure? Why might I care, as a user, to configure these?

What is "the Prometheus instance"?
Prometheus instance is the pod that contains Prometheus.

Where does it run? Is it run by default?
By default there are at least one Prometheus pod in openshift-monitoring

Is Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations all the things I can configure? Why might I care, as a user, to configure these?
How to configure Prometheus is important and depends on the user's needs.
Some of the config:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/
https://prometheus.io/docs/prometheus/latest/storage/

Please include helpful information, for end-users that may read this API documentation, in the GoDoc here.

All of the information you included in your response here is helpful context in understanding what changes to this configuration impacts.

Please also include some information in the description as to why users may want to configure this field.

everettraven · 2025-09-17T14:11:09Z

config/v1alpha1/types_cluster_monitoring.go

+	// prometheusK8sConfig is optional.
+	// When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
+	// +optional
+	PrometheusK8sConfig PrometheusK8sConfig `json:"prometheusK8sConfig,omitempty,omitzero"`


Why do we need the K8s in the name here? Feels a bit redundant?

Maybe to differentiate from prometheusOperator, coming from config map. Fixed.

Yes, that's the reasoning, it was also a bit confusing to me when I first explored the API and the fields.

I don't think we have concretely decided on whether or not this should still be a separate root-level field in #2481 (comment) but if we do keep it separate, this should be named more clearly.

I'd agree that we don't need the k8s part.

everettraven · 2025-09-17T14:13:05Z

config/v1alpha1/types_cluster_monitoring.go

+	// additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from
+	// the Prometheus component. By default, no additional Alertmanager instances are configured.


As someone unfamiliar with configuring Prometheus, I don't really understand what any of this means. Why would I care to configure additional alertmanager instances that receive alerts from Prometheus?

AdditionalAlertmanagerConfig is used if the client has other Alertmanager and needs there the prometheus Alerts. Also to decouple alerts if needed.

To clarify, additionalAlertmanagerConfigs is an optional field that configures Prometheus to send alerts to this set of Alertmanager instances. This is useful because it decouples alerts from what?

Do not decouple alerts from nothing, sorry. It just sends the same alerts from Prometheus to another Alertmanager provided by the user.

Would something like this potentially clarify what this field does for end users?:

additionalAlertmanagerConfigs is an optional field used to configure how Prometheus communicates
with Alertmanager instances when sending alerts.

We should also include information as to what happens when this field is omitted in the GoDoc. Based on my current understanding, that would likely be something like:

When omitted, Prometheus will not send alerts to any Alertmanager instances.

Is this correct? Is there a default Alertmanager instance that will always be included in the configuration for Prometheus instances?

config/v1alpha1/types_cluster_monitoring.go

everettraven · 2025-09-17T14:15:43Z

config/v1alpha1/types_cluster_monitoring.go

+	// the Prometheus component. By default, no additional Alertmanager instances are configured.
+	// +optional
+	// +kubebuilder:validation:MaxItems=10
+	// +listType=atomic


Why atomic? Using atomic makes it so you can't have multiple-owner entries here, which doesn't seem like something unreasonable.

Would this be more appropriate as a listType=map?

correct. Thx

everettraven · 2025-09-17T15:03:56Z

config/v1alpha1/types_cluster_monitoring.go

+	// url is the URL of the remote write endpoint.
+	// +required
+	// +kubebuilder:validation:MaxLength=2048
+	URL *string `json:"url,omitempty"`


What other constraints can we put in place to ensure the URL is valid?

everettraven · 2025-09-17T15:04:20Z

config/v1alpha1/types_cluster_monitoring.go

+	// +kubebuilder:validation:MaxLength=2048
+	URL *string `json:"url,omitempty"`
+	// name is the name of the remote write configuration.
+	// +optional


What does it mean for this name to not be provided?

well, if you need to add and endpoint and you don't fill the name, data is not sent correctly.

So then this should be required?

This comment still stands.

Additionally, what other constraints are there on the format of the name?

Name is optional: if not specified, Prometheus will generate a unique one. There are no specific constraint on the name (Prometheus wise).

Since Prometheus has no specific constraints, I'm just enforcing MaxLenght here for sanity.

Is setting "" the same as omitting the field?

Prometheus has no naming constraints here? So setting something like !@*&(^ would be a valid name? Should we allow that?

Agreed. Added MinLength=1 to reject empty strings (omitting is the only way to let Prometheus auto-generate). Also added pattern validation restricting to alphanumeric, hyphens, and underscores to prevent garbage like !@*&(^.

So setting something like !@*&(^ would be a valid name?

Yes. Same as "☘"

Should we allow that?

I dunno

everettraven · 2025-09-17T15:04:47Z

config/v1alpha1/types_cluster_monitoring.go

+	// remoteTimeout is the timeout for requests to the remote write endpoint.
+	// +optional
+	// +kubebuilder:validation:MaxLength=20
+	RemoteTimeout *string `json:"remoteTimeout,omitempty"`


What does it mean for this to not be provided? what are the allowed values here?

everettraven · 2025-09-17T15:05:09Z

config/v1alpha1/types_cluster_monitoring.go

+	// +kubebuilder:validation:MaxLength=20
+	RemoteTimeout *string `json:"remoteTimeout,omitempty"`
+	// writeRelabelConfigs is a list of relabeling rules to apply before sending data to the remote endpoint.
+	// +optional


What does it mean for this to be omitted?

if is optional and you dont fill it, then no relabel is done.

Looks like this comment still needs to be addressed in the GoDoc.

Done. Added "When omitted, no relabeling is performed and all metrics are sent as-is." to the godoc.

everettraven · 2025-09-17T15:06:55Z

config/v1alpha1/types_cluster_monitoring.go

+type RelabelConfig struct {
+	// sourceLabels is a list of source label names.
+	// +optional
+	// +kubebuilder:validation:MaxItems=10
+	// +kubebuilder:validation:items:MaxLength=63
+	// +listType=set
+	SourceLabels []string `json:"sourceLabels,omitempty"`
+	// separator is the separator used to join source label values.
+	// +optional
+	// +kubebuilder:validation:MaxLength=10
+	Separator *string `json:"separator,omitempty"`
+	// regex is the regular expression to match against the concatenated source label values.
+	// +optional
+	// +kubebuilder:validation:MaxLength=1000
+	Regex *string `json:"regex,omitempty"`
+	// targetLabel is the target label name.
+	// +optional
+	// +kubebuilder:validation:MaxLength=63
+	TargetLabel *string `json:"targetLabel,omitempty"`
+	// replacement is the replacement value for the target label.
+	// +optional
+	// +kubebuilder:validation:MaxLength=255
+	Replacement *string `json:"replacement,omitempty"`
+	// action is the action to perform.
+	// +optional
+	// +kubebuilder:validation:MaxLength=20
+	Action *string `json:"action,omitempty"`
+}


Please elaborate in the godoc more on why a user may care to configure these fields, constraints, and what it means for them to be omitted.

As-is I don't really understand what any of these fields result in when configured.

that explanations are pretty similar to prometheus ones:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config

openshift-ci · 2025-09-17T15:09:24Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from everettraven. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

marioferh · 2025-09-18T18:45:55Z

Some comments addressed, I will continue tomorrow

openshift-ci-robot · 2025-09-23T07:37:17Z

@marioferh: This pull request references MON-4030 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

everettraven

Another round of comments

everettraven · 2025-09-23T12:34:02Z

config/v1alpha1/types_cluster_monitoring.go

+	// prometheusK8sConfig provides configuration options for the Prometheus instance, which is the pod running Prometheus in the cluster.
+	// By default, at least one Prometheus pod is deployed in the `openshift-monitoring` namespace to collect and store metrics for the platform.
+	//
+	// This field allows you to customize how Prometheus is deployed and operated, including:


Does this configure how all Prometheus instances are deployed and operated or just the default one?

same as above, I'll push an updated PR clarifying this ;)

everettraven · 2025-09-23T12:36:13Z

config/v1alpha1/types_cluster_monitoring.go

+	// For more information on Prometheus configuration, see:
+	//   https://prometheus.io/docs/prometheus/latest/configuration/configuration/
+	//   https://prometheus.io/docs/prometheus/latest/storage/


We generally avoid linking to documentation where possible because these docs can be updated and the references may no longer make sense for users.

I don't know that these links are necessary either. We should be appropriately documenting each field in the configuration here such that users can understand what they are configuring and why.

Agreed, removed the external Prometheus documentation links. Each child field now has comprehensive documentation

everettraven · 2025-09-23T12:44:19Z

config/v1alpha1/types_cluster_monitoring.go

+	// additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from
+	// the Prometheus component. By default, no additional Alertmanager instances are configured.


Would something like this potentially clarify what this field does for end users?:

additionalAlertmanagerConfigs is an optional field used to configure how Prometheus communicates
with Alertmanager instances when sending alerts.

We should also include information as to what happens when this field is omitted in the GoDoc. Based on my current understanding, that would likely be something like:

When omitted, Prometheus will not send alerts to any Alertmanager instances.

Is this correct? Is there a default Alertmanager instance that will always be included in the configuration for Prometheus instances?

everettraven · 2025-09-23T12:46:31Z

config/v1alpha1/types_cluster_monitoring.go

+	// +optional
+	// +kubebuilder:validation:MaxItems=10
+	// +listType=map
+	// +listMapKey=apiVersion


So this must be unique based on the apiVersion?

everettraven · 2025-09-23T12:54:19Z

config/v1alpha1/types_cluster_monitoring.go

+	// +kubebuilder:validation:MaxLength=50
+	// +kubebuilder:validation:Pattern=`^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$`
+	// +optional
+	EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"`


Does it actually make sense for this to be a string value? This makes it more complex to evaluate valid values.

For example, would it make sense for me to specify 3,000,000,000B here? The "integer" value there is higher than the maximum possible int32 value - is that problematic? What if I specified higher than the maximum possible int64 value in bytes - is that problematic?

Something like:

enforcedBodySizeLimit: value: 4000 units: KB

would probably be easier to put constraints on that forces end users to make reasonable decisions here.

Alternatively, we decide a reasonable base unit that all values provided by a user will use and name the field accordingly (i.e enforcedBodySizeLimitMB)

Additionally, would it make sense to nest this under something like a scrapeConfig field?

Looking at https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config there are a lot of options that we aren't yet exposing. Do you ever intend to expand the set of scraping-related configuration options a user can configure?

I think my comments here still stand and in general we prefer using an explicit unit name in the field.

See https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#units for reference. In OpenShift APIs we generally prefer the units in the field name approach.

Good point on the unit naming convention. Per the K8s API conventions, I'll rename enforcedBodySizeLimit to enforcedBodySizeLimitBytes and use an integer type instead of the string pattern.

Regarding nesting under scrapeConfig and expanding options we're not planning to expose the full Prometheus scrape config surface here. This API is intentionally scoped to mirror what's already exposed in the CMO ConfigMap (see cluster-monitoring-operator/pkg/manifests/types.go), not to be a 1:1 mapping of upstream Prometheus config. So keeping it flat should be fine.

everettraven · 2025-09-23T14:37:13Z

config/v1alpha1/types_cluster_monitoring.go

+	// replacement is the value against which a regex replace is performed if the
+	// regular expression matches. Regex capture groups are available.
+	// +optional
+	// +kubebuilder:validation:MaxLength=255
+	Replacement *string `json:"replacement,omitempty"`


What happens if I don't supply this value?

What is the difference between omitting and providing an empty string?

Addressed in latest commit

I don't think these questions have been addressed. I still don't understand what happens if I omit this field or set "".

You're right, I've now explicitly documented the difference in the godoc as there were just so many comments xD :

When omitted: Defaults to "$1" (the first regex capture group is used as the replacement value)

When set to "": Explicitly clears/empties the target label value

For example, with sourceLabels: ["env"] where env=production:

Omitted → targetLabel gets value "production" (via $1)

"" → targetLabel gets empty string ""

everettraven · 2025-09-23T14:37:45Z

config/v1alpha1/types_cluster_monitoring.go

+	// action is the action to perform.
+	// +optional
+	// +kubebuilder:validation:MaxLength=20
+	Action *string `json:"action,omitempty"`


What happens if I don't specify an action?

What are the allowed actions that I can specify?

everettraven · 2025-09-23T14:39:47Z

config/v1alpha1/types_cluster_monitoring.go

+	// +optional
+	// +kubebuilder:validation:Enum=Verify;InsecureSkipVerify
+	// +kubebuilder:default=Verify
+	InsecureSkipVerify string `json:"insecureSkipVerify,omitempty"`


insecureSkipVerify: Verify and insecureSkipVerify: InsecureSkipVerify read weird and could be confusing for users. Consider using a new field name that better reflects what this field is used for.

everettraven · 2025-09-23T14:40:30Z

config/v1alpha1/types_cluster_monitoring.go

+	// Allowed values are "Verify" (default, secure) and "InsecureSkipVerify" (skip certificate verification, insecure).
+	// By default, certificate verification is performed ("Verify").
+	// +optional
+	// +kubebuilder:validation:Enum=Verify;InsecureSkipVerify


When using an enum, create a type alias with constants for the allowed values.

everettraven · 2025-09-23T14:42:47Z

config/v1alpha1/types_cluster_monitoring.go

+// LocalObjectReference contains enough information to let you locate the
+// referenced object inside the same namespace.
+// ---
+// New uses of this type are discouraged because of difficulty describing its usage when embedded in APIs.
+//  1. Invalid usage help.  It is impossible to add specific help for individual usage.  In most embedded usages, there are particular
+//     restrictions like, "must refer only to types A and B" or "UID not honored" or "name must be restricted".
+//     Those cannot be well described when embedded.
+//  2. Inconsistent validation.  Because the usages are different, the validation rules are different by usage, which makes it hard for users to predict what will happen.
+//  3. We cannot easily change it.  Because this type is embedded in many locations, updates to this type
+//     will affect numerous schemas.  Don't make new APIs embed an underspecified API type they do not control.
+//
+// Instead of using this type, create a locally provided and used type that is well-focused on your reference.
+// For example, ServiceReferences for admission registration: https://github.com/kubernetes/api/blob/release-1.17/admissionregistration/v1/types.go#L533 .
+// +structType=atomic
+type LocalObjectReference struct {
+	// name of the referent.
+	// This field is effectively required, but due to backwards compatibility is
+	// allowed to be empty. Instances of this type with an empty value here are
+	// almost certainly wrong.
+	// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
+	// +optional
+	// +default=""
+	// +kubebuilder:default=""
+	// +kubebuilder:validation:MaxLength=253
+	// TODO: Drop `kubebuilder:default` when controller-gen doesn't need it https://github.com/kubernetes-sigs/kubebuilder/issues/3896.
+	Name *string `json:"name,omitempty" protobuf:"bytes,1,opt,name=name"`
+}
+
+// SecretKeySelector selects a key of a Secret.
+// +structType=atomic
+type SecretKeySelector struct {
+	// The name of the secret in the pod's namespace to select from.
+	LocalObjectReference `json:",inline" protobuf:"bytes,1,opt,name=localObjectReference"`
+	// key of the secret to select from.  Must be a valid secret key.
+	// +required
+	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:MaxLength=253
+	Key string `json:"key,omitempty" protobuf:"bytes,2,opt,name=key"`
+	// optional specifies whether the Secret or its key must be defined
+	// +optional
+	// +kubebuilder:validation:Enum=true;false
+	Optional string `json:"optional,omitempty" protobuf:"varint,3,opt,name=optional"`
+}


How is this different than the existing SecretKeyReference type you've created earlier in this file?

Agree here, I think we can just use one. While SecretKeyReference is used for reference and SecretKeySelector is for TLS certificates I think we may want to mimic Prom Op and use just v1.SecretKeySelector

We prefer not importing types for this because they generally don't implement the same granularity of validations that we can impose on a custom type.

coderabbitai · 2025-12-02T09:50:02Z

Walkthrough

Adds an optional PrometheusK8sConfig to ClusterMonitoringSpec, introduces many new public types/enums for Prometheus/Alertmanager/remote-write, regenerates deepcopy/swagger/OpenAPI code and CRD payloads (with duplicate prometheusK8sConfig insertions), and extends onCreate validation tests for PrometheusK8sConfig.

Changes

Cohort / File(s)	Summary
Type definitions `config/v1alpha1/types_cluster_monitoring.go`	Adds `PrometheusK8sConfig` field to `ClusterMonitoringSpec` and many exported types/enums (e.g., `PrometheusK8sConfig`, `AdditionalAlertmanagerConfig`, `ExternalLabels`, `Label`, `RemoteWriteSpec`, `RelabelConfig`, `TLSConfig`, `SecretKeySelector`, `SecretRequirement`, `RelabelAction`, `CollectionProfile`, `AlertmanagerAPIVersion`, `AlertmanagerScheme`) with comments and validations.
Generated deepcopy `config/v1alpha1/zz_generated.deepcopy.go`	Adds DeepCopyInto/DeepCopy implementations for all new types and updates `ClusterMonitoringSpec.DeepCopyInto` to deep-copy `PrometheusK8sConfig`.
Generated swagger docs `config/v1alpha1/zz_generated.swagger_doc_generated.go`	Adds SwaggerDoc maps and `SwaggerDoc()` methods for the new types and updates `ClusterMonitoringSpec` swagger entry to include `prometheusK8sConfig`.
Generated CRD manifests (config) `config/v1alpha1/zz_generated.crd-manifests/...clustermonitorings-*.crd.yaml`, `config/v1alpha1/zz_generated.featuregated-crd-manifests/.../ClusterMonitoringConfig.yaml`	Inserts comprehensive `prometheusK8sConfig` property into ClusterMonitoring v1alpha1 CRD schemas with extensive nested fields/validation (additionalAlertmanagerConfigs, collectionProfile, enforcedBodySizeLimit, externalLabels, logLevel, nodeSelector, queryLogFile, remoteWrite, resources, retention/retentionSize, tolerations, topologySpreadConstraints, volumeClaimTemplate, TLS/secret refs, etc.). Duplicate insertions of the same block observed.
Payload CRD manifests `payload-manifests/crds/...clustermonitorings-*.crd.yaml`	Adds the same `prometheusK8sConfig` property to payload CRD manifests (`CustomNoUpgrade`, `DevPreviewNoUpgrade`, `TechPreviewNoUpgrade`) with matching nested schema; duplicates present in some files.
OpenAPI / generated OpenAPI JSON `openapi/generated_openapi/zz_generated.openapi.go`, `openapi/openapi.json`	Adds OpenAPI definitions for new v1alpha1 types and updates schema references so `ClusterMonitoringSpec` includes `prometheusK8sConfig` and related nested types.
Validation tests `config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml`	Adds onCreate tests covering valid `PrometheusK8sConfig` variants (minimal, nodeSelector, resources, tolerations, topologySpreadConstraints, additionalAlertmanagerConfigs, remoteWrite, externalLabels) and numerous rejection scenarios (empty object, exceeding max items, empty entries).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas needing extra attention:
- Duplicate prometheusK8sConfig blocks in generated and feature-gated CRD manifests.
- Consistency between type validations, OpenAPI definitions, and CRD constraints (minProperties, maxItems, enums, patterns).
- Correct deepcopy semantics for maps/slices/pointers in newly generated code.
- Swagger/OpenAPI doc entries for duplication or conflicting descriptions.
- Validation test coverage to ensure CRD schema and admission validation align.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented
The command is terminated due to an error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

danielmellado · 2025-12-02T09:50:39Z

@everettraven I've gotten through all the comments and took a look at most of the concerns, could you take another, fresh review at it? Thanks!

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (9)

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)
1634-1651: Incorrect component name in logLevel description (duplicate).

Line 1636 states "logLevel defines the verbosity of logs emitted by Alertmanager," but this field is under PrometheusK8sConfig and should refer to "Prometheus," not "Alertmanager." This appears to be copy-pasted from the alertmanagerConfig section.
                  logLevel:
                    description: |-
-                      logLevel defines the verbosity of logs emitted by Alertmanager.
+                      logLevel defines the verbosity of logs emitted by Prometheus.
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)
1634-1651: Fix logLevel description — should reference Prometheus, not Alertmanager.

The description at line 1636 incorrectly refers to "logs emitted by Alertmanager" when this field is part of prometheusK8sConfig and should refer to Prometheus. This appears to have been copied from the alertmanagerConfig.customConfig.logLevel field without updating the component name.
  logLevel:
    description: |-
-     logLevel defines the verbosity of logs emitted by Alertmanager.
+     logLevel defines the verbosity of logs emitted by Prometheus.
      This field allows users to control the amount and severity of logs generated, which can be useful
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)

1634-1651: ⚠️ Unresolved: logLevel description still references Alertmanager instead of Prometheus.

The description at Line 1635 states: "logLevel defines the verbosity of logs emitted by Alertmanager", but this field resides within prometheusK8sConfig, which configures Prometheus, not Alertmanager. This is a copy-paste error from the Alertmanager section that appears not to have been fixed.

Update the description to reference Prometheus instead. The source type definition in config/v1alpha1/types_cluster_monitoring.go should be corrected first, then the CRD regenerated.
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)
1634-1651: Resolve incorrect component reference in logLevel description (unresolved from previous review).

Line 1636 incorrectly states "logLevel defines the verbosity of logs emitted by Alertmanager". This field belongs to prometheusK8sConfig, so the description should refer to Prometheus, not Alertmanager. This issue was previously flagged but remains unresolved.

Apply this diff to fix the description:
  logLevel:
    description: |-
-     logLevel defines the verbosity of logs emitted by Alertmanager.
+     logLevel defines the verbosity of logs emitted by Prometheus.
      This field allows users to control the amount and severity of logs generated, which can be useful
      for debugging issues or reducing noise in production environments.
      Allowed values are Error, Warn, Info, and Debug.
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)
1634-1651: Fix copy-paste error: Replace "Alertmanager" with "Prometheus" in logLevel description.

The logLevel field description incorrectly references "Alertmanager" when this field is under prometheusK8sConfig and should reference "Prometheus". This appears to be a copy-paste error from the Alertmanager configuration section.

Apply this fix:
                  logLevel:
                    description: |-
-                     logLevel defines the verbosity of logs emitted by Alertmanager.
+                     logLevel defines the verbosity of logs emitted by Prometheus.
                      This field allows users to control the amount and severity of logs generated, which can be useful
                      for debugging issues or reducing noise in production environments.
                      Allowed values are Error, Warn, Info, and Debug.
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)

1634-1651: Fix logLevel description to reference Prometheus, not Alertmanager (same issue as earlier review).

This logLevel field is part of prometheusK8sConfig, but the description still says “logs emitted by Alertmanager.” It should describe Prometheus log verbosity instead, mirroring semantics but naming the correct component, and then regenerate the CRD.

config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

260-260: PrometheusK8sConfig.logLevel swagger text still references Alertmanager instead of Prometheus.

In map_PrometheusK8sConfig["logLevel"], the text says “logs emitted by Alertmanager” but this field belongs to PrometheusK8sConfig. Update the Go comment for PrometheusK8sConfig.LogLevel to reference Prometheus, then regenerate swagger so this entry matches.
config/v1alpha1/types_cluster_monitoring.go (2)
655-660: Documentation claims default value but field is required without a default.

The scheme field is marked +required with no +kubebuilder:default, yet line 657 states "The default value is http." Since users must explicitly provide a value, remove the misleading default claim.
-	// Possible values are `http` or `https`. The default value is `http`.
+	// Possible values are `http` or `https`.
100-100: Documentation mentions "remote write/read" but only remote write is supported.

The comment mentions "remote write/read" but the PrometheusK8sConfig struct only exposes RemoteWrite configuration—there is no RemoteRead field. Update the documentation to say "remote write" only, or clarify that remote read may be added in the future.
-	//   - External integrations (remote write/read, alerting, and scraping configuration)
+	//   - External integrations (remote write, alerting, and scraping configuration)

🧹 Nitpick comments (4)

config/v1alpha1/types_cluster_monitoring.go (4)
683-695: Missing validation for Prometheus label name format on key field.

The documentation states the key must be "a valid Prometheus label name, starting with a letter or underscore, followed by letters, digits, or underscores," but there's no CEL or pattern validation enforcing this. Invalid label names will be accepted at admission and fail at runtime.

Add validation to enforce the documented constraint:
 	// key is the name of the label.
 	// The key must be a valid Prometheus label name, starting with a letter or underscore,
 	// followed by letters, digits, or underscores.
 	// +required
 	// +kubebuilder:validation:MaxLength=63
 	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:XValidation:rule="self.matches('^[a-zA-Z_][a-zA-Z0-9_]*$')",message="key must be a valid Prometheus label name"
 	Key string `json:"key,omitempty"`
721-726: Missing format validation for remoteTimeout duration string.

The documentation states the value must be "a valid Go time.Duration string (e.g. 30s, 5m, 1h)" but there's no validation enforcing this format. Invalid values will be accepted at admission time.

Consider adding CEL validation to match duration patterns:
 	// remoteTimeout is the timeout for requests to the remote write endpoint.
 	// When omitted, the default is 30s.
 	// +optional
 	// +kubebuilder:validation:MaxLength=20
 	// +kubebuilder:validation:MinLength=2
+	// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+(ns|us|µs|ms|s|m|h)$')",message="remoteTimeout must be a valid duration (e.g. 30s, 5m, 1h)"
 	RemoteTimeout string `json:"remoteTimeout,omitempty"`
781-786: Consider adding DNS subdomain validation for serverName.

The documentation states serverName "must be a valid DNS subdomain as per RFC 1123" but there's no validation enforcing this. Consider adding CEL validation:
 	// serverName is the server name to use for TLS connections.
 	// If specified, must be a valid DNS subdomain as per RFC 1123.
 	// +optional
 	// +kubebuilder:validation:MinLength=1
 	// +kubebuilder:validation:MaxLength=253
+	// +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="serverName must be a valid DNS subdomain"
 	ServerName string `json:"serverName,omitempty"`
799-808: Add Kubernetes secret name validation to name field.

The name field references a Kubernetes Secret, which must follow DNS-1123 subdomain naming rules. Add validation to catch invalid names at admission time:
 	// name is the name of the secret in the same namespace to select from.
 	// +required
 	// +kubebuilder:validation:MinLength=1
 	// +kubebuilder:validation:MaxLength=253
+	// +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="name must be a valid Kubernetes secret name (DNS subdomain)"
 	Name string `json:"name,omitempty"`

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 8473dba and e748202.

📒 Files selected for processing (11)

config/v1alpha1/types_cluster_monitoring.go (3 hunks)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)
config/v1alpha1/zz_generated.deepcopy.go (8 hunks)
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1 hunks)
config/v1alpha1/zz_generated.swagger_doc_generated.go (4 hunks)
openapi/generated_openapi/zz_generated.openapi.go (13 hunks)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
openapi/generated_openapi/zz_generated.openapi.go
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.swagger_doc_generated.go
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
config/v1alpha1/types_cluster_monitoring.go
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.deepcopy.go

🧬 Code graph analysis (3)

config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

config/v1alpha1/types_cluster_monitoring.go (8)

AdditionalAlertmanagerConfig (636-679)

ExternalLabels (698-708)

Label (682-695)

PrometheusK8sConfig (447-618)

RelabelConfig (736-768)

RemoteWriteSpec (711-733)

SecretKeySelector (798-823)

TLSConfig (771-794)

config/v1alpha1/types_cluster_monitoring.go (2)

config/v1alpha1/zz_generated.swagger_doc_generated.go (9)

PrometheusK8sConfig (273-275)

AdditionalAlertmanagerConfig (132-134)

ExternalLabels (227-229)

RemoteWriteSpec (299-301)

ContainerResource (218-220)

SecretKeySelector (310-312)

TLSConfig (323-325)

Label (237-239)

RelabelConfig (287-289)

operator/v1/types.go (1)

LogLevel (94-94)

config/v1alpha1/zz_generated.deepcopy.go (1)

config/v1alpha1/types_cluster_monitoring.go (9)

AdditionalAlertmanagerConfig (636-679)

TLSConfig (771-794)

PrometheusK8sConfig (447-618)

ExternalLabels (698-708)

Label (682-695)

RemoteWriteSpec (711-733)

ContainerResource (318-348)

RelabelConfig (736-768)

SecretKeySelector (798-823)

🔇 Additional comments (13)

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)

1635-1636: Fix logLevel description to reference Prometheus, not Alertmanager.

The logLevel field description at lines 1635–1636 in the CRD file should reference Prometheus, not Alertmanager. Update the description, the corresponding Go doc comment in config/v1alpha1/types_cluster_monitoring.go, and regenerate the CRD.

config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

121-134: New swagger docs for Prometheus-related helper types look consistent with Go types.

The swagger maps for AdditionalAlertmanagerConfig, ExternalLabels, Label, PrometheusK8sConfig (aside from the logLevel wording), RelabelConfig, RemoteWriteSpec, SecretKeySelector, and TLSConfig closely mirror the struct fields and kubebuilder annotations in types_cluster_monitoring.go, including enums, list semantics, and validation constraints. No additional issues from a docs/API-shape perspective.

Also applies to: 222-239, 255-275, 277-301, 303-325

openapi/generated_openapi/zz_generated.openapi.go (9)

424-424: LGTM: Schema registrations correctly added.

The new v1alpha1 type registrations are properly formatted and follow the established pattern.

Also applies to: 442-442, 453-453, 465-465, 467-468, 472-472, 474-474

21918-21924: LGTM: PrometheusK8sConfig properly integrated.

The new prometheusK8sConfig field is correctly added to ClusterMonitoringSpec with comprehensive documentation and proper dependency declarations.

Also applies to: 21936-21936

22032-22068: Verify min/max label constraints are enforced.

Line 22049 documents "At least 1 label must be specified, with a maximum of 50 labels," but the schema lacks minItems: 1 and maxItems: 50 constraints. Confirm these limits are validated via CRD validation or admission logic.

22462-22488: LGTM: Label schema correctly defined.

The key/value structure is appropriate. Note that line 22471 describes a Prometheus label name pattern, but schema-level regex validation is absent—confirm this is validated elsewhere.

22930-23119: LGTM: PrometheusK8sConfig schema is comprehensive and well-structured.

This central configuration type correctly references all dependencies. Multiple descriptions document size limits (e.g., max 10 items for various arrays), but as with other types, ensure these are enforced via CRD validation rules or admission webhooks.

23149-23272: LGTM: RelabelConfig and RemoteWriteSpec schemas are correctly defined.

Both types have appropriate required fields and dependencies. Note that line 23206 lists valid actions for RelabelConfig without enum enforcement, and line 23253 documents a maximum without maxItems—consistent with the validation pattern throughout this API.

23368-23406: LGTM: SecretKeySelector follows Kubernetes patterns correctly.

The schema properly uses x-kubernetes-map-type: atomic to ensure atomicity of secret references, with appropriate required fields.

23438-23486: LGTM: TLSConfig schema is correctly structured.

All fields appropriately optional for flexible TLS configuration, with proper references to SecretKeySelector for sensitive data.

21227-21302: Verify that documented limits are enforced elsewhere.

The description states "Maximum of 10 endpoints can be specified," but the schema lacks a maxItems constraint. Confirm this limit is enforced via CRD validation rules or admission webhooks, or update the description if the limit is not actually enforced.

config/v1alpha1/types_cluster_monitoring.go (1)

620-631: Enum types are well-defined with proper validation.

The AlertmanagerAPIVersion, AlertmanagerScheme, RelabelAction, and CollectionProfile types have appropriate enum constraints and constant definitions. The lowercase values for RelabelAction align with Prometheus conventions.

Also applies to: 836-867

config/v1alpha1/zz_generated.deepcopy.go (1)

14-39: Autogenerated deepcopy implementations are correct.

The generated DeepCopyInto and DeepCopy methods for the new types correctly handle:

Slice allocation and element copying (including calling DeepCopyInto for complex elements)

Map allocation and key-value copying

Pointer field allocation and value copying

Nil checks for pointer and slice fields

The struct assignments for TLSConfig and SecretKeySelector are correct since they only contain value types (strings).

Also applies to: 460-479, 734-748, 994-1061, 1089-1156, 1216-1270

config/v1alpha1/types_cluster_monitoring.go

config/v1alpha1/zz_generated.swagger_doc_generated.go

coderabbitai · 2025-12-03T12:55:00Z

...ad-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml

+              prometheusK8sConfig:
+                description: |-
+                  prometheusK8sConfig provides configuration options for the Prometheus instance, which is the pod running Prometheus in the cluster.
+                  By default, at least one Prometheus pod is deployed in the `openshift-monitoring` namespace to collect and store metrics for the platform.
+
+                  This field allows you to customize how Prometheus is deployed and operated, including:
+                    - Deployment settings (such as replica count and update strategy)
+                    - Pod scheduling (node selectors, tolerations, affinity)
+                    - Resource allocation (CPU, memory, and storage requests/limits)
+                    - Retention policies (how long metrics are stored)
+                    - External integrations (remote write/read, alerting, and scraping configuration)
+
+                  Configuring Prometheus is important because it enables you to tailor monitoring to your cluster's needs, such as:
+                    - Ensuring high availability and reliability of monitoring data
+                    - Managing resource usage to fit your infrastructure
+                    - Controlling how long metrics are retained for compliance or troubleshooting
+                    - Integrating with external systems for alerting or long-term storage
+                    - Adjusting scraping and relabeling to match your workloads and security requirements
+
+                  For more information on Prometheus configuration, see:
+                    https://prometheus.io/docs/prometheus/latest/configuration/configuration/
+                    https://prometheus.io/docs/prometheus/latest/storage/
+
+                  This field is optional. When omitted, the platform chooses reasonable defaults, which may change over time.
+                minProperties: 1


⚠️ Potential issue | 🟠 Major

Align prometheusK8sConfig description with actual fields (no replicas/update-strategy/remote read/scrape config exposed).

The doc block claims control over replica count, update strategy, remote read, and scraping configuration, but the prometheusK8sConfig schema here only exposes alertmanager integrations, external labels, node selector/tolerations/topology spread, resources, retention, remoteWrite, and PVC. Either add the missing fields or update this description (and the upstream Go field comment) to only advertise what’s actually configurable, then regenerate the CRD.

🤖 Prompt for AI Agents

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml around lines 1287-1311: the prometheusK8sConfig description advertises control over replicas, update strategy, remote read and scraping configuration which are not present in the schema; either add those fields to the API or update the description to exactly list the exposed fields (alertmanager integrations, externalLabels, nodeSelector/tolerations/topologySpread, resources, retention, remoteWrite and PVC), then update the upstream Go field comment to match and regenerate the CRD (run the repo’s CRD generation/make target) so the YAML and comments stay in sync.

explained above

@danielmellado Thanks for the response! I don't see the explanation in this comment thread, so I may have missed it in another part of the PR discussion.

Could you clarify or point me to where this discrepancy was addressed? Specifically, the concern is that the prometheusK8sConfig description mentions:

"replica count and update strategy"

"remote read"

"scraping configuration"

...but these fields don't appear to be present in the actual schema (which only exposes additionalAlertmanagerConfigs, collectionProfile, enforcedBodySizeLimit, externalLabels, logLevel, nodeSelector, queryLogFile, remoteWrite, resources, retention/retentionSize, tolerations, topologySpreadConstraints, and volumeClaimTemplate).

If this was intentionally left as future-looking documentation or if there's a plan to add these fields later, that's totally fine—I just want to make sure I'm not missing something. Let me know if you'd like me to mark this as resolved or if there's additional context I should review!

everettraven · 2025-12-09T14:55:43Z

@danielmellado It looks like you've picked this work back up. As it has been sitting for quite some time, let me know when you are ready for another round of review.

danielmellado · 2025-12-09T14:57:43Z

Hi @everettraven as soon as I rebase it, I think it's ready for it, thanks!

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (11)

config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)
1634-1651: Fix logLevel description to reference Prometheus (not Alertmanager).

Line 1636 states "logs emitted by Alertmanager," but this field is in prometheusK8sConfig and should reference Prometheus. This appears to be a copy-paste error from the earlier alertmanagerConfig.customConfig.logLevel section (line 66–83).
  logLevel:
    description: |-
-     logLevel defines the verbosity of logs emitted by Alertmanager.
+     logLevel defines the verbosity of logs emitted by Prometheus.
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)
1634-1651: Incorrect component name in logLevel description.

Line 1636 states "defines the verbosity of logs emitted by Alertmanager," but this field is for PrometheusK8sConfig and should refer to Prometheus, not Alertmanager. This appears to be copy-pasted from the alertmanagerConfig section.
                  logLevel:
                    description: |-
-                      logLevel defines the verbosity of logs emitted by Alertmanager.
+                      logLevel defines the verbosity of logs emitted by Prometheus.
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)
1634-1651: Fix incorrect component reference in logLevel description (not addressed from previous review).

The logLevel field description (line 1636) still incorrectly states "logLevel defines the verbosity of logs emitted by Alertmanager". Since this field belongs to prometheusK8sConfig, it should refer to Prometheus instead. This was flagged in a previous review but the fix was not applied.

Apply this diff to correct the description:
  logLevel:
    description: |-
-     logLevel defines the verbosity of logs emitted by Alertmanager.
+     logLevel defines the verbosity of logs emitted by Prometheus.
      This field allows users to control the amount and severity of logs generated, which can be useful
      for debugging issues or reducing noise in production environments.
      Allowed values are Error, Warn, Info, and Debug.
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)
1634-1651: Fix logLevel description—replace "Alertmanager" with "Prometheus".

Line 1636 incorrectly states "logLevel defines the verbosity of logs emitted by Alertmanager," but this field is nested within prometheusK8sConfig and controls Prometheus log verbosity, not Alertmanager's. The description must reference the correct component to avoid confusing API users.
                  logLevel:
                    description: |-
-                     logLevel defines the verbosity of logs emitted by Alertmanager.
+                     logLevel defines the verbosity of logs emitted by Prometheus.
                      This field allows users to control the amount and severity of logs generated, which can be useful
                      for debugging issues or reducing noise in production environments.
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)
1634-1651: Fix Prometheus logLevel description (currently mentions Alertmanager).

Within prometheusK8sConfig.logLevel the description still says “logs emitted by Alertmanager”, but this field configures Prometheus logging. That’s a copy/paste error and can confuse consumers of the API.

Please update the source comment on PrometheusK8sConfig.LogLevel in config/v1alpha1/types_cluster_monitoring.go to reference Prometheus instead, then regenerate swagger/OpenAPI and CRDs so both this YAML and the Go docs stay in sync.

For example, in PrometheusK8sConfig:
-	// logLevel defines the verbosity of logs emitted by Alertmanager.
+	// logLevel defines the verbosity of logs emitted by Prometheus.
Also applies to: 1287-2528
config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

255-275: Correct logLevel description in PrometheusK8sConfig Swagger docs (Alertmanager vs Prometheus).

map_PrometheusK8sConfig["logLevel"] still describes “logs emitted by Alertmanager”, but this field belongs to PrometheusK8sConfig and controls Prometheus logging. This mirrors the CRD issue and comes from the same Go comment on PrometheusK8sConfig.LogLevel.

Once you fix the Go comment to say “Prometheus” and rerun the generators, this swagger text (and all CRDs) will update automatically; no need to edit the generated files directly.

Also applies to: 121-325
config/v1alpha1/types_cluster_monitoring.go (5)
683-689: Add validation for Prometheus label name format on Key field.

Per past review feedback, the Key field should validate that label names conform to Prometheus requirements. Prometheus label names must match ^[a-zA-Z_][a-zA-Z0-9_]*$. Without this, users could specify invalid names like my-label (with hyphen) that will fail at runtime.
 	// key is the name of the label.
 	// The key must be a valid Prometheus label name, starting with a letter or underscore,
 	// followed by letters, digits, or underscores.
 	// +required
 	// +kubebuilder:validation:MaxLength=63
 	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:XValidation:rule="self.matches('^[a-zA-Z_][a-zA-Z0-9_]*$')",message="key must be a valid Prometheus label name: start with letter or underscore, followed by letters, digits, or underscores"
 	Key string `json:"key,omitempty"`
100-100: Documentation mentions "remote write/read" but only remote write is exposed.

The comment on line 100 states "External integrations (remote write/read, alerting, and scraping configuration)" but the PrometheusK8sConfig struct only exposes RemoteWrite — there are no RemoteRead fields. This could confuse users expecting remote read configuration.

Update to say "remote write" only, or clarify that remote read may be added in the future.

483-494: Documentation incorrectly references Alertmanager instead of Prometheus.

Line 483 states "logLevel defines the verbosity of logs emitted by Alertmanager" but this field is within PrometheusK8sConfig and should reference Prometheus. This appears to be a copy-paste error from AlertmanagerCustomConfig.
-	// logLevel defines the verbosity of logs emitted by Alertmanager.
+	// logLevel defines the verbosity of logs emitted by Prometheus.
642-646: Remove implementation detail from user-facing documentation.

Line 644 states "This is a custom type to allow for admission time validations" which is an implementation detail that shouldn't be in user-facing godoc. This was flagged in past review.
 	// bearerToken defines the secret reference containing the bearer token
 	// to use when authenticating to Alertmanager.
-	// This is a custom type to allow for admission time validations.
 	// +optional
647-660: PathPrefix example references v1 but only v2 is supported; Scheme default claim is misleading.

Two documentation issues per past review:

Line 649 mentions /api/v1/alerts in the example, but apiVersion only allows v2. Update to /api/v2/alerts.

Line 657 says "The default value is http" but the field is +required with no +kubebuilder:default, so users must always provide a value.
 	// pathPrefix defines an optional URL path prefix to prepend to the Alertmanager API endpoints.
 	// For example, if your Alertmanager is behind a reverse proxy at "/alertmanager/",
-	// set this to "/alertmanager" so requests go to "/alertmanager/api/v1/alerts" instead of "/api/v1/alerts".
+	// set this to "/alertmanager" so requests go to "/alertmanager/api/v2/alerts" instead of "/api/v2/alerts".
 	// scheme defines the URL scheme to use when communicating with Alertmanager
 	// instances.
-	// Possible values are `http` or `https`. The default value is `http`.
+	// Possible values are `http` or `https`.

🧹 Nitpick comments (2)

openapi/generated_openapi/zz_generated.openapi.go (2)

21227-21302: Documented constraints are not enforced by the schema

In schema_openshift_api_config_v1alpha1_AdditionalAlertmanagerConfig, several fields document strict constraints (e.g. apiVersion only v2, scheme only http/https with a default, staticConfigs as <host>:<port> with a max of 10 entries), but the OpenAPI schema here exposes them as plain strings/arrays with no Enum, Pattern, MinItems, or MaxItems. This means invalid configs will pass CRD validation and only fail later when the operator consumes them.

If you intend these constraints to be enforced at admission time, consider adding the appropriate +kubebuilder:validation tags on the Go types (enums, patterns, min/max items) so that openapi-gen emits them. The same observation applies to other new types where the descriptions mention limits or allowed values (e.g. ExternalLabels.labels, several fields on PrometheusK8sConfig like enforcedBodySizeLimit, remoteWrite, resources, tolerations, topologySpreadConstraints, collectionProfile, and RemoteWriteSpec.writeRelabelConfigs).

23368-23486: Consider enum + fail‑closed handling for SecretKeySelector.optional and TLSConfig.certificateVerification

SecretKeySelector.optional and TLSConfig.certificateVerification are documented as having a small, fixed set of string values (Required vs Optional, Verify vs SkipVerify), but the OpenAPI schema exposes them as unrestricted strings with no enum. That makes it easy for users to typo values and for the implementation to accidentally treat unknown values like “SkipVerify”/“Optional”, which would be a security regression.

I’d recommend:

Modeling these as explicit enum types in the Go API with +kubebuilder:validation:Enum=... so bad values are rejected at admission.

Verifying that the consuming code treats unknown/empty values in a fail‑closed way (i.e. defaulting to Required / Verify, never to Optional / SkipVerify).

This would harden the API surface without changing the intended semantics.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between e748202 and 0d09af6.

📒 Files selected for processing (12)

config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1 hunks)
config/v1alpha1/types_cluster_monitoring.go (3 hunks)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)
config/v1alpha1/zz_generated.deepcopy.go (8 hunks)
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1 hunks)
config/v1alpha1/zz_generated.swagger_doc_generated.go (4 hunks)
openapi/generated_openapi/zz_generated.openapi.go (13 hunks)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
config/v1alpha1/types_cluster_monitoring.go
openapi/generated_openapi/zz_generated.openapi.go
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.swagger_doc_generated.go
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.deepcopy.go

🧬 Code graph analysis (2)

config/v1alpha1/types_cluster_monitoring.go (2)

config/v1alpha1/zz_generated.swagger_doc_generated.go (9)

PrometheusK8sConfig (273-275)

AdditionalAlertmanagerConfig (132-134)

ExternalLabels (227-229)

RemoteWriteSpec (299-301)

ContainerResource (218-220)

SecretKeySelector (310-312)

TLSConfig (323-325)

Label (237-239)

RelabelConfig (287-289)

operator/v1/types.go (1)

LogLevel (94-94)

config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

config/v1alpha1/types_cluster_monitoring.go (8)

AdditionalAlertmanagerConfig (636-679)

ExternalLabels (698-708)

Label (682-695)

PrometheusK8sConfig (447-618)

RelabelConfig (736-768)

RemoteWriteSpec (711-733)

SecretKeySelector (798-823)

TLSConfig (771-794)

🔇 Additional comments (27)

config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)

354-778: ✅ Comprehensive test coverage for PrometheusK8sConfig validation.

The test cases thoroughly validate PrometheusK8sConfig behavior across multiple scenarios: minimal config, valid configurations with various optional fields (nodeSelector, resources, tolerations, topologySpreadConstraints, additionalAlertmanagerConfigs, remoteWrite, externalLabels), and rejection cases for constraint violations. Enum values are correctly cased (e.g., "Info" not "info").
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)
1634-1651: Fix logLevel description: should reference Prometheus, not Alertmanager.

Line 1636 incorrectly states "logs emitted by Alertmanager," but this field configures Prometheus logging (it is nested under prometheusK8sConfig). Update the description to "logs emitted by Prometheus" and regenerate the CRD from the updated Go type doc comments.

Apply this diff to update the description:
                  logLevel:
                    description: |-
-                      logLevel defines the verbosity of logs emitted by Alertmanager.
+                      logLevel defines the verbosity of logs emitted by Prometheus.
                       This field allows users to control the amount and severity of logs generated, which can be useful
                       for debugging issues or reducing noise in production environments.
                       Allowed values are Error, Warn, Info, and Debug.
After updating the Go doc comment in config/v1alpha1/types_cluster_monitoring.go, regenerate the CRD manifest to sync the description.
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (2)

1562-1575: ✓ collectionProfile enum–description consistency resolved.

The description correctly shows Full and Minimal in PascalCase, matching the enum values. This aligns with the prior review feedback (commit 3b005e4).

1287-2528: Verify whether prometheusK8sConfig duplication exists.

The AI summary indicates duplicate prometheusK8sConfig entries in the CRD OpenAPI schema block. However, the provided excerpt shows only one definition (lines 1287–2528). Duplication may exist in:

The CustomNoUpgrade or DevPreviewNoUpgrade CRD variants (mentioned in the summary as also having duplicates)

The full file beyond the provided range

Generated artifacts that were not regenerated cleanly

Please confirm whether duplication is present and, if so, whether it was introduced during code generation.

config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

191-197: ClusterMonitoringSpec.prometheusK8sConfig Swagger doc looks consistent with the CRD.

The high-level description for prometheusK8sConfig matches the CRD and underlying intent; no additional structural or validation issues from the swagger side.

openapi/generated_openapi/zz_generated.openapi.go (4)

421-477: OpenAPI registration for new v1alpha1 types looks consistent

The new v1alpha1 schemas (AdditionalAlertmanagerConfig, ExternalLabels, Label, PrometheusK8sConfig, RelabelConfig, RemoteWriteSpec, SecretKeySelector, TLSConfig) are correctly wired into the OpenAPI definitions map alongside related config types; no issues from a registration/lookup perspective.

21918-21937: Wiring of prometheusK8sConfig into ClusterMonitoringSpec looks correct

The new prometheusK8sConfig property references v1alpha1.PrometheusK8sConfig with an empty-object default, and the dependencies slice is updated to include that type alongside the existing config types. From an OpenAPI/wiring perspective this addition looks sound.

22032-22069: ExternalLabels / Label schema structure matches intended usage

Label as a simple {key,value} object and ExternalLabels as a list-map keyed by key is a good fit for enforcing uniqueness of label keys and mirrors Prometheus’ external labels model. Aside from the earlier general note about documenting min/max label counts without schema-level enforcement, the structure and references here look correct.

Also applies to: 22462-22488

23149-23272: RelabelConfig / RemoteWriteSpec model aligns with Prometheus semantics

RelabelConfig closely follows the standard Prometheus relabel rule shape (sourceLabels, separator, regex, targetLabel, replacement, action), and RemoteWriteSpec uses that for writeRelabelConfigs with url required, which matches expectations for remote_write configuration. Structurally and from a reference standpoint this looks correct; just keep in mind the earlier comment if you decide you want schema‑level enforcement of allowed action values or max relabel rules.

config/v1alpha1/types_cluster_monitoring.go (9)

443-461: LGTM on additionalAlertmanagerConfigs structure.

The listType=atomic approach and documentation for additionalAlertmanagerConfigs addresses the prior review feedback about uniqueness constraints. The field correctly documents the max items limit and behavior when omitted.

505-518: LGTM on queryLogFile documentation and validation.

The field has clear documentation explaining supported paths (including /dev/stderr, /dev/stdout, /dev/null), restrictions on other /dev/ paths, and appropriate length constraints.

519-527: LGTM on remoteWrite configuration.

Using url as the list map key ensures uniqueness per remote endpoint, and the documentation clearly explains the purpose and links to Prometheus documentation.

600-617: LGTM on collectionProfile; note on volumeClaimTemplate pattern.

The collectionProfile field has clear documentation with meaningful enum values. The volumeClaimTemplate uses v1.PersistentVolumeClaim directly, which matches the existing pattern used in AlertmanagerCustomConfig (line 278). While past review suggested using a custom type for better validation, this maintains consistency with the existing API surface in this file.

710-733: LGTM on RemoteWriteSpec.

The required url field with length constraints and optional configuration fields are well-structured. Using listType=atomic for writeRelabelConfigs is appropriate for relabeling rule ordering.

735-768: LGTM on RelabelConfig.

The structure appropriately mirrors Prometheus relabel_config with the action field being required and other fields optional. Length constraints are reasonable.

770-794: LGTM on TLSConfig with improved certificateVerification naming.

The certificateVerification field with Verify/SkipVerify enum values is much clearer than the previous insecureSkipVerify naming. The default of Verify ensures secure behavior by default.

796-834: LGTM on SecretKeySelector with detailed Optional behavior documentation.

The optional field documentation (lines 809-818) thoroughly explains the runtime behavior when set to Optional and the Secret/key is missing. This addresses past review feedback and provides clear guidance to users.

836-867: LGTM on RelabelAction and CollectionProfile enums.

RelabelAction correctly uses lowercase values to match Prometheus conventions, while CollectionProfile uses CamelCase per OpenShift conventions. Both have well-documented constants.

config/v1alpha1/zz_generated.deepcopy.go (9)

14-39: LGTM on AdditionalAlertmanagerConfig deepcopy.

The auto-generated deepcopy correctly handles StaticConfigs slice and TLSConfig pointer. The BearerToken value copy is correct since SecretKeySelector contains only value types.

389-397: LGTM on ClusterMonitoringSpec deepcopy integration.

The PrometheusK8sConfig.DeepCopyInto call follows the established pattern for other config fields.

460-479: LGTM on ExternalLabels deepcopy.

The Labels slice is correctly deep copied.

734-748: LGTM on Label deepcopy.

Simple value copy is correct for a struct with only string fields.

994-1061: LGTM on PrometheusK8sConfig deepcopy.

All complex fields are correctly handled:

Slices with struct elements use per-element DeepCopyInto

Maps are properly deep copied

Pointer fields allocate new memory and copy values

1089-1128: LGTM on RelabelConfig deepcopy.

All pointer fields (Separator, Regex, TargetLabel, Replacement) and the SourceLabels slice are correctly deep copied.

1130-1156: LGTM on RemoteWriteSpec deepcopy.

The optional Name pointer and WriteRelabelConfigs slice are correctly deep copied.

1216-1230: LGTM on SecretKeySelector deepcopy.

Value copy is correct for a struct with only string and string-alias fields.

1253-1270: LGTM on TLSConfig deepcopy.

All fields are value types (SecretKeySelector and strings), so simple value copy is correct.

coderabbitai · 2025-12-09T15:31:12Z

config/v1alpha1/types_cluster_monitoring.go

+// LogLevel defines the verbosity of logs emitted by Alertmanager.
 // Valid values are Error, Warn, Info and Debug.
 // +kubebuilder:validation:Enum=Error;Warn;Info;Debug
 type LogLevel string


⚠️ Potential issue | 🟡 Minor

LogLevel type documentation is too specific for a shared type.

This LogLevel type is used by both AlertmanagerCustomConfig and PrometheusK8sConfig, but the godoc says "defines the verbosity of logs emitted by Alertmanager." This is misleading when users reference the type documentation in the Prometheus context.

-// LogLevel defines the verbosity of logs emitted by Alertmanager. +// LogLevel defines the verbosity of logs emitted by monitoring components. // Valid values are Error, Warn, Info and Debug.

🤖 Prompt for AI Agents

In config/v1alpha1/types_cluster_monitoring.go around lines 299 to 302, the godoc for the LogLevel type is too specific to Alertmanager even though the type is shared; update the comment to a generic description such as "LogLevel defines the verbosity of logs emitted by components (for example Alertmanager and Prometheus)." and keep the existing validation Enum tag intact so the allowed values remain Error;Warn;Info;Debug.

...ed.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml

openapi/generated_openapi/zz_generated.openapi.go

everettraven · 2025-12-11T16:51:35Z

@danielmellado I just wanted to follow up and let you know that this is next on my list to review.

Because there is a lot of historical review here I took some time to re-read that history and feel sufficiently caught up to perform another review round.

My next review round is going to take the approach of starting from scratch so that I can tune out any noise from previous review cycles.

I should have a review completed by the end of the week.

everettraven · 2025-12-12T14:57:49Z

config/v1alpha1/types_cluster_monitoring.go

+	// prometheusK8sConfig provides configuration options for the Prometheus instance, which is the pod running Prometheus in the cluster.
+	// By default, at least one Prometheus pod is deployed in the `openshift-monitoring` namespace to collect and store metrics for the platform.


Is this only for configuring the default prometheus instance or does the configuration specified here apply to all prometheus instances?

This configures the default platform Prometheus instance that runs in openshift-monitoring. User-workload Prometheus instances are configured separately. Updated godoc to clarify.

everettraven · 2025-12-12T15:00:37Z

config/v1alpha1/types_cluster_monitoring.go

+	//   https://prometheus.io/docs/prometheus/latest/configuration/configuration/
+	//   https://prometheus.io/docs/prometheus/latest/storage/


Documentation links like this can become stale and may not properly reflect the configuration options that this type exposes.

We should appropriately document all the configuration options we expose instead of linking out.

In this particular case, I'd probably just remove these links here and let the child fields explain what they configure.

same as above

everettraven · 2025-12-12T15:12:12Z

config/v1alpha1/types_cluster_monitoring.go

+	// +optional
+	// +kubebuilder:validation:MaxItems=10
+	// +listType=atomic
+	AdditionalAlertmanagerConfigs []AdditionalAlertmanagerConfig `json:"additionalAlertmanagerConfigs,omitempty"`


Circling back to this, I think something like:

additionalAlertmanagerConfigs: - apiVersion: v2 url: "alertmanager.external.com:9093" path: "/critical" authenticationMethods: - type: BearerToken bearerToken: name: "prod-token" key: "token" - type: TLS tls: certificate: name: "..." key: "..."

Would be a better UX for this field and you can enforce some form of uniqueness on a combination of fields if necessary.

What do you think?

everettraven · 2025-12-12T15:23:45Z

config/v1alpha1/types_cluster_monitoring.go

+	// +kubebuilder:validation:MaxLength=50
+	// +kubebuilder:validation:Pattern=`^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$`
+	// +optional
+	EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"`


I think my comments here still stand and in general we prefer using an explicit unit name in the field.

See https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#units for reference. In OpenShift APIs we generally prefer the units in the field name approach.

everettraven · 2025-12-12T15:25:04Z

config/v1alpha1/types_cluster_monitoring.go

+	// and Alertmanager. By default, no labels are added.
+	// When omitted, no external labels are applied (default behavior).


Nit: I think that just saying this is sufficient:

Suggested change

// and Alertmanager. By default, no labels are added.

// When omitted, no external labels are applied (default behavior).

// and Alertmanager.

// When omitted, no external labels are applied.

Agreed. Simplified the documentation.

config/v1alpha1/types_cluster_monitoring.go

everettraven · 2025-12-12T16:05:27Z

config/v1alpha1/types_cluster_monitoring.go

+	// When omitted, defaults to "Required".
+	// +optional
+	// +kubebuilder:default=Required
+	Optional SecretRequirement `json:"optional,omitempty"`


The field name here is a bit confusing to me.

setting:

optional: Required

or

optional: Optional

doesn't seem intuitive to me.

Something like presence feels like it would fit better here.

everettraven · 2025-12-12T16:06:35Z

config/v1alpha1/types_cluster_monitoring.go

+)
+
+// RelabelAction defines the action to perform in a relabeling rule.
+// +kubebuilder:validation:Enum=replace;keep;drop;hashmod;labelmap;labeldrop;labelkeep


Enum values should be CamelCase:

Suggested change

// +kubebuilder:validation:Enum=replace;keep;drop;hashmod;labelmap;labeldrop;labelkeep

// +kubebuilder:validation:Enum=Replace;Keep;Drop;HashMod;LabelMap;LabelDrop;LabelKeep

everettraven · 2025-12-12T16:07:37Z

config/v1alpha1/types_cluster_monitoring.go

+	// +kubebuilder:validation:MaxLength=255
+	Replacement *string `json:"replacement,omitempty"`
+	// action is the action to perform.
+	// Valid actions are: replace, keep, drop, hashmod, labelmap, labeldrop, labelkeep.


What does setting this to each of these values do?

How do I determine which action I want to use?

Added detailed documentation for each action in the updated godoc

simonpasquier

Regarding the max values, I'm worried that some are too low. Maybe we can try to leverage Insights data to understand the current practices?

simonpasquier · 2025-12-16T09:57:17Z

config/v1alpha1/types_cluster_monitoring.go

+	// prometheusK8sConfig is optional.
+	// When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
+	// +optional
+	PrometheusK8sConfig PrometheusK8sConfig `json:"prometheusK8sConfig,omitempty,omitzero"`


I'd agree that we don't need the k8s part.

simonpasquier · 2025-12-16T10:01:03Z

config/v1alpha1/types_cluster_monitoring.go

+	// +optional
+	// +kubebuilder:validation:MaxItems=10
+	// +listType=atomic
+	AdditionalAlertmanagerConfigs []AdditionalAlertmanagerConfig `json:"additionalAlertmanagerConfigs,omitempty"`


I agree that enabling multiple owners for the field is important. For instance ACM wants to manage its own entry(ies) while cluster admins may want to use other external Alertmanagers.

simonpasquier · 2025-12-16T10:02:16Z

config/v1alpha1/types_cluster_monitoring.go

+	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:MaxLength=50
+	// +optional
+	EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"`


Don't we want to ensure the format of the string?

If you are going to be continuing with the size format approach as opposed to using explicit units, yes.

@simonpasquier I was thinking of just getting this to automatic if omitted, wdyt?

simonpasquier · 2025-12-16T10:03:47Z

config/v1alpha1/types_cluster_monitoring.go

+	// remoteWrite supports a maximum of 10 items in the list.
+	// +kubebuilder:validation:MaxItems=10
+	// +listType=map
+	// +listMapKey=url


Not sure if it's the only field we should use for the key.

Added prose explaining: "Maximum of 10 remote write configurations can be specified. Each entry must have a unique URL."

@simonpasquier - using url as the list map key since it's required and should be unique per endpoint. Is there a better key you'd suggest?

simonpasquier · 2025-12-16T10:07:19Z

config/v1alpha1/types_cluster_monitoring.go

+	// Allowed values: "v2". "v1" is no longer supported.
+	// +kubebuilder:validation:Enum=v2
+	// +required
+	APIVersion AlertmanagerAPIVersion `json:"apiVersion,omitempty"`


we can omit the API version field (v1 shouldn't be used anymore).

Good point there! I removed the apiVersion field entirely since v1 is no longer supported and v2 is the only valid option. CMO will hardcode v2 when communicating with Alertmanager.

simonpasquier · 2025-12-16T10:29:13Z

config/v1alpha1/types_cluster_monitoring.go

+	// +optional
+	// +kubebuilder:validation:MaxLength=20
+	// +kubebuilder:validation:MinLength=2
+	RemoteTimeout string `json:"remoteTimeout,omitempty"`


We need to validate duration strings.

simonpasquier · 2025-12-16T10:29:27Z

config/v1alpha1/types_cluster_monitoring.go

+	// writeRelabelConfigs is a list of relabeling rules to apply before sending data to the remote endpoint.
+	// Maximum of 10 relabeling rules can be specified.
+	// +optional
+	// +kubebuilder:validation:MaxItems=10


10 is too low

Good catch! What would be a reasonable limit? 50? 100? We want to allow complex relabeling while preventing unbounded configs, TL;DR same as in the other one

simonpasquier · 2025-12-16T10:31:10Z

config/v1alpha1/types_cluster_monitoring.go

+
+const (
+	// RelabelActionReplace replaces the target label with the replacement value.
+	RelabelActionReplace RelabelAction = "replace"


shouldn't this be Replace?

https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#constants

Already fixed, will push an updated commit, thanks!

simonpasquier · 2025-12-16T10:32:41Z

config/v1alpha1/types_cluster_monitoring.go

+	// When omitted, defaults to "Required".
+	// +optional
+	// +kubebuilder:default=Required
+	Optional SecretRequirement `json:"optional,omitempty"`


I think that we can make all secret keys required and not have the optional field. The Prometheus operator itself will fail the reconciliation if a secret key ref is missing hence a ref is required de facto.

+1, seems like a cleaner approach, wdyt @everettraven ?

If you don't need some kind of optionality evaluation here, yes that is better.

simonpasquier · 2025-12-16T10:33:17Z

config/v1alpha1/types_cluster_monitoring.go

+	// +optional
+	// +kubebuilder:validation:Enum=Verify;SkipVerify
+	// +kubebuilder:default=Verify
+	CertificateVerification string `json:"certificateVerification,omitempty"`


should there be a custom type?

Yes, we typically encourage the use of a type alias for enums

Addressed - created CertificateVerificationType type alias with proper constants.

Signed-off-by: Mario Fernandez <mariofer@redhat.com>

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (6)

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)
1634-1651: The logLevel description still incorrectly references Alertmanager instead of Prometheus.

Lines 1635-1636 state "logs emitted by Alertmanager", but this field configures Prometheus logging under prometheusK8sConfig. This documentation error was flagged in previous reviews and marked as addressed, but the current code still contains the incorrect reference.

Update the description to reference Prometheus instead of Alertmanager. The Go doc comment in config/v1alpha1/types_cluster_monitoring.go should be updated and the CRD regenerated.

Apply this diff to fix the description:
                   logLevel:
                     description: |-
-                      logLevel defines the verbosity of logs emitted by Alertmanager.
+                      logLevel defines the verbosity of logs emitted by Prometheus.
                       This field allows users to control the amount and severity of logs generated, which can be useful
                       for debugging issues or reducing noise in production environments.
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)

1287-2529: LGTM! Past issues resolved, schema is well-structured.

The prometheusK8sConfig field addition looks good. Both previously flagged issues have been successfully addressed:

✅ collectionProfile (lines 1527-1528): The description now correctly matches the enum values using PascalCase (Full, Minimal).

✅ logLevel (line 1605): The description now correctly references "Prometheus" instead of the previously incorrect "Alertmanager" reference.

The CRD schema is comprehensive with:

Proper field validation rules (length constraints, patterns, enums)

Clear, detailed descriptions for all fields

Appropriate nested structure for complex configurations

Consistent naming conventions following Kubernetes patterns
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)
1634-1651: Fix incorrect component reference in logLevel description.

Line 1636 still incorrectly states "logLevel defines the verbosity of logs emitted by Alertmanager". This field belongs to prometheusK8sConfig, so it should reference Prometheus, not Alertmanager.

This issue was flagged in previous reviews but remains unfixed.

Apply this diff to correct the description:
  logLevel:
    description: |-
-     logLevel defines the verbosity of logs emitted by Alertmanager.
+     logLevel defines the verbosity of logs emitted by Prometheus.
      This field allows users to control the amount and severity of logs generated, which can be useful
      for debugging issues or reducing noise in production environments.
      Allowed values are Error, Warn, Info, and Debug.
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)
1634-1651: Fix copy-paste error: logLevel description references wrong component.

Line 1636 incorrectly states that logLevel controls verbosity for "Alertmanager" when it should reference "Prometheus" since this field is nested within prometheusK8sConfig.

Apply this diff to correct the component reference:
  logLevel:
    description: |-
-     logLevel defines the verbosity of logs emitted by Alertmanager.
+     logLevel defines the verbosity of logs emitted by Prometheus.
      This field allows users to control the amount and severity of logs generated, which can be useful
      for debugging issues or reducing noise in production environments.
config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

303-311: Document the missing optional field in SecretKeySelector swagger docs.

The swagger documentation for SecretKeySelector is missing the optional field, which is present in the actual type definition (as seen in the CRD schema at lines 1356-1373, 1436-1449, etc. in the CRD file). The optional field is an enum with values Required/Optional and defaults to Required.

Update the source Go type comment for SecretKeySelector.Optional to include appropriate documentation, then regenerate swagger docs. The documentation should describe:

Allowed values: Required, Optional

Default value: Required

Behavior when set to Required vs Optional
config/v1alpha1/types_cluster_monitoring.go (1)
288-291: LogLevel documentation is too component-specific for a shared type.

This LogLevel type is used by both AlertmanagerCustomConfig (line 174) and PrometheusK8sConfig (line 481), but the godoc states it "defines the verbosity of logs emitted by Alertmanager." This is misleading when users reference the type documentation in the Prometheus context.

This was flagged in past review comments but not yet addressed.

Update to a generic description:
-// LogLevel defines the verbosity of logs emitted by Alertmanager.
+// LogLevel defines the verbosity of logs emitted by monitoring components (such as Alertmanager and Prometheus).
 // Valid values are Error, Warn, Info and Debug.

🧹 Nitpick comments (5)

config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)

1855-1866: Consider adding pattern validation for duration and quantity fields.

Several fields accept duration or quantity strings but lack CRD-level pattern validation:

retention (lines 1855-1866): Description specifies pattern [0-9]+(ms|s|m|h|d|w|y) but no pattern field enforces it

remoteTimeout (lines 1673-1679): Described as duration string but no pattern validation

retentionSize (lines 1867-1874): Described as Kubernetes quantity format but no pattern validation

While runtime validation will catch invalid values, adding pattern validation here would provide earlier feedback to users during resource creation/update.

Note: If this is intentionally omitted (e.g., to allow for forward compatibility with new duration formats or because quantity parsing is complex), this suggestion can be safely ignored.

Also applies to: 1673-1679, 1867-1874
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (2)
1576-1590: Add pattern validation for enforcedBodySizeLimit or clarify validation approach.

The description (line 1585) states "The value must match the following pattern: ^(automatic|[0-9]+(B|KB|MB|GB|GB|TB)?)$", but the schema lacks a pattern constraint. Without CRD-level validation, users receive no immediate feedback when submitting invalid values, and validation errors occur later at the operator level.

Consider adding the pattern constraint to the schema:
  enforcedBodySizeLimit:
    description: |-
      enforcedBodySizeLimit enforces a body size limit for Prometheus scraped metrics. If a scraped
      target's body response is larger than the limit, the scrape will fail.
      The following values are valid:
      a numeric value in Prometheus size format (such as "4MB", "1000", "1GB", "512KB", "100B")
      or the string `automatic`, which indicates that the limit will be
      automatically calculated based on cluster capacity.
      To specify no limit, omit this field.
      The value must match the following pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$
      Minimum length is 1 character.
      Maximum length is 50 characters.
    maxLength: 50
    minLength: 1
+   pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$
    type: string
Alternatively, if operator-side validation is intentional, clarify in the description that validation occurs at the operator level.

1854-1865: Add pattern validation for retention or clarify validation approach.

The description (lines 1857-1859) states the value "must be specified using the following regular expression pattern: [0-9]+(ms|s|m|h|d|w|y)", but the schema lacks a pattern constraint. This creates the same validation gap as enforcedBodySizeLimit.

Consider adding the pattern constraint:
  retention:
    description: |-
      retention defines the duration for which Prometheus retains data.
      This definition must be specified using the following regular
      expression pattern: `[0-9]+(ms|s|m|h|d|w|y)` (ms = milliseconds,
      s= seconds,m = minutes, h = hours, d = days, w = weeks, y = years).
      When omitted, this means the user has no opinion and the platform is left
      to choose reasonable defaults, which are subject to change over time.
      The default value is `15d`.
    maxLength: 20
    minLength: 1
+   pattern: ^[0-9]+(ms|s|m|h|d|w|y)$
    type: string
Alternatively, if operator-side validation is intentional, clarify in the description that validation occurs at the operator level.
config/v1alpha1/types_cluster_monitoring.go (2)
437-450: Reconsider listType=atomic for multi-owner scenarios.

The additionalAlertmanagerConfigs field uses listType=atomic, which doesn't work well with server-side apply when multiple controllers or users want to independently manage their own entries. As noted in past review comments, there are valid use cases where ACM wants to manage its own entry while cluster admins manage others.

Consider changing to listType=map with an appropriate uniqueness key. However, this is complicated by the lack of a natural unique identifier. If entries must be truly independent with no uniqueness constraints, you may need to accept atomic with its limitations.

Past discussion also noted that the current design lacks a clear way to enforce uniqueness when needed. Consider whether there's a combination of fields that could serve as a composite key, or whether the API should be restructured to make ownership clearer.

536-546: Add CEL validation for retention duration format.

The field documentation describes a specific pattern [0-9]+(ms|s|m|h|d|w|y) but there's no runtime validation to enforce this. Invalid values like "15x" or "abc" would be accepted at admission time and only fail when Prometheus tries to use them.

Past review comments suggested adding CEL validation, which would be especially helpful here.

Add validation:
// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+(ms|s|m|h|d|w|y)$')",message="must be a duration like '15d', '6h', or '30m'"
Note: Similar validation should be added for other duration/size fields like remoteTimeout (line 718-723) and size-based fields like retentionSize if you want to enforce Kubernetes quantity format.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0d09af6 and 0643f0d.

📒 Files selected for processing (11)

config/v1alpha1/types_cluster_monitoring.go (3 hunks)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)
config/v1alpha1/zz_generated.deepcopy.go (8 hunks)
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1 hunks)
config/v1alpha1/zz_generated.swagger_doc_generated.go (4 hunks)
openapi/generated_openapi/zz_generated.openapi.go (13 hunks)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
openapi/generated_openapi/zz_generated.openapi.go
config/v1alpha1/types_cluster_monitoring.go
config/v1alpha1/zz_generated.swagger_doc_generated.go
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
config/v1alpha1/zz_generated.deepcopy.go

🧬 Code graph analysis (3)

config/v1alpha1/types_cluster_monitoring.go (1)

config/v1alpha1/zz_generated.swagger_doc_generated.go (9)

PrometheusK8sConfig (273-275)

AdditionalAlertmanagerConfig (132-134)

ExternalLabels (227-229)

RemoteWriteSpec (299-301)

ContainerResource (218-220)

SecretKeySelector (309-311)

TLSConfig (322-324)

Label (237-239)

RelabelConfig (287-289)

config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

config/v1alpha1/types_cluster_monitoring.go (8)

AdditionalAlertmanagerConfig (622-668)

ExternalLabels (690-700)

Label (671-687)

PrometheusK8sConfig (436-604)

RelabelConfig (733-784)

RemoteWriteSpec (703-730)

SecretKeySelector (820-835)

TLSConfig (787-816)

config/v1alpha1/zz_generated.deepcopy.go (1)

config/v1alpha1/types_cluster_monitoring.go (9)

AdditionalAlertmanagerConfig (622-668)

TLSConfig (787-816)

PrometheusK8sConfig (436-604)

ExternalLabels (690-700)

Label (671-687)

RemoteWriteSpec (703-730)

ContainerResource (307-337)

RelabelConfig (733-784)

SecretKeySelector (820-835)

🔇 Additional comments (18)

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (2)

1603-1620: Past issue resolved - logLevel description now correct.

The previous review comment correctly identified that the logLevel description referred to "Alertmanager" instead of "Prometheus". Line 1605 now properly states: "logLevel defines the verbosity of logs emitted by Prometheus." This is the correct description for the prometheusK8sConfig field.

1287-2529: Schema structure looks correct and comprehensive.

The new prometheusK8sConfig field adds extensive configuration options for the platform Prometheus instance. The schema includes:

Proper validation rules (regex patterns, length constraints, enum values)

Comprehensive TLS configuration with certificate verification options

Appropriate resource constraints following Kubernetes conventions

Clear, detailed documentation for each field

Since this is a generated file (zz_generated prefix), the schema correctness depends on the generation source. The generated output appears structurally sound with no obvious issues.

Note: This is a generated file. Any future changes should be made in the source templates/generators to ensure consistency across regenerations.

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)

1603-1620: Past issue has been properly addressed.

The logLevel field description now correctly references "Prometheus" instead of "Alertmanager" as noted in the previous review. The schema definition for this field and the entire prometheusK8sConfig addition appear well-formed with appropriate validation rules, enums, and constraints.

Since this is a generated file (indicated by the zz_generated prefix), the schema structure and formatting are as expected.

config/v1alpha1/zz_generated.swagger_doc_generated.go (2)

313-324: LGTM: TLSConfig swagger documentation is accurate.

The swagger documentation for TLSConfig correctly describes all fields including certificateVerification with proper enum values and default behavior.

121-134: LGTM: New swagger documentation entries are comprehensive.

The newly added swagger documentation for AdditionalAlertmanagerConfig, ExternalLabels, Label, RelabelConfig, and RemoteWriteSpec are well-documented with clear descriptions of fields, validation rules, and expected behaviors.

Also applies to: 222-239, 277-289, 291-301

openapi/generated_openapi/zz_generated.openapi.go (9)

424-424: LGTM! New type registrations are properly added.

All new v1alpha1 types (AdditionalAlertmanagerConfig, ExternalLabels, Label, PrometheusK8sConfig, RelabelConfig, RemoteWriteSpec, SecretKeySelector, TLSConfig) are correctly registered in the GetOpenAPIDefinitions map with their corresponding schema functions.

Also applies to: 442-442, 453-453, 465-465, 467-468, 472-472, 474-474

21227-21302: LGTM! AdditionalAlertmanagerConfig schema is well-defined.

The schema properly defines all fields with clear descriptions, appropriate required fields (apiVersion, staticConfigs), correct type references, and proper Kubernetes list annotations.

21918-21924: LGTM! prometheusK8sConfig field properly integrated.

The new field is correctly added to ClusterMonitoringSpec with a comprehensive description, appropriate optional semantics, and dependencies list properly updated to include the new type.

Also applies to: 21936-21936

22032-22068: LGTM! ExternalLabels schema is correctly structured.

The schema properly defines a list-map of labels with appropriate Kubernetes annotations (x-kubernetes-list-type: map, keyed by key), clear constraints in the description, and correct dependencies.

22462-22488: LGTM! Label schema is simple and well-defined.

The schema correctly models a key/value label pair with both fields required, clear validation constraints documented in descriptions, and no unnecessary complexity.

22930-23119: Previous issue resolved. PrometheusK8sConfig schema looks excellent.

The logLevel field description (line 22972) now correctly references "logs emitted by Prometheus" rather than "Alertmanager", resolving the issue flagged in the previous review.

The overall schema is comprehensive and well-structured with:

Appropriate Kubernetes list annotations (atomic, map with proper keys)

Clear descriptions explaining defaults, constraints, and behaviors

Correct type references and dependencies

Proper field definitions for all Prometheus configuration aspects

23149-23272: LGTM! RelabelConfig and RemoteWriteSpec schemas are well-defined.

Both schemas are properly structured with:

RelabelConfig: Comprehensive action descriptions, appropriate required field (action), proper list-type annotation for sourceLabels

RemoteWriteSpec: Clear structure with url as required, correct reference to RelabelConfig for write relabeling, proper dependencies

23368-23399: LGTM! SecretKeySelector schema is correctly defined.

The schema properly models secret key selection with both fields required, appropriate x-kubernetes-map-type: atomic annotation for atomic updates, and clear validation constraints in descriptions.

23431-23479: LGTM! TLSConfig schema is properly structured.

The schema correctly defines TLS configuration with appropriate optional fields, proper references to SecretKeySelector for certificates and keys, clear descriptions of verification policies, and correct dependencies.
config/v1alpha1/types_cluster_monitoring.go (3)
818-835: LGTM - SecretKeySelector implementation looks correct.

The SecretKeySelector type properly:

Marks both name and key as required

Validates the secret name as a DNS subdomain

Enforces length constraints on both fields

Addresses past review feedback about removing the problematic optional field

732-784: RelabelConfig documentation is comprehensive.

The RelabelConfig type now provides clear documentation for:

What happens when fields are omitted (defaults are specified)

The semantic difference between omitting a field vs setting it to empty string (lines 767-769)

Valid values and constraints for each field

This addresses multiple past review comments asking for clarity on field behavior.

Note: The documented defaults (; for separator, (.*) for regex, $1 for replacement) will need to be implemented in the controller/operator code since CRD validation cannot set these defaults for pointer types.

92-104: Documentation incorrectly mentions "remote read" capability.

The field documentation states this configuration is for "remote write/read" but the actual PrometheusK8sConfig struct (lines 509-518) only exposes a remoteWrite field. There is no remote read configuration available.

This was flagged in past review comments but not yet addressed.

Either remove the "/read" reference or explicitly state that only remote write is currently supported:
-	//   - External integrations (remote write, additional alertmanagers)
+	//   - External integrations (remote write to long-term storage, additional alertmanagers)
Likely an incorrect or invalid review comment.
config/v1alpha1/zz_generated.deepcopy.go (1)

14-39: LGTM - Autogenerated deepcopy methods are correct.

The autogenerated deepcopy implementations properly handle:

New type support: All new types introduced in this PR have corresponding DeepCopyInto and DeepCopy methods:

AdditionalAlertmanagerConfig (lines 14-39)

ExternalLabels (lines 460-479)

Label (lines 734-748)

PrometheusK8sConfig (lines 994-1061)

RelabelConfig (lines 1089-1128)

RemoteWriteSpec (lines 1130-1156)

SecretKeySelector (lines 1216-1230)

TLSConfig (lines 1253-1270)

Correct field propagation: ClusterMonitoringSpec.DeepCopyInto (line 394) properly calls in.PrometheusK8sConfig.DeepCopyInto(&out.PrometheusK8sConfig) to deep copy the new field.

Proper handling of complex fields:

Slices are allocated with correct length and elements are copied

Maps are allocated with correct capacity and entries are copied

Pointers are allocated and dereferenced properly

Nested structs call their own DeepCopyInto methods

The code generation appears to have correctly processed all the new API types.

Also applies to: 394-394, 460-479, 734-748, 994-1061, 1089-1128, 1130-1156, 1216-1230, 1253-1270

coderabbitai · 2025-12-16T15:14:20Z

config/v1alpha1/types_cluster_monitoring.go

+	// enforcedBodySizeLimit enforces a body size limit for Prometheus scraped metrics. If a scraped
+	// target's body response is larger than the limit, the scrape will fail.
+	// The following values are valid:
+	// a numeric value in Prometheus size format (such as "4MB", "1000", "1GB", "512KB", "100B")
+	// or the string `automatic`, which indicates that the limit will be
+	// automatically calculated based on cluster capacity.
+	// To specify no limit, omit this field.
+	// The value must match the following pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$
+	// Minimum length is 1 character.
+	// Maximum length is 50 characters.
+	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:MaxLength=50
+	// +optional
+	EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"`


⚠️ Potential issue | 🟡 Minor

Documentation contradicts validation for empty values.

The field comment states "To specify no limit, omit this field" which is correct, but there's still confusion about whether empty string is valid. The +kubebuilder:validation:MinLength=1 marker means empty string is not allowed, yet some readers might interpret "omit" to mean either nil or empty string.

Clarify that omission (nil) is the only way to specify "no limit":

// To specify no limit, omit this field. + // Note: Empty string ("") is not allowed; omission means the field is not set at all. // The value must match the following pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$

Additionally, past review comments suggested adding CEL validation for the pattern rather than just documenting it. Consider adding:

// +kubebuilder:validation:XValidation:rule="self.matches('^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$')",message="must be 'automatic' or a size like '4MB', '512KB', '100B'"

🤖 Prompt for AI Agents

In config/v1alpha1/types_cluster_monitoring.go around lines 451-464, the field comment is ambiguous about "omit this field" vs empty string and the MinLength=1 validator disallows empty string; update the comment to explicitly state that only omission (nil) denotes "no limit" and that an empty string is not allowed, and add a kubebuilder XValidation CEL rule to enforce the documented pattern (e.g. self == null || self.matches('^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$')) with a clear message like "must be 'automatic' or a size like '4MB', '512KB', '100B' or omitted to indicate no limit", ensuring it complements the existing MinLength/MaxLength markers.

coderabbitai · 2025-12-16T15:14:20Z

config/v1alpha1/types_cluster_monitoring.go

+type AlertmanagerAPIVersion string
+
+const (
+	AlertmanagerAPIVersionV2 AlertmanagerAPIVersion = "v2"
+)


⚠️ Potential issue | 🟡 Minor

Remove misleading "default" language from required apiVersion field.

The apiVersion field (lines 623-627) is marked as +required, meaning users must explicitly provide it. However, past documentation suggested it has a "default value" which is contradictory.

Since the field is required and currently only allows one value (v2), the concept of a "default" doesn't apply. The current enum design is likely future-proofing for when additional API versions might be supported.

The documentation should simply state:

// apiVersion defines the Alertmanager API version to target. - // Allowed values: "v2". "v1" is no longer supported. + // Currently, only "v2" is supported. "v1" is no longer supported. // +kubebuilder:validation:Enum=v2

Alternatively, if you want v2 to be the implied default for backward compatibility in the future, consider making the field optional with a default:

// +kubebuilder:default=v2 // +optional

Also applies to: 623-627

🤖 Prompt for AI Agents

In config/v1alpha1/types_cluster_monitoring.go around lines 623-627, remove any wording in the field comment that claims the apiVersion has a "default" value since the field is annotated +required and users must supply it; update the doc comment to simply state the accepted enum values (currently "v2") and that the field is required, or alternatively if you want v2 to be an implied default change the field to optional and add +kubebuilder:default=v2 and +optional instead (do not leave contradictory "default" language).

coderabbitai · 2025-12-16T15:14:20Z

config/v1alpha1/types_cluster_monitoring.go

+	// pathPrefix defines an optional URL path prefix to prepend to the Alertmanager API endpoints.
+	// For example, if your Alertmanager is behind a reverse proxy at "/alertmanager/",
+	// set this to "/alertmanager" so requests go to "/alertmanager/api/v1/alerts" instead of "/api/v1/alerts".
+	// This is commonly needed when Alertmanager is deployed behind ingress controllers or load balancers.
+	// +kubebuilder:validation:MaxLength=255
+	// +kubebuilder:validation:MinLength=1
+	// +optional
+	PathPrefix string `json:"pathPrefix,omitempty"`
+	// scheme defines the URL scheme to use when communicating with Alertmanager
+	// instances.
+	// Possible values are `HTTP` or `HTTPS`.
+	// When omitted, defaults to `HTTP`.
+	// +kubebuilder:validation:Enum=HTTP;HTTPS
+	// +kubebuilder:default=HTTP
+	// +optional
+	Scheme AlertmanagerScheme `json:"scheme,omitempty"`
+	// staticConfigs is a list of statically configured Alertmanager endpoints in the form


⚠️ Potential issue | 🟡 Minor

Fix documentation inconsistencies in AdditionalAlertmanagerConfig.

Two issues here:

PathPrefix example references wrong API version: Line 635 mentions /api/v1/alerts but the apiVersion field only allows v2. The example should reference v2 endpoints.

Scheme field marked as both required and having a default: Line 644 states "When omitted, defaults to HTTP" but line 646 marks it as +required, meaning it cannot be omitted.

Apply these fixes:

// For example, if your Alertmanager is behind a reverse proxy at "/alertmanager/", - // set this to "/alertmanager" so requests go to "/alertmanager/api/v1/alerts" instead of "/api/v1/alerts". + // set this to "/alertmanager" so requests go to "/alertmanager/api/v2/alerts" instead of "/api/v2/alerts".

For the scheme field, either:

Remove +required and keep the default behavior, OR

Remove the "When omitted, defaults to HTTP" text and keep it required

Based on the +kubebuilder:default=HTTP marker, it seems the intent is to have a default, so removing +required would be more appropriate.

🤖 Prompt for AI Agents

In config/v1alpha1/types_cluster_monitoring.go around lines 633-649, update the PathPrefix example to reference v2 endpoints (e.g., use "/alertmanager/api/v2/alerts" instead of "/api/v1/alerts") to match the apiVersion allowed, and remove the +kubebuilder:validation:Required marker from the Scheme field so it can be omitted and rely on the existing +kubebuilder:default=HTTP and the comment "When omitted, defaults to `HTTP`".

coderabbitai · 2025-12-16T15:14:20Z

...aturegated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml

+                        scheme:
+                          default: HTTP
+                          description: |-
+                            scheme defines the URL scheme to use when communicating with Alertmanager
+                            instances.
+                            Possible values are `HTTP` or `HTTPS`.
+                            When omitted, defaults to `HTTP`.
+                          enum:
+                          - HTTP
+                          - HTTPS
+                          type: string


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for the scheme field definition in the alertmanager configuration types find config/v1alpha1 -name "*.go" | head -20

Repository: openshift/api

Length of output: 446

🏁 Script executed:

#!/bin/bash # Look for alertmanager-related struct definitions and scheme defaults rg -n "AlertmanagerConfig|scheme" config/v1alpha1/ --type go -A 5 -B 5 | head -100

Repository: openshift/api

Length of output: 15409

🏁 Script executed:

#!/bin/bash # Search more specifically for the scheme field in alertmanager structs rg -n "Scheme.*string|scheme.*HTTP" config/v1alpha1/ --type go

Repository: openshift/api

Length of output: 372

🏁 Script executed:

#!/bin/bash # Extract the AdditionalAlertmanagerConfig struct definition and surrounding context sed -n '500,750p' config/v1alpha1/types_cluster_monitoring.go | cat -n

Repository: openshift/api

Length of output: 14320

🏁 Script executed:

#!/bin/bash # Look for the complete AlertmanagerScheme type and AdditionalAlertmanagerConfig definition rg -n -A 30 "type AdditionalAlertmanagerConfig struct" config/v1alpha1/types_cluster_monitoring.go

Repository: openshift/api

Length of output: 1878

🏁 Script executed:

#!/bin/bash # Check for any comments or documentation about the scheme field default rg -n -B 5 -A 5 "scheme.*HTTP|HTTP.*scheme" config/v1alpha1/types_cluster_monitoring.go

Repository: openshift/api

Length of output: 39

Change the default scheme from HTTP to HTTPS in the source type definition.

The scheme field defaults to HTTP in the AdditionalAlertmanagerConfig struct (line 646 of config/v1alpha1/types_cluster_monitoring.go via the +kubebuilder:default=HTTP tag). This means alert data will be transmitted unencrypted by default when connecting to additional Alertmanager instances. Since alerts can contain sensitive operational information, defaulting to unencrypted transmission degrades security posture.

Update the source Go type definition to default to HTTPS instead. Change +kubebuilder:default=HTTP to +kubebuilder:default=HTTPS in config/v1alpha1/types_cluster_monitoring.go, then regenerate the manifest.

🤖 Prompt for AI Agents

In config/v1alpha1/types_cluster_monitoring.go around line 646 and in the generated manifest config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (lines ~1366-1376), update the kubebuilder default on the AdditionalAlertmanagerConfig.scheme from +kubebuilder:default=HTTP to +kubebuilder:default=HTTPS in the Go source (types_cluster_monitoring.go), then run the codegen/CRD manifest generation step used by the repo to regenerate the ClusterMonitoringConfig.yaml so the CRD default in the manifest reflects HTTPS.

Addresses feedback from PR reviewers and linting issues. Signed-off-by: Daniel Mellado <dmellado@fedoraproject.org>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)

1366-1376: Security concern: Default scheme should be HTTPS, not HTTP.

This was previously flagged. Alertmanager connections will transmit alert data (potentially sensitive operational information) over unencrypted HTTP by default. Update the source Go type to use +kubebuilder:default=HTTPS and regenerate.
config/v1alpha1/types_cluster_monitoring.go (2)
288-291: Update LogLevel type documentation to be component-agnostic.

The LogLevel type is used by both AlertmanagerCustomConfig (line 174) and PrometheusK8sConfig (line 481), but the documentation specifically mentions "Alertmanager". This creates confusion when users reference the type documentation in the Prometheus context.

Apply this diff to make the documentation more generic:
-// LogLevel defines the verbosity of logs emitted by Alertmanager.
+// LogLevel defines the verbosity of logs emitted by monitoring components.
 // Valid values are Error, Warn, Info and Debug.
633-640: Fix API version in pathPrefix example.

The example at line 635 references /api/v1/alerts, but the apiVersion field only allows v2. The example should reference the v2 endpoint to be consistent with the API constraints.

Apply this diff:
 	// pathPrefix defines an optional URL path prefix to prepend to the Alertmanager API endpoints.
 	// For example, if your Alertmanager is behind a reverse proxy at "/alertmanager/",
-	// set this to "/alertmanager" so requests go to "/alertmanager/api/v1/alerts" instead of "/api/v1/alerts".
+	// set this to "/alertmanager" so requests go to "/alertmanager/api/v2/alerts" instead of "/api/v2/alerts".
 	// This is commonly needed when Alertmanager is deployed behind ingress controllers or load balancers.

🧹 Nitpick comments (11)

config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)
392-420: Consider quoting resource quantities for consistency.

The resource quantity values (e.g., 100m, 500m, 128Mi, 512Mi) are currently unquoted. While most YAML parsers handle these correctly, explicitly quoting them aligns with Kubernetes conventions and ensures consistent parsing across different tools.

Apply this diff to quote the resource quantities:
           prometheusK8sConfig:
             resources:
               - name: cpu
-                request: 100m
-                limit: 500m
+                request: "100m"
+                limit: "500m"
               - name: memory
-                request: 128Mi
-                limit: 512Mi
+                request: "128Mi"
+                limit: "512Mi"
       expected: |
         apiVersion: config.openshift.io/v1alpha1
         kind: ClusterMonitoring
         spec:
           userDefined:
             mode: "Disabled"
           prometheusK8sConfig:
             resources:
               - name: cpu
-                request: 100m
-                limit: 500m
+                request: "100m"
+                limit: "500m"
               - name: memory
-                request: 128Mi
-                limit: 512Mi
+                request: "128Mi"
+                limit: "512Mi"
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (4)
1382-1390: staticConfigs regex allows invalid port numbers.

The regex '^[a-zA-Z0-9.-]+:[0-9]+$' permits port numbers like 0 or 99999 which are invalid. Consider tightening to validate port range (1-65535) or rely on the operator to validate at runtime.

1539-1553: Missing pattern validation for enforcedBodySizeLimit.

The description documents the pattern ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$ but no pattern: constraint enforces it. Invalid values like invalid or 5MB5 would be accepted by the schema and only fail at runtime.
+                    pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$
                     maxLength: 50
                     minLength: 1
                     type: string
1855-1866: Missing pattern validation for retention.

The description specifies the format [0-9]+(ms|s|m|h|d|w|y) but no schema pattern: enforces it. Add pattern validation to catch invalid values at admission time:
+                    pattern: ^[0-9]+(ms|s|m|h|d|w|y)$
                     maxLength: 20
                     minLength: 1
                     type: string
1635-1654: queryLogFile validation doesn't enforce /dev/ path restrictions.

The description states only /dev/stderr, /dev/stdout, and /dev/null are supported, but the CEL rule permits any /dev/* path. Users could specify /dev/random which would pass validation but fail at runtime.

Consider adding validation:
x-kubernetes-validations:
- message: must be an absolute path starting with '/' or a simple filename without '/'
  rule: self.startsWith('/') || !self.contains('/')
- message: only /dev/stderr, /dev/stdout, and /dev/null are supported under /dev/
  rule: "!self.startsWith('/dev/') || self in ['/dev/stderr', '/dev/stdout', '/dev/null']"
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (5)
1391-1397: Add pattern validation for Go duration format.

The timeout field accepts duration strings but lacks validation. Users can set invalid values like "30x" or "abc" that will fail at runtime.

Add a CEL validation rule to enforce Go duration format:
                        timeout:
                          description: |-
                            timeout defines the timeout value used when sending alerts.
                            The value must be a valid Go time.Duration string (e.g. 30s, 5m, 1h).
                          maxLength: 20
                          minLength: 2
                          type: string
+                         x-kubernetes-validations:
+                         - message: must be a valid Go duration (e.g., 30s, 5m, 1h)
+                           rule: self.matches('^[0-9]+(ns|us|µs|ms|s|m|h)$')
1673-1679: Add pattern validation for Go duration format.

The remoteTimeout field accepts duration strings but lacks validation. Users can set invalid values that will fail at runtime.

Add a CEL validation rule to enforce Go duration format:
                        remoteTimeout:
                          description: |-
                            remoteTimeout is the timeout for requests to the remote write endpoint.
                            When omitted, the default is 30s.
                          maxLength: 20
                          minLength: 2
                          type: string
+                         x-kubernetes-validations:
+                         - message: must be a valid Go duration (e.g., 30s, 5m, 1h)
+                           rule: self.matches('^[0-9]+(ns|us|µs|ms|s|m|h)$')
1855-1866: Add pattern validation for retention duration format.

The retention field documentation specifies the pattern [0-9]+(ms|s|m|h|d|w|y) but no validation enforces it. Users can set invalid values like "15days" instead of "15d".

Add a CEL validation rule to enforce the documented pattern:
                  retention:
                    description: |-
                      retention defines the duration for which Prometheus retains data.
                      This definition must be specified using the following regular
                      expression pattern: `[0-9]+(ms|s|m|h|d|w|y)` (ms = milliseconds,
                      s= seconds,m = minutes, h = hours, d = days, w = weeks, y = years).
                      When omitted, this means the user has no opinion and the platform is left
                      to choose reasonable defaults, which are subject to change over time.
                      The default value is `15d`.
                    maxLength: 20
                    minLength: 1
                    type: string
+                   x-kubernetes-validations:
+                   - message: must match pattern [0-9]+(ms|s|m|h|d|w|y), e.g., 15d, 2w, 6m
+                     rule: self.matches('^[0-9]+(ms|s|m|h|d|w|y)$')
1867-1874: Add pattern validation for Kubernetes quantity format.

The retentionSize field accepts Kubernetes quantity formats (e.g., Mi, Gi, Ti) but lacks validation. Users can set invalid values like "5 GB" (space not allowed) or "100gigabytes".

Add the standard Kubernetes quantity pattern validation:
                  retentionSize:
                    description: |-
                      retentionSize specifies the maximum volume of persistent storage that Prometheus uses for data blocks and the write-ahead log (WAL).
                      Acceptable values use standard Kubernetes resource quantity formats, such as `Mi`, `Gi`, `Ti`, etc.
                      When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
                      The default is no storage size limit is enforced and Prometheus will use the available storage capacity of the PersistentVolume.
                    maxLength: 20
                    type: string
+                   pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
+                   x-kubernetes-validations:
+                   - message: must be a valid Kubernetes quantity (e.g., 5Gi, 100Mi)
+                     rule: isQuantity(self)
1635-1654: Add validation to prevent unsupported /dev/ paths.

The description states that writing to /dev/ paths other than /dev/stderr, /dev/stdout, or /dev/null is not supported, but the validation only checks if the path is absolute or a simple filename. Users can set unsupported paths like /dev/random or /dev/zero.

Add a CEL validation to prevent unsupported /dev/ paths:
                  queryLogFile:
                    description: |-
                      queryLogFile specifies the file to which PromQL queries are logged.
                      This setting can be either a filename, in which
                      case the queries are saved to an `emptyDir` volume
                      at `/var/log/prometheus`, or a full path to a location where
                      an `emptyDir` volume will be mounted and the queries saved.
                      Writing to `/dev/stderr`, `/dev/stdout` or `/dev/null` is supported, but
                      writing to any other `/dev/` path is not supported. Relative paths are
                      also not supported.
                      By default, PromQL queries are not logged.
                      Must be an absolute path starting with `/` or a simple filename without path separators.
                      Must be between 1 and 255 characters in length.
                    maxLength: 255
                    minLength: 1
                    type: string
                    x-kubernetes-validations:
                    - message: must be an absolute path starting with '/' or a simple
                        filename without '/'
                      rule: self.startsWith('/') || !self.contains('/')
+                   - message: /dev/ paths other than /dev/stderr, /dev/stdout, or /dev/null are not supported
+                     rule: '!self.startsWith(''/dev/'') || self in [''/dev/stderr'', ''/dev/stdout'', ''/dev/null'']'
config/v1alpha1/types_cluster_monitoring.go (1)

596-603: Consider using a custom type instead of core/v1.PersistentVolumeClaim.

Using *v1.PersistentVolumeClaim directly limits the ability to enforce admission-time validations specific to this use case. OpenShift APIs typically prefer custom reference types that allow for more granular validation.

Based on past review feedback (comment at lines 603-603), there is precedent in the openshift/api repository for wrapping persistent volume references in custom types. This would enable:

More specific validation rules for this context

Better documentation tailored to Prometheus usage

Ability to restrict which PVC fields are relevant

Example from insights API:

api/insights/v1alpha2/types_insights.go

Lines 118 to 121 in 8a46f74

// persistentVolume is an optional field that specifies the PersistentVolume that will be used to store the Insights data archive.

// The PersistentVolume must be created in the openshift-insights namespace.

// +optional

PersistentVolume *PersistentVolumeConfig `json:"persistentVolume,omitempty"`

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0643f0d and 99db558.

📒 Files selected for processing (12)

config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1 hunks)
config/v1alpha1/types_cluster_monitoring.go (3 hunks)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)
config/v1alpha1/zz_generated.deepcopy.go (8 hunks)
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1 hunks)
config/v1alpha1/zz_generated.swagger_doc_generated.go (4 hunks)
openapi/generated_openapi/zz_generated.openapi.go (13 hunks)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
config/v1alpha1/zz_generated.swagger_doc_generated.go
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
openapi/generated_openapi/zz_generated.openapi.go
config/v1alpha1/types_cluster_monitoring.go
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
config/v1alpha1/zz_generated.deepcopy.go

🧬 Code graph analysis (1)

config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

config/v1alpha1/types_cluster_monitoring.go (8)

AdditionalAlertmanagerConfig (622-668)

ExternalLabels (690-700)

Label (671-687)

PrometheusK8sConfig (436-604)

RelabelConfig (733-784)

RemoteWriteSpec (703-730)

SecretKeySelector (820-835)

TLSConfig (787-816)

🔇 Additional comments (17)

config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (2)

354-370: LGTM!

The test correctly uses the capitalized enum value "Info" for logLevel, matching the CRD's allowed values (Error, Warn, Info, Debug). This addresses the previous review comment about case sensitivity.

371-778: Excellent test coverage!

The test suite comprehensively validates PrometheusK8sConfig behavior:

Valid configurations for all major fields (nodeSelector, tolerations, topologySpreadConstraints, additionalAlertmanagerConfigs, remoteWrite, externalLabels)

Appropriate rejection tests for boundary violations (empty objects, exceeding maxItems limits, violating minItems constraints)

Clear, actionable error messages in all rejection scenarios

The tests align well with the CRD validation rules and provide strong coverage for the new PrometheusK8sConfig API surface.

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (2)

1603-1620: LGTM!

The logLevel description correctly refers to "Prometheus" rather than "Alertmanager", addressing the previous review comment. The field definition properly specifies the allowed enum values (Error, Warn, Info, Debug) and provides clear documentation about each log level's behavior.

1287-2529: Well-structured CRD schema for PrometheusK8sConfig.

The generated OpenAPI schema is comprehensive and follows Kubernetes API conventions:

Proper validation constraints (maxItems: 10 for arrays, length limits, enum values, patterns)

Correct list-type annotations (map, atomic, set) for appropriate array behavior

Standard Kubernetes types reused where applicable (PersistentVolumeClaim, Toleration, TopologySpreadConstraint)

Clear field descriptions explaining purpose, defaults, and allowed values

Appropriate minProperties: 1 constraint to prevent empty configuration objects

The schema provides a robust foundation for the PrometheusK8sConfig API.

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)

1287-2529: LGTM! Comprehensive and well-structured Prometheus configuration schema.

The prometheusK8sConfig field addition is well-designed with:

Clear, detailed field descriptions explaining purpose and defaults

Comprehensive validation rules (patterns, length constraints, enums)

Proper security options (TLS configuration, bearer token authentication)

Reasonable constraints (max 10 items for most lists)

Consistent use of Kubernetes API conventions

Previous review concerns (enum case, field documentation) appear to have been properly addressed based on the commit markers in past reviews.

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (2)

1287-1300: Well-documented API surface.

The prometheusK8sConfig field and its nested properties are thoroughly documented with clear descriptions, use cases, and behavioral expectations. The documentation improvements from previous reviews (collectionProfile enum values, logLevel component references) have been successfully addressed.

1300-1301: Appropriate constraints protect against configuration explosion.

The API design includes sensible limits (e.g., maxItems: 10 for additionalAlertmanagerConfigs, remoteWrite endpoints, and resource entries) and requires meaningful configuration via minProperties: 1. These constraints prevent performance degradation from overly complex configurations while remaining flexible for most production use cases.

Also applies to: 1522-1523, 1774-1778, 1849-1854

config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)

1287-1300: New prometheusK8sConfig API structure looks well-designed.

The field organization follows established patterns from alertmanagerConfig and metricsServerConfig. Good use of list-type maps with appropriate keys for uniqueness constraints, consistent validation rules, and comprehensive documentation. The minProperties constraint ensures meaningful configurations.

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (2)

1603-1620: Previous issue resolved.

The logLevel description now correctly references "Prometheus" instead of "Alertmanager". The field documentation is accurate and complete.

1287-2529: LGTM! Well-structured generated schema.

The prometheusK8sConfig schema is comprehensive and properly validated with:

Appropriate validation constraints (length limits, patterns, enums)

Secure defaults (e.g., certificate verification enabled)

Clear documentation for all fields

Proper Kubernetes CRD conventions

Since this is a generated file, the schema appears correctly generated from the API types.

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)

1287-2529: PrometheusK8sConfig CRD block looks consistent with Go types and validation markers

The prometheusK8sConfig schema (including additionalAlertmanagerConfigs, externalLabels, remoteWrite, TLS/secret selectors, resources, retention, scheduling knobs, and PVC template) aligns with the referenced Go types and kubebuilder validations. List semantics, enums, and x-kubernetes validations all look correct and coherent with the intended API surface. I don't see issues here.

config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

121-201: Swagger docs for PrometheusK8sConfig and related types are consistent and accurate

The newly added SwaggerDoc maps for AdditionalAlertmanagerConfig, ExternalLabels, Label, PrometheusK8sConfig, RelabelConfig, RemoteWriteSpec, SecretKeySelector, and TLSConfig, plus the ClusterMonitoringSpec.prometheusK8sConfig entry, all line up with the Go type comments and the CRD schema (fields, enums, and behavior). This should give consumers clear, correct API documentation without introducing inconsistencies.

Also applies to: 222-325

openapi/generated_openapi/zz_generated.openapi.go (4)

424-474: LGTM: Schema function registrations are correct.

All new v1alpha1 monitoring types (AdditionalAlertmanagerConfig, ExternalLabels, Label, PrometheusK8sConfig, RelabelConfig, RemoteWriteSpec, SecretKeySelector, TLSConfig) are properly registered in the GetOpenAPIDefinitions map with consistent naming and alphabetical ordering.

21227-21302: LGTM: AdditionalAlertmanagerConfig and ClusterMonitoringSpec changes are well-structured.

The schema definition for AdditionalAlertmanagerConfig properly specifies required fields (apiVersion, staticConfigs), uses appropriate list-type annotations, and correctly declares dependencies on SecretKeySelector and TLSConfig. The addition of prometheusK8sConfig to ClusterMonitoringSpec follows existing patterns and includes comprehensive documentation.

Also applies to: 21918-21936

22032-22068: LGTM: ExternalLabels, Label, and PrometheusK8sConfig schemas are correct.

The schemas are well-structured with proper list-map annotations, detailed descriptions, and correct dependencies. The previously flagged logLevel description inconsistency (line 22972) has been resolved—it now correctly refers to "logs emitted by Prometheus" rather than Alertmanager.

Based on past review comments, the logLevel description issue was addressed in commit 0643f0d.

Also applies to: 22462-22488, 22930-23119

23149-23272: LGTM: RelabelConfig, RemoteWriteSpec, SecretKeySelector, and TLSConfig schemas are secure and well-defined.

All schemas follow consistent patterns with proper annotations. SecretKeySelector is correctly marked as atomic (x-kubernetes-map-type: atomic), and TLSConfig appropriately uses SecretKeySelector references for sensitive fields (ca, cert, key), ensuring secure handling of credentials.

Also applies to: 23368-23399, 23431-23479

config/v1alpha1/types_cluster_monitoring.go (1)

437-450: Reconsider list type for multi-owner scenarios.

The additionalAlertmanagerConfigs field uses listType=atomic, which prevents multiple owners from managing separate entries via server-side apply. Past review discussions (everettraven, simonpasquier) highlighted that enabling multiple owners is important - for example, ACM wanting to manage its own entries while cluster admins manage others.

However, there's no clear unique key for listType=map since:

apiVersion is an enum with only one value (v2)

staticConfigs can vary for the same logical Alertmanager

Consider whether the current atomic list type adequately supports the intended multi-owner use cases, or if a different approach (such as introducing a required name field as a unique identifier) would better serve users.

Based on past review discussions at lines 450-450 in previous comments.

coderabbitai · 2025-12-16T15:28:22Z

...ed.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml

+                        scheme:
+                          default: HTTP
+                          description: |-
+                            scheme defines the URL scheme to use when communicating with Alertmanager
+                            instances.
+                            Possible values are `HTTP` or `HTTPS`.
+                            When omitted, defaults to `HTTP`.
+                          enum:
+                          - HTTP
+                          - HTTPS
+                          type: string


⚠️ Potential issue | 🟠 Major

Reconsider HTTP as the default scheme for Alertmanager connections.

The scheme field defaults to HTTP for connections to additional Alertmanager instances. In production monitoring environments, alerts may contain sensitive information (e.g., error messages, system details), and communication with alert endpoints should use encrypted channels by default.

Consider defaulting to HTTPS instead, or at minimum, add prominent documentation warning that HTTP transmits alerts and credentials in cleartext.

🤖 Prompt for AI Agents

In config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml around lines 1366-1376, the scheme default is set to HTTP; change the default to HTTPS and update the description text to state that the default is HTTPS and that HTTPS encrypts alert payloads and credentials; retain the enum values (HTTP, HTTPS) but ensure the textual docs include a prominent warning that using HTTP transmits alerts and credentials in cleartext and should only be used for testing or within trusted networks.

coderabbitai · 2025-12-16T15:28:22Z

...ed.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml

+                            maxLength: 255
+                            type: string
+                            x-kubernetes-validations:
+                            - message: must be in the format 'host:port' (e.g., 'alertmanager.example.com:9093')
+                              rule: self.matches('^[a-zA-Z0-9.-]+:[0-9]+$')
+                          maxItems: 10


⚠️ Potential issue | 🟡 Minor

Strengthen host:port validation to prevent invalid hostnames.

The regex ^[a-zA-Z0-9.-]+:[0-9]+$ permits invalid hostnames that start or end with hyphens or dots (e.g., -example.com:9093 or example-.com:9093). While Kubernetes' own validation may catch this downstream, tightening the pattern here would fail-fast with clearer error messages.

Consider a pattern like ^[a-zA-Z0-9]([a-zA-Z0-9.-]*[a-zA-Z0-9])?:[0-9]+$ to ensure hostnames start and end with alphanumeric characters.

🤖 Prompt for AI Agents

In config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml around lines 1383-1388, the existing host:port regex allows hostnames that begin or end with hyphens or dots; replace the rule with a stricter pattern that enforces the hostname to start and end with an alphanumeric character (for example by using a pattern equivalent to ^[a-zA-Z0-9](?:[a-zA-Z0-9.-]*[a-zA-Z0-9])?:[0-9]+$) so hostnames like -example.com or example-.com are rejected while preserving the port validation. Ensure the new pattern is placed in the x-kubernetes-validations.rule field and the validation message remains descriptive.

openshift-ci · 2025-12-16T18:10:39Z

@marioferh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn	`5cef22c`	link	true	`/test e2e-aws-ovn`
ci/prow/e2e-gcp	`5cef22c`	link	false	`/test e2e-gcp`
ci/prow/e2e-aws-serial-1of2	`5cef22c`	link	true	`/test e2e-aws-serial-1of2`
ci/prow/okd-scos-e2e-aws-ovn	`505533b`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/integration	`99db558`	link	true	`/test integration`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

everettraven

Another handful of comments after another pass.

Just a note that because this PR has been around a while and has many comments that things are getting a bit more difficult to keep track of here.

It may be worth it to open a new PR for this change and essentially start the review cycle fresh so we can keep better track of the review comments.

I'll leave that up to you though.

everettraven · 2025-12-16T18:47:18Z

config/v1alpha1/types_cluster_monitoring.go

+	// By default, no additional Alertmanager instances are configured.
+	// Maximum of 10 additional Alertmanager configurations can be specified.
+	// When omitted, no additional Alertmanager instances are configured (default behavior).
+	// When set to an empty array [], the behavior is the same as omitting the field.


In this case, you should not allow setting [] then. Where possible we try to have only one way to "say" something.

In this case, because setting both [] and omitting the field altogether mean the same thing we should just make it so that omitting the field is the only way to signal that you do not want any additional alertmanager instances configured.

You can do this by setting the min items to 1 for this field.

Makes sense, I'll add +kubebuilder:validation:MinItems=1 to enforce that omission is the only way to signal "no additional alertmanagers". This removes the ambiguity between [] and field omission.

everettraven · 2025-12-16T18:49:05Z

config/v1alpha1/types_cluster_monitoring.go

+	// +optional
+	// +kubebuilder:validation:MaxItems=10
+	// +listType=atomic
+	AdditionalAlertmanagerConfigs []AdditionalAlertmanagerConfig `json:"additionalAlertmanagerConfigs,omitempty"`


Re: naming, prometheusConfig makes sense to me - although that looks like maybe it is associated with a different thread.

@danielmellado It looks like this thread still needs to be addressed by updating the list type from atomic to map and setting the appropriate uniqueness keys.

everettraven · 2025-12-16T18:50:27Z

config/v1alpha1/types_cluster_monitoring.go

+	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:MaxLength=50
+	// +optional
+	EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"`


If you are going to be continuing with the size format approach as opposed to using explicit units, yes.

everettraven · 2025-12-16T18:53:03Z

config/v1alpha1/types_cluster_monitoring.go

+}
+
+// ExternalLabels represents labels to be added to time series and alerts.
+type ExternalLabels struct {


Good catch - I missed this on my last review round.

This does look funky. I'd expect that we probably just want externalLabels to be typed as a slice of Labels instead of this nesting.

everettraven · 2025-12-16T18:56:17Z

config/v1alpha1/types_cluster_monitoring.go

+	// Writing to `/dev/stderr`, `/dev/stdout` or `/dev/null` is supported, but
+	// writing to any other `/dev/` path is not supported. Relative paths are
+	// also not supported.


This doesn't appear to be enforced by validations

Good catch - added CEL validation to enforce the /dev/ path restriction: !self.startsWith('/dev/') || self in ['/dev/stdout', '/dev/stderr', '/dev/null']. This ensures only the documented /dev/ paths are accepted while rejecting arbitrary /dev/ paths.

everettraven · 2025-12-16T19:31:02Z

config/v1alpha1/types_cluster_monitoring.go

+	// +optional
+	// +kubebuilder:validation:MaxLength=10
+	Separator *string `json:"separator,omitempty"`
+	// regex is the regular expression to match against the concatenated source label values.


It wouldn't hurt to add this.

Is there a reasonable way for us to ensure that the regular expression only uses characters that are valid in an RE2 regular expression?

everettraven · 2025-12-16T19:33:12Z

config/v1alpha1/types_cluster_monitoring.go

+	// +optional
+	// +kubebuilder:validation:MaxLength=63
+	TargetLabel *string `json:"targetLabel,omitempty"`


We should ensure that is appropriately enforced through validations.

everettraven · 2025-12-16T19:35:01Z

config/v1alpha1/types_cluster_monitoring.go

+	// +optional
+	// +kubebuilder:validation:Enum=Verify;SkipVerify
+	// +kubebuilder:default=Verify
+	CertificateVerification string `json:"certificateVerification,omitempty"`


Yes, we typically encourage the use of a type alias for enums

everettraven · 2025-12-16T19:36:04Z

config/v1alpha1/types_cluster_monitoring.go

+// SecretKeySelector selects a key of a Secret.
+// +structType=atomic
+type SecretKeySelector struct {
+	// name is the name of the secret in the same namespace to select from.


What is the "same" namespace?

Clarified that secrets are referenced from the openshift-monitoring namespace (where the platform monitoring stack is deployed).

everettraven · 2025-12-16T19:36:51Z

config/v1alpha1/types_cluster_monitoring.go

+	// +kubebuilder:validation:MaxLength=253
+	// +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="must be a valid secret name (lowercase alphanumeric characters, '-' or '.', start and end with alphanumeric)"
+	Name string `json:"name,omitempty"`
+	// key is the key of the secret to select from. Must be a valid secret key.


What makes a "valid secret key"? Are there formatting constraints that can be enforced here?

Added validation ;) Secret keys must match ^[a-zA-Z0-9._-]+$ (alphanumeric, -, _, or .). This is the standard Kubernetes secret key format.

danielmellado · 2025-12-16T19:50:32Z

I'll deff do that as it's getting hard to me as well but my colleague told me that he was pointed out not to do that. If that's ok for you I'll reply to your comments then put up a fresh one so it's clearer. Thanks! El mar, 16 dic 2025, 20:39, Bryce Palmer ***@***.***> escribió:

…

***@***.**** commented on this pull request. Another handful of comments after another pass. Just a note that because this PR has been around a while and has many comments that things are getting a bit more difficult to keep track of here. It may be worth it to open a new PR for this change and essentially start the review cycle fresh so we can keep better track of the review comments. I'll leave that up to you though. ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > @@ -416,6 +429,444 @@ type MetricsServerConfig struct { TopologySpreadConstraints []v1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"` } +// PrometheusK8sConfig provides configuration options for the Prometheus instance +// Use this configuration to control +// Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations. +// +kubebuilder:validation:MinProperties=1 +type PrometheusK8sConfig struct { + // additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from + // the Prometheus component. This is useful for organizations that need to: + // - Send alerts to external monitoring systems (like PagerDuty, Slack, or custom webhooks) + // - Route different types of alerts to different teams or systems + // - Integrate with existing enterprise alerting infrastructure + // - Maintain separate alert routing for compliance or organizational requirements + // By default, no additional Alertmanager instances are configured. + // Maximum of 10 additional Alertmanager configurations can be specified. + // When omitted, no additional Alertmanager instances are configured (default behavior). + // When set to an empty array [], the behavior is the same as omitting the field. In this case, you should not allow setting [] then. Where possible we try to have only one way to "say" something. In this case, because setting both [] and omitting the field altogether mean the same thing we should just make it so that omitting the field is the only way to signal that you do not want any additional alertmanager instances configured. You can do this by setting the min items to 1 for this field. ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // Writing to `/dev/stderr`, `/dev/stdout` or `/dev/null` is supported, but + // writing to any other `/dev/` path is not supported. Relative paths are + // also not supported. This doesn't appear to be enforced by validations ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // The current default value is `kubernetes.io/os: linux`. + // Maximum of 10 node selector key-value pairs can be specified. + // +optional + // +kubebuilder:validation:MinProperties=1 + // +kubebuilder:validation:MaxProperties=10 + NodeSelector map[string]string `json:"nodeSelector,omitempty"` + // queryLogFile specifies the file to which PromQL queries are logged. + // This setting can be either a filename, in which + // case the queries are saved to an `emptyDir` volume + // at `/var/log/prometheus`, or a full path to a location where + // an `emptyDir` volume will be mounted and the queries saved. + // Writing to `/dev/stderr`, `/dev/stdout` or `/dev/null` is supported, but + // writing to any other `/dev/` path is not supported. Relative paths are + // also not supported. + // By default, PromQL queries are not logged. + // Must be an absolute path starting with `/` or a simple filename without path separators. What is a "simple filename"? Can this have a file extension? ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // Must be between 1 and 255 characters in length. + // +optional + // +kubebuilder:validation:MinLength=1 + // +kubebuilder:validation:MaxLength=255 + // +kubebuilder:validation:XValidation:rule="self.startsWith('/') || !self.contains('/')",message="must be an absolute path starting with '/' or a simple filename without '/'" + QueryLogFile string `json:"queryLogFile,omitempty"` + // remoteWrite defines the remote write configuration, including URL, authentication, and relabeling settings. + // Remote write allows Prometheus to send metrics it collects to external long-term storage systems. + // Maximum of 10 remote write configurations can be specified. + // Each entry must have a unique URL. + // When omitted, no remote write endpoints are configured. + // +kubebuilder:validation:MaxItems=10 + // +listType=map + // +listMapKey=url + // +optional + RemoteWrite []RemoteWriteSpec `json:"remoteWrite,omitempty"` If there is not a difference between the empty value [] and this field being omitted, please add minimum length of 1 ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // +kubebuilder:validation:Enum=v2 + // +required + APIVersion AlertmanagerAPIVersion `json:"apiVersion,omitempty"` + // bearerToken defines the secret reference containing the bearer token + // to use when authenticating to Alertmanager. + // When omitted, no bearer token authentication is used. + // +optional + BearerToken SecretKeySelector `json:"bearerToken,omitempty,omitzero"` + // pathPrefix defines an optional URL path prefix to prepend to the Alertmanager API endpoints. + // For example, if your Alertmanager is behind a reverse proxy at "/alertmanager/", + // set this to "/alertmanager" so requests go to "/alertmanager/api/v1/alerts" instead of "/api/v1/alerts". + // This is commonly needed when Alertmanager is deployed behind ingress controllers or load balancers. + // +kubebuilder:validation:MaxLength=255 + // +kubebuilder:validation:MinLength=1 + // +optional + PathPrefix string `json:"pathPrefix,omitempty"` Looks like this field also needs formatting validations to ensure that it is a valid path prefix. ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // When omitted, no name is assigned. + // Must be at most 63 characters in length when specified. + // +optional + // +kubebuilder:validation:MaxLength=63 + Name *string `json:"name,omitempty"` + // remoteTimeout is the timeout for requests to the remote write endpoint. + // When omitted, the default is 30s. + // +optional + // +kubebuilder:validation:MaxLength=20 + // +kubebuilder:validation:MinLength=2 + RemoteTimeout string `json:"remoteTimeout,omitempty"` + // writeRelabelConfigs is a list of relabeling rules to apply before sending data to the remote endpoint. + // Maximum of 10 relabeling rules can be specified. + // +optional + // +kubebuilder:validation:MaxItems=10 + // +listType=atomic Is atomic actually what you want here? ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // If a label does not exist, an empty string ("") is used in its place. + // The values of these labels are joined together using the configured separator, + // and the resulting string is then matched against the regular expression for + // the replace, keep, or drop actions. + // When omitted, the rule operates without extracting source labels (useful for actions like labelmap). + // Maximum of 10 source labels can be specified, each up to 63 characters. + // +optional + // +kubebuilder:validation:MaxItems=10 + // +kubebuilder:validation:items:MaxLength=63 + // +listType=set + SourceLabels []string `json:"sourceLabels,omitempty"` + // separator is the separator used to join source label values. + // When omitted, defaults to ";" (semicolon). + // Must be at most 10 characters in length. + // +optional + // +kubebuilder:validation:MaxLength=10 Do you have examples where a separator greater than a single character makes sense here? ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // +kubebuilder:validation:MaxLength=253 + // +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="must be a valid DNS subdomain (lowercase alphanumeric characters, '-' or '.', start and end with alphanumeric)" + ServerName string `json:"serverName,omitempty"` + // certificateVerification determines the policy for TLS certificate verification. + // Allowed values are "Verify" (performs certificate verification, secure) and "SkipVerify" (skips verification, insecure). + // When omitted, defaults to "Verify" (secure certificate verification is performed). + // +optional + // +kubebuilder:validation:Enum=Verify;SkipVerify + // +kubebuilder:default=Verify + CertificateVerification string `json:"certificateVerification,omitempty"` +} + +// SecretKeySelector selects a key of a Secret. +// +structType=atomic +type SecretKeySelector struct { + // name is the name of the secret in the same namespace to select from. What is the "same" namespace? ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // +kubebuilder:default=Verify + CertificateVerification string `json:"certificateVerification,omitempty"` +} + +// SecretKeySelector selects a key of a Secret. +// +structType=atomic +type SecretKeySelector struct { + // name is the name of the secret in the same namespace to select from. + // Must be a valid Kubernetes secret name (lowercase alphanumeric, '-' or '.', start/end with alphanumeric). + // Must be between 1 and 253 characters in length. + // +required + // +kubebuilder:validation:MinLength=1 + // +kubebuilder:validation:MaxLength=253 + // +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="must be a valid secret name (lowercase alphanumeric characters, '-' or '.', start and end with alphanumeric)" + Name string `json:"name,omitempty"` + // key is the key of the secret to select from. Must be a valid secret key. What makes a "valid secret key"? Are there formatting constraints that can be enforced here? ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > @@ -416,6 +422,292 @@ type MetricsServerConfig struct { TopologySpreadConstraints []v1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"` } +// PrometheusK8sConfig provides configuration options for the Prometheus instance +// Use this configuration to control +// Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations. +// +kubebuilder:validation:MinProperties=1 +type PrometheusK8sConfig struct { + // additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from + // the Prometheus component. By default, no additional Alertmanager instances are configured. + // +optional + // +kubebuilder:validation:MaxItems=10 + // +listType=atomic + AdditionalAlertmanagerConfigs []AdditionalAlertmanagerConfig `json:"additionalAlertmanagerConfigs,omitempty"` Re: naming, prometheusConfig makes sense to me - although that looks like maybe it is associated with a different thread. @danielmellado <https://github.com/danielmellado> It looks like this thread still needs to be addressed by updating the list type from atomic to map and setting the appropriate uniqueness keys. ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // +optional + // +kubebuilder:validation:MaxLength=255 + PathPrefix *string `json:"pathPrefix,omitempty"` + // scheme defines the URL scheme to use when communicating with Alertmanager + // instances. + // Possible values are `http` or `https`. The default value is `http`. + // +optional + // +kubebuilder:validation:MaxLength=10 + Scheme *string `json:"scheme,omitempty"` + // staticConfigs is a list of statically configured Alertmanager endpoints in the form + // of `<hosts>:<port>`. + // +optional + // +kubebuilder:validation:MaxItems=10 + // +kubebuilder:validation:items:MaxLength=255 + // +listType=set + StaticConfigs []string `json:"staticConfigs,omitempty"` If this needs to be a valid hostname with ports, there might be existing utilities out there for a CEL validation. If not, you may be able to leverage the URL CEL helpers to validate that it is a valid URL and the hostname and port are valid. ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // +kubebuilder:validation:MaxLength=10 + Scheme *string `json:"scheme,omitempty"` + // staticConfigs is a list of statically configured Alertmanager endpoints in the form + // of `<hosts>:<port>`. + // +optional + // +kubebuilder:validation:MaxItems=10 + // +kubebuilder:validation:items:MaxLength=255 + // +listType=set + StaticConfigs []string `json:"staticConfigs,omitempty"` + // timeout defines the timeout value used when sending alerts. + // +optional + // +kubebuilder:validation:MaxLength=20 + Timeout *string `json:"timeout,omitempty"` + // tlsConfig defines the TLS settings to use for Alertmanager connections. + // +optional + TLSConfig *TLSConfig `json:"tlsConfig,omitempty"` We should not allow setting {} on this field then. The TLSConfig type should have either at least one required field *or* have minimum properties set to 1 ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + +// ExternalLabels represents labels to be added to time series and alerts. +type ExternalLabels struct { + // labels is a map of label names to label values. + // +required + Labels map[string]string `json:"labels,omitempty"` +} + +// RemoteWriteSpec represents configuration for remote write endpoints. +type RemoteWriteSpec struct { + // url is the URL of the remote write endpoint. + // +required + // +kubebuilder:validation:MaxLength=2048 + URL *string `json:"url,omitempty"` + // name is the name of the remote write configuration. + // +optional Is setting "" the same as omitting the field? Prometheus has no naming constraints here? So setting something like !@*&(^ would be a valid name? Should *we* allow that? ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > +// RemoteWriteSpec represents configuration for remote write endpoints. +type RemoteWriteSpec struct { + // url is the URL of the remote write endpoint. + // +required + // +kubebuilder:validation:MaxLength=2048 + URL *string `json:"url,omitempty"` + // name is the name of the remote write configuration. + // +optional + // +kubebuilder:validation:MaxLength=63 + Name *string `json:"name,omitempty"` + // remoteTimeout is the timeout for requests to the remote write endpoint. + // +optional + // +kubebuilder:validation:MaxLength=20 + RemoteTimeout *string `json:"remoteTimeout,omitempty"` + // writeRelabelConfigs is a list of relabeling rules to apply before sending data to the remote endpoint. + // +optional Looks like this comment still needs to be addressed in the GoDoc. ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // +listType=atomic + WriteRelabelConfigs []RelabelConfig `json:"writeRelabelConfigs,omitempty"` +} + +// RelabelConfig represents a relabeling rule. +type RelabelConfig struct { + // sourceLabels specifies which labels to extract from each series for this relabeling rule. + // If a label does not exist, an empty string ("") is used in its place. + // The values of these labels are joined together using the configured separator, + // and the resulting string is then matched against the regular expression for + // the replace, keep, or drop actions. + // +optional + // +kubebuilder:validation:MaxItems=10 + // +kubebuilder:validation:items:MaxLength=63 + // +listType=set + SourceLabels []string `json:"sourceLabels,omitempty"` Sorry, I should have been more clear - is it valid for me, as an end-user writing this API, to set: sourceLabels: - "" ? Would that just mean to source from an empty label if it exists? ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > +type RelabelConfig struct { + // sourceLabels specifies which labels to extract from each series for this relabeling rule. + // If a label does not exist, an empty string ("") is used in its place. + // The values of these labels are joined together using the configured separator, + // and the resulting string is then matched against the regular expression for + // the replace, keep, or drop actions. + // +optional + // +kubebuilder:validation:MaxItems=10 + // +kubebuilder:validation:items:MaxLength=63 + // +listType=set + SourceLabels []string `json:"sourceLabels,omitempty"` + // separator is the separator used to join source label values. + // +optional + // +kubebuilder:validation:MaxLength=10 + Separator *string `json:"separator,omitempty"` + // regex is the regular expression to match against the concatenated source label values. It wouldn't hurt to add this. Is there a reasonable way for us to ensure that the regular expression only uses characters that are valid in an RE2 regular expression? ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // +optional + // +kubebuilder:validation:MaxLength=63 + TargetLabel *string `json:"targetLabel,omitempty"` We should ensure that is appropriately enforced through validations. ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > +type RemoteWriteSpec struct { + // url is the URL of the remote write endpoint. + // +required + // +kubebuilder:validation:MaxLength=2048 + // +kubebuilder:validation:MinLength=1 + URL string `json:"url,omitempty"` + // name is the name of the remote write configuration. + // +optional + // +kubebuilder:validation:MaxLength=63 + Name *string `json:"name,omitempty"` + // remoteTimeout is the timeout for requests to the remote write endpoint. + // When omitted, the default is 30s. + // +optional + // +kubebuilder:validation:MaxLength=20 + // +kubebuilder:validation:MinLength=2 + RemoteTimeout string `json:"remoteTimeout,omitempty"` Just because the Prometheus configuration uses the Go Duration type does not mean that you have to for your API. You can have an internal translation layer. ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // +listType=atomic + AdditionalAlertmanagerConfigs []AdditionalAlertmanagerConfig `json:"additionalAlertmanagerConfigs,omitempty"` + // enforcedBodySizeLimit enforces a body size limit for Prometheus scraped metrics. If a scraped + // target's body response is larger than the limit, the scrape will fail. + // The following values are valid: + // a numeric value in Prometheus size format (such as "4MB", "1000", "1GB", "512KB", "100B") + // or the string `automatic`, which indicates that the limit will be + // automatically calculated based on cluster capacity. + // To specify no limit, omit this field. + // The value must match the following pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$ + // Minimum length is 1 character. + // Maximum length is 50 characters. + // +kubebuilder:validation:MinLength=1 + // +kubebuilder:validation:MaxLength=50 + // +optional + EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"` If you are going to be continuing with the size format approach as opposed to using explicit units, yes. ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > +) + +// AdditionalAlertmanagerConfig represents configuration for additional Alertmanager instances. +// The `AdditionalAlertmanagerConfig` resource defines settings for how a +// component communicates with additional Alertmanager instances. +type AdditionalAlertmanagerConfig struct { + // apiVersion defines the Alertmanager API version to target. + // Allowed values: "v2". "v1" is no longer supported. + // +kubebuilder:validation:Enum=v2 + // +required + APIVersion AlertmanagerAPIVersion `json:"apiVersion,omitempty"` + // bearerToken defines the secret reference containing the bearer token + // to use when authenticating to Alertmanager. + // This is a custom type to allow for admission time validations. + // +optional + BearerToken SecretKeySelector `json:"bearerToken,omitempty,omitzero"` For a situation like that, instead of making it free-form I would encourage the use of a discriminated union https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#discriminated-unions Using a DU, you can ensure that only the authentication types that you want to support are supported on OpenShift *and* you can create concrete types with appropriate structures and references for each supported type. ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // key is the name of the label. + // The key must be a valid Prometheus label name, starting with a letter or underscore, + // followed by letters, digits, or underscores. + // +required + // +kubebuilder:validation:MaxLength=63 + // +kubebuilder:validation:MinLength=1 + Key string `json:"key,omitempty"` + // value is the value of the label. + // +required + // +kubebuilder:validation:MaxLength=63 + // +kubebuilder:validation:MinLength=1 + Value string `json:"value,omitempty"` +} + +// ExternalLabels represents labels to be added to time series and alerts. +type ExternalLabels struct { Good catch - I missed this on my last review round. This does look funky. I'd expect that we probably just want externalLabels to be typed as a slice of Labels instead of this nesting. ------------------------------ In config/v1alpha1/types_cluster_monitoring.go <#2463 (comment)>: > + // key is the client key to use for TLS connections. + // +optional + Key SecretKeySelector `json:"key,omitempty,omitzero"` + // serverName is the server name to use for TLS connections. + // If specified, must be a valid DNS subdomain as per RFC 1123. + // +optional + // +kubebuilder:validation:MinLength=1 + // +kubebuilder:validation:MaxLength=253 + ServerName string `json:"serverName,omitempty"` + // certificateVerification determines the policy for TLS certificate verification. + // Allowed values are "Verify" (performs certificate verification, secure) and "SkipVerify" (skips verification, insecure). + // When omitted, defaults to "Verify" (secure certificate verification is performed). + // +optional + // +kubebuilder:validation:Enum=Verify;SkipVerify + // +kubebuilder:default=Verify + CertificateVerification string `json:"certificateVerification,omitempty"` Yes, we typically encourage the use of a type alias for enums — Reply to this email directly, view it on GitHub <#2463 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAVQ64OAGWVK3MGT7U6RD4CBNXTAVCNFSM6AAAAACFKBDI3SVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTKOBUGUYTAMZZGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

danielmellado · 2025-12-16T22:02:28Z

Thanks @everettraven! I'll put up a new PR with the updates in the comments. Even GH API is being a bit sluggish and I'm starting to have a hard time getting to it due to it by default not loading more than 40 comments. There were a LOT before a took this xD

simonpasquier · 2025-12-17T11:08:43Z

config/v1alpha1/types_cluster_monitoring.go

+}
+
+// RemoteWriteSpec represents configuration for remote write endpoints.
+type RemoteWriteSpec struct {


We lack all the queue configuration which is another big struct.

simonpasquier · 2025-12-17T11:10:54Z

Thanks @everettraven! I'll put up a new PR with the updates in the comments. Even GH API is being a bit sluggish and I'm starting to have a hard time getting to it due to it by default not loading more than 40 comments. There were a LOT before a took this xD

Chatted offline with @danielmellado: I would even go one step further and split the PR into multiple ones, each PR focusing on a sub-part of the configuration which can reasonably be reviewed. For instance, remote write config only has dozens of fields.

danielmellado · 2025-12-17T11:18:06Z

Totally +1 there, I *can't* even load the PR anymore and getting unicorns. For now, I've put up #2630 addressing yesterday's comments and replies so Bryce can reply and afterwards I'll split it out.

…

On Wed, Dec 17, 2025 at 12:11 PM Simon Pasquier ***@***.***> wrote: *simonpasquier* left a comment (openshift/api#2463) <#2463 (comment)> Thanks @everettraven <https://github.com/everettraven>! I'll put up a new PR with the updates in the comments. Even GH API is being a bit sluggish and I'm starting to have a hard time getting to it due to it by default not loading more than 40 comments. There were a LOT before a took this xD Chatted offline with @danielmellado <https://github.com/danielmellado>: I would even go one step further and split the PR into multiple ones, each PR focusing on a sub-part of the configuration which can reasonably be reviewed. For instance, remote write config only has dozens of fields. — Reply to this email directly, view it on GitHub <#2463 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAVQ5V34IZCZYMKL7QJKL4CE25JAVCNFSM6AAAAACFKBDI3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMNRUHA3DCOBVHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 1, 2025

openshift-ci bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 1, 2025

openshift-ci bot requested review from JoelSpeed and everettraven September 1, 2025 10:36

marioferh force-pushed the prometheus_k8s_config_api branch from 4b7b3a6 to 5cef22c Compare September 15, 2025 13:11

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2025

marioferh force-pushed the prometheus_k8s_config_api branch from 5cef22c to 86df303 Compare September 16, 2025 08:52

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 16, 2025

marioferh force-pushed the prometheus_k8s_config_api branch 3 times, most recently from bf0b7b7 to 64ce7b7 Compare September 16, 2025 17:23

marioferh changed the title ~~WIP: Monitoring API prometheus k8s config API~~ Monitoring API: prometheus k8s config API Sep 17, 2025

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 17, 2025

openshift-ci bot assigned everettraven Sep 17, 2025

everettraven requested changes Sep 17, 2025

View reviewed changes

everettraven mentioned this pull request Sep 17, 2025

MON-4031: Add prometheusOperatorConfig API #2481

Open

marioferh changed the title ~~Monitoring API: prometheus k8s config API~~ MON-4030: prometheus k8s config API Sep 23, 2025

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 23, 2025

everettraven reviewed Sep 23, 2025

View reviewed changes

danielmellado force-pushed the prometheus_k8s_config_api branch from 505533b to da2c1ef Compare December 2, 2025 09:49

danielmellado force-pushed the prometheus_k8s_config_api branch from 8473dba to e748202 Compare December 3, 2025 12:50

coderabbitai bot reviewed Dec 3, 2025

View reviewed changes

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 9, 2025

danielmellado force-pushed the prometheus_k8s_config_api branch from e748202 to 0d09af6 Compare December 9, 2025 15:21

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 9, 2025

coderabbitai bot reviewed Dec 9, 2025

View reviewed changes

everettraven reviewed Dec 12, 2025

View reviewed changes

simonpasquier reviewed Dec 16, 2025

View reviewed changes

danielmellado force-pushed the prometheus_k8s_config_api branch from 0d09af6 to 0643f0d Compare December 16, 2025 15:07

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 16, 2025

marioferh added 5 commits December 16, 2025 16:13

add prometheusK8sConfig API

2b6b5a4

Signed-off-by: Mario Fernandez <mariofer@redhat.com>

add suggestions

db4322a

Signed-off-by: Mario Fernandez <mariofer@redhat.com>

add suggestions

454dabe

Signed-off-by: Mario Fernandez <mariofer@redhat.com>

fix lint

bf3ce20

Signed-off-by: Mario Fernandez <mariofer@redhat.com>

fix test

610132f

Signed-off-by: Mario Fernandez <mariofer@redhat.com>

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

Fix API design issues and address PR feedback

99db558

Addresses feedback from PR reviewers and linting issues. Signed-off-by: Daniel Mellado <dmellado@fedoraproject.org>

danielmellado force-pushed the prometheus_k8s_config_api branch from 0643f0d to 99db558 Compare December 16, 2025 15:19

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 16, 2025

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

everettraven reviewed Dec 16, 2025

View reviewed changes

simonpasquier reviewed Dec 17, 2025

View reviewed changes

		// additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from
		// the Prometheus component. By default, no additional Alertmanager instances are configured.

		// prometheusK8sConfig provides configuration options for the Prometheus instance, which is the pod running Prometheus in the cluster.
		// By default, at least one Prometheus pod is deployed in the `openshift-monitoring` namespace to collect and store metrics for the platform.

		// https://prometheus.io/docs/prometheus/latest/configuration/configuration/
		// https://prometheus.io/docs/prometheus/latest/storage/

		// and Alertmanager. By default, no labels are added.
		// When omitted, no external labels are applied (default behavior).

	// +kubebuilder:validation:Enum=replace;keep;drop;hashmod;labelmap;labeldrop;labelkeep
	// +kubebuilder:validation:Enum=Replace;Keep;Drop;HashMod;LabelMap;LabelDrop;LabelKeep

	// persistentVolume is an optional field that specifies the PersistentVolume that will be used to store the Insights data archive.
	// The PersistentVolume must be created in the openshift-insights namespace.
	// +optional
	PersistentVolume *PersistentVolumeConfig `json:"persistentVolume,omitempty"`

MON-4030: prometheus k8s config API #2463

Are you sure you want to change the base?

MON-4030: prometheus k8s config API #2463

Uh oh!

Conversation

marioferh commented Sep 1, 2025

Uh oh!

openshift-ci bot commented Sep 1, 2025

Uh oh!

openshift-ci-robot commented Sep 15, 2025

Uh oh!

JoelSpeed commented Sep 16, 2025

Uh oh!

marioferh commented Sep 16, 2025

Uh oh!

everettraven commented Sep 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielmellado Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

everettraven Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielmellado Oct 22, 2025 •

edited

Loading

everettraven Oct 22, 2025 •

edited

Loading

openshift-ci-robot commented Sep 23, 2025 •

edited by openshift-ci bot

Loading