Skip to content

Conversation

@marioferh
Copy link
Contributor

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 1, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 1, 2025

Hello @marioferh! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 1, 2025
@marioferh marioferh force-pushed the prometheus_k8s_config_api branch from 4b7b3a6 to 5cef22c Compare September 15, 2025 13:11
@openshift-ci-robot
Copy link

/test remaining-required

Scheduling tests matching the pipeline_run_if_changed parameter:
/test e2e-aws-ovn
/test e2e-aws-ovn-hypershift
/test e2e-aws-ovn-hypershift-conformance
/test e2e-aws-ovn-techpreview
/test e2e-aws-serial-1of2
/test e2e-aws-serial-2of2
/test e2e-aws-serial-techpreview-1of2
/test e2e-aws-serial-techpreview-2of2
/test e2e-azure
/test e2e-gcp
/test e2e-upgrade
/test e2e-upgrade-out-of-change

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2025
@marioferh marioferh force-pushed the prometheus_k8s_config_api branch from 5cef22c to 86df303 Compare September 16, 2025 08:52
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 16, 2025
@JoelSpeed
Copy link
Contributor

@marioferh You appear to have some conflict markers here? Can you PTAL and make sure the correct content is pushed.

Let me know when you want this API reviewed

@marioferh
Copy link
Contributor Author

yep I've mixed the commit, let me fix it

@marioferh marioferh force-pushed the prometheus_k8s_config_api branch 3 times, most recently from bf0b7b7 to 64ce7b7 Compare September 16, 2025 17:23
@marioferh marioferh changed the title WIP: Monitoring API prometheus k8s config API Monitoring API: prometheus k8s config API Sep 17, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 17, 2025
@everettraven
Copy link
Contributor

/assign

Comment on lines 92 to 95
// prometheusK8sConfig provides configuration options for the Prometheus instance
// Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations.
// prometheusK8sConfig is optional.
// When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this follow the existing pattern of related fields like the metricsServerConfig field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, some of the questions I have from reading this as-is are:

  • What is "the Prometheus instance"?
  • Where does it run? Is it run by default?
  • Is Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations all the things I can configure? Why might I care, as a user, to configure these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include helpful information, for end-users that may read this API documentation, in the GoDoc here.

All of the information you included in your response here is helpful context in understanding what changes to this configuration impacts.

Please also include some information in the description as to why users may want to configure this field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// prometheusK8sConfig is optional.
// When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
// +optional
PrometheusK8sConfig PrometheusK8sConfig `json:"prometheusK8sConfig,omitempty,omitzero"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the K8s in the name here? Feels a bit redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to differentiate from prometheusOperator, coming from config map. Fixed.

Copy link
Contributor

@danielmellado danielmellado Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the reasoning, it was also a bit confusing to me when I first explored the API and the fields.

Copy link
Contributor

@everettraven everettraven Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have concretely decided on whether or not this should still be a separate root-level field in #2481 (comment) but if we do keep it separate, this should be named more clearly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree that we don't need the k8s part.

Comment on lines 430 to 431
// additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from
// the Prometheus component. By default, no additional Alertmanager instances are configured.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone unfamiliar with configuring Prometheus, I don't really understand what any of this means. Why would I care to configure additional alertmanager instances that receive alerts from Prometheus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AdditionalAlertmanagerConfig is used if the client has other Alertmanager and needs there the prometheus Alerts. Also to decouple alerts if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, additionalAlertmanagerConfigs is an optional field that configures Prometheus to send alerts to this set of Alertmanager instances. This is useful because it decouples alerts from what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not decouple alerts from nothing, sorry. It just sends the same alerts from Prometheus to another Alertmanager provided by the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this potentially clarify what this field does for end users?:

additionalAlertmanagerConfigs is an optional field used to configure how Prometheus communicates
with Alertmanager instances when sending alerts.

We should also include information as to what happens when this field is omitted in the GoDoc. Based on my current understanding, that would likely be something like:

When omitted, Prometheus will not send alerts to any Alertmanager instances.

Is this correct? Is there a default Alertmanager instance that will always be included in the configuration for Prometheus instances?

// the Prometheus component. By default, no additional Alertmanager instances are configured.
// +optional
// +kubebuilder:validation:MaxItems=10
// +listType=atomic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why atomic? Using atomic makes it so you can't have multiple-owner entries here, which doesn't seem like something unreasonable.

Would this be more appropriate as a listType=map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. Thx

// url is the URL of the remote write endpoint.
// +required
// +kubebuilder:validation:MaxLength=2048
URL *string `json:"url,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other constraints can we put in place to ensure the URL is valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// +kubebuilder:validation:MaxLength=2048
URL *string `json:"url,omitempty"`
// name is the name of the remote write configuration.
// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for this name to not be provided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, if you need to add and endpoint and you don't fill the name, data is not sent correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then this should be required?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still stands.

Additionally, what other constraints are there on the format of the name?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name is optional: if not specified, Prometheus will generate a unique one. There are no specific constraint on the name (Prometheus wise).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Prometheus has no specific constraints, I'm just enforcing MaxLenght here for sanity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is setting "" the same as omitting the field?

Prometheus has no naming constraints here? So setting something like !@*&(^ would be a valid name? Should we allow that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Added MinLength=1 to reject empty strings (omitting is the only way to let Prometheus auto-generate). Also added pattern validation restricting to alphanumeric, hyphens, and underscores to prevent garbage like !@*&(^.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So setting something like !@*&(^ would be a valid name?

Yes. Same as "☘"

Should we allow that?

I dunno

Comment on lines 637 to 640
// remoteTimeout is the timeout for requests to the remote write endpoint.
// +optional
// +kubebuilder:validation:MaxLength=20
RemoteTimeout *string `json:"remoteTimeout,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for this to not be provided? what are the allowed values here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// +kubebuilder:validation:MaxLength=20
RemoteTimeout *string `json:"remoteTimeout,omitempty"`
// writeRelabelConfigs is a list of relabeling rules to apply before sending data to the remote endpoint.
// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for this to be omitted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if is optional and you dont fill it, then no relabel is done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this comment still needs to be addressed in the GoDoc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added "When omitted, no relabeling is performed and all metrics are sent as-is." to the godoc.

Comment on lines 649 to 784
type RelabelConfig struct {
// sourceLabels is a list of source label names.
// +optional
// +kubebuilder:validation:MaxItems=10
// +kubebuilder:validation:items:MaxLength=63
// +listType=set
SourceLabels []string `json:"sourceLabels,omitempty"`
// separator is the separator used to join source label values.
// +optional
// +kubebuilder:validation:MaxLength=10
Separator *string `json:"separator,omitempty"`
// regex is the regular expression to match against the concatenated source label values.
// +optional
// +kubebuilder:validation:MaxLength=1000
Regex *string `json:"regex,omitempty"`
// targetLabel is the target label name.
// +optional
// +kubebuilder:validation:MaxLength=63
TargetLabel *string `json:"targetLabel,omitempty"`
// replacement is the replacement value for the target label.
// +optional
// +kubebuilder:validation:MaxLength=255
Replacement *string `json:"replacement,omitempty"`
// action is the action to perform.
// +optional
// +kubebuilder:validation:MaxLength=20
Action *string `json:"action,omitempty"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate in the godoc more on why a user may care to configure these fields, constraints, and what it means for them to be omitted.

As-is I don't really understand what any of these fields result in when configured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that explanations are pretty similar to prometheus ones:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from everettraven. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@marioferh
Copy link
Contributor Author

Some comments addressed, I will continue tomorrow

@marioferh marioferh changed the title Monitoring API: prometheus k8s config API MON-4030: prometheus k8s config API Sep 23, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 23, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 23, 2025

@marioferh: This pull request references MON-4030 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of comments

// prometheusK8sConfig provides configuration options for the Prometheus instance, which is the pod running Prometheus in the cluster.
// By default, at least one Prometheus pod is deployed in the `openshift-monitoring` namespace to collect and store metrics for the platform.
//
// This field allows you to customize how Prometheus is deployed and operated, including:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this configure how all Prometheus instances are deployed and operated or just the default one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, I'll push an updated PR clarifying this ;)

Comment on lines 109 to 111
// For more information on Prometheus configuration, see:
// https://prometheus.io/docs/prometheus/latest/configuration/configuration/
// https://prometheus.io/docs/prometheus/latest/storage/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally avoid linking to documentation where possible because these docs can be updated and the references may no longer make sense for users.

I don't know that these links are necessary either. We should be appropriately documenting each field in the configuration here such that users can understand what they are configuring and why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, removed the external Prometheus documentation links. Each child field now has comprehensive documentation

Comment on lines 430 to 431
// additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from
// the Prometheus component. By default, no additional Alertmanager instances are configured.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this potentially clarify what this field does for end users?:

additionalAlertmanagerConfigs is an optional field used to configure how Prometheus communicates
with Alertmanager instances when sending alerts.

We should also include information as to what happens when this field is omitted in the GoDoc. Based on my current understanding, that would likely be something like:

When omitted, Prometheus will not send alerts to any Alertmanager instances.

Is this correct? Is there a default Alertmanager instance that will always be included in the configuration for Prometheus instances?

// +optional
// +kubebuilder:validation:MaxItems=10
// +listType=map
// +listMapKey=apiVersion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this must be unique based on the apiVersion?

// +kubebuilder:validation:MaxLength=50
// +kubebuilder:validation:Pattern=`^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$`
// +optional
EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it actually make sense for this to be a string value? This makes it more complex to evaluate valid values.

For example, would it make sense for me to specify 3,000,000,000B here? The "integer" value there is higher than the maximum possible int32 value - is that problematic? What if I specified higher than the maximum possible int64 value in bytes - is that problematic?

Something like:

enforcedBodySizeLimit:
  value: 4000
  units: KB

would probably be easier to put constraints on that forces end users to make reasonable decisions here.

Alternatively, we decide a reasonable base unit that all values provided by a user will use and name the field accordingly (i.e enforcedBodySizeLimitMB)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, would it make sense to nest this under something like a scrapeConfig field?

Looking at https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config there are a lot of options that we aren't yet exposing. Do you ever intend to expand the set of scraping-related configuration options a user can configure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comments here still stand and in general we prefer using an explicit unit name in the field.

See https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#units for reference. In OpenShift APIs we generally prefer the units in the field name approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on the unit naming convention. Per the K8s API conventions, I'll rename enforcedBodySizeLimit to enforcedBodySizeLimitBytes and use an integer type instead of the string pattern.

Regarding nesting under scrapeConfig and expanding options we're not planning to expose the full Prometheus scrape config surface here. This API is intentionally scoped to mirror what's already exposed in the CMO ConfigMap (see cluster-monitoring-operator/pkg/manifests/types.go), not to be a 1:1 mapping of upstream Prometheus config. So keeping it flat should be fine.

Comment on lines 767 to 772
// replacement is the value against which a regex replace is performed if the
// regular expression matches. Regex capture groups are available.
// +optional
// +kubebuilder:validation:MaxLength=255
Replacement *string `json:"replacement,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if I don't supply this value?

What is the difference between omitting and providing an empty string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in latest commit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these questions have been addressed. I still don't understand what happens if I omit this field or set "".

Copy link
Contributor

@danielmellado danielmellado Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I've now explicitly documented the difference in the godoc as there were just so many comments xD :

When omitted: Defaults to "$1" (the first regex capture group is used as the replacement value)

  • When set to "": Explicitly clears/empties the target label value

For example, with sourceLabels: ["env"] where env=production:

  • Omitted → targetLabel gets value "production" (via $1)
  • "" → targetLabel gets empty string ""

Comment on lines 772 to 775
// action is the action to perform.
// +optional
// +kubebuilder:validation:MaxLength=20
Action *string `json:"action,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if I don't specify an action?

What are the allowed actions that I can specify?

// +optional
// +kubebuilder:validation:Enum=Verify;InsecureSkipVerify
// +kubebuilder:default=Verify
InsecureSkipVerify string `json:"insecureSkipVerify,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insecureSkipVerify: Verify and insecureSkipVerify: InsecureSkipVerify read weird and could be confusing for users. Consider using a new field name that better reflects what this field is used for.

// Allowed values are "Verify" (default, secure) and "InsecureSkipVerify" (skip certificate verification, insecure).
// By default, certificate verification is performed ("Verify").
// +optional
// +kubebuilder:validation:Enum=Verify;InsecureSkipVerify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using an enum, create a type alias with constants for the allowed values.

Comment on lines 805 to 854
// LocalObjectReference contains enough information to let you locate the
// referenced object inside the same namespace.
// ---
// New uses of this type are discouraged because of difficulty describing its usage when embedded in APIs.
// 1. Invalid usage help. It is impossible to add specific help for individual usage. In most embedded usages, there are particular
// restrictions like, "must refer only to types A and B" or "UID not honored" or "name must be restricted".
// Those cannot be well described when embedded.
// 2. Inconsistent validation. Because the usages are different, the validation rules are different by usage, which makes it hard for users to predict what will happen.
// 3. We cannot easily change it. Because this type is embedded in many locations, updates to this type
// will affect numerous schemas. Don't make new APIs embed an underspecified API type they do not control.
//
// Instead of using this type, create a locally provided and used type that is well-focused on your reference.
// For example, ServiceReferences for admission registration: https://github.com/kubernetes/api/blob/release-1.17/admissionregistration/v1/types.go#L533 .
// +structType=atomic
type LocalObjectReference struct {
// name of the referent.
// This field is effectively required, but due to backwards compatibility is
// allowed to be empty. Instances of this type with an empty value here are
// almost certainly wrong.
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// +optional
// +default=""
// +kubebuilder:default=""
// +kubebuilder:validation:MaxLength=253
// TODO: Drop `kubebuilder:default` when controller-gen doesn't need it https://github.com/kubernetes-sigs/kubebuilder/issues/3896.
Name *string `json:"name,omitempty" protobuf:"bytes,1,opt,name=name"`
}

// SecretKeySelector selects a key of a Secret.
// +structType=atomic
type SecretKeySelector struct {
// The name of the secret in the pod's namespace to select from.
LocalObjectReference `json:",inline" protobuf:"bytes,1,opt,name=localObjectReference"`
// key of the secret to select from. Must be a valid secret key.
// +required
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
Key string `json:"key,omitempty" protobuf:"bytes,2,opt,name=key"`
// optional specifies whether the Secret or its key must be defined
// +optional
// +kubebuilder:validation:Enum=true;false
Optional string `json:"optional,omitempty" protobuf:"varint,3,opt,name=optional"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different than the existing SecretKeyReference type you've created earlier in this file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree here, I think we can just use one. While SecretKeyReference is used for reference and SecretKeySelector is for TLS certificates I think we may want to mimic Prom Op and use just v1.SecretKeySelector

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer not importing types for this because they generally don't implement the same granularity of validations that we can impose on a custom type.

@danielmellado danielmellado force-pushed the prometheus_k8s_config_api branch from 505533b to da2c1ef Compare December 2, 2025 09:49
@coderabbitai
Copy link

coderabbitai bot commented Dec 2, 2025

Walkthrough

Adds an optional PrometheusK8sConfig to ClusterMonitoringSpec, introduces many new public types/enums for Prometheus/Alertmanager/remote-write, regenerates deepcopy/swagger/OpenAPI code and CRD payloads (with duplicate prometheusK8sConfig insertions), and extends onCreate validation tests for PrometheusK8sConfig.

Changes

Cohort / File(s) Summary
Type definitions
config/v1alpha1/types_cluster_monitoring.go
Adds PrometheusK8sConfig field to ClusterMonitoringSpec and many exported types/enums (e.g., PrometheusK8sConfig, AdditionalAlertmanagerConfig, ExternalLabels, Label, RemoteWriteSpec, RelabelConfig, TLSConfig, SecretKeySelector, SecretRequirement, RelabelAction, CollectionProfile, AlertmanagerAPIVersion, AlertmanagerScheme) with comments and validations.
Generated deepcopy
config/v1alpha1/zz_generated.deepcopy.go
Adds DeepCopyInto/DeepCopy implementations for all new types and updates ClusterMonitoringSpec.DeepCopyInto to deep-copy PrometheusK8sConfig.
Generated swagger docs
config/v1alpha1/zz_generated.swagger_doc_generated.go
Adds SwaggerDoc maps and SwaggerDoc() methods for the new types and updates ClusterMonitoringSpec swagger entry to include prometheusK8sConfig.
Generated CRD manifests (config)
config/v1alpha1/zz_generated.crd-manifests/...clustermonitorings-*.crd.yaml, config/v1alpha1/zz_generated.featuregated-crd-manifests/.../ClusterMonitoringConfig.yaml
Inserts comprehensive prometheusK8sConfig property into ClusterMonitoring v1alpha1 CRD schemas with extensive nested fields/validation (additionalAlertmanagerConfigs, collectionProfile, enforcedBodySizeLimit, externalLabels, logLevel, nodeSelector, queryLogFile, remoteWrite, resources, retention/retentionSize, tolerations, topologySpreadConstraints, volumeClaimTemplate, TLS/secret refs, etc.). Duplicate insertions of the same block observed.
Payload CRD manifests
payload-manifests/crds/...clustermonitorings-*.crd.yaml
Adds the same prometheusK8sConfig property to payload CRD manifests (CustomNoUpgrade, DevPreviewNoUpgrade, TechPreviewNoUpgrade) with matching nested schema; duplicates present in some files.
OpenAPI / generated OpenAPI JSON
openapi/generated_openapi/zz_generated.openapi.go, openapi/openapi.json
Adds OpenAPI definitions for new v1alpha1 types and updates schema references so ClusterMonitoringSpec includes prometheusK8sConfig and related nested types.
Validation tests
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
Adds onCreate tests covering valid PrometheusK8sConfig variants (minimal, nodeSelector, resources, tolerations, topologySpreadConstraints, additionalAlertmanagerConfigs, remoteWrite, externalLabels) and numerous rejection scenarios (empty object, exceeding max items, empty entries).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Areas needing extra attention:
    • Duplicate prometheusK8sConfig blocks in generated and feature-gated CRD manifests.
    • Consistency between type validations, OpenAPI definitions, and CRD constraints (minProperties, maxItems, enums, patterns).
    • Correct deepcopy semantics for maps/slices/pointers in newly generated code.
    • Swagger/OpenAPI doc entries for duplication or conflicting descriptions.
    • Validation test coverage to ensure CRD schema and admission validation align.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented
The command is terminated due to an error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented


Comment @coderabbitai help to get the list of available commands and usage tips.

@danielmellado
Copy link
Contributor

@everettraven I've gotten through all the comments and took a look at most of the concerns, could you take another, fresh review at it? Thanks!

@danielmellado danielmellado force-pushed the prometheus_k8s_config_api branch from 8473dba to e748202 Compare December 3, 2025 12:50
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (9)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)

1634-1651: Incorrect component name in logLevel description (duplicate).

Line 1636 states "logLevel defines the verbosity of logs emitted by Alertmanager," but this field is under PrometheusK8sConfig and should refer to "Prometheus," not "Alertmanager." This appears to be copy-pasted from the alertmanagerConfig section.

                  logLevel:
                    description: |-
-                      logLevel defines the verbosity of logs emitted by Alertmanager.
+                      logLevel defines the verbosity of logs emitted by Prometheus.
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)

1634-1651: Fix logLevel description — should reference Prometheus, not Alertmanager.

The description at line 1636 incorrectly refers to "logs emitted by Alertmanager" when this field is part of prometheusK8sConfig and should refer to Prometheus. This appears to have been copied from the alertmanagerConfig.customConfig.logLevel field without updating the component name.

  logLevel:
    description: |-
-     logLevel defines the verbosity of logs emitted by Alertmanager.
+     logLevel defines the verbosity of logs emitted by Prometheus.
      This field allows users to control the amount and severity of logs generated, which can be useful
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)

1634-1651: ⚠️ Unresolved: logLevel description still references Alertmanager instead of Prometheus.

The description at Line 1635 states: "logLevel defines the verbosity of logs emitted by Alertmanager", but this field resides within prometheusK8sConfig, which configures Prometheus, not Alertmanager. This is a copy-paste error from the Alertmanager section that appears not to have been fixed.

Update the description to reference Prometheus instead. The source type definition in config/v1alpha1/types_cluster_monitoring.go should be corrected first, then the CRD regenerated.

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)

1634-1651: Resolve incorrect component reference in logLevel description (unresolved from previous review).

Line 1636 incorrectly states "logLevel defines the verbosity of logs emitted by Alertmanager". This field belongs to prometheusK8sConfig, so the description should refer to Prometheus, not Alertmanager. This issue was previously flagged but remains unresolved.

Apply this diff to fix the description:

  logLevel:
    description: |-
-     logLevel defines the verbosity of logs emitted by Alertmanager.
+     logLevel defines the verbosity of logs emitted by Prometheus.
      This field allows users to control the amount and severity of logs generated, which can be useful
      for debugging issues or reducing noise in production environments.
      Allowed values are Error, Warn, Info, and Debug.
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)

1634-1651: Fix copy-paste error: Replace "Alertmanager" with "Prometheus" in logLevel description.

The logLevel field description incorrectly references "Alertmanager" when this field is under prometheusK8sConfig and should reference "Prometheus". This appears to be a copy-paste error from the Alertmanager configuration section.

Apply this fix:

                  logLevel:
                    description: |-
-                     logLevel defines the verbosity of logs emitted by Alertmanager.
+                     logLevel defines the verbosity of logs emitted by Prometheus.
                      This field allows users to control the amount and severity of logs generated, which can be useful
                      for debugging issues or reducing noise in production environments.
                      Allowed values are Error, Warn, Info, and Debug.
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)

1634-1651: Fix logLevel description to reference Prometheus, not Alertmanager (same issue as earlier review).

This logLevel field is part of prometheusK8sConfig, but the description still says “logs emitted by Alertmanager.” It should describe Prometheus log verbosity instead, mirroring semantics but naming the correct component, and then regenerate the CRD.

config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

260-260: PrometheusK8sConfig.logLevel swagger text still references Alertmanager instead of Prometheus.

In map_PrometheusK8sConfig["logLevel"], the text says “logs emitted by Alertmanager” but this field belongs to PrometheusK8sConfig. Update the Go comment for PrometheusK8sConfig.LogLevel to reference Prometheus, then regenerate swagger so this entry matches.

config/v1alpha1/types_cluster_monitoring.go (2)

655-660: Documentation claims default value but field is required without a default.

The scheme field is marked +required with no +kubebuilder:default, yet line 657 states "The default value is http." Since users must explicitly provide a value, remove the misleading default claim.

-	// Possible values are `http` or `https`. The default value is `http`.
+	// Possible values are `http` or `https`.

100-100: Documentation mentions "remote write/read" but only remote write is supported.

The comment mentions "remote write/read" but the PrometheusK8sConfig struct only exposes RemoteWrite configuration—there is no RemoteRead field. Update the documentation to say "remote write" only, or clarify that remote read may be added in the future.

-	//   - External integrations (remote write/read, alerting, and scraping configuration)
+	//   - External integrations (remote write, alerting, and scraping configuration)
🧹 Nitpick comments (4)
config/v1alpha1/types_cluster_monitoring.go (4)

683-695: Missing validation for Prometheus label name format on key field.

The documentation states the key must be "a valid Prometheus label name, starting with a letter or underscore, followed by letters, digits, or underscores," but there's no CEL or pattern validation enforcing this. Invalid label names will be accepted at admission and fail at runtime.

Add validation to enforce the documented constraint:

 	// key is the name of the label.
 	// The key must be a valid Prometheus label name, starting with a letter or underscore,
 	// followed by letters, digits, or underscores.
 	// +required
 	// +kubebuilder:validation:MaxLength=63
 	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:XValidation:rule="self.matches('^[a-zA-Z_][a-zA-Z0-9_]*$')",message="key must be a valid Prometheus label name"
 	Key string `json:"key,omitempty"`

721-726: Missing format validation for remoteTimeout duration string.

The documentation states the value must be "a valid Go time.Duration string (e.g. 30s, 5m, 1h)" but there's no validation enforcing this format. Invalid values will be accepted at admission time.

Consider adding CEL validation to match duration patterns:

 	// remoteTimeout is the timeout for requests to the remote write endpoint.
 	// When omitted, the default is 30s.
 	// +optional
 	// +kubebuilder:validation:MaxLength=20
 	// +kubebuilder:validation:MinLength=2
+	// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+(ns|us|µs|ms|s|m|h)$')",message="remoteTimeout must be a valid duration (e.g. 30s, 5m, 1h)"
 	RemoteTimeout string `json:"remoteTimeout,omitempty"`

781-786: Consider adding DNS subdomain validation for serverName.

The documentation states serverName "must be a valid DNS subdomain as per RFC 1123" but there's no validation enforcing this. Consider adding CEL validation:

 	// serverName is the server name to use for TLS connections.
 	// If specified, must be a valid DNS subdomain as per RFC 1123.
 	// +optional
 	// +kubebuilder:validation:MinLength=1
 	// +kubebuilder:validation:MaxLength=253
+	// +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="serverName must be a valid DNS subdomain"
 	ServerName string `json:"serverName,omitempty"`

799-808: Add Kubernetes secret name validation to name field.

The name field references a Kubernetes Secret, which must follow DNS-1123 subdomain naming rules. Add validation to catch invalid names at admission time:

 	// name is the name of the secret in the same namespace to select from.
 	// +required
 	// +kubebuilder:validation:MinLength=1
 	// +kubebuilder:validation:MaxLength=253
+	// +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="name must be a valid Kubernetes secret name (DNS subdomain)"
 	Name string `json:"name,omitempty"`
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 8473dba and e748202.

📒 Files selected for processing (11)
  • config/v1alpha1/types_cluster_monitoring.go (3 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)
  • config/v1alpha1/zz_generated.deepcopy.go (8 hunks)
  • config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1 hunks)
  • config/v1alpha1/zz_generated.swagger_doc_generated.go (4 hunks)
  • openapi/generated_openapi/zz_generated.openapi.go (13 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
  • openapi/generated_openapi/zz_generated.openapi.go
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.swagger_doc_generated.go
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/types_cluster_monitoring.go
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.deepcopy.go
🧬 Code graph analysis (3)
config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
config/v1alpha1/types_cluster_monitoring.go (8)
  • AdditionalAlertmanagerConfig (636-679)
  • ExternalLabels (698-708)
  • Label (682-695)
  • PrometheusK8sConfig (447-618)
  • RelabelConfig (736-768)
  • RemoteWriteSpec (711-733)
  • SecretKeySelector (798-823)
  • TLSConfig (771-794)
config/v1alpha1/types_cluster_monitoring.go (2)
config/v1alpha1/zz_generated.swagger_doc_generated.go (9)
  • PrometheusK8sConfig (273-275)
  • AdditionalAlertmanagerConfig (132-134)
  • ExternalLabels (227-229)
  • RemoteWriteSpec (299-301)
  • ContainerResource (218-220)
  • SecretKeySelector (310-312)
  • TLSConfig (323-325)
  • Label (237-239)
  • RelabelConfig (287-289)
operator/v1/types.go (1)
  • LogLevel (94-94)
config/v1alpha1/zz_generated.deepcopy.go (1)
config/v1alpha1/types_cluster_monitoring.go (9)
  • AdditionalAlertmanagerConfig (636-679)
  • TLSConfig (771-794)
  • PrometheusK8sConfig (447-618)
  • ExternalLabels (698-708)
  • Label (682-695)
  • RemoteWriteSpec (711-733)
  • ContainerResource (318-348)
  • RelabelConfig (736-768)
  • SecretKeySelector (798-823)
🔇 Additional comments (13)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)

1635-1636: Fix logLevel description to reference Prometheus, not Alertmanager.

The logLevel field description at lines 1635–1636 in the CRD file should reference Prometheus, not Alertmanager. Update the description, the corresponding Go doc comment in config/v1alpha1/types_cluster_monitoring.go, and regenerate the CRD.

config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

121-134: New swagger docs for Prometheus-related helper types look consistent with Go types.

The swagger maps for AdditionalAlertmanagerConfig, ExternalLabels, Label, PrometheusK8sConfig (aside from the logLevel wording), RelabelConfig, RemoteWriteSpec, SecretKeySelector, and TLSConfig closely mirror the struct fields and kubebuilder annotations in types_cluster_monitoring.go, including enums, list semantics, and validation constraints. No additional issues from a docs/API-shape perspective.

Also applies to: 222-239, 255-275, 277-301, 303-325

openapi/generated_openapi/zz_generated.openapi.go (9)

424-424: LGTM: Schema registrations correctly added.

The new v1alpha1 type registrations are properly formatted and follow the established pattern.

Also applies to: 442-442, 453-453, 465-465, 467-468, 472-472, 474-474


21918-21924: LGTM: PrometheusK8sConfig properly integrated.

The new prometheusK8sConfig field is correctly added to ClusterMonitoringSpec with comprehensive documentation and proper dependency declarations.

Also applies to: 21936-21936


22032-22068: Verify min/max label constraints are enforced.

Line 22049 documents "At least 1 label must be specified, with a maximum of 50 labels," but the schema lacks minItems: 1 and maxItems: 50 constraints. Confirm these limits are validated via CRD validation or admission logic.


22462-22488: LGTM: Label schema correctly defined.

The key/value structure is appropriate. Note that line 22471 describes a Prometheus label name pattern, but schema-level regex validation is absent—confirm this is validated elsewhere.


22930-23119: LGTM: PrometheusK8sConfig schema is comprehensive and well-structured.

This central configuration type correctly references all dependencies. Multiple descriptions document size limits (e.g., max 10 items for various arrays), but as with other types, ensure these are enforced via CRD validation rules or admission webhooks.


23149-23272: LGTM: RelabelConfig and RemoteWriteSpec schemas are correctly defined.

Both types have appropriate required fields and dependencies. Note that line 23206 lists valid actions for RelabelConfig without enum enforcement, and line 23253 documents a maximum without maxItems—consistent with the validation pattern throughout this API.


23368-23406: LGTM: SecretKeySelector follows Kubernetes patterns correctly.

The schema properly uses x-kubernetes-map-type: atomic to ensure atomicity of secret references, with appropriate required fields.


23438-23486: LGTM: TLSConfig schema is correctly structured.

All fields appropriately optional for flexible TLS configuration, with proper references to SecretKeySelector for sensitive data.


21227-21302: Verify that documented limits are enforced elsewhere.

The description states "Maximum of 10 endpoints can be specified," but the schema lacks a maxItems constraint. Confirm this limit is enforced via CRD validation rules or admission webhooks, or update the description if the limit is not actually enforced.

config/v1alpha1/types_cluster_monitoring.go (1)

620-631: Enum types are well-defined with proper validation.

The AlertmanagerAPIVersion, AlertmanagerScheme, RelabelAction, and CollectionProfile types have appropriate enum constraints and constant definitions. The lowercase values for RelabelAction align with Prometheus conventions.

Also applies to: 836-867

config/v1alpha1/zz_generated.deepcopy.go (1)

14-39: Autogenerated deepcopy implementations are correct.

The generated DeepCopyInto and DeepCopy methods for the new types correctly handle:

  • Slice allocation and element copying (including calling DeepCopyInto for complex elements)
  • Map allocation and key-value copying
  • Pointer field allocation and value copying
  • Nil checks for pointer and slice fields

The struct assignments for TLSConfig and SecretKeySelector are correct since they only contain value types (strings).

Also applies to: 460-479, 734-748, 994-1061, 1089-1156, 1216-1270

Comment on lines 1287 to 1300
prometheusK8sConfig:
description: |-
prometheusK8sConfig provides configuration options for the Prometheus instance, which is the pod running Prometheus in the cluster.
By default, at least one Prometheus pod is deployed in the `openshift-monitoring` namespace to collect and store metrics for the platform.
This field allows you to customize how Prometheus is deployed and operated, including:
- Deployment settings (such as replica count and update strategy)
- Pod scheduling (node selectors, tolerations, affinity)
- Resource allocation (CPU, memory, and storage requests/limits)
- Retention policies (how long metrics are stored)
- External integrations (remote write/read, alerting, and scraping configuration)
Configuring Prometheus is important because it enables you to tailor monitoring to your cluster's needs, such as:
- Ensuring high availability and reliability of monitoring data
- Managing resource usage to fit your infrastructure
- Controlling how long metrics are retained for compliance or troubleshooting
- Integrating with external systems for alerting or long-term storage
- Adjusting scraping and relabeling to match your workloads and security requirements
For more information on Prometheus configuration, see:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/
https://prometheus.io/docs/prometheus/latest/storage/
This field is optional. When omitted, the platform chooses reasonable defaults, which may change over time.
minProperties: 1
Copy link

@coderabbitai coderabbitai bot Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Align prometheusK8sConfig description with actual fields (no replicas/update-strategy/remote read/scrape config exposed).

The doc block claims control over replica count, update strategy, remote read, and scraping configuration, but the prometheusK8sConfig schema here only exposes alertmanager integrations, external labels, node selector/tolerations/topology spread, resources, retention, remoteWrite, and PVC. Either add the missing fields or update this description (and the upstream Go field comment) to only advertise what’s actually configurable, then regenerate the CRD.

🤖 Prompt for AI Agents
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
around lines 1287-1311: the prometheusK8sConfig description advertises control
over replicas, update strategy, remote read and scraping configuration which are
not present in the schema; either add those fields to the API or update the
description to exactly list the exposed fields (alertmanager integrations,
externalLabels, nodeSelector/tolerations/topologySpread, resources, retention,
remoteWrite and PVC), then update the upstream Go field comment to match and
regenerate the CRD (run the repo’s CRD generation/make target) so the YAML and
comments stay in sync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explained above

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielmellado Thanks for the response! I don't see the explanation in this comment thread, so I may have missed it in another part of the PR discussion.

Could you clarify or point me to where this discrepancy was addressed? Specifically, the concern is that the prometheusK8sConfig description mentions:

  • "replica count and update strategy"
  • "remote read"
  • "scraping configuration"

...but these fields don't appear to be present in the actual schema (which only exposes additionalAlertmanagerConfigs, collectionProfile, enforcedBodySizeLimit, externalLabels, logLevel, nodeSelector, queryLogFile, remoteWrite, resources, retention/retentionSize, tolerations, topologySpreadConstraints, and volumeClaimTemplate).

If this was intentionally left as future-looking documentation or if there's a plan to add these fields later, that's totally fine—I just want to make sure I'm not missing something. Let me know if you'd like me to mark this as resolved or if there's additional context I should review!

@everettraven
Copy link
Contributor

@danielmellado It looks like you've picked this work back up. As it has been sitting for quite some time, let me know when you are ready for another round of review.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 9, 2025
@danielmellado
Copy link
Contributor

Hi @everettraven as soon as I rebase it, I think it's ready for it, thanks!

@danielmellado danielmellado force-pushed the prometheus_k8s_config_api branch from e748202 to 0d09af6 Compare December 9, 2025 15:21
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 9, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (11)
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)

1634-1651: Fix logLevel description to reference Prometheus (not Alertmanager).

Line 1636 states "logs emitted by Alertmanager," but this field is in prometheusK8sConfig and should reference Prometheus. This appears to be a copy-paste error from the earlier alertmanagerConfig.customConfig.logLevel section (line 66–83).

  logLevel:
    description: |-
-     logLevel defines the verbosity of logs emitted by Alertmanager.
+     logLevel defines the verbosity of logs emitted by Prometheus.
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)

1634-1651: Incorrect component name in logLevel description.

Line 1636 states "defines the verbosity of logs emitted by Alertmanager," but this field is for PrometheusK8sConfig and should refer to Prometheus, not Alertmanager. This appears to be copy-pasted from the alertmanagerConfig section.

                  logLevel:
                    description: |-
-                      logLevel defines the verbosity of logs emitted by Alertmanager.
+                      logLevel defines the verbosity of logs emitted by Prometheus.
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)

1634-1651: Fix incorrect component reference in logLevel description (not addressed from previous review).

The logLevel field description (line 1636) still incorrectly states "logLevel defines the verbosity of logs emitted by Alertmanager". Since this field belongs to prometheusK8sConfig, it should refer to Prometheus instead. This was flagged in a previous review but the fix was not applied.

Apply this diff to correct the description:

  logLevel:
    description: |-
-     logLevel defines the verbosity of logs emitted by Alertmanager.
+     logLevel defines the verbosity of logs emitted by Prometheus.
      This field allows users to control the amount and severity of logs generated, which can be useful
      for debugging issues or reducing noise in production environments.
      Allowed values are Error, Warn, Info, and Debug.
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)

1634-1651: Fix logLevel description—replace "Alertmanager" with "Prometheus".

Line 1636 incorrectly states "logLevel defines the verbosity of logs emitted by Alertmanager," but this field is nested within prometheusK8sConfig and controls Prometheus log verbosity, not Alertmanager's. The description must reference the correct component to avoid confusing API users.

                  logLevel:
                    description: |-
-                     logLevel defines the verbosity of logs emitted by Alertmanager.
+                     logLevel defines the verbosity of logs emitted by Prometheus.
                      This field allows users to control the amount and severity of logs generated, which can be useful
                      for debugging issues or reducing noise in production environments.
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)

1634-1651: Fix Prometheus logLevel description (currently mentions Alertmanager).

Within prometheusK8sConfig.logLevel the description still says “logs emitted by Alertmanager”, but this field configures Prometheus logging. That’s a copy/paste error and can confuse consumers of the API.

Please update the source comment on PrometheusK8sConfig.LogLevel in config/v1alpha1/types_cluster_monitoring.go to reference Prometheus instead, then regenerate swagger/OpenAPI and CRDs so both this YAML and the Go docs stay in sync.

For example, in PrometheusK8sConfig:

-	// logLevel defines the verbosity of logs emitted by Alertmanager.
+	// logLevel defines the verbosity of logs emitted by Prometheus.

Also applies to: 1287-2528

config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

255-275: Correct logLevel description in PrometheusK8sConfig Swagger docs (Alertmanager vs Prometheus).

map_PrometheusK8sConfig["logLevel"] still describes “logs emitted by Alertmanager”, but this field belongs to PrometheusK8sConfig and controls Prometheus logging. This mirrors the CRD issue and comes from the same Go comment on PrometheusK8sConfig.LogLevel.

Once you fix the Go comment to say “Prometheus” and rerun the generators, this swagger text (and all CRDs) will update automatically; no need to edit the generated files directly.

Also applies to: 121-325

config/v1alpha1/types_cluster_monitoring.go (5)

683-689: Add validation for Prometheus label name format on Key field.

Per past review feedback, the Key field should validate that label names conform to Prometheus requirements. Prometheus label names must match ^[a-zA-Z_][a-zA-Z0-9_]*$. Without this, users could specify invalid names like my-label (with hyphen) that will fail at runtime.

 	// key is the name of the label.
 	// The key must be a valid Prometheus label name, starting with a letter or underscore,
 	// followed by letters, digits, or underscores.
 	// +required
 	// +kubebuilder:validation:MaxLength=63
 	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:XValidation:rule="self.matches('^[a-zA-Z_][a-zA-Z0-9_]*$')",message="key must be a valid Prometheus label name: start with letter or underscore, followed by letters, digits, or underscores"
 	Key string `json:"key,omitempty"`

100-100: Documentation mentions "remote write/read" but only remote write is exposed.

The comment on line 100 states "External integrations (remote write/read, alerting, and scraping configuration)" but the PrometheusK8sConfig struct only exposes RemoteWrite — there are no RemoteRead fields. This could confuse users expecting remote read configuration.

Update to say "remote write" only, or clarify that remote read may be added in the future.


483-494: Documentation incorrectly references Alertmanager instead of Prometheus.

Line 483 states "logLevel defines the verbosity of logs emitted by Alertmanager" but this field is within PrometheusK8sConfig and should reference Prometheus. This appears to be a copy-paste error from AlertmanagerCustomConfig.

-	// logLevel defines the verbosity of logs emitted by Alertmanager.
+	// logLevel defines the verbosity of logs emitted by Prometheus.

642-646: Remove implementation detail from user-facing documentation.

Line 644 states "This is a custom type to allow for admission time validations" which is an implementation detail that shouldn't be in user-facing godoc. This was flagged in past review.

 	// bearerToken defines the secret reference containing the bearer token
 	// to use when authenticating to Alertmanager.
-	// This is a custom type to allow for admission time validations.
 	// +optional

647-660: PathPrefix example references v1 but only v2 is supported; Scheme default claim is misleading.

Two documentation issues per past review:

  1. Line 649 mentions /api/v1/alerts in the example, but apiVersion only allows v2. Update to /api/v2/alerts.

  2. Line 657 says "The default value is http" but the field is +required with no +kubebuilder:default, so users must always provide a value.

 	// pathPrefix defines an optional URL path prefix to prepend to the Alertmanager API endpoints.
 	// For example, if your Alertmanager is behind a reverse proxy at "/alertmanager/",
-	// set this to "/alertmanager" so requests go to "/alertmanager/api/v1/alerts" instead of "/api/v1/alerts".
+	// set this to "/alertmanager" so requests go to "/alertmanager/api/v2/alerts" instead of "/api/v2/alerts".
 	// scheme defines the URL scheme to use when communicating with Alertmanager
 	// instances.
-	// Possible values are `http` or `https`. The default value is `http`.
+	// Possible values are `http` or `https`.
🧹 Nitpick comments (2)
openapi/generated_openapi/zz_generated.openapi.go (2)

21227-21302: Documented constraints are not enforced by the schema

In schema_openshift_api_config_v1alpha1_AdditionalAlertmanagerConfig, several fields document strict constraints (e.g. apiVersion only v2, scheme only http/https with a default, staticConfigs as <host>:<port> with a max of 10 entries), but the OpenAPI schema here exposes them as plain strings/arrays with no Enum, Pattern, MinItems, or MaxItems. This means invalid configs will pass CRD validation and only fail later when the operator consumes them.

If you intend these constraints to be enforced at admission time, consider adding the appropriate +kubebuilder:validation tags on the Go types (enums, patterns, min/max items) so that openapi-gen emits them. The same observation applies to other new types where the descriptions mention limits or allowed values (e.g. ExternalLabels.labels, several fields on PrometheusK8sConfig like enforcedBodySizeLimit, remoteWrite, resources, tolerations, topologySpreadConstraints, collectionProfile, and RemoteWriteSpec.writeRelabelConfigs).


23368-23486: Consider enum + fail‑closed handling for SecretKeySelector.optional and TLSConfig.certificateVerification

SecretKeySelector.optional and TLSConfig.certificateVerification are documented as having a small, fixed set of string values (Required vs Optional, Verify vs SkipVerify), but the OpenAPI schema exposes them as unrestricted strings with no enum. That makes it easy for users to typo values and for the implementation to accidentally treat unknown values like “SkipVerify”/“Optional”, which would be a security regression.

I’d recommend:

  • Modeling these as explicit enum types in the Go API with +kubebuilder:validation:Enum=... so bad values are rejected at admission.
  • Verifying that the consuming code treats unknown/empty values in a fail‑closed way (i.e. defaulting to Required / Verify, never to Optional / SkipVerify).

This would harden the API surface without changing the intended semantics.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between e748202 and 0d09af6.

📒 Files selected for processing (12)
  • config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1 hunks)
  • config/v1alpha1/types_cluster_monitoring.go (3 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)
  • config/v1alpha1/zz_generated.deepcopy.go (8 hunks)
  • config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1 hunks)
  • config/v1alpha1/zz_generated.swagger_doc_generated.go (4 hunks)
  • openapi/generated_openapi/zz_generated.openapi.go (13 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
  • config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
  • config/v1alpha1/types_cluster_monitoring.go
  • openapi/generated_openapi/zz_generated.openapi.go
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.swagger_doc_generated.go
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.deepcopy.go
🧬 Code graph analysis (2)
config/v1alpha1/types_cluster_monitoring.go (2)
config/v1alpha1/zz_generated.swagger_doc_generated.go (9)
  • PrometheusK8sConfig (273-275)
  • AdditionalAlertmanagerConfig (132-134)
  • ExternalLabels (227-229)
  • RemoteWriteSpec (299-301)
  • ContainerResource (218-220)
  • SecretKeySelector (310-312)
  • TLSConfig (323-325)
  • Label (237-239)
  • RelabelConfig (287-289)
operator/v1/types.go (1)
  • LogLevel (94-94)
config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
config/v1alpha1/types_cluster_monitoring.go (8)
  • AdditionalAlertmanagerConfig (636-679)
  • ExternalLabels (698-708)
  • Label (682-695)
  • PrometheusK8sConfig (447-618)
  • RelabelConfig (736-768)
  • RemoteWriteSpec (711-733)
  • SecretKeySelector (798-823)
  • TLSConfig (771-794)
🔇 Additional comments (27)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)

354-778: ✅ Comprehensive test coverage for PrometheusK8sConfig validation.

The test cases thoroughly validate PrometheusK8sConfig behavior across multiple scenarios: minimal config, valid configurations with various optional fields (nodeSelector, resources, tolerations, topologySpreadConstraints, additionalAlertmanagerConfigs, remoteWrite, externalLabels), and rejection cases for constraint violations. Enum values are correctly cased (e.g., "Info" not "info").

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)

1634-1651: Fix logLevel description: should reference Prometheus, not Alertmanager.

Line 1636 incorrectly states "logs emitted by Alertmanager," but this field configures Prometheus logging (it is nested under prometheusK8sConfig). Update the description to "logs emitted by Prometheus" and regenerate the CRD from the updated Go type doc comments.

Apply this diff to update the description:

                  logLevel:
                    description: |-
-                      logLevel defines the verbosity of logs emitted by Alertmanager.
+                      logLevel defines the verbosity of logs emitted by Prometheus.
                       This field allows users to control the amount and severity of logs generated, which can be useful
                       for debugging issues or reducing noise in production environments.
                       Allowed values are Error, Warn, Info, and Debug.

After updating the Go doc comment in config/v1alpha1/types_cluster_monitoring.go, regenerate the CRD manifest to sync the description.

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (2)

1562-1575: ✓ collectionProfile enum–description consistency resolved.

The description correctly shows Full and Minimal in PascalCase, matching the enum values. This aligns with the prior review feedback (commit 3b005e4).


1287-2528: Verify whether prometheusK8sConfig duplication exists.

The AI summary indicates duplicate prometheusK8sConfig entries in the CRD OpenAPI schema block. However, the provided excerpt shows only one definition (lines 1287–2528). Duplication may exist in:

  • The CustomNoUpgrade or DevPreviewNoUpgrade CRD variants (mentioned in the summary as also having duplicates)
  • The full file beyond the provided range
  • Generated artifacts that were not regenerated cleanly

Please confirm whether duplication is present and, if so, whether it was introduced during code generation.

config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

191-197: ClusterMonitoringSpec.prometheusK8sConfig Swagger doc looks consistent with the CRD.

The high-level description for prometheusK8sConfig matches the CRD and underlying intent; no additional structural or validation issues from the swagger side.

openapi/generated_openapi/zz_generated.openapi.go (4)

421-477: OpenAPI registration for new v1alpha1 types looks consistent

The new v1alpha1 schemas (AdditionalAlertmanagerConfig, ExternalLabels, Label, PrometheusK8sConfig, RelabelConfig, RemoteWriteSpec, SecretKeySelector, TLSConfig) are correctly wired into the OpenAPI definitions map alongside related config types; no issues from a registration/lookup perspective.


21918-21937: Wiring of prometheusK8sConfig into ClusterMonitoringSpec looks correct

The new prometheusK8sConfig property references v1alpha1.PrometheusK8sConfig with an empty-object default, and the dependencies slice is updated to include that type alongside the existing config types. From an OpenAPI/wiring perspective this addition looks sound.


22032-22069: ExternalLabels / Label schema structure matches intended usage

Label as a simple {key,value} object and ExternalLabels as a list-map keyed by key is a good fit for enforcing uniqueness of label keys and mirrors Prometheus’ external labels model. Aside from the earlier general note about documenting min/max label counts without schema-level enforcement, the structure and references here look correct.

Also applies to: 22462-22488


23149-23272: RelabelConfig / RemoteWriteSpec model aligns with Prometheus semantics

RelabelConfig closely follows the standard Prometheus relabel rule shape (sourceLabels, separator, regex, targetLabel, replacement, action), and RemoteWriteSpec uses that for writeRelabelConfigs with url required, which matches expectations for remote_write configuration. Structurally and from a reference standpoint this looks correct; just keep in mind the earlier comment if you decide you want schema‑level enforcement of allowed action values or max relabel rules.

config/v1alpha1/types_cluster_monitoring.go (9)

443-461: LGTM on additionalAlertmanagerConfigs structure.

The listType=atomic approach and documentation for additionalAlertmanagerConfigs addresses the prior review feedback about uniqueness constraints. The field correctly documents the max items limit and behavior when omitted.


505-518: LGTM on queryLogFile documentation and validation.

The field has clear documentation explaining supported paths (including /dev/stderr, /dev/stdout, /dev/null), restrictions on other /dev/ paths, and appropriate length constraints.


519-527: LGTM on remoteWrite configuration.

Using url as the list map key ensures uniqueness per remote endpoint, and the documentation clearly explains the purpose and links to Prometheus documentation.


600-617: LGTM on collectionProfile; note on volumeClaimTemplate pattern.

The collectionProfile field has clear documentation with meaningful enum values. The volumeClaimTemplate uses v1.PersistentVolumeClaim directly, which matches the existing pattern used in AlertmanagerCustomConfig (line 278). While past review suggested using a custom type for better validation, this maintains consistency with the existing API surface in this file.


710-733: LGTM on RemoteWriteSpec.

The required url field with length constraints and optional configuration fields are well-structured. Using listType=atomic for writeRelabelConfigs is appropriate for relabeling rule ordering.


735-768: LGTM on RelabelConfig.

The structure appropriately mirrors Prometheus relabel_config with the action field being required and other fields optional. Length constraints are reasonable.


770-794: LGTM on TLSConfig with improved certificateVerification naming.

The certificateVerification field with Verify/SkipVerify enum values is much clearer than the previous insecureSkipVerify naming. The default of Verify ensures secure behavior by default.


796-834: LGTM on SecretKeySelector with detailed Optional behavior documentation.

The optional field documentation (lines 809-818) thoroughly explains the runtime behavior when set to Optional and the Secret/key is missing. This addresses past review feedback and provides clear guidance to users.


836-867: LGTM on RelabelAction and CollectionProfile enums.

RelabelAction correctly uses lowercase values to match Prometheus conventions, while CollectionProfile uses CamelCase per OpenShift conventions. Both have well-documented constants.

config/v1alpha1/zz_generated.deepcopy.go (9)

14-39: LGTM on AdditionalAlertmanagerConfig deepcopy.

The auto-generated deepcopy correctly handles StaticConfigs slice and TLSConfig pointer. The BearerToken value copy is correct since SecretKeySelector contains only value types.


389-397: LGTM on ClusterMonitoringSpec deepcopy integration.

The PrometheusK8sConfig.DeepCopyInto call follows the established pattern for other config fields.


460-479: LGTM on ExternalLabels deepcopy.

The Labels slice is correctly deep copied.


734-748: LGTM on Label deepcopy.

Simple value copy is correct for a struct with only string fields.


994-1061: LGTM on PrometheusK8sConfig deepcopy.

All complex fields are correctly handled:

  • Slices with struct elements use per-element DeepCopyInto
  • Maps are properly deep copied
  • Pointer fields allocate new memory and copy values

1089-1128: LGTM on RelabelConfig deepcopy.

All pointer fields (Separator, Regex, TargetLabel, Replacement) and the SourceLabels slice are correctly deep copied.


1130-1156: LGTM on RemoteWriteSpec deepcopy.

The optional Name pointer and WriteRelabelConfigs slice are correctly deep copied.


1216-1230: LGTM on SecretKeySelector deepcopy.

Value copy is correct for a struct with only string and string-alias fields.


1253-1270: LGTM on TLSConfig deepcopy.

All fields are value types (SecretKeySelector and strings), so simple value copy is correct.

Comment on lines +299 to 291
// LogLevel defines the verbosity of logs emitted by Alertmanager.
// Valid values are Error, Warn, Info and Debug.
// +kubebuilder:validation:Enum=Error;Warn;Info;Debug
type LogLevel string
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

LogLevel type documentation is too specific for a shared type.

This LogLevel type is used by both AlertmanagerCustomConfig and PrometheusK8sConfig, but the godoc says "defines the verbosity of logs emitted by Alertmanager." This is misleading when users reference the type documentation in the Prometheus context.

-// LogLevel defines the verbosity of logs emitted by Alertmanager.
+// LogLevel defines the verbosity of logs emitted by monitoring components.
 // Valid values are Error, Warn, Info and Debug.
🤖 Prompt for AI Agents
In config/v1alpha1/types_cluster_monitoring.go around lines 299 to 302, the
godoc for the LogLevel type is too specific to Alertmanager even though the type
is shared; update the comment to a generic description such as "LogLevel defines
the verbosity of logs emitted by components (for example Alertmanager and
Prometheus)." and keep the existing validation Enum tag intact so the allowed
values remain Error;Warn;Info;Debug.

@everettraven
Copy link
Contributor

@danielmellado I just wanted to follow up and let you know that this is next on my list to review.

Because there is a lot of historical review here I took some time to re-read that history and feel sufficiently caught up to perform another review round.

My next review round is going to take the approach of starting from scratch so that I can tune out any noise from previous review cycles.

I should have a review completed by the end of the week.

Comment on lines 92 to 93
// prometheusK8sConfig provides configuration options for the Prometheus instance, which is the pod running Prometheus in the cluster.
// By default, at least one Prometheus pod is deployed in the `openshift-monitoring` namespace to collect and store metrics for the platform.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only for configuring the default prometheus instance or does the configuration specified here apply to all prometheus instances?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configures the default platform Prometheus instance that runs in openshift-monitoring. User-workload Prometheus instances are configured separately. Updated godoc to clarify.

Comment on lines 110 to 111
// https://prometheus.io/docs/prometheus/latest/configuration/configuration/
// https://prometheus.io/docs/prometheus/latest/storage/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation links like this can become stale and may not properly reflect the configuration options that this type exposes.

We should appropriately document all the configuration options we expose instead of linking out.

In this particular case, I'd probably just remove these links here and let the child fields explain what they configure.

Copy link
Contributor

@danielmellado danielmellado Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

// +optional
// +kubebuilder:validation:MaxItems=10
// +listType=atomic
AdditionalAlertmanagerConfigs []AdditionalAlertmanagerConfig `json:"additionalAlertmanagerConfigs,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circling back to this, I think something like:

additionalAlertmanagerConfigs:
  - apiVersion: v2
     url: "alertmanager.external.com:9093"
     path: "/critical"
     authenticationMethods:
       - type: BearerToken
          bearerToken:
            name: "prod-token"
            key: "token"
        - type: TLS
           tls:
             certificate:
               name: "..."
               key: "..."

Would be a better UX for this field and you can enforce some form of uniqueness on a combination of fields if necessary.

What do you think?

// +kubebuilder:validation:MaxLength=50
// +kubebuilder:validation:Pattern=`^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$`
// +optional
EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comments here still stand and in general we prefer using an explicit unit name in the field.

See https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#units for reference. In OpenShift APIs we generally prefer the units in the field name approach.

Comment on lines 478 to 479
// and Alertmanager. By default, no labels are added.
// When omitted, no external labels are applied (default behavior).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think that just saying this is sufficient:

Suggested change
// and Alertmanager. By default, no labels are added.
// When omitted, no external labels are applied (default behavior).
// and Alertmanager.
// When omitted, no external labels are applied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Simplified the documentation.

// When omitted, defaults to "Required".
// +optional
// +kubebuilder:default=Required
Optional SecretRequirement `json:"optional,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name here is a bit confusing to me.

setting:

optional: Required

or

optional: Optional

doesn't seem intuitive to me.

Something like presence feels like it would fit better here.

)

// RelabelAction defines the action to perform in a relabeling rule.
// +kubebuilder:validation:Enum=replace;keep;drop;hashmod;labelmap;labeldrop;labelkeep
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enum values should be CamelCase:

Suggested change
// +kubebuilder:validation:Enum=replace;keep;drop;hashmod;labelmap;labeldrop;labelkeep
// +kubebuilder:validation:Enum=Replace;Keep;Drop;HashMod;LabelMap;LabelDrop;LabelKeep

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

// +kubebuilder:validation:MaxLength=255
Replacement *string `json:"replacement,omitempty"`
// action is the action to perform.
// Valid actions are: replace, keep, drop, hashmod, labelmap, labeldrop, labelkeep.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does setting this to each of these values do?

How do I determine which action I want to use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added detailed documentation for each action in the updated godoc

Copy link

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the max values, I'm worried that some are too low. Maybe we can try to leverage Insights data to understand the current practices?

// prometheusK8sConfig is optional.
// When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
// +optional
PrometheusK8sConfig PrometheusK8sConfig `json:"prometheusK8sConfig,omitempty,omitzero"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree that we don't need the k8s part.

// +optional
// +kubebuilder:validation:MaxItems=10
// +listType=atomic
AdditionalAlertmanagerConfigs []AdditionalAlertmanagerConfig `json:"additionalAlertmanagerConfigs,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that enabling multiple owners for the field is important. For instance ACM wants to manage its own entry(ies) while cluster admins may want to use other external Alertmanagers.

// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=50
// +optional
EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to ensure the format of the string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are going to be continuing with the size format approach as opposed to using explicit units, yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonpasquier I was thinking of just getting this to automatic if omitted, wdyt?

// remoteWrite supports a maximum of 10 items in the list.
// +kubebuilder:validation:MaxItems=10
// +listType=map
// +listMapKey=url

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's the only field we should use for the key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added prose explaining: "Maximum of 10 remote write configurations can be specified. Each entry must have a unique URL."

@simonpasquier - using url as the list map key since it's required and should be unique per endpoint. Is there a better key you'd suggest?

// Allowed values: "v2". "v1" is no longer supported.
// +kubebuilder:validation:Enum=v2
// +required
APIVersion AlertmanagerAPIVersion `json:"apiVersion,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can omit the API version field (v1 shouldn't be used anymore).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point there! I removed the apiVersion field entirely since v1 is no longer supported and v2 is the only valid option. CMO will hardcode v2 when communicating with Alertmanager.

// +optional
// +kubebuilder:validation:MaxLength=20
// +kubebuilder:validation:MinLength=2
RemoteTimeout string `json:"remoteTimeout,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to validate duration strings.

// writeRelabelConfigs is a list of relabeling rules to apply before sending data to the remote endpoint.
// Maximum of 10 relabeling rules can be specified.
// +optional
// +kubebuilder:validation:MaxItems=10

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 is too low

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! What would be a reasonable limit? 50? 100? We want to allow complex relabeling while preventing unbounded configs, TL;DR same as in the other one


const (
// RelabelActionReplace replaces the target label with the replacement value.
RelabelActionReplace RelabelAction = "replace"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed, will push an updated commit, thanks!

// When omitted, defaults to "Required".
// +optional
// +kubebuilder:default=Required
Optional SecretRequirement `json:"optional,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can make all secret keys required and not have the optional field. The Prometheus operator itself will fail the reconciliation if a secret key ref is missing hence a ref is required de facto.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, seems like a cleaner approach, wdyt @everettraven ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't need some kind of optionality evaluation here, yes that is better.

// +optional
// +kubebuilder:validation:Enum=Verify;SkipVerify
// +kubebuilder:default=Verify
CertificateVerification string `json:"certificateVerification,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a custom type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we typically encourage the use of a type alias for enums

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed - created CertificateVerificationType type alias with proper constants.

@danielmellado danielmellado force-pushed the prometheus_k8s_config_api branch from 0d09af6 to 0643f0d Compare December 16, 2025 15:07
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 16, 2025
Signed-off-by: Mario Fernandez <mariofer@redhat.com>
Signed-off-by: Mario Fernandez <mariofer@redhat.com>
Signed-off-by: Mario Fernandez <mariofer@redhat.com>
Signed-off-by: Mario Fernandez <mariofer@redhat.com>
Signed-off-by: Mario Fernandez <mariofer@redhat.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (6)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)

1634-1651: The logLevel description still incorrectly references Alertmanager instead of Prometheus.

Lines 1635-1636 state "logs emitted by Alertmanager", but this field configures Prometheus logging under prometheusK8sConfig. This documentation error was flagged in previous reviews and marked as addressed, but the current code still contains the incorrect reference.

Update the description to reference Prometheus instead of Alertmanager. The Go doc comment in config/v1alpha1/types_cluster_monitoring.go should be updated and the CRD regenerated.

Apply this diff to fix the description:

                   logLevel:
                     description: |-
-                      logLevel defines the verbosity of logs emitted by Alertmanager.
+                      logLevel defines the verbosity of logs emitted by Prometheus.
                       This field allows users to control the amount and severity of logs generated, which can be useful
                       for debugging issues or reducing noise in production environments.
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)

1287-2529: LGTM! Past issues resolved, schema is well-structured.

The prometheusK8sConfig field addition looks good. Both previously flagged issues have been successfully addressed:

  1. collectionProfile (lines 1527-1528): The description now correctly matches the enum values using PascalCase (Full, Minimal).

  2. logLevel (line 1605): The description now correctly references "Prometheus" instead of the previously incorrect "Alertmanager" reference.

The CRD schema is comprehensive with:

  • Proper field validation rules (length constraints, patterns, enums)
  • Clear, detailed descriptions for all fields
  • Appropriate nested structure for complex configurations
  • Consistent naming conventions following Kubernetes patterns
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)

1634-1651: Fix incorrect component reference in logLevel description.

Line 1636 still incorrectly states "logLevel defines the verbosity of logs emitted by Alertmanager". This field belongs to prometheusK8sConfig, so it should reference Prometheus, not Alertmanager.

This issue was flagged in previous reviews but remains unfixed.

Apply this diff to correct the description:

  logLevel:
    description: |-
-     logLevel defines the verbosity of logs emitted by Alertmanager.
+     logLevel defines the verbosity of logs emitted by Prometheus.
      This field allows users to control the amount and severity of logs generated, which can be useful
      for debugging issues or reducing noise in production environments.
      Allowed values are Error, Warn, Info, and Debug.
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)

1634-1651: Fix copy-paste error: logLevel description references wrong component.

Line 1636 incorrectly states that logLevel controls verbosity for "Alertmanager" when it should reference "Prometheus" since this field is nested within prometheusK8sConfig.

Apply this diff to correct the component reference:

  logLevel:
    description: |-
-     logLevel defines the verbosity of logs emitted by Alertmanager.
+     logLevel defines the verbosity of logs emitted by Prometheus.
      This field allows users to control the amount and severity of logs generated, which can be useful
      for debugging issues or reducing noise in production environments.
config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

303-311: Document the missing optional field in SecretKeySelector swagger docs.

The swagger documentation for SecretKeySelector is missing the optional field, which is present in the actual type definition (as seen in the CRD schema at lines 1356-1373, 1436-1449, etc. in the CRD file). The optional field is an enum with values Required/Optional and defaults to Required.

Update the source Go type comment for SecretKeySelector.Optional to include appropriate documentation, then regenerate swagger docs. The documentation should describe:

  • Allowed values: Required, Optional
  • Default value: Required
  • Behavior when set to Required vs Optional
config/v1alpha1/types_cluster_monitoring.go (1)

288-291: LogLevel documentation is too component-specific for a shared type.

This LogLevel type is used by both AlertmanagerCustomConfig (line 174) and PrometheusK8sConfig (line 481), but the godoc states it "defines the verbosity of logs emitted by Alertmanager." This is misleading when users reference the type documentation in the Prometheus context.

This was flagged in past review comments but not yet addressed.

Update to a generic description:

-// LogLevel defines the verbosity of logs emitted by Alertmanager.
+// LogLevel defines the verbosity of logs emitted by monitoring components (such as Alertmanager and Prometheus).
 // Valid values are Error, Warn, Info and Debug.
🧹 Nitpick comments (5)
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)

1855-1866: Consider adding pattern validation for duration and quantity fields.

Several fields accept duration or quantity strings but lack CRD-level pattern validation:

  • retention (lines 1855-1866): Description specifies pattern [0-9]+(ms|s|m|h|d|w|y) but no pattern field enforces it
  • remoteTimeout (lines 1673-1679): Described as duration string but no pattern validation
  • retentionSize (lines 1867-1874): Described as Kubernetes quantity format but no pattern validation

While runtime validation will catch invalid values, adding pattern validation here would provide earlier feedback to users during resource creation/update.

Note: If this is intentionally omitted (e.g., to allow for forward compatibility with new duration formats or because quantity parsing is complex), this suggestion can be safely ignored.

Also applies to: 1673-1679, 1867-1874

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (2)

1576-1590: Add pattern validation for enforcedBodySizeLimit or clarify validation approach.

The description (line 1585) states "The value must match the following pattern: ^(automatic|[0-9]+(B|KB|MB|GB|GB|TB)?)$", but the schema lacks a pattern constraint. Without CRD-level validation, users receive no immediate feedback when submitting invalid values, and validation errors occur later at the operator level.

Consider adding the pattern constraint to the schema:

  enforcedBodySizeLimit:
    description: |-
      enforcedBodySizeLimit enforces a body size limit for Prometheus scraped metrics. If a scraped
      target's body response is larger than the limit, the scrape will fail.
      The following values are valid:
      a numeric value in Prometheus size format (such as "4MB", "1000", "1GB", "512KB", "100B")
      or the string `automatic`, which indicates that the limit will be
      automatically calculated based on cluster capacity.
      To specify no limit, omit this field.
      The value must match the following pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$
      Minimum length is 1 character.
      Maximum length is 50 characters.
    maxLength: 50
    minLength: 1
+   pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$
    type: string

Alternatively, if operator-side validation is intentional, clarify in the description that validation occurs at the operator level.


1854-1865: Add pattern validation for retention or clarify validation approach.

The description (lines 1857-1859) states the value "must be specified using the following regular expression pattern: [0-9]+(ms|s|m|h|d|w|y)", but the schema lacks a pattern constraint. This creates the same validation gap as enforcedBodySizeLimit.

Consider adding the pattern constraint:

  retention:
    description: |-
      retention defines the duration for which Prometheus retains data.
      This definition must be specified using the following regular
      expression pattern: `[0-9]+(ms|s|m|h|d|w|y)` (ms = milliseconds,
      s= seconds,m = minutes, h = hours, d = days, w = weeks, y = years).
      When omitted, this means the user has no opinion and the platform is left
      to choose reasonable defaults, which are subject to change over time.
      The default value is `15d`.
    maxLength: 20
    minLength: 1
+   pattern: ^[0-9]+(ms|s|m|h|d|w|y)$
    type: string

Alternatively, if operator-side validation is intentional, clarify in the description that validation occurs at the operator level.

config/v1alpha1/types_cluster_monitoring.go (2)

437-450: Reconsider listType=atomic for multi-owner scenarios.

The additionalAlertmanagerConfigs field uses listType=atomic, which doesn't work well with server-side apply when multiple controllers or users want to independently manage their own entries. As noted in past review comments, there are valid use cases where ACM wants to manage its own entry while cluster admins manage others.

Consider changing to listType=map with an appropriate uniqueness key. However, this is complicated by the lack of a natural unique identifier. If entries must be truly independent with no uniqueness constraints, you may need to accept atomic with its limitations.

Past discussion also noted that the current design lacks a clear way to enforce uniqueness when needed. Consider whether there's a combination of fields that could serve as a composite key, or whether the API should be restructured to make ownership clearer.


536-546: Add CEL validation for retention duration format.

The field documentation describes a specific pattern [0-9]+(ms|s|m|h|d|w|y) but there's no runtime validation to enforce this. Invalid values like "15x" or "abc" would be accepted at admission time and only fail when Prometheus tries to use them.

Past review comments suggested adding CEL validation, which would be especially helpful here.

Add validation:

// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+(ms|s|m|h|d|w|y)$')",message="must be a duration like '15d', '6h', or '30m'"

Note: Similar validation should be added for other duration/size fields like remoteTimeout (line 718-723) and size-based fields like retentionSize if you want to enforce Kubernetes quantity format.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0d09af6 and 0643f0d.

📒 Files selected for processing (11)
  • config/v1alpha1/types_cluster_monitoring.go (3 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)
  • config/v1alpha1/zz_generated.deepcopy.go (8 hunks)
  • config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1 hunks)
  • config/v1alpha1/zz_generated.swagger_doc_generated.go (4 hunks)
  • openapi/generated_openapi/zz_generated.openapi.go (13 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
  • openapi/generated_openapi/zz_generated.openapi.go
  • config/v1alpha1/types_cluster_monitoring.go
  • config/v1alpha1/zz_generated.swagger_doc_generated.go
  • config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
  • config/v1alpha1/zz_generated.deepcopy.go
🧬 Code graph analysis (3)
config/v1alpha1/types_cluster_monitoring.go (1)
config/v1alpha1/zz_generated.swagger_doc_generated.go (9)
  • PrometheusK8sConfig (273-275)
  • AdditionalAlertmanagerConfig (132-134)
  • ExternalLabels (227-229)
  • RemoteWriteSpec (299-301)
  • ContainerResource (218-220)
  • SecretKeySelector (309-311)
  • TLSConfig (322-324)
  • Label (237-239)
  • RelabelConfig (287-289)
config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
config/v1alpha1/types_cluster_monitoring.go (8)
  • AdditionalAlertmanagerConfig (622-668)
  • ExternalLabels (690-700)
  • Label (671-687)
  • PrometheusK8sConfig (436-604)
  • RelabelConfig (733-784)
  • RemoteWriteSpec (703-730)
  • SecretKeySelector (820-835)
  • TLSConfig (787-816)
config/v1alpha1/zz_generated.deepcopy.go (1)
config/v1alpha1/types_cluster_monitoring.go (9)
  • AdditionalAlertmanagerConfig (622-668)
  • TLSConfig (787-816)
  • PrometheusK8sConfig (436-604)
  • ExternalLabels (690-700)
  • Label (671-687)
  • RemoteWriteSpec (703-730)
  • ContainerResource (307-337)
  • RelabelConfig (733-784)
  • SecretKeySelector (820-835)
🔇 Additional comments (18)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (2)

1603-1620: Past issue resolved - logLevel description now correct.

The previous review comment correctly identified that the logLevel description referred to "Alertmanager" instead of "Prometheus". Line 1605 now properly states: "logLevel defines the verbosity of logs emitted by Prometheus." This is the correct description for the prometheusK8sConfig field.


1287-2529: Schema structure looks correct and comprehensive.

The new prometheusK8sConfig field adds extensive configuration options for the platform Prometheus instance. The schema includes:

  • Proper validation rules (regex patterns, length constraints, enum values)
  • Comprehensive TLS configuration with certificate verification options
  • Appropriate resource constraints following Kubernetes conventions
  • Clear, detailed documentation for each field

Since this is a generated file (zz_generated prefix), the schema correctness depends on the generation source. The generated output appears structurally sound with no obvious issues.

Note: This is a generated file. Any future changes should be made in the source templates/generators to ensure consistency across regenerations.

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)

1603-1620: Past issue has been properly addressed.

The logLevel field description now correctly references "Prometheus" instead of "Alertmanager" as noted in the previous review. The schema definition for this field and the entire prometheusK8sConfig addition appear well-formed with appropriate validation rules, enums, and constraints.

Since this is a generated file (indicated by the zz_generated prefix), the schema structure and formatting are as expected.

config/v1alpha1/zz_generated.swagger_doc_generated.go (2)

313-324: LGTM: TLSConfig swagger documentation is accurate.

The swagger documentation for TLSConfig correctly describes all fields including certificateVerification with proper enum values and default behavior.


121-134: LGTM: New swagger documentation entries are comprehensive.

The newly added swagger documentation for AdditionalAlertmanagerConfig, ExternalLabels, Label, RelabelConfig, and RemoteWriteSpec are well-documented with clear descriptions of fields, validation rules, and expected behaviors.

Also applies to: 222-239, 277-289, 291-301

openapi/generated_openapi/zz_generated.openapi.go (9)

424-424: LGTM! New type registrations are properly added.

All new v1alpha1 types (AdditionalAlertmanagerConfig, ExternalLabels, Label, PrometheusK8sConfig, RelabelConfig, RemoteWriteSpec, SecretKeySelector, TLSConfig) are correctly registered in the GetOpenAPIDefinitions map with their corresponding schema functions.

Also applies to: 442-442, 453-453, 465-465, 467-468, 472-472, 474-474


21227-21302: LGTM! AdditionalAlertmanagerConfig schema is well-defined.

The schema properly defines all fields with clear descriptions, appropriate required fields (apiVersion, staticConfigs), correct type references, and proper Kubernetes list annotations.


21918-21924: LGTM! prometheusK8sConfig field properly integrated.

The new field is correctly added to ClusterMonitoringSpec with a comprehensive description, appropriate optional semantics, and dependencies list properly updated to include the new type.

Also applies to: 21936-21936


22032-22068: LGTM! ExternalLabels schema is correctly structured.

The schema properly defines a list-map of labels with appropriate Kubernetes annotations (x-kubernetes-list-type: map, keyed by key), clear constraints in the description, and correct dependencies.


22462-22488: LGTM! Label schema is simple and well-defined.

The schema correctly models a key/value label pair with both fields required, clear validation constraints documented in descriptions, and no unnecessary complexity.


22930-23119: Previous issue resolved. PrometheusK8sConfig schema looks excellent.

The logLevel field description (line 22972) now correctly references "logs emitted by Prometheus" rather than "Alertmanager", resolving the issue flagged in the previous review.

The overall schema is comprehensive and well-structured with:

  • Appropriate Kubernetes list annotations (atomic, map with proper keys)
  • Clear descriptions explaining defaults, constraints, and behaviors
  • Correct type references and dependencies
  • Proper field definitions for all Prometheus configuration aspects

23149-23272: LGTM! RelabelConfig and RemoteWriteSpec schemas are well-defined.

Both schemas are properly structured with:

  • RelabelConfig: Comprehensive action descriptions, appropriate required field (action), proper list-type annotation for sourceLabels
  • RemoteWriteSpec: Clear structure with url as required, correct reference to RelabelConfig for write relabeling, proper dependencies

23368-23399: LGTM! SecretKeySelector schema is correctly defined.

The schema properly models secret key selection with both fields required, appropriate x-kubernetes-map-type: atomic annotation for atomic updates, and clear validation constraints in descriptions.


23431-23479: LGTM! TLSConfig schema is properly structured.

The schema correctly defines TLS configuration with appropriate optional fields, proper references to SecretKeySelector for certificates and keys, clear descriptions of verification policies, and correct dependencies.

config/v1alpha1/types_cluster_monitoring.go (3)

818-835: LGTM - SecretKeySelector implementation looks correct.

The SecretKeySelector type properly:

  • Marks both name and key as required
  • Validates the secret name as a DNS subdomain
  • Enforces length constraints on both fields
  • Addresses past review feedback about removing the problematic optional field

732-784: RelabelConfig documentation is comprehensive.

The RelabelConfig type now provides clear documentation for:

  • What happens when fields are omitted (defaults are specified)
  • The semantic difference between omitting a field vs setting it to empty string (lines 767-769)
  • Valid values and constraints for each field

This addresses multiple past review comments asking for clarity on field behavior.

Note: The documented defaults (; for separator, (.*) for regex, $1 for replacement) will need to be implemented in the controller/operator code since CRD validation cannot set these defaults for pointer types.


92-104: Documentation incorrectly mentions "remote read" capability.

The field documentation states this configuration is for "remote write/read" but the actual PrometheusK8sConfig struct (lines 509-518) only exposes a remoteWrite field. There is no remote read configuration available.

This was flagged in past review comments but not yet addressed.

Either remove the "/read" reference or explicitly state that only remote write is currently supported:

-	//   - External integrations (remote write, additional alertmanagers)
+	//   - External integrations (remote write to long-term storage, additional alertmanagers)

Likely an incorrect or invalid review comment.

config/v1alpha1/zz_generated.deepcopy.go (1)

14-39: LGTM - Autogenerated deepcopy methods are correct.

The autogenerated deepcopy implementations properly handle:

  1. New type support: All new types introduced in this PR have corresponding DeepCopyInto and DeepCopy methods:

    • AdditionalAlertmanagerConfig (lines 14-39)
    • ExternalLabels (lines 460-479)
    • Label (lines 734-748)
    • PrometheusK8sConfig (lines 994-1061)
    • RelabelConfig (lines 1089-1128)
    • RemoteWriteSpec (lines 1130-1156)
    • SecretKeySelector (lines 1216-1230)
    • TLSConfig (lines 1253-1270)
  2. Correct field propagation: ClusterMonitoringSpec.DeepCopyInto (line 394) properly calls in.PrometheusK8sConfig.DeepCopyInto(&out.PrometheusK8sConfig) to deep copy the new field.

  3. Proper handling of complex fields:

    • Slices are allocated with correct length and elements are copied
    • Maps are allocated with correct capacity and entries are copied
    • Pointers are allocated and dereferenced properly
    • Nested structs call their own DeepCopyInto methods

The code generation appears to have correctly processed all the new API types.

Also applies to: 394-394, 460-479, 734-748, 994-1061, 1089-1128, 1130-1156, 1216-1230, 1253-1270

Comment on lines +451 to +464
// enforcedBodySizeLimit enforces a body size limit for Prometheus scraped metrics. If a scraped
// target's body response is larger than the limit, the scrape will fail.
// The following values are valid:
// a numeric value in Prometheus size format (such as "4MB", "1000", "1GB", "512KB", "100B")
// or the string `automatic`, which indicates that the limit will be
// automatically calculated based on cluster capacity.
// To specify no limit, omit this field.
// The value must match the following pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$
// Minimum length is 1 character.
// Maximum length is 50 characters.
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=50
// +optional
EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation contradicts validation for empty values.

The field comment states "To specify no limit, omit this field" which is correct, but there's still confusion about whether empty string is valid. The +kubebuilder:validation:MinLength=1 marker means empty string is not allowed, yet some readers might interpret "omit" to mean either nil or empty string.

Clarify that omission (nil) is the only way to specify "no limit":

 	// To specify no limit, omit this field.
+	// Note: Empty string ("") is not allowed; omission means the field is not set at all.
 	// The value must match the following pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$

Additionally, past review comments suggested adding CEL validation for the pattern rather than just documenting it. Consider adding:

// +kubebuilder:validation:XValidation:rule="self.matches('^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$')",message="must be 'automatic' or a size like '4MB', '512KB', '100B'"
🤖 Prompt for AI Agents
In config/v1alpha1/types_cluster_monitoring.go around lines 451-464, the field
comment is ambiguous about "omit this field" vs empty string and the MinLength=1
validator disallows empty string; update the comment to explicitly state that
only omission (nil) denotes "no limit" and that an empty string is not allowed,
and add a kubebuilder XValidation CEL rule to enforce the documented pattern
(e.g. self == null || self.matches('^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$')) with
a clear message like "must be 'automatic' or a size like '4MB', '512KB', '100B'
or omitted to indicate no limit", ensuring it complements the existing
MinLength/MaxLength markers.

Comment on lines +606 to +610
type AlertmanagerAPIVersion string

const (
AlertmanagerAPIVersionV2 AlertmanagerAPIVersion = "v2"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove misleading "default" language from required apiVersion field.

The apiVersion field (lines 623-627) is marked as +required, meaning users must explicitly provide it. However, past documentation suggested it has a "default value" which is contradictory.

Since the field is required and currently only allows one value (v2), the concept of a "default" doesn't apply. The current enum design is likely future-proofing for when additional API versions might be supported.

The documentation should simply state:

 	// apiVersion defines the Alertmanager API version to target.
-	// Allowed values: "v2". "v1" is no longer supported.
+	// Currently, only "v2" is supported. "v1" is no longer supported.
 	// +kubebuilder:validation:Enum=v2

Alternatively, if you want v2 to be the implied default for backward compatibility in the future, consider making the field optional with a default:

// +kubebuilder:default=v2
// +optional

Also applies to: 623-627

🤖 Prompt for AI Agents
In config/v1alpha1/types_cluster_monitoring.go around lines 623-627, remove any
wording in the field comment that claims the apiVersion has a "default" value
since the field is annotated +required and users must supply it; update the doc
comment to simply state the accepted enum values (currently "v2") and that the
field is required, or alternatively if you want v2 to be an implied default
change the field to optional and add +kubebuilder:default=v2 and +optional
instead (do not leave contradictory "default" language).

Comment on lines +633 to +649
// pathPrefix defines an optional URL path prefix to prepend to the Alertmanager API endpoints.
// For example, if your Alertmanager is behind a reverse proxy at "/alertmanager/",
// set this to "/alertmanager" so requests go to "/alertmanager/api/v1/alerts" instead of "/api/v1/alerts".
// This is commonly needed when Alertmanager is deployed behind ingress controllers or load balancers.
// +kubebuilder:validation:MaxLength=255
// +kubebuilder:validation:MinLength=1
// +optional
PathPrefix string `json:"pathPrefix,omitempty"`
// scheme defines the URL scheme to use when communicating with Alertmanager
// instances.
// Possible values are `HTTP` or `HTTPS`.
// When omitted, defaults to `HTTP`.
// +kubebuilder:validation:Enum=HTTP;HTTPS
// +kubebuilder:default=HTTP
// +optional
Scheme AlertmanagerScheme `json:"scheme,omitempty"`
// staticConfigs is a list of statically configured Alertmanager endpoints in the form
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix documentation inconsistencies in AdditionalAlertmanagerConfig.

Two issues here:

  1. PathPrefix example references wrong API version: Line 635 mentions /api/v1/alerts but the apiVersion field only allows v2. The example should reference v2 endpoints.

  2. Scheme field marked as both required and having a default: Line 644 states "When omitted, defaults to HTTP" but line 646 marks it as +required, meaning it cannot be omitted.

Apply these fixes:

 	// For example, if your Alertmanager is behind a reverse proxy at "/alertmanager/",
-	// set this to "/alertmanager" so requests go to "/alertmanager/api/v1/alerts" instead of "/api/v1/alerts".
+	// set this to "/alertmanager" so requests go to "/alertmanager/api/v2/alerts" instead of "/api/v2/alerts".

For the scheme field, either:

  • Remove +required and keep the default behavior, OR
  • Remove the "When omitted, defaults to HTTP" text and keep it required

Based on the +kubebuilder:default=HTTP marker, it seems the intent is to have a default, so removing +required would be more appropriate.

🤖 Prompt for AI Agents
In config/v1alpha1/types_cluster_monitoring.go around lines 633-649, update the
PathPrefix example to reference v2 endpoints (e.g., use
"/alertmanager/api/v2/alerts" instead of "/api/v1/alerts") to match the
apiVersion allowed, and remove the +kubebuilder:validation:Required marker from
the Scheme field so it can be omitted and rely on the existing
+kubebuilder:default=HTTP and the comment "When omitted, defaults to `HTTP`".

Comment on lines +1366 to +1376
scheme:
default: HTTP
description: |-
scheme defines the URL scheme to use when communicating with Alertmanager
instances.
Possible values are `HTTP` or `HTTPS`.
When omitted, defaults to `HTTP`.
enum:
- HTTP
- HTTPS
type: string
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for the scheme field definition in the alertmanager configuration types
find config/v1alpha1 -name "*.go" | head -20

Repository: openshift/api

Length of output: 446


🏁 Script executed:

#!/bin/bash
# Look for alertmanager-related struct definitions and scheme defaults
rg -n "AlertmanagerConfig|scheme" config/v1alpha1/ --type go -A 5 -B 5 | head -100

Repository: openshift/api

Length of output: 15409


🏁 Script executed:

#!/bin/bash
# Search more specifically for the scheme field in alertmanager structs
rg -n "Scheme.*string|scheme.*HTTP" config/v1alpha1/ --type go

Repository: openshift/api

Length of output: 372


🏁 Script executed:

#!/bin/bash
# Extract the AdditionalAlertmanagerConfig struct definition and surrounding context
sed -n '500,750p' config/v1alpha1/types_cluster_monitoring.go | cat -n

Repository: openshift/api

Length of output: 14320


🏁 Script executed:

#!/bin/bash
# Look for the complete AlertmanagerScheme type and AdditionalAlertmanagerConfig definition
rg -n -A 30 "type AdditionalAlertmanagerConfig struct" config/v1alpha1/types_cluster_monitoring.go

Repository: openshift/api

Length of output: 1878


🏁 Script executed:

#!/bin/bash
# Check for any comments or documentation about the scheme field default
rg -n -B 5 -A 5 "scheme.*HTTP|HTTP.*scheme" config/v1alpha1/types_cluster_monitoring.go

Repository: openshift/api

Length of output: 39


Change the default scheme from HTTP to HTTPS in the source type definition.

The scheme field defaults to HTTP in the AdditionalAlertmanagerConfig struct (line 646 of config/v1alpha1/types_cluster_monitoring.go via the +kubebuilder:default=HTTP tag). This means alert data will be transmitted unencrypted by default when connecting to additional Alertmanager instances. Since alerts can contain sensitive operational information, defaulting to unencrypted transmission degrades security posture.

Update the source Go type definition to default to HTTPS instead. Change +kubebuilder:default=HTTP to +kubebuilder:default=HTTPS in config/v1alpha1/types_cluster_monitoring.go, then regenerate the manifest.

🤖 Prompt for AI Agents
In config/v1alpha1/types_cluster_monitoring.go around line 646 and in the
generated manifest
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
(lines ~1366-1376), update the kubebuilder default on the
AdditionalAlertmanagerConfig.scheme from +kubebuilder:default=HTTP to
+kubebuilder:default=HTTPS in the Go source (types_cluster_monitoring.go), then
run the codegen/CRD manifest generation step used by the repo to regenerate the
ClusterMonitoringConfig.yaml so the CRD default in the manifest reflects HTTPS.

Addresses feedback from PR reviewers and linting issues.

Signed-off-by: Daniel Mellado <dmellado@fedoraproject.org>
@danielmellado danielmellado force-pushed the prometheus_k8s_config_api branch from 0643f0d to 99db558 Compare December 16, 2025 15:19
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 16, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)

1366-1376: Security concern: Default scheme should be HTTPS, not HTTP.

This was previously flagged. Alertmanager connections will transmit alert data (potentially sensitive operational information) over unencrypted HTTP by default. Update the source Go type to use +kubebuilder:default=HTTPS and regenerate.

config/v1alpha1/types_cluster_monitoring.go (2)

288-291: Update LogLevel type documentation to be component-agnostic.

The LogLevel type is used by both AlertmanagerCustomConfig (line 174) and PrometheusK8sConfig (line 481), but the documentation specifically mentions "Alertmanager". This creates confusion when users reference the type documentation in the Prometheus context.

Apply this diff to make the documentation more generic:

-// LogLevel defines the verbosity of logs emitted by Alertmanager.
+// LogLevel defines the verbosity of logs emitted by monitoring components.
 // Valid values are Error, Warn, Info and Debug.

633-640: Fix API version in pathPrefix example.

The example at line 635 references /api/v1/alerts, but the apiVersion field only allows v2. The example should reference the v2 endpoint to be consistent with the API constraints.

Apply this diff:

 	// pathPrefix defines an optional URL path prefix to prepend to the Alertmanager API endpoints.
 	// For example, if your Alertmanager is behind a reverse proxy at "/alertmanager/",
-	// set this to "/alertmanager" so requests go to "/alertmanager/api/v1/alerts" instead of "/api/v1/alerts".
+	// set this to "/alertmanager" so requests go to "/alertmanager/api/v2/alerts" instead of "/api/v2/alerts".
 	// This is commonly needed when Alertmanager is deployed behind ingress controllers or load balancers.
🧹 Nitpick comments (11)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)

392-420: Consider quoting resource quantities for consistency.

The resource quantity values (e.g., 100m, 500m, 128Mi, 512Mi) are currently unquoted. While most YAML parsers handle these correctly, explicitly quoting them aligns with Kubernetes conventions and ensures consistent parsing across different tools.

Apply this diff to quote the resource quantities:

           prometheusK8sConfig:
             resources:
               - name: cpu
-                request: 100m
-                limit: 500m
+                request: "100m"
+                limit: "500m"
               - name: memory
-                request: 128Mi
-                limit: 512Mi
+                request: "128Mi"
+                limit: "512Mi"
       expected: |
         apiVersion: config.openshift.io/v1alpha1
         kind: ClusterMonitoring
         spec:
           userDefined:
             mode: "Disabled"
           prometheusK8sConfig:
             resources:
               - name: cpu
-                request: 100m
-                limit: 500m
+                request: "100m"
+                limit: "500m"
               - name: memory
-                request: 128Mi
-                limit: 512Mi
+                request: "128Mi"
+                limit: "512Mi"
config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (4)

1382-1390: staticConfigs regex allows invalid port numbers.

The regex '^[a-zA-Z0-9.-]+:[0-9]+$' permits port numbers like 0 or 99999 which are invalid. Consider tightening to validate port range (1-65535) or rely on the operator to validate at runtime.


1539-1553: Missing pattern validation for enforcedBodySizeLimit.

The description documents the pattern ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$ but no pattern: constraint enforces it. Invalid values like invalid or 5MB5 would be accepted by the schema and only fail at runtime.

+                    pattern: ^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$
                     maxLength: 50
                     minLength: 1
                     type: string

1855-1866: Missing pattern validation for retention.

The description specifies the format [0-9]+(ms|s|m|h|d|w|y) but no schema pattern: enforces it. Add pattern validation to catch invalid values at admission time:

+                    pattern: ^[0-9]+(ms|s|m|h|d|w|y)$
                     maxLength: 20
                     minLength: 1
                     type: string

1635-1654: queryLogFile validation doesn't enforce /dev/ path restrictions.

The description states only /dev/stderr, /dev/stdout, and /dev/null are supported, but the CEL rule permits any /dev/* path. Users could specify /dev/random which would pass validation but fail at runtime.

Consider adding validation:

x-kubernetes-validations:
- message: must be an absolute path starting with '/' or a simple filename without '/'
  rule: self.startsWith('/') || !self.contains('/')
- message: only /dev/stderr, /dev/stdout, and /dev/null are supported under /dev/
  rule: "!self.startsWith('/dev/') || self in ['/dev/stderr', '/dev/stdout', '/dev/null']"
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (5)

1391-1397: Add pattern validation for Go duration format.

The timeout field accepts duration strings but lacks validation. Users can set invalid values like "30x" or "abc" that will fail at runtime.

Add a CEL validation rule to enforce Go duration format:

                        timeout:
                          description: |-
                            timeout defines the timeout value used when sending alerts.
                            The value must be a valid Go time.Duration string (e.g. 30s, 5m, 1h).
                          maxLength: 20
                          minLength: 2
                          type: string
+                         x-kubernetes-validations:
+                         - message: must be a valid Go duration (e.g., 30s, 5m, 1h)
+                           rule: self.matches('^[0-9]+(ns|us|µs|ms|s|m|h)$')

1673-1679: Add pattern validation for Go duration format.

The remoteTimeout field accepts duration strings but lacks validation. Users can set invalid values that will fail at runtime.

Add a CEL validation rule to enforce Go duration format:

                        remoteTimeout:
                          description: |-
                            remoteTimeout is the timeout for requests to the remote write endpoint.
                            When omitted, the default is 30s.
                          maxLength: 20
                          minLength: 2
                          type: string
+                         x-kubernetes-validations:
+                         - message: must be a valid Go duration (e.g., 30s, 5m, 1h)
+                           rule: self.matches('^[0-9]+(ns|us|µs|ms|s|m|h)$')

1855-1866: Add pattern validation for retention duration format.

The retention field documentation specifies the pattern [0-9]+(ms|s|m|h|d|w|y) but no validation enforces it. Users can set invalid values like "15days" instead of "15d".

Add a CEL validation rule to enforce the documented pattern:

                  retention:
                    description: |-
                      retention defines the duration for which Prometheus retains data.
                      This definition must be specified using the following regular
                      expression pattern: `[0-9]+(ms|s|m|h|d|w|y)` (ms = milliseconds,
                      s= seconds,m = minutes, h = hours, d = days, w = weeks, y = years).
                      When omitted, this means the user has no opinion and the platform is left
                      to choose reasonable defaults, which are subject to change over time.
                      The default value is `15d`.
                    maxLength: 20
                    minLength: 1
                    type: string
+                   x-kubernetes-validations:
+                   - message: must match pattern [0-9]+(ms|s|m|h|d|w|y), e.g., 15d, 2w, 6m
+                     rule: self.matches('^[0-9]+(ms|s|m|h|d|w|y)$')

1867-1874: Add pattern validation for Kubernetes quantity format.

The retentionSize field accepts Kubernetes quantity formats (e.g., Mi, Gi, Ti) but lacks validation. Users can set invalid values like "5 GB" (space not allowed) or "100gigabytes".

Add the standard Kubernetes quantity pattern validation:

                  retentionSize:
                    description: |-
                      retentionSize specifies the maximum volume of persistent storage that Prometheus uses for data blocks and the write-ahead log (WAL).
                      Acceptable values use standard Kubernetes resource quantity formats, such as `Mi`, `Gi`, `Ti`, etc.
                      When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
                      The default is no storage size limit is enforced and Prometheus will use the available storage capacity of the PersistentVolume.
                    maxLength: 20
                    type: string
+                   pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
+                   x-kubernetes-validations:
+                   - message: must be a valid Kubernetes quantity (e.g., 5Gi, 100Mi)
+                     rule: isQuantity(self)

1635-1654: Add validation to prevent unsupported /dev/ paths.

The description states that writing to /dev/ paths other than /dev/stderr, /dev/stdout, or /dev/null is not supported, but the validation only checks if the path is absolute or a simple filename. Users can set unsupported paths like /dev/random or /dev/zero.

Add a CEL validation to prevent unsupported /dev/ paths:

                  queryLogFile:
                    description: |-
                      queryLogFile specifies the file to which PromQL queries are logged.
                      This setting can be either a filename, in which
                      case the queries are saved to an `emptyDir` volume
                      at `/var/log/prometheus`, or a full path to a location where
                      an `emptyDir` volume will be mounted and the queries saved.
                      Writing to `/dev/stderr`, `/dev/stdout` or `/dev/null` is supported, but
                      writing to any other `/dev/` path is not supported. Relative paths are
                      also not supported.
                      By default, PromQL queries are not logged.
                      Must be an absolute path starting with `/` or a simple filename without path separators.
                      Must be between 1 and 255 characters in length.
                    maxLength: 255
                    minLength: 1
                    type: string
                    x-kubernetes-validations:
                    - message: must be an absolute path starting with '/' or a simple
                        filename without '/'
                      rule: self.startsWith('/') || !self.contains('/')
+                   - message: /dev/ paths other than /dev/stderr, /dev/stdout, or /dev/null are not supported
+                     rule: '!self.startsWith(''/dev/'') || self in [''/dev/stderr'', ''/dev/stdout'', ''/dev/null'']'
config/v1alpha1/types_cluster_monitoring.go (1)

596-603: Consider using a custom type instead of core/v1.PersistentVolumeClaim.

Using *v1.PersistentVolumeClaim directly limits the ability to enforce admission-time validations specific to this use case. OpenShift APIs typically prefer custom reference types that allow for more granular validation.

Based on past review feedback (comment at lines 603-603), there is precedent in the openshift/api repository for wrapping persistent volume references in custom types. This would enable:

  • More specific validation rules for this context
  • Better documentation tailored to Prometheus usage
  • Ability to restrict which PVC fields are relevant

Example from insights API:

// persistentVolume is an optional field that specifies the PersistentVolume that will be used to store the Insights data archive.
// The PersistentVolume must be created in the openshift-insights namespace.
// +optional
PersistentVolume *PersistentVolumeConfig `json:"persistentVolume,omitempty"`

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0643f0d and 99db558.

📒 Files selected for processing (12)
  • config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1 hunks)
  • config/v1alpha1/types_cluster_monitoring.go (3 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)
  • config/v1alpha1/zz_generated.deepcopy.go (8 hunks)
  • config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1 hunks)
  • config/v1alpha1/zz_generated.swagger_doc_generated.go (4 hunks)
  • openapi/generated_openapi/zz_generated.openapi.go (13 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
  • config/v1alpha1/zz_generated.swagger_doc_generated.go
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
  • openapi/generated_openapi/zz_generated.openapi.go
  • config/v1alpha1/types_cluster_monitoring.go
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.deepcopy.go
🧬 Code graph analysis (1)
config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
config/v1alpha1/types_cluster_monitoring.go (8)
  • AdditionalAlertmanagerConfig (622-668)
  • ExternalLabels (690-700)
  • Label (671-687)
  • PrometheusK8sConfig (436-604)
  • RelabelConfig (733-784)
  • RemoteWriteSpec (703-730)
  • SecretKeySelector (820-835)
  • TLSConfig (787-816)
🔇 Additional comments (17)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (2)

354-370: LGTM!

The test correctly uses the capitalized enum value "Info" for logLevel, matching the CRD's allowed values (Error, Warn, Info, Debug). This addresses the previous review comment about case sensitivity.


371-778: Excellent test coverage!

The test suite comprehensively validates PrometheusK8sConfig behavior:

  • Valid configurations for all major fields (nodeSelector, tolerations, topologySpreadConstraints, additionalAlertmanagerConfigs, remoteWrite, externalLabels)
  • Appropriate rejection tests for boundary violations (empty objects, exceeding maxItems limits, violating minItems constraints)
  • Clear, actionable error messages in all rejection scenarios

The tests align well with the CRD validation rules and provide strong coverage for the new PrometheusK8sConfig API surface.

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (2)

1603-1620: LGTM!

The logLevel description correctly refers to "Prometheus" rather than "Alertmanager", addressing the previous review comment. The field definition properly specifies the allowed enum values (Error, Warn, Info, Debug) and provides clear documentation about each log level's behavior.


1287-2529: Well-structured CRD schema for PrometheusK8sConfig.

The generated OpenAPI schema is comprehensive and follows Kubernetes API conventions:

  • Proper validation constraints (maxItems: 10 for arrays, length limits, enum values, patterns)
  • Correct list-type annotations (map, atomic, set) for appropriate array behavior
  • Standard Kubernetes types reused where applicable (PersistentVolumeClaim, Toleration, TopologySpreadConstraint)
  • Clear field descriptions explaining purpose, defaults, and allowed values
  • Appropriate minProperties: 1 constraint to prevent empty configuration objects

The schema provides a robust foundation for the PrometheusK8sConfig API.

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)

1287-2529: LGTM! Comprehensive and well-structured Prometheus configuration schema.

The prometheusK8sConfig field addition is well-designed with:

  • Clear, detailed field descriptions explaining purpose and defaults
  • Comprehensive validation rules (patterns, length constraints, enums)
  • Proper security options (TLS configuration, bearer token authentication)
  • Reasonable constraints (max 10 items for most lists)
  • Consistent use of Kubernetes API conventions

Previous review concerns (enum case, field documentation) appear to have been properly addressed based on the commit markers in past reviews.

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (2)

1287-1300: Well-documented API surface.

The prometheusK8sConfig field and its nested properties are thoroughly documented with clear descriptions, use cases, and behavioral expectations. The documentation improvements from previous reviews (collectionProfile enum values, logLevel component references) have been successfully addressed.


1300-1301: Appropriate constraints protect against configuration explosion.

The API design includes sensible limits (e.g., maxItems: 10 for additionalAlertmanagerConfigs, remoteWrite endpoints, and resource entries) and requires meaningful configuration via minProperties: 1. These constraints prevent performance degradation from overly complex configurations while remaining flexible for most production use cases.

Also applies to: 1522-1523, 1774-1778, 1849-1854

config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)

1287-1300: New prometheusK8sConfig API structure looks well-designed.

The field organization follows established patterns from alertmanagerConfig and metricsServerConfig. Good use of list-type maps with appropriate keys for uniqueness constraints, consistent validation rules, and comprehensive documentation. The minProperties constraint ensures meaningful configurations.

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (2)

1603-1620: Previous issue resolved.

The logLevel description now correctly references "Prometheus" instead of "Alertmanager". The field documentation is accurate and complete.


1287-2529: LGTM! Well-structured generated schema.

The prometheusK8sConfig schema is comprehensive and properly validated with:

  • Appropriate validation constraints (length limits, patterns, enums)
  • Secure defaults (e.g., certificate verification enabled)
  • Clear documentation for all fields
  • Proper Kubernetes CRD conventions

Since this is a generated file, the schema appears correctly generated from the API types.

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)

1287-2529: PrometheusK8sConfig CRD block looks consistent with Go types and validation markers

The prometheusK8sConfig schema (including additionalAlertmanagerConfigs, externalLabels, remoteWrite, TLS/secret selectors, resources, retention, scheduling knobs, and PVC template) aligns with the referenced Go types and kubebuilder validations. List semantics, enums, and x-kubernetes validations all look correct and coherent with the intended API surface. I don't see issues here.

config/v1alpha1/zz_generated.swagger_doc_generated.go (1)

121-201: Swagger docs for PrometheusK8sConfig and related types are consistent and accurate

The newly added SwaggerDoc maps for AdditionalAlertmanagerConfig, ExternalLabels, Label, PrometheusK8sConfig, RelabelConfig, RemoteWriteSpec, SecretKeySelector, and TLSConfig, plus the ClusterMonitoringSpec.prometheusK8sConfig entry, all line up with the Go type comments and the CRD schema (fields, enums, and behavior). This should give consumers clear, correct API documentation without introducing inconsistencies.

Also applies to: 222-325

openapi/generated_openapi/zz_generated.openapi.go (4)

424-474: LGTM: Schema function registrations are correct.

All new v1alpha1 monitoring types (AdditionalAlertmanagerConfig, ExternalLabels, Label, PrometheusK8sConfig, RelabelConfig, RemoteWriteSpec, SecretKeySelector, TLSConfig) are properly registered in the GetOpenAPIDefinitions map with consistent naming and alphabetical ordering.


21227-21302: LGTM: AdditionalAlertmanagerConfig and ClusterMonitoringSpec changes are well-structured.

The schema definition for AdditionalAlertmanagerConfig properly specifies required fields (apiVersion, staticConfigs), uses appropriate list-type annotations, and correctly declares dependencies on SecretKeySelector and TLSConfig. The addition of prometheusK8sConfig to ClusterMonitoringSpec follows existing patterns and includes comprehensive documentation.

Also applies to: 21918-21936


22032-22068: LGTM: ExternalLabels, Label, and PrometheusK8sConfig schemas are correct.

The schemas are well-structured with proper list-map annotations, detailed descriptions, and correct dependencies. The previously flagged logLevel description inconsistency (line 22972) has been resolved—it now correctly refers to "logs emitted by Prometheus" rather than Alertmanager.

Based on past review comments, the logLevel description issue was addressed in commit 0643f0d.

Also applies to: 22462-22488, 22930-23119


23149-23272: LGTM: RelabelConfig, RemoteWriteSpec, SecretKeySelector, and TLSConfig schemas are secure and well-defined.

All schemas follow consistent patterns with proper annotations. SecretKeySelector is correctly marked as atomic (x-kubernetes-map-type: atomic), and TLSConfig appropriately uses SecretKeySelector references for sensitive fields (ca, cert, key), ensuring secure handling of credentials.

Also applies to: 23368-23399, 23431-23479

config/v1alpha1/types_cluster_monitoring.go (1)

437-450: Reconsider list type for multi-owner scenarios.

The additionalAlertmanagerConfigs field uses listType=atomic, which prevents multiple owners from managing separate entries via server-side apply. Past review discussions (everettraven, simonpasquier) highlighted that enabling multiple owners is important - for example, ACM wanting to manage its own entries while cluster admins manage others.

However, there's no clear unique key for listType=map since:

  • apiVersion is an enum with only one value (v2)
  • staticConfigs can vary for the same logical Alertmanager

Consider whether the current atomic list type adequately supports the intended multi-owner use cases, or if a different approach (such as introducing a required name field as a unique identifier) would better serve users.

Based on past review discussions at lines 450-450 in previous comments.

Comment on lines +1366 to +1376
scheme:
default: HTTP
description: |-
scheme defines the URL scheme to use when communicating with Alertmanager
instances.
Possible values are `HTTP` or `HTTPS`.
When omitted, defaults to `HTTP`.
enum:
- HTTP
- HTTPS
type: string
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reconsider HTTP as the default scheme for Alertmanager connections.

The scheme field defaults to HTTP for connections to additional Alertmanager instances. In production monitoring environments, alerts may contain sensitive information (e.g., error messages, system details), and communication with alert endpoints should use encrypted channels by default.

Consider defaulting to HTTPS instead, or at minimum, add prominent documentation warning that HTTP transmits alerts and credentials in cleartext.

🤖 Prompt for AI Agents
In
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
around lines 1366-1376, the scheme default is set to HTTP; change the default to
HTTPS and update the description text to state that the default is HTTPS and
that HTTPS encrypts alert payloads and credentials; retain the enum values
(HTTP, HTTPS) but ensure the textual docs include a prominent warning that using
HTTP transmits alerts and credentials in cleartext and should only be used for
testing or within trusted networks.

Comment on lines +1383 to +1388
maxLength: 255
type: string
x-kubernetes-validations:
- message: must be in the format 'host:port' (e.g., 'alertmanager.example.com:9093')
rule: self.matches('^[a-zA-Z0-9.-]+:[0-9]+$')
maxItems: 10
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Strengthen host:port validation to prevent invalid hostnames.

The regex ^[a-zA-Z0-9.-]+:[0-9]+$ permits invalid hostnames that start or end with hyphens or dots (e.g., -example.com:9093 or example-.com:9093). While Kubernetes' own validation may catch this downstream, tightening the pattern here would fail-fast with clearer error messages.

Consider a pattern like ^[a-zA-Z0-9]([a-zA-Z0-9.-]*[a-zA-Z0-9])?:[0-9]+$ to ensure hostnames start and end with alphanumeric characters.

🤖 Prompt for AI Agents
In
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
around lines 1383-1388, the existing host:port regex allows hostnames that begin
or end with hyphens or dots; replace the rule with a stricter pattern that
enforces the hostname to start and end with an alphanumeric character (for
example by using a pattern equivalent to
^[a-zA-Z0-9](?:[a-zA-Z0-9.-]*[a-zA-Z0-9])?:[0-9]+$) so hostnames like
-example.com or example-.com are rejected while preserving the port validation.
Ensure the new pattern is placed in the x-kubernetes-validations.rule field and
the validation message remains descriptive.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2025

@marioferh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn 5cef22c link true /test e2e-aws-ovn
ci/prow/e2e-gcp 5cef22c link false /test e2e-gcp
ci/prow/e2e-aws-serial-1of2 5cef22c link true /test e2e-aws-serial-1of2
ci/prow/okd-scos-e2e-aws-ovn 505533b link false /test okd-scos-e2e-aws-ovn
ci/prow/integration 99db558 link true /test integration

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another handful of comments after another pass.

Just a note that because this PR has been around a while and has many comments that things are getting a bit more difficult to keep track of here.

It may be worth it to open a new PR for this change and essentially start the review cycle fresh so we can keep better track of the review comments.

I'll leave that up to you though.

// By default, no additional Alertmanager instances are configured.
// Maximum of 10 additional Alertmanager configurations can be specified.
// When omitted, no additional Alertmanager instances are configured (default behavior).
// When set to an empty array [], the behavior is the same as omitting the field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, you should not allow setting [] then. Where possible we try to have only one way to "say" something.

In this case, because setting both [] and omitting the field altogether mean the same thing we should just make it so that omitting the field is the only way to signal that you do not want any additional alertmanager instances configured.

You can do this by setting the min items to 1 for this field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll add +kubebuilder:validation:MinItems=1 to enforce that omission is the only way to signal "no additional alertmanagers". This removes the ambiguity between [] and field omission.

// +optional
// +kubebuilder:validation:MaxItems=10
// +listType=atomic
AdditionalAlertmanagerConfigs []AdditionalAlertmanagerConfig `json:"additionalAlertmanagerConfigs,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: naming, prometheusConfig makes sense to me - although that looks like maybe it is associated with a different thread.

@danielmellado It looks like this thread still needs to be addressed by updating the list type from atomic to map and setting the appropriate uniqueness keys.

// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=50
// +optional
EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are going to be continuing with the size format approach as opposed to using explicit units, yes.

}

// ExternalLabels represents labels to be added to time series and alerts.
type ExternalLabels struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - I missed this on my last review round.

This does look funky. I'd expect that we probably just want externalLabels to be typed as a slice of Labels instead of this nesting.

Comment on lines +498 to +500
// Writing to `/dev/stderr`, `/dev/stdout` or `/dev/null` is supported, but
// writing to any other `/dev/` path is not supported. Relative paths are
// also not supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to be enforced by validations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - added CEL validation to enforce the /dev/ path restriction: !self.startsWith('/dev/') || self in ['/dev/stdout', '/dev/stderr', '/dev/null']. This ensures only the documented /dev/ paths are accepted while rejecting arbitrary /dev/ paths.

// +optional
// +kubebuilder:validation:MaxLength=10
Separator *string `json:"separator,omitempty"`
// regex is the regular expression to match against the concatenated source label values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't hurt to add this.

Is there a reasonable way for us to ensure that the regular expression only uses characters that are valid in an RE2 regular expression?

// +optional
// +kubebuilder:validation:MaxLength=63
TargetLabel *string `json:"targetLabel,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ensure that is appropriately enforced through validations.

// +optional
// +kubebuilder:validation:Enum=Verify;SkipVerify
// +kubebuilder:default=Verify
CertificateVerification string `json:"certificateVerification,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we typically encourage the use of a type alias for enums

// SecretKeySelector selects a key of a Secret.
// +structType=atomic
type SecretKeySelector struct {
// name is the name of the secret in the same namespace to select from.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "same" namespace?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified that secrets are referenced from the openshift-monitoring namespace (where the platform monitoring stack is deployed).

// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="must be a valid secret name (lowercase alphanumeric characters, '-' or '.', start and end with alphanumeric)"
Name string `json:"name,omitempty"`
// key is the key of the secret to select from. Must be a valid secret key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes a "valid secret key"? Are there formatting constraints that can be enforced here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added validation ;) Secret keys must match ^[a-zA-Z0-9._-]+$ (alphanumeric, -, _, or .). This is the standard Kubernetes secret key format.

@danielmellado
Copy link
Contributor

danielmellado commented Dec 16, 2025 via email

@danielmellado
Copy link
Contributor

Thanks @everettraven! I'll put up a new PR with the updates in the comments. Even GH API is being a bit sluggish and I'm starting to have a hard time getting to it due to it by default not loading more than 40 comments. There were a LOT before a took this xD

}

// RemoteWriteSpec represents configuration for remote write endpoints.
type RemoteWriteSpec struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We lack all the queue configuration which is another big struct.

@simonpasquier
Copy link

Thanks @everettraven! I'll put up a new PR with the updates in the comments. Even GH API is being a bit sluggish and I'm starting to have a hard time getting to it due to it by default not loading more than 40 comments. There were a LOT before a took this xD

Chatted offline with @danielmellado: I would even go one step further and split the PR into multiple ones, each PR focusing on a sub-part of the configuration which can reasonably be reviewed. For instance, remote write config only has dozens of fields.

@danielmellado
Copy link
Contributor

danielmellado commented Dec 17, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants