Use PromQL info function instead of resource attribute promotion #2869

aknuds1 · 2026-01-08T09:57:20Z

Changes

Changing the demo to use the PromQL info function instead of configuring Prometheus to promote resource attributes, for a more light weight approach aligning with Prometheus recommendations.

Bear also in mind that Prometheus might in the future store OTel resource attributes as native metadata - this PR would prepare for that because info should keep working.

Merge Requirements

For new features contributions, please make sure you have completed the following
essential items:

CHANGELOG.md updated to document new feature additions
Appropriate documentation updates in the docs
Appropriate Helm chart updates in the helm-charts

Maintainers will not merge until the above have been completed. If you're unsure
which docs need to be changed ping the
@open-telemetry/demo-approvers.

cyrille-leclerc · 2026-01-08T15:53:02Z

docker-compose.yml

      - OTEL_EXPORTER_OTLP_ENDPOINT
      - OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE
-      - OTEL_RESOURCE_ATTRIBUTES
+      - OTEL_RESOURCE_ATTRIBUTES=${OTEL_RESOURCE_ATTRIBUTES},service.instance.id=checkout


service.instance.id is expected to be generated by SDKs or derived from the K8s environment, moreover, it should be a GUID.
Specs: https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id
K8s naming specs: https://opentelemetry.io/docs/specs/semconv/non-normative/k8s-attributes/

cyrille-leclerc · 2026-01-08T16:01:29Z

src/otel-collector/otelcol-config.yml

+  resource/postgresql:
+    attributes:
+      - key: service.name
+        value: postgresql
+        action: upsert
+      - key: service.instance.id
+        value: ${env:POSTGRES_HOST}
+        action: upsert


Reading the service.name specs here, we could try to broaden the requirements in specs to also define service.name in infrastructure monitoring use cases and convince OTel Collector Receiver maintainers to adopt this but today, no infrastructure monitoring receiver produces service.name or service.instance.id

cyrille-leclerc · 2026-01-08T16:04:13Z

src/otel-collector/otelcol-config.yml

+      - context: resource
+        statements:
+          # Set service.instance.id to service.name if not already set (needed for Prometheus info() joins)
+          - set(attributes["service.instance.id"], attributes["service.name"]) where attributes["service.instance.id"] == nil and attributes["service.name"] != nil


We would have collision if the same service type (eg a redis) is running multiple times. For infra monitoring metrics, we commonly use attributes like host.name... to differentiate the instances.

jmichalek132 · 2026-01-08T16:10:02Z

CLAUDE.md

@@ -0,0 +1,136 @@
+# CLAUDE.md


not sure you want to commit this file?

Good point @jmichalek132 - it's useful while working out the PR however - I would remove it before finalizing the PR.

CLAUDE.md

jmichalek132 · 2026-01-08T16:23:21Z

Overall looks good outside of what @cyrille-leclerc already pointed out, did you test this locally (given lot of changes to the queries even just re-formatting) do all of the panel still show metrics? Would be potentially nice to show screenshots of it.

aknuds1 · 2026-01-08T16:28:37Z

did you test this locally (given lot of changes to the queries even just re-formatting) do all of the panel still show metrics?

@jmichalek132 I did some simple testing locally, but I already don't know the demo much, so I'm not a very effective tester :/ Do you know the demo well enough to look for discrepancies?

I did fix the bugs I could find from checking the APM and PostegreSQL dashboards, on Docker Compose.

aknuds1 · 2026-01-08T16:40:45Z

As discussed offline with @cyrille-leclerc, it might be better to implement instance label synthesis in Prometheus' OTLP endpoint, based on user configuration, instead of in OTel Collector config (because this would be a hurdle to users).

ldufr · 2026-01-09T14:32:07Z

src/otel-collector/otelcol-config.yml

+  transform/postgresql:
+    error_mode: ignore
+    metric_statements:
+      - context: resource
+        statements:
+          # Construct unique service.instance.id based on PostgreSQL resource scope.
+          # The PostgreSQL receiver sets postgresql.database.name, postgresql.table.name,
+          # postgresql.index.name as resource attributes, creating multiple target_info
+          # entries with the same identifying labels. By including these in service.instance.id,
+          # each scope gets a unique target_info, allowing info() to work correctly.
+          - set(attributes["service.instance.id"], Concat([attributes["service.name"], "/", attributes["postgresql.database.name"], "/", attributes["postgresql.table.name"], "/", attributes["postgresql.index.name"]], "")) where attributes["postgresql.index.name"] != nil
+          - set(attributes["service.instance.id"], Concat([attributes["service.name"], "/", attributes["postgresql.database.name"], "/", attributes["postgresql.table.name"]], "")) where attributes["postgresql.table.name"] != nil and attributes["postgresql.index.name"] == nil
+          - set(attributes["service.instance.id"], Concat([attributes["service.name"], "/", attributes["postgresql.database.name"]], "")) where attributes["postgresql.database.name"] != nil and attributes["postgresql.table.name"] == nil
+          - set(attributes["service.instance.id"], attributes["service.name"]) where attributes["postgresql.database.name"] == nil


After talking with @aknuds1, we notice that I had a similar idea, though not using service.instance.id (or instance from Prometheus side), but a different resource.uid.

One advantage of using service.instance.id is that it works out of the box for older Prometheus, but it adds additional translation which might create issues. For instance, it does some translation which may be surprising for a user, but it's not clear what will happen if the entity data model is supported. Should the user remove this translation and break the ui?

Using a different attributes has the advantage that while not changing the status-quo, will work better with the introduction of the entity data model and is also a bit more clear. service.instance.id is not always enough, sometime it requires service.name and service.namespace (i.e., job).

We will probably drop this client side synthesis though, in favour of something on the Prometheus OTLP side instead.

Ok, and presumably, the synthesis wouldn't be changing instance, but an other label, correct?

@ldufr The synthesis should generate the instance label, why not? If the OTLP endpoint cannot generate target_info, because the identifying resource attribute triplet isn't present (service.namespace, service.name, service.instance.id), the idea is to have a potential fallback for determining another identifying resource attribute subset (from which to generate target_info and job and instance labels).

Add a dedicated pipeline for PostgreSQL metrics with a resource processor that sets service.name and service.instance.id. This ensures Prometheus generates target_info for PostgreSQL metrics, enabling the info() function to work correctly. Signed-off-by: Arve Knudsen <[email protected]>

Replace info() with explicit group_left + max() joins to handle duplicate target_info series from the PostgreSQL receiver. The PostgreSQL receiver creates multiple target_info entries (one per database/table/index scope), causing info() to fail with "duplicate series for info metric" error. The max() aggregation collapses duplicate target_info series while preserving all k8s and host variable filters for K8s compatibility. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Update PostgreSQL dashboard to use the experimental Prometheus info() function for enriching metrics with resource attributes from target_info. Changes: - Add transform/postgresql processor to generate unique service.instance.id per PostgreSQL resource scope (database, table, index), fixing duplicate target_info entries - Promote k8s.cluster.name and k8s.statefulset.name to metric labels to work around Prometheus info() filtering bug when filter labels exist on input metric - Simplify dashboard queries from verbose group_left + max() to cleaner info() function calls This approach works for both Kubernetes and Docker Compose deployments. Co-Authored-By: Claude Opus 4.5 <[email protected]> Signed-off-by: Arve Knudsen <[email protected]>

Use a custom collector image that generates unique service.instance.id per PostgreSQL resource scope to fix duplicate target_info entries. Co-Authored-By: Claude Opus 4.5 <[email protected]>

The custom collector image now generates unique service.instance.id per PostgreSQL resource scope natively, making this workaround unnecessary. Co-Authored-By: Claude Opus 4.5 <[email protected]> Signed-off-by: Arve Knudsen <[email protected]>

- Remove resource/postgresql processor (custom receiver sets service.name) - Remove metric_statements from transform processor (info() only needs one of service.name or service.instance.id) - Merge PostgreSQL into main metrics pipeline - Remove transform from metrics pipeline (only needed for traces) Signed-off-by: Arve Knudsen <[email protected]>

Promote service.name to metrics to work around a Prometheus bug where info() filtering doesn't work when the filter label already exists on the input metric. This fixes the APM dashboard which filters by service_name. Update PostgreSQL dashboard queries to filter by service_name directly on the metric (first argument to info()) rather than in the target_info filter (second argument), avoiding the same Prometheus bug. TODO: Remove service.name promotion when upgrading to Prometheus >= v3.10.x.

Switch to custom Prometheus image that includes the fix for the info() function filtering bug. Remove the workarounds that were needed: - Remove service.name, k8s.cluster.name, k8s.statefulset.name from promote_resource_attributes in prometheus-config.yaml - Revert PostgreSQL dashboard queries to filter by service_name in the info() second argument instead of the first argument - Update APM dashboard queries to filter by service_name in the info() second argument instead of the first argument The info() function now correctly filters by labels in the second argument even when those labels exist on target_info. Signed-off-by: Arve Knudsen <[email protected]>

Add Helm values overrides and deployment scripts for testing the experimental Prometheus info() function in Kubernetes environments: - values-info-function.yaml: Custom Prometheus image (public) and OTLP config - values-kind.yaml: NodePort service for Kind frontend access - kind-config.yaml: Kind cluster config with frontend port mapping - deploy-kind.sh: All-in-one script for Kind deployment - deploy-info-function.sh: Generic Kubernetes deployment script Co-Authored-By: Claude Opus 4.5 <[email protected]>

Use ghcr.io/aknuds1/otelcontribcol:postgresreceiver-uuid-v0.143.0 which includes the PostgreSQL receiver fix for proper service.name resource attributes. Co-Authored-By: Claude Opus 4.5 <[email protected]>

product-catalog and flagd were getting OOMKilled with the default memory limits. Increase limits to prevent crashes: - product-catalog: 100Mi (up from 20Mi default) - flagd: 500Mi (up from default) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Delete conflicting APM and PostgreSQL dashboards from Helm chart before deploying our custom info() function versions - Increase memory limits for product-catalog (100Mi) and flagd (500Mi) to prevent OOMKill in Kind Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Configure OTel Collector to set host.name from k8s.pod.name using upsert action in the resource processor. This ensures services have their pod name as host_name instead of the collector's hostname. - Set resourcedetection override: false to preserve existing attributes - Disable flagd-ui sidecar which OOMKills even with 1Gi memory limit Co-Authored-By: Claude Opus 4.5 <[email protected]>

github-actions bot added the helm-update-required Requires an update to the Helm chart when released label Jan 8, 2026

aknuds1 force-pushed the arve/prometheus-metadata branch 4 times, most recently from 2cb5b84 to 93d3951 Compare January 8, 2026 10:58

aknuds1 changed the title ~~WIP: Use PromQL info function instead of resource attribute promotion~~ Use PromQL info function instead of resource attribute promotion Jan 8, 2026

cyrille-leclerc reviewed Jan 8, 2026

View reviewed changes

jmichalek132 reviewed Jan 8, 2026

View reviewed changes

CLAUDE.md Outdated Show resolved Hide resolved

aknuds1 force-pushed the arve/prometheus-metadata branch from 93d3951 to 43cdf83 Compare January 8, 2026 16:37

ldufr reviewed Jan 9, 2026

View reviewed changes

aknuds1 force-pushed the arve/prometheus-metadata branch 4 times, most recently from 7210765 to a8a33a0 Compare January 12, 2026 17:11

aknuds1 and others added 7 commits January 13, 2026 08:51

Use custom OTel Collector image with PostgreSQL receiver fix

3431dab

Use a custom collector image that generates unique service.instance.id per PostgreSQL resource scope to fix duplicate target_info entries. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Remove transform/postgresql processor

8ec37e9

The custom collector image now generates unique service.instance.id per PostgreSQL resource scope natively, making this workaround unnecessary. Co-Authored-By: Claude Opus 4.5 <[email protected]> Signed-off-by: Arve Knudsen <[email protected]>

aknuds1 force-pushed the arve/prometheus-metadata branch from 9fd96c8 to 2b30a83 Compare January 13, 2026 07:57

aknuds1 and others added 4 commits January 13, 2026 11:12

Add custom OTel Collector image to Kubernetes deployment

6b3c43b

Use ghcr.io/aknuds1/otelcontribcol:postgresreceiver-uuid-v0.143.0 which includes the PostgreSQL receiver fix for proper service.name resource attributes. Co-Authored-By: Claude Opus 4.5 <[email protected]>

aknuds1 and others added 2 commits January 13, 2026 16:43

Use PromQL info function instead of resource attribute promotion #2869

Are you sure you want to change the base?

Use PromQL info function instead of resource attribute promotion #2869

Uh oh!

Conversation

aknuds1 commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Merge Requirements

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jmichalek132 commented Jan 8, 2026

Uh oh!

aknuds1 commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aknuds1 commented Jan 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aknuds1 commented Jan 8, 2026 •

edited

Loading

aknuds1 commented Jan 8, 2026 •

edited

Loading