Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ OTEL_JAVA_AGENT_VERSION=2.23.0
OPENTELEMETRY_CPP_VERSION=1.24.0

# Dependent images
COLLECTOR_CONTRIB_IMAGE=ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.142.0
COLLECTOR_CONTRIB_IMAGE=ghcr.io/aknuds1/otelcontribcol:postgresreceiver-uuid-v0.143.0
FLAGD_IMAGE=ghcr.io/open-feature/flagd:v0.12.9
GRAFANA_IMAGE=grafana/grafana:12.3.1
JAEGERTRACING_IMAGE=jaegertracing/jaeger:2.12.0
# must also update version field in src/grafana/provisioning/datasources/opensearch.yaml
OPENSEARCH_IMAGE=opensearchproject/opensearch:3.4.0
OPENSEARCH_DOCKERFILE=./src/opensearch/Dockerfile
POSTGRES_IMAGE=postgres:17.6
PROMETHEUS_IMAGE=quay.io/prometheus/prometheus:v3.8.1
PROMETHEUS_IMAGE=ghcr.io/aknuds1/prometheus@sha256:5daac9ac954a23b1918d2dca10c0604355b3c2c5dbf0657e5a2358adea917e5c
VALKEY_IMAGE=valkey/valkey:9.0.1-alpine3.23
TRACETEST_IMAGE=kubeshop/tracetest:${TRACETEST_IMAGE_VERSION}

Expand Down
134 changes: 134 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# CLAUDE.md

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure you want to commit this file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @jmichalek132 - it's useful while working out the PR however - I would remove it before finalizing the PR.


This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is the **OpenTelemetry Astronomy Shop Demo** - a polyglot microservices e-commerce application showcasing OpenTelemetry instrumentation across multiple programming languages. It serves as a realistic example for demonstrating distributed tracing, metrics, and logging.

## Common Commands

### Running the Demo
```bash
make start # Start all services (http://localhost:8080)
make start-minimal # Start minimal set of services
make stop # Stop all services
```

### Building
```bash
make build # Build all Docker images
make redeploy service=<name> # Rebuild and restart a single service
```

### Testing
```bash
make run-tests # Run all tests (frontend + trace-based)
make run-tracetesting # Run trace-based tests only
make run-tracetesting SERVICES_TO_TEST="ad payment" # Test specific services
```

### Linting & Validation
```bash
make check # Run all checks (misspell, markdownlint, license, links)
make misspell # Check spelling in markdown files
make markdownlint # Lint markdown files
make checklicense # Check license headers
```

### Protobuf Generation
```bash
make generate-protobuf # Generate protobuf code (requires local tools)
make docker-generate-protobuf # Generate protobuf code via Docker
make clean # Remove generated protobuf files
```

## Architecture

### Service Communication
- **gRPC**: Primary protocol for inter-service communication (defined in `pb/demo.proto`)
- **HTTP/REST**: Used by frontend, email service, and external-facing endpoints
- **Kafka**: Async messaging for checkout -> accounting/fraud-detection flow
- **Envoy**: Frontend proxy handling routing to all services

### Microservices by Language

| Language | Services |
|----------|----------|
| **Go** | checkout, product-catalog, shipping |
| **Java** | ad, fraud-detection (with OTel Java agent) |
| **.NET/C#** | accounting, cart |
| **Python** | recommendation, product-reviews, load-generator |
| **TypeScript/Node.js** | frontend (Next.js), payment |
| **Ruby** | email |
| **PHP** | quote |
| **C++** | currency |
| **Rust** | shipping |
| **Elixir** | flagd-ui |

### Key Infrastructure Components
- **OpenTelemetry Collector**: Central telemetry pipeline (`src/otel-collector/`)
- **Jaeger**: Distributed tracing backend (http://localhost:8080/jaeger/ui)
- **Grafana**: Dashboards and visualization (http://localhost:8080/grafana)
- **Prometheus**: Metrics storage
- **Flagd**: Feature flags service (`src/flagd/demo.flagd.json`)
- **Kafka**: Event streaming for order processing
- **Valkey**: Cart session storage (Redis-compatible)
- **PostgreSQL**: Persistent storage for accounting

### Directory Structure
```
src/
├── <service>/ # Each microservice has its own directory
│ ├── Dockerfile # Build definition
│ └── README.md # Service-specific documentation
pb/
└── demo.proto # Shared protobuf definitions for gRPC services
test/
└── tracetesting/ # Trace-based test definitions
```

## Configuration

- **Environment variables**: Defined in `.env` (base) and `.env.override` (local customizations)
- **Docker Compose**: Main orchestration in `docker-compose.yml`
- **Feature flags**: Configured in `src/flagd/demo.flagd.json`

**Note:** Do not commit changes to `.env.override` - it is for local customizations only.

## Development Workflow

1. Make code changes to a service in `src/<service>/`
2. Rebuild and restart only that service: `make redeploy service=<name>`
3. View traces in Jaeger and logs via `docker logs <container_name>`
4. For protobuf changes, update `pb/demo.proto` then run `make docker-generate-protobuf`

## PromQL Conventions

### Prefer `info()` over Resource Attribute Promotion

When writing PromQL queries that need to filter or group by OpenTelemetry resource attributes (e.g., `service_name`, `deployment_environment_name`, `k8s_cluster_name`), prefer using the experimental `info()` function over resource attribute promotion in the collector.

**Pattern:**
```promql
# Preferred: Use info() with data-label-selector
sum by (service_name) (
info(rate(http_server_request_duration_seconds_count[$__rate_interval]),
{deployment_environment_name=~"$env", service_name="$service"})
)

# Avoid: Resource attributes promoted directly onto metrics
sum by (service_name) (
rate(http_server_request_duration_seconds_count{
deployment_environment_name=~"$env",
service_name="$service"
}[$__rate_interval])
)
```

**Why:**
- Reduces metric cardinality in Prometheus
- Resource attributes are stored once in `target_info` rather than on every metric
- The `info()` function joins metrics with `target_info` at query time

**Note:** Requires Prometheus with `--enable-feature=promql-experimental-functions`.
7 changes: 4 additions & 3 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ services:
- GOMEMLIMIT=16MiB
- OTEL_EXPORTER_OTLP_ENDPOINT
- OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE
- OTEL_RESOURCE_ATTRIBUTES
- OTEL_RESOURCE_ATTRIBUTES=${OTEL_RESOURCE_ATTRIBUTES},service.instance.id=checkout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

service.instance.id is expected to be generated by SDKs or derived from the K8s environment, moreover, it should be a GUID.
Specs: https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id
K8s naming specs: https://opentelemetry.io/docs/specs/semconv/non-normative/k8s-attributes/

- OTEL_SERVICE_NAME=checkout
depends_on:
cart:
Expand Down Expand Up @@ -500,7 +500,7 @@ services:
- GOMEMLIMIT=16MiB
- OTEL_EXPORTER_OTLP_ENDPOINT
- OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE
- OTEL_RESOURCE_ATTRIBUTES
- OTEL_RESOURCE_ATTRIBUTES=${OTEL_RESOURCE_ATTRIBUTES},service.instance.id=product-catalog
- OTEL_SERVICE_NAME=product-catalog
- OTEL_SEMCONV_STABILITY_OPT_IN=database
- DB_CONNECTION_STRING=postgres://otelu:otelp@${POSTGRES_HOST}/${POSTGRES_DB}?sslmode=disable
Expand Down Expand Up @@ -669,7 +669,7 @@ services:
- FLAGD_OTEL_COLLECTOR_URI=${OTEL_COLLECTOR_HOST}:${OTEL_COLLECTOR_PORT_GRPC}
- FLAGD_METRICS_EXPORTER=otel
- GOMEMLIMIT=60MiB
- OTEL_RESOURCE_ATTRIBUTES
- OTEL_RESOURCE_ATTRIBUTES=${OTEL_RESOURCE_ATTRIBUTES},service.instance.id=flagd
- OTEL_SERVICE_NAME=flagd
command: [
"start",
Expand Down Expand Up @@ -907,6 +907,7 @@ services:
- --web.route-prefix=/
- --web.enable-otlp-receiver
- --enable-feature=exemplar-storage
- --enable-feature=promql-experimental-functions
volumes:
- ./src/prometheus/prometheus-config.yaml:/etc/prometheus/prometheus-config.yaml
deploy:
Expand Down
72 changes: 72 additions & 0 deletions kubernetes/deploy-info-function.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#!/bin/bash
# Deploy OpenTelemetry Demo with experimental Prometheus info() function support
#
# This script:
# 1. Installs/upgrades the Helm chart with custom values
# 2. Deploys custom Grafana dashboards that use the info() function

set -e

NAMESPACE="${NAMESPACE:-otel-demo}"
RELEASE_NAME="${RELEASE_NAME:-opentelemetry-demo}"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(dirname "$SCRIPT_DIR")"

echo "=== Deploying OpenTelemetry Demo with info() function support ==="
echo "Namespace: $NAMESPACE"
echo "Release: $RELEASE_NAME"
echo ""

# Add Helm repo if not already added
echo "Adding Helm repository..."
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts 2>/dev/null || true
helm repo update

# Create namespace if it doesn't exist
kubectl create namespace "$NAMESPACE" --dry-run=client -o yaml | kubectl apply -f -

# Install/upgrade the Helm chart
echo ""
echo "Installing/upgrading Helm chart..."
helm upgrade --install "$RELEASE_NAME" open-telemetry/opentelemetry-demo \
--namespace "$NAMESPACE" \
-f "$SCRIPT_DIR/values-info-function.yaml" \
--wait

# Deploy custom dashboards as ConfigMaps
echo ""
echo "Deploying custom Grafana dashboards..."

# APM Dashboard
echo " - APM Dashboard"
kubectl create configmap apm-dashboard \
--from-file=apm-dashboard.json="$REPO_ROOT/src/grafana/provisioning/dashboards/demo/apm-dashboard.json" \
--namespace "$NAMESPACE" \
--dry-run=client -o yaml | kubectl apply -f -
kubectl label configmap apm-dashboard grafana_dashboard=1 --namespace "$NAMESPACE" --overwrite

# PostgreSQL Dashboard
echo " - PostgreSQL Dashboard"
kubectl create configmap postgresql-dashboard \
--from-file=postgresql-dashboard.json="$REPO_ROOT/src/grafana/provisioning/dashboards/demo/postgresql-dashboard.json" \
--namespace "$NAMESPACE" \
--dry-run=client -o yaml | kubectl apply -f -
kubectl label configmap postgresql-dashboard grafana_dashboard=1 --namespace "$NAMESPACE" --overwrite

# Restart Grafana to pick up the new dashboards
echo ""
echo "Restarting Grafana to load dashboards..."
kubectl rollout restart deployment/grafana --namespace "$NAMESPACE" 2>/dev/null || \
kubectl rollout restart deployment/"$RELEASE_NAME"-grafana --namespace "$NAMESPACE" 2>/dev/null || \
echo " (Could not restart Grafana - dashboards will load on next restart)"

echo ""
echo "=== Deployment complete ==="
echo ""
echo "Access the demo:"
echo " kubectl port-forward svc/frontend-proxy 8080:8080 -n $NAMESPACE"
echo " Open http://localhost:8080"
echo ""
echo "Access Grafana:"
echo " kubectl port-forward svc/grafana 3000:80 -n $NAMESPACE"
echo " Open http://localhost:3000 (admin/admin)"
113 changes: 113 additions & 0 deletions kubernetes/deploy-kind.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
#!/bin/bash
# Deploy OpenTelemetry Demo to a local Kind cluster
#
# This script:
# 1. Creates a Kind cluster (if it doesn't exist)
# 2. Installs the Helm chart with info() function support
# 3. Deploys custom Grafana dashboards
#
# Prerequisites:
# - kind: https://kind.sigs.k8s.io/docs/user/quick-start/#installation
# - kubectl
# - helm

set -e

CLUSTER_NAME="${CLUSTER_NAME:-otel-demo}"
NAMESPACE="${NAMESPACE:-otel-demo}"
RELEASE_NAME="${RELEASE_NAME:-opentelemetry-demo}"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(dirname "$SCRIPT_DIR")"

echo "=== OpenTelemetry Demo on Kind ==="
echo "Cluster: $CLUSTER_NAME"
echo "Namespace: $NAMESPACE"
echo ""

# Check prerequisites
command -v kind >/dev/null 2>&1 || { echo "Error: kind is not installed. See https://kind.sigs.k8s.io/docs/user/quick-start/#installation"; exit 1; }
command -v kubectl >/dev/null 2>&1 || { echo "Error: kubectl is not installed."; exit 1; }
command -v helm >/dev/null 2>&1 || { echo "Error: helm is not installed."; exit 1; }

# Create Kind cluster if it doesn't exist
if ! kind get clusters 2>/dev/null | grep -q "^${CLUSTER_NAME}$"; then
echo "Creating Kind cluster '$CLUSTER_NAME'..."
kind create cluster --config "$SCRIPT_DIR/kind-config.yaml" --name "$CLUSTER_NAME"
echo ""
else
echo "Kind cluster '$CLUSTER_NAME' already exists."
# Ensure kubectl context is set to the Kind cluster
kubectl config use-context "kind-${CLUSTER_NAME}"
echo ""
fi

# Add Helm repo
echo "Adding Helm repository..."
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts 2>/dev/null || true
helm repo update

# Create namespace
kubectl create namespace "$NAMESPACE" --dry-run=client -o yaml | kubectl apply -f -

# Install/upgrade the Helm chart
echo ""
echo "Installing OpenTelemetry Demo (this may take a few minutes)..."
helm upgrade --install "$RELEASE_NAME" open-telemetry/opentelemetry-demo \
--namespace "$NAMESPACE" \
-f "$SCRIPT_DIR/values-info-function.yaml" \
-f "$SCRIPT_DIR/values-kind.yaml" \
--timeout 10m \
--wait

# Deploy custom dashboards
echo ""
echo "Deploying custom Grafana dashboards..."

# Delete conflicting dashboards from Helm chart that don't use info() function.
# The Helm chart bundles dashboards that query metrics directly with resource
# attributes as labels. Our custom dashboards use the info() function instead.
echo " - Removing default Helm chart dashboards..."
kubectl delete configmap grafana-dashboard-apm-dashboard --namespace "$NAMESPACE" 2>/dev/null || true
kubectl delete configmap grafana-dashboard-postgresql-dashboard --namespace "$NAMESPACE" 2>/dev/null || true

echo " - APM Dashboard"
kubectl create configmap apm-dashboard \
--from-file=apm-dashboard.json="$REPO_ROOT/src/grafana/provisioning/dashboards/demo/apm-dashboard.json" \
--namespace "$NAMESPACE" \
--dry-run=client -o yaml | kubectl apply -f -
kubectl label configmap apm-dashboard grafana_dashboard=1 --namespace "$NAMESPACE" --overwrite

echo " - PostgreSQL Dashboard"
kubectl create configmap postgresql-dashboard \
--from-file=postgresql-dashboard.json="$REPO_ROOT/src/grafana/provisioning/dashboards/demo/postgresql-dashboard.json" \
--namespace "$NAMESPACE" \
--dry-run=client -o yaml | kubectl apply -f -
kubectl label configmap postgresql-dashboard grafana_dashboard=1 --namespace "$NAMESPACE" --overwrite

# Restart Grafana to pick up dashboards
echo ""
echo "Restarting Grafana to load dashboards..."
kubectl rollout restart deployment/grafana --namespace "$NAMESPACE" 2>/dev/null || true

# Wait for pods
echo ""
echo "Waiting for pods to be ready..."
kubectl wait --for=condition=ready pod -l app.kubernetes.io/instance="$RELEASE_NAME" \
--namespace "$NAMESPACE" --timeout=5m 2>/dev/null || true

echo ""
echo "=== Deployment complete ==="
echo ""
echo "Access the demo:"
echo " Frontend: http://localhost:8080 (via Kind NodePort)"
echo ""
echo "For Grafana, Prometheus, Jaeger use port-forward:"
echo " kubectl port-forward svc/grafana 3000:80 -n $NAMESPACE"
echo " kubectl port-forward svc/prometheus 9090:9090 -n $NAMESPACE"
echo " kubectl port-forward svc/jaeger 16686:16686 -n $NAMESPACE"
echo ""
echo "View pods:"
echo " kubectl get pods -n $NAMESPACE"
echo ""
echo "Delete cluster when done:"
echo " kind delete cluster --name $CLUSTER_NAME"
15 changes: 15 additions & 0 deletions kubernetes/kind-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Kind cluster configuration for OpenTelemetry Demo
# Creates a cluster with port mapping for the frontend proxy
#
# Usage:
# kind create cluster --config kubernetes/kind-config.yaml --name otel-demo
#
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraPortMappings:
# Frontend proxy (main entry point) - exposed via NodePort
- containerPort: 30080
hostPort: 8080
protocol: TCP
Loading