panic: when using prometheus.write.queue #2074

tristanmorgan · 2024-11-13T05:46:47Z

What's wrong?

A panic "panic: duplicate metrics collector registration attempted" occurs during startup when trying to replace prometheus.remote_write with a comparable prometheus.write.queue. the panic seems to stem from internal/component/prometheus/write/queue/types/stats.go line 171.

Steps to reproduce

Panic occurs during startup:
alloy run --server.http.listen-addr=0.0.0.0:20604 --feature.community-components.enabled --stability.level=experimental .config.alloy

System information

Linux 6.6.40-v8+ aarch64

Software version

Grafana alloy v1.5.0-rc.1

Configuration

// pull metrics from tagged services
discovery.consul "discovery" {
	server = "10.0.0.23:8500"
	tags   = ["prom-metrics"]
}

discovery.relabel "discovery" {
	targets = discovery.consul.discovery.targets

	rule {
		action        = "replace"
		source_labels = ["__meta_consul_service"]
		target_label  = "job"
	}
}

prometheus.scrape "services" {
	clustering {
		enabled = true
	}
	targets    = discovery.relabel.discovery.output
	forward_to = [prometheus.relabel.slim_metrics.receiver]
	params     = {
		format = ["prometheus"],
	}
}

prometheus.relabel "slim_metrics" {
	forward_to = [prometheus.write.queue.thanos.receiver]

	rule {
		action        = "drop"
		source_labels = ["__name__"]
		regex         = ".*_bucket"
	}

	rule {
		action        = "drop"
		source_labels = ["__name__"]
		regex         = ".*_requests_ttfb_seconds_distribution"
	}

	rule {
		action        = "drop"
		source_labels = ["__name__"]
		regex         = "grpc_server_handled_total"
	}
}

discovery.consul "remote_write" {
	server   = "10.0.0.23:8500"
	services = [
		"remote-write",
	]
}

prometheus.write.queue "thanos" {
	endpoint "remote_write" {
		url = format(
			"http://%s/api/v1/receive",
			concat(discovery.consul.remote_write.targets, [{"__address__" = "10.10.10.123:9009"}])[0]["__address__"],
		)
	}
}

Logs

panic: duplicate metrics collector registration attempted

goroutine 183 [running]:
github.com/prometheus/client_golang/prometheus.(*wrappingRegisterer).MustRegister(0x4003b62300, {0x400307ce70?, 0x0?, 0x0?})
	/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/wrap.go:104 +0x14c
github.com/grafana/alloy/internal/component/prometheus/write/queue/types.NewStats({0x9d196b0, 0x5}, {0x9d47e6e, 0xc}, {0xb4bf048, 0x4003b62300})
	/src/alloy/internal/component/prometheus/write/queue/types/stats.go:171 +0x11a8
github.com/grafana/alloy/internal/component/prometheus/write/queue.(*Queue).createEndpoints(0x40037669a0)
	/src/alloy/internal/component/prometheus/write/queue/component.go:123 +0x168
github.com/grafana/alloy/internal/component/prometheus/write/queue.(*Queue).Update(0x40037669a0, {0x90c6d20, 0x4003b62210})
	/src/alloy/internal/component/prometheus/write/queue/component.go:109 +0x1e4
github.com/grafana/alloy/internal/runtime/internal/controller.(*BuiltinComponentNode).evaluate(0x4003904b48, 0x4003afb1a0)
	/src/alloy/internal/runtime/internal/controller/node_builtin_component.go:293 +0x238
github.com/grafana/alloy/internal/runtime/internal/controller.(*BuiltinComponentNode).Evaluate(0x4003904b48, 0x9a38260?)
	/src/alloy/internal/runtime/internal/controller/node_builtin_component.go:248 +0x20
github.com/grafana/alloy/internal/runtime/internal/controller.(*Loader).concurrentEvalFn(0x400303d1e0, {0x7f48162640, 0x4003904b48}, {0xb4e8d68, 0x4003b13dd0}, {0xb45e5e8, 0x4003b13d40}, 0x4003b13ce0)
	/src/alloy/internal/runtime/internal/controller/loader.go:787 +0x520
github.com/grafana/alloy/internal/runtime/internal/controller.(*Loader).EvaluateDependants.func2()
	/src/alloy/internal/runtime/internal/controller/loader.go:736 +0x3c
github.com/grafana/alloy/internal/runtime/internal/worker.(*workQueue).emitNextTask.func1()
	/src/alloy/internal/runtime/internal/worker/worker_pool.go:181 +0x6c
github.com/grafana/alloy/internal/runtime/internal/worker.(*fixedWorkerPool).start.func1()
	/src/alloy/internal/runtime/internal/worker/worker_pool.go:87 +0x68
created by github.com/grafana/alloy/internal/runtime/internal/worker.(*fixedWorkerPool).start in goroutine 1
	/src/alloy/internal/runtime/internal/worker/worker_pool.go:80 +0x2c

The text was updated successfully, but these errors were encountered:

wildum · 2024-11-15T11:26:12Z

@mattdurham fyi

mattdurham · 2024-11-15T14:26:31Z

This is fixed in #1994. Let me know if you can try out the dev image when it builds. Probably be in an hour or so since I just merged the changes.

tristanmorgan · 2024-11-24T07:36:36Z

I’ve been running the grafana/alloy-dev docker container for a few days and it’s working great.

tristanmorgan · 2024-12-03T22:28:11Z

Sorry @mattdurham, this issue seems to have popped up in the v1.5.1 release but it doesn't occur with the grafana/alloy-dev:latest build.

$ docker run --rm -it -v ${PWD}/test.alloy:/config/test.alloy grafana/alloy:v1.5.1 run --feature.community-components.enabled --stability.level=experimental /config/test.alloy
ts=2024-12-03T22:23:35.906788303Z level=info "boringcrypto enabled"=false
ts=2024-12-03T22:23:35.903567011Z level=info source=/go/pkg/mod/github.com/!kim!machine!gun/[email protected]/memlimit/memlimit.go:170 msg="memory is not limited, skipping: %v" package=github.com/KimMachineGun/automemlimit/memlimit !BADKEY="memory is not limited"
ts=2024-12-03T22:23:35.906890095Z level=info msg="no peer discovery configured: both join and discover peers are empty" service=cluster
ts=2024-12-03T22:23:35.90689347Z level=info msg="running usage stats reporter"
ts=2024-12-03T22:23:35.90689547Z level=warn msg="this stdlib function is deprecated; please refer to the documentation for updated usage and alternatives" controller_path=/ controller_id="" function=format
ts=2024-12-03T22:23:35.906898511Z level=warn msg="this stdlib function is deprecated; please refer to the documentation for updated usage and alternatives" controller_path=/ controller_id="" function=concat
ts=2024-12-03T22:23:35.906900928Z level=info msg="starting complete graph evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885
ts=2024-12-03T22:23:35.906909428Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=remotecfg duration=26.208µs
ts=2024-12-03T22:23:35.906920928Z level=info msg="applying non-TLS config to HTTP server" service=http
ts=2024-12-03T22:23:35.906922928Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=http duration=3.375µs
ts=2024-12-03T22:23:35.906926511Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=cluster duration=292ns
ts=2024-12-03T22:23:35.906929303Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=otel duration=208ns
ts=2024-12-03T22:23:35.906932011Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=discovery.consul.remote_write duration=200.5µs
ts=2024-12-03T22:23:35.90693472Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=prometheus.write.queue.thanos duration=606.417µs
ts=2024-12-03T22:23:35.906938053Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=prometheus.relabel.slim_metrics duration=795.459µs
ts=2024-12-03T22:23:35.906941178Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=livedebugging duration=8.042µs
ts=2024-12-03T22:23:35.906943678Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=ui duration=1.875µs
ts=2024-12-03T22:23:35.906946136Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=discovery.consul.discovery duration=35.541µs
ts=2024-12-03T22:23:35.90694872Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=discovery.relabel.discovery duration=33.208µs
ts=2024-12-03T22:23:35.906951303Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=prometheus.scrape.services duration=382.209µs
ts=2024-12-03T22:23:35.90695797Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=tracing duration=6.5µs
ts=2024-12-03T22:23:35.906961386Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=logging duration=177.667µs
ts=2024-12-03T22:23:35.906971261Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=labelstore duration=4.875µs
ts=2024-12-03T22:23:35.906982511Z level=info msg="finished complete graph evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 duration=2.410125ms
ts=2024-12-03T22:23:35.907124053Z level=info msg="scheduling loaded components and services"
ts=2024-12-03T22:23:35.90745922Z level=info msg="starting cluster node" service=cluster peers_count=0 peers="" advertise_addr=127.0.0.1:12345
ts=2024-12-03T22:23:35.908073678Z level=info msg="peers changed" service=cluster peers_count=1 peers=d013c0fe661e
ts=2024-12-03T22:23:35.908224928Z level=info msg="now listening for http traffic" service=http addr=127.0.0.1:12345
panic: duplicate metrics collector registration attempted

goroutine 178 [running]:
github.com/prometheus/client_golang/prometheus.(*wrappingRegisterer).MustRegister(0x4004086c90, {0x40040be0b0?, 0x0?, 0x0?})
	/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/wrap.go:104 +0x14c
github.com/grafana/alloy/internal/component/prometheus/write/queue/types.NewStats({0x9d1dc50, 0x5}, {0x9d4c430, 0xc}, {0xb4c5728, 0x4004086c90})
	/src/alloy/internal/component/prometheus/write/queue/types/stats.go:171 +0x11a8
github.com/grafana/alloy/internal/component/prometheus/write/queue.(*Queue).createEndpoints(0x40036e6c40)
	/src/alloy/internal/component/prometheus/write/queue/component.go:123 +0x168
github.com/grafana/alloy/internal/component/prometheus/write/queue.(*Queue).Update(0x40036e6c40, {0x90cae80, 0x4004086ba0})
	/src/alloy/internal/component/prometheus/write/queue/component.go:109 +0x1e4
github.com/grafana/alloy/internal/runtime/internal/controller.(*BuiltinComponentNode).evaluate(0x400382efc8, 0x40040901a0)
	/src/alloy/internal/runtime/internal/controller/node_builtin_component.go:293 +0x238
github.com/grafana/alloy/internal/runtime/internal/controller.(*BuiltinComponentNode).Evaluate(0x400382efc8, 0x9a3c540?)
	/src/alloy/internal/runtime/internal/controller/node_builtin_component.go:248 +0x20
github.com/grafana/alloy/internal/runtime/internal/controller.(*Loader).concurrentEvalFn(0x40035b1a00, {0xffff63bd3988, 0x400382efc8}, {0xb4ef448, 0x40040867e0}, {0xb464cc8, 0x4004086750}, 0x4003b83170)
	/src/alloy/internal/runtime/internal/controller/loader.go:801 +0x520
github.com/grafana/alloy/internal/runtime/internal/controller.(*Loader).EvaluateDependants.func2()
	/src/alloy/internal/runtime/internal/controller/loader.go:738 +0x3c
github.com/grafana/alloy/internal/runtime/internal/worker.(*workQueue).emitNextTask.func1()
	/src/alloy/internal/runtime/internal/worker/worker_pool.go:181 +0x6c
github.com/grafana/alloy/internal/runtime/internal/worker.(*fixedWorkerPool).start.func1()
	/src/alloy/internal/runtime/internal/worker/worker_pool.go:87 +0x68
created by github.com/grafana/alloy/internal/runtime/internal/worker.(*fixedWorkerPool).start in goroutine 1
	/src/alloy/internal/runtime/internal/worker/worker_pool.go:80 +0x2c

mattdurham · 2024-12-04T13:45:54Z

It was also tied to a go update which was a bit big for a patch release. Will go out in 1.6.

tristanmorgan added the bug Something isn't working label Nov 13, 2024

wildum assigned mattdurham Nov 15, 2024

mattdurham mentioned this issue Nov 15, 2024

Move wal queue to its own repository and minor bug fixes. #1994

Merged

mattdurham closed this as completed Nov 15, 2024

github-actions bot added the frozen-due-to-age label Jan 4, 2025

github-actions bot locked as resolved and limited conversation to collaborators Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

panic: when using prometheus.write.queue #2074

panic: when using prometheus.write.queue #2074

tristanmorgan commented Nov 13, 2024

wildum commented Nov 15, 2024

mattdurham commented Nov 15, 2024

tristanmorgan commented Nov 24, 2024

tristanmorgan commented Dec 3, 2024

mattdurham commented Dec 4, 2024

panic: when using prometheus.write.queue #2074

panic: when using prometheus.write.queue #2074

Comments

tristanmorgan commented Nov 13, 2024

What's wrong?

Steps to reproduce

System information

Software version

Configuration

Logs

wildum commented Nov 15, 2024

mattdurham commented Nov 15, 2024

tristanmorgan commented Nov 24, 2024

tristanmorgan commented Dec 3, 2024

mattdurham commented Dec 4, 2024