Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: when using prometheus.write.queue #2074

Closed
tristanmorgan opened this issue Nov 13, 2024 · 5 comments
Closed

panic: when using prometheus.write.queue #2074

tristanmorgan opened this issue Nov 13, 2024 · 5 comments
Assignees
Labels
bug Something isn't working frozen-due-to-age

Comments

@tristanmorgan
Copy link

What's wrong?

A panic "panic: duplicate metrics collector registration attempted" occurs during startup when trying to replace prometheus.remote_write with a comparable prometheus.write.queue. the panic seems to stem from internal/component/prometheus/write/queue/types/stats.go line 171.

Steps to reproduce

Panic occurs during startup:
alloy run --server.http.listen-addr=0.0.0.0:20604 --feature.community-components.enabled --stability.level=experimental .config.alloy

System information

Linux 6.6.40-v8+ aarch64

Software version

Grafana alloy v1.5.0-rc.1

Configuration

// pull metrics from tagged services
discovery.consul "discovery" {
	server = "10.0.0.23:8500"
	tags   = ["prom-metrics"]
}

discovery.relabel "discovery" {
	targets = discovery.consul.discovery.targets

	rule {
		action        = "replace"
		source_labels = ["__meta_consul_service"]
		target_label  = "job"
	}
}

prometheus.scrape "services" {
	clustering {
		enabled = true
	}
	targets    = discovery.relabel.discovery.output
	forward_to = [prometheus.relabel.slim_metrics.receiver]
	params     = {
		format = ["prometheus"],
	}
}

prometheus.relabel "slim_metrics" {
	forward_to = [prometheus.write.queue.thanos.receiver]

	rule {
		action        = "drop"
		source_labels = ["__name__"]
		regex         = ".*_bucket"
	}

	rule {
		action        = "drop"
		source_labels = ["__name__"]
		regex         = ".*_requests_ttfb_seconds_distribution"
	}

	rule {
		action        = "drop"
		source_labels = ["__name__"]
		regex         = "grpc_server_handled_total"
	}
}

discovery.consul "remote_write" {
	server   = "10.0.0.23:8500"
	services = [
		"remote-write",
	]
}

prometheus.write.queue "thanos" {
	endpoint "remote_write" {
		url = format(
			"http://%s/api/v1/receive",
			concat(discovery.consul.remote_write.targets, [{"__address__" = "10.10.10.123:9009"}])[0]["__address__"],
		)
	}
}

Logs

panic: duplicate metrics collector registration attempted

goroutine 183 [running]:
github.com/prometheus/client_golang/prometheus.(*wrappingRegisterer).MustRegister(0x4003b62300, {0x400307ce70?, 0x0?, 0x0?})
	/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/wrap.go:104 +0x14c
github.com/grafana/alloy/internal/component/prometheus/write/queue/types.NewStats({0x9d196b0, 0x5}, {0x9d47e6e, 0xc}, {0xb4bf048, 0x4003b62300})
	/src/alloy/internal/component/prometheus/write/queue/types/stats.go:171 +0x11a8
github.com/grafana/alloy/internal/component/prometheus/write/queue.(*Queue).createEndpoints(0x40037669a0)
	/src/alloy/internal/component/prometheus/write/queue/component.go:123 +0x168
github.com/grafana/alloy/internal/component/prometheus/write/queue.(*Queue).Update(0x40037669a0, {0x90c6d20, 0x4003b62210})
	/src/alloy/internal/component/prometheus/write/queue/component.go:109 +0x1e4
github.com/grafana/alloy/internal/runtime/internal/controller.(*BuiltinComponentNode).evaluate(0x4003904b48, 0x4003afb1a0)
	/src/alloy/internal/runtime/internal/controller/node_builtin_component.go:293 +0x238
github.com/grafana/alloy/internal/runtime/internal/controller.(*BuiltinComponentNode).Evaluate(0x4003904b48, 0x9a38260?)
	/src/alloy/internal/runtime/internal/controller/node_builtin_component.go:248 +0x20
github.com/grafana/alloy/internal/runtime/internal/controller.(*Loader).concurrentEvalFn(0x400303d1e0, {0x7f48162640, 0x4003904b48}, {0xb4e8d68, 0x4003b13dd0}, {0xb45e5e8, 0x4003b13d40}, 0x4003b13ce0)
	/src/alloy/internal/runtime/internal/controller/loader.go:787 +0x520
github.com/grafana/alloy/internal/runtime/internal/controller.(*Loader).EvaluateDependants.func2()
	/src/alloy/internal/runtime/internal/controller/loader.go:736 +0x3c
github.com/grafana/alloy/internal/runtime/internal/worker.(*workQueue).emitNextTask.func1()
	/src/alloy/internal/runtime/internal/worker/worker_pool.go:181 +0x6c
github.com/grafana/alloy/internal/runtime/internal/worker.(*fixedWorkerPool).start.func1()
	/src/alloy/internal/runtime/internal/worker/worker_pool.go:87 +0x68
created by github.com/grafana/alloy/internal/runtime/internal/worker.(*fixedWorkerPool).start in goroutine 1
	/src/alloy/internal/runtime/internal/worker/worker_pool.go:80 +0x2c
@tristanmorgan tristanmorgan added the bug Something isn't working label Nov 13, 2024
@wildum
Copy link
Contributor

wildum commented Nov 15, 2024

@mattdurham fyi

@mattdurham
Copy link
Collaborator

This is fixed in #1994. Let me know if you can try out the dev image when it builds. Probably be in an hour or so since I just merged the changes.

@tristanmorgan
Copy link
Author

I’ve been running the grafana/alloy-dev docker container for a few days and it’s working great.

@tristanmorgan
Copy link
Author

Sorry @mattdurham, this issue seems to have popped up in the v1.5.1 release but it doesn't occur with the grafana/alloy-dev:latest build.

$ docker run --rm -it -v ${PWD}/test.alloy:/config/test.alloy grafana/alloy:v1.5.1 run --feature.community-components.enabled --stability.level=experimental /config/test.alloy
ts=2024-12-03T22:23:35.906788303Z level=info "boringcrypto enabled"=false
ts=2024-12-03T22:23:35.903567011Z level=info source=/go/pkg/mod/github.com/!kim!machine!gun/[email protected]/memlimit/memlimit.go:170 msg="memory is not limited, skipping: %v" package=github.com/KimMachineGun/automemlimit/memlimit !BADKEY="memory is not limited"
ts=2024-12-03T22:23:35.906890095Z level=info msg="no peer discovery configured: both join and discover peers are empty" service=cluster
ts=2024-12-03T22:23:35.90689347Z level=info msg="running usage stats reporter"
ts=2024-12-03T22:23:35.90689547Z level=warn msg="this stdlib function is deprecated; please refer to the documentation for updated usage and alternatives" controller_path=/ controller_id="" function=format
ts=2024-12-03T22:23:35.906898511Z level=warn msg="this stdlib function is deprecated; please refer to the documentation for updated usage and alternatives" controller_path=/ controller_id="" function=concat
ts=2024-12-03T22:23:35.906900928Z level=info msg="starting complete graph evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885
ts=2024-12-03T22:23:35.906909428Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=remotecfg duration=26.208µs
ts=2024-12-03T22:23:35.906920928Z level=info msg="applying non-TLS config to HTTP server" service=http
ts=2024-12-03T22:23:35.906922928Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=http duration=3.375µs
ts=2024-12-03T22:23:35.906926511Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=cluster duration=292ns
ts=2024-12-03T22:23:35.906929303Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=otel duration=208ns
ts=2024-12-03T22:23:35.906932011Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=discovery.consul.remote_write duration=200.5µs
ts=2024-12-03T22:23:35.90693472Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=prometheus.write.queue.thanos duration=606.417µs
ts=2024-12-03T22:23:35.906938053Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=prometheus.relabel.slim_metrics duration=795.459µs
ts=2024-12-03T22:23:35.906941178Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=livedebugging duration=8.042µs
ts=2024-12-03T22:23:35.906943678Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=ui duration=1.875µs
ts=2024-12-03T22:23:35.906946136Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=discovery.consul.discovery duration=35.541µs
ts=2024-12-03T22:23:35.90694872Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=discovery.relabel.discovery duration=33.208µs
ts=2024-12-03T22:23:35.906951303Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=prometheus.scrape.services duration=382.209µs
ts=2024-12-03T22:23:35.90695797Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=tracing duration=6.5µs
ts=2024-12-03T22:23:35.906961386Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=logging duration=177.667µs
ts=2024-12-03T22:23:35.906971261Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 node_id=labelstore duration=4.875µs
ts=2024-12-03T22:23:35.906982511Z level=info msg="finished complete graph evaluation" controller_path=/ controller_id="" trace_id=77b791085151fe9d032e8a4d6a950885 duration=2.410125ms
ts=2024-12-03T22:23:35.907124053Z level=info msg="scheduling loaded components and services"
ts=2024-12-03T22:23:35.90745922Z level=info msg="starting cluster node" service=cluster peers_count=0 peers="" advertise_addr=127.0.0.1:12345
ts=2024-12-03T22:23:35.908073678Z level=info msg="peers changed" service=cluster peers_count=1 peers=d013c0fe661e
ts=2024-12-03T22:23:35.908224928Z level=info msg="now listening for http traffic" service=http addr=127.0.0.1:12345
panic: duplicate metrics collector registration attempted

goroutine 178 [running]:
github.com/prometheus/client_golang/prometheus.(*wrappingRegisterer).MustRegister(0x4004086c90, {0x40040be0b0?, 0x0?, 0x0?})
	/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/wrap.go:104 +0x14c
github.com/grafana/alloy/internal/component/prometheus/write/queue/types.NewStats({0x9d1dc50, 0x5}, {0x9d4c430, 0xc}, {0xb4c5728, 0x4004086c90})
	/src/alloy/internal/component/prometheus/write/queue/types/stats.go:171 +0x11a8
github.com/grafana/alloy/internal/component/prometheus/write/queue.(*Queue).createEndpoints(0x40036e6c40)
	/src/alloy/internal/component/prometheus/write/queue/component.go:123 +0x168
github.com/grafana/alloy/internal/component/prometheus/write/queue.(*Queue).Update(0x40036e6c40, {0x90cae80, 0x4004086ba0})
	/src/alloy/internal/component/prometheus/write/queue/component.go:109 +0x1e4
github.com/grafana/alloy/internal/runtime/internal/controller.(*BuiltinComponentNode).evaluate(0x400382efc8, 0x40040901a0)
	/src/alloy/internal/runtime/internal/controller/node_builtin_component.go:293 +0x238
github.com/grafana/alloy/internal/runtime/internal/controller.(*BuiltinComponentNode).Evaluate(0x400382efc8, 0x9a3c540?)
	/src/alloy/internal/runtime/internal/controller/node_builtin_component.go:248 +0x20
github.com/grafana/alloy/internal/runtime/internal/controller.(*Loader).concurrentEvalFn(0x40035b1a00, {0xffff63bd3988, 0x400382efc8}, {0xb4ef448, 0x40040867e0}, {0xb464cc8, 0x4004086750}, 0x4003b83170)
	/src/alloy/internal/runtime/internal/controller/loader.go:801 +0x520
github.com/grafana/alloy/internal/runtime/internal/controller.(*Loader).EvaluateDependants.func2()
	/src/alloy/internal/runtime/internal/controller/loader.go:738 +0x3c
github.com/grafana/alloy/internal/runtime/internal/worker.(*workQueue).emitNextTask.func1()
	/src/alloy/internal/runtime/internal/worker/worker_pool.go:181 +0x6c
github.com/grafana/alloy/internal/runtime/internal/worker.(*fixedWorkerPool).start.func1()
	/src/alloy/internal/runtime/internal/worker/worker_pool.go:87 +0x68
created by github.com/grafana/alloy/internal/runtime/internal/worker.(*fixedWorkerPool).start in goroutine 1
	/src/alloy/internal/runtime/internal/worker/worker_pool.go:80 +0x2c

@mattdurham
Copy link
Collaborator

It was also tied to a go update which was a bit big for a patch release. Will go out in 1.6.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working frozen-due-to-age
Projects
None yet
Development

No branches or pull requests

3 participants