- All notable changes to this project will be documented in this file.
- Records in this file are not identical to the title of their Pull Requests. A detailed description is necessary for understanding what changes are and why they are made.
- Provide a new metric called kindling_k8s_workload_info, which supports workload filtering for k8s, thus preventing frequent crashes of Grafana topology. Please refer to the doc for any limitations.(#530)
- Added support for displaying trace-profiling data by querying from Elasticsearch. (#528)
- Display scheduler run queue latency on Trace-Profiling chart. To learn more about the concept of 'Run Queue Latency', refer to this blog post. You can also find a use case for this feature in this blog post. (#494)
- Upgrade the Grafana version to 8.5.26 (#533)
- MySQL CommandLine Case: Ignore quit command and get sql with CLIENT_QUERY_ATTRIBUTES(#523)
⚠️ Breaking change: Refactor the data format of on/off CPU events from "string" to "array". Note that the old data format cannot be parsed using the new version of the front-end.(#512 #520)
- Fix the bug where the DNS domain is not obtained when DNS transport over TCP. (#524)
- Fix panic: send on closed channel. (#519)
- Fix the bug that the event detail panel doesn't hide when switching profiles.(#513)
- Fix span data deduplication issue.(#511)
In this release, we have a new contributor @hwz779866221. Thanks and welcome! 🥳
- Add an option
WithMemory
to OpenTelemetry's Prometheus exporter. It allows users to control whether metrics that haven't been updated in the most recent interval are reported. (#501) - Add a config to cgoreceiver for suppressing events according to processes' comm (#495)
- Add
bind
syscall support to get the listening ip and port of a server. (#493) - Add an option
enable_fetch_replicaset
to control whether to fetch ReplicaSet metadata. The default value is false which aims to release pressure on Kubernetes API server. (#492)
- Fix the memory leak issue by deleting vtid-tid map to avoid OOM. (#499)
- Fix unrunnable bug due to the error Insufficient parameters of TCP retransmit. (#499)
- Fix the bug that in
cpuanalyzer
, no segments are sent if they contain no cpuevents. Now segments are sent as long as they contain events, regardless of what the events are. (#502) - Fix the bug that the default configs of slice/map are not overridden. (#497)
Thanks for the significant help of @yanhongchang to provide OOM-killed information on #499.
- Support trace-profiling sampling to reduce data output. One trace is sampled every five seconds for each endpoint by default. (#446#462)
- Upgrade the golang version to v1.19 in the requirement. (#463)
- Improve Kindling Event log format. (#455)
- Fix security alerts(CVE-2022-41721, CVE-2022-27664) by upgrading package
golang.org/x/net
.(#463) - Fix the potential endless loop in the rocketmq parser. (#465)
- Fix retransmission count is not consistent with the real value on Linux 4.7 or higher. (#450)
- Reduce the cases pods are not found when they are daemonset. (#439 @llhhbc)
- Collector subscribes
sendmmsg
events to fix the bug that some DNS requests are missed. (#430) - Fix the bug that the agent panics when it receives DeletedFinalStateUnknown by watching K8s metadata. (#456)
In this release, we have a new contributor @llhhbc. Thanks and welcome! 🥳
- Add a new simplified chart to display the trace-profiling data. It mixes
span
with profiling and is more user-friendly. Try the demo now on the website.(#443) - Add trace to cpuevents to display the payload of network flows. (#442)
- Support Attach Agent for NoAPM Java Application. (#431)
- Add an option edge_events_window_size to allow users to reduce the size of the files by narrowing the time window where seats the edge events. (#437)
- Rename the camera profiling file to make the timestamp of the profiling files readable. (#434)
- When using the file writer in
cameraexporter
, we rotate files in chronological order now and rotate half of files one time. (#420) - Support to identify the MySQL protocol with statements
commit
andset
. (#417)
- Fix the bug that TCP metrics are not aggregated correctly. (#444)
- Fix the bug that cpuanalyzer missed some trigger events due to the incorrect variable reference. This may cause some traces can't correlate with on/off CPU data. (#424)
- Support to configure
snaplen
through startup args.(#387) - Add tracing span data in cpu events. (#384)
- Add a new tool: A debug tool for Trace Profiling is provided for developers to troubleshoot problems.(#363)
- Support the protocol RocketMQ.(#328)
- Add self-monitor tool: include kernel event log and gdb information of exit.(#398)
- Adjust max depth of stack trace to 20. (#399)
- Add the field
end_timestamp
to the trace data to make it easier for querying. (#380) - Add
request_tid
andresponse_tid
for trace labels.(#379) - Add no_response_threshold(120s) for No response requests. (#376)
- Add payload for all protocols.(#375)
- Add a new clustering method "blank" that is used to reduce the cardinality of metrics as much as possible. (#372)
- Modify the configuration file structure and add parameter fields for subscription events. (#368)
- Add the missing timestamp of TCP connect data and filter the incorrect one without srcPort.(#405)
- Fix the bug that multiple events cannot be correlated when they are in one ON-CPU data. (#395)
- Add the missed latency field for
cgoEvent
to fix the bug where therequest_sent_time
insingle_net_request_metric_group
is always 0. (#394) - Fix http-100 request is detected as NOSUPPORT(#393)
- Fix the wrong thread name in the trace profiling function. (#385)
- Remove "reset" method of ScheduledTaskRoutine to fix a potential dead-lock issue. (#369)
- Fix the bug where the pod metadata with persistent IP in the map is deleted incorrectly due to the deleting mechanism with a delay. (#374)
- Fix the bug that when the response is nil, the NAT IP and port are not added to the labels of the "DataGroup". (#378)
- Fix potential deadlock of exited thread delay queue. (#373)
- Fix the bug that cpuEvent cache size continuously increases even if trace profiling is not enabled.(#362)
- Fix the bug that duplicate CPU events are indexed into Elasticsearch. (#359)
- Implement the delay queue for exited thread, so as to avoid losing the data in the period before the thread exits. (#365)
- Fix the bug of incomplete records when threads arrive at the cpu analyzer for the first time. (#364)
- Add request and response payload of
Redis
protocol message toSpan
data. (#325)
- Fix the topology node naming error in the default namespace.(#346)
- Fix the bug that if
ReadBytes
receives negative numbers as arguments, the program panics with the error of slice outofbound. (#327)
- When processing Redis' Requests, add additional labels to describe the key information of the message. Check Metrics Document for more details. (#321)
- Fix the bug when the kernel does not support some kprobe, the probe crashes. (#320)
- Optimize the log output. (#299)
- Print logs when subscribing to events. Print a warning message if there is no event the agent subscribes to. (#290)
- Allow the collector run in the non-Kubernetes environment by setting the option
enable
false
under thek8smetadataprocessor
section. (#285) - Add a new environment variable: IS_PRINT_EVENT. When the value is true, sinsp events can be printed to the stdout. (#283)
- Declare the 9500 port in the agent's deployment file (#282)
- Avoid printing logs to console when both
observability.logger.file_level
andobservability.logger.console_level
are set to none(#316) - Fix the userAttributes array out of range error caused by userAttNumber exceeding 8
- Fix the bug where no HTTP headers were got. (#301)
- Fix the bug that need_trace_as_span options cannot take effect (#292)
- Fix connection failure rate data lost when change topology layout in the Grafana plugin. (#289)
- Fix the bug that the external topologys' metric name is named with
kindling_entity_request
prefix. Change the prefix of these metrics tokindling_topology_request
(#287) - Fix the bug where the table name of SQL is missed if there is no trailing character at the end of the table name. (#284)
- Add an option name
debug_selector
to filter debug_log from different components (#300) - Add a URL clustering method to reduce the cardinality of the entity metrics. Configuration options are provided to choose which method to use. (#268)
- Display connection failure metrics in the Grafana-plugin (#255)
- Add the metrics that describe how many times the TCP connections have been made (#234 #235 #236 #237)
- Add a histogram aggregator in defaultAggregator (#226)
- (Experimental) Support Protocol Dubbo2 (#184)
- Improve the go project layout (#273)
- Correct the configurations and disable the
dubbo
protocol parser by default since it is still experimental now. (#270) - Implement self-metrics using opentelemetry for cgoreceiver (#269)
- Use cgo to replace UDS for transferring data from the probe to the collector to improve the performance (#264)
- Add command labels in tcp connect metrics and span attributes (#260)
- Use the tcp_close events to generate the srtt metric (#256)
- Remove the histogram metrics by default to reduce the number of metrics (#253)
- k8sprocessor: use src IP for further searching if the dst IP is a loopback address (#251)
- docs:update developer links (#247)
- Add some self metrics for agent cpu and memory usage (#243)
- Export the trace of MySQL request when it contains an error (#241)
- Block in the application instead of the udsreceiver after running (#240)
- Decouple the logic of dispatching events from receivers (#232)
- Search for k8s metadata using
src_ip
when no containerid found (#233) - Record the containers with
hostport
mode and fill the pod information of them in k8sprocessor (#219) - Support building Grafana-plugin by using Actions (#218)
- Improve metrics description doc (#216)
- Update deployment files needed for releasing (#215)
- docs: fix language issues in documents (#258)
- Fix the bug where the pod information is missed after it is restarted (#245)
- Grafana-plugin: delete yarn.lock to remove unnecessary dependencies (#244)
- Fix the bug that the container name is incorrect when multiple containers in the pod don't specify ports by setting it empty. (#238)
- Fix the bug that sometimes the workload kind is
ReplicaSet
(#230) - Fix "no such file or directory" when using the kubeconfig file. #225
- Fix several bugs in the Grafana plugin. (#220)
- Provide a kindling Prometheus exporter that can support integration with Prometheus easily. See kindling's metrics from the kindling website.
- Support network performance, DNS performance, service network maps, and workload performance analysis.
- Support HTTP, MySQL, and REDIS request analysis.
- Provide a Grafana-plugin with four built-in dashboards to support basic analysis features.