Test Cases for JMX -> Prom Exporter Regexps #14155

suddendust · 2024-10-03T10:21:50Z

Instructions:

This PR adds test cases that test that all gauges/timers/meters for each Pinot component are exported with the right names and labels. Given most of these metrics are already in use, the tests validate that they keep getting exported the way they're being used right now (to maintain backward compatibility).

Pinot currently uses the JMX->Prom exporter to export metrics to Prometheus. Normally, this runs as an agent in the JVM. In this case however, we've got tests using four different config files (server.yml, broker.yml, controller.yml and minion.yml). Loading the same agent in the same JVM multiple times isn't allowed, so I had to resort to copying the agent's code to some degree and starting up the HTTP servers manually.

Edit: The tests detected that the following metrics were not being exported right now:

pinot_server_luceneIndexingDelayMs
pinot_server_luceneIndexingDelayDocs
pinot_server_realtimeRowsSanitized: This metric is exported right now but no useful information is exported (table, tableType, topic or partition). This is because it being exported by a catch-all metric right now that encodes very less information.

pinot_server_realtimeRowsSanitized_Count{app="pinot", apps_kubernetes_io_pod_index="0", cluster_name="pinot", component="pinot-server", component_name="server-default-tenant-1", controller_revision_hash="pinot-server-default-tenant-1-197ef0f2-dcfc7c4c4", heritage="StarTreeOperator", instance="10.168.39.97:8080", job="kubernetes-pods", kubernetes_namespace="managed", kubernetes_pod_name="pinot-server-default-tenant-1-197ef0f2-0", profile_id="197ef0f2", statefulset_kubernetes_io_pod_name="pinot-server-default-tenant-1-197ef0f2-0"}

Compare it to realtimeRowsConsumed that has table, tableType, topic and partition.

pinot_server_realtimeRowsConsumed_Count{app="pinot", apps_kubernetes_io_pod_index="0", cluster_name="pinot", component="pinot-server", component_name="server-default-tenant-1", controller_revision_hash="pinot-server-default-tenant-1-197ef0f2-dcfc7c4c4", heritage="StarTreeOperator", instance="10.168.39.97:8080", job="kubernetes-pods", kubernetes_namespace="managed", kubernetes_pod_name="pinot-server-default-tenant-1-197ef0f2-0", partition="0", profile_id="197ef0f2", statefulset_kubernetes_io_pod_name="pinot-server-default-tenant-1-197ef0f2-0", table="f_express_order_v2", tableType="REALTIME", topic="stg-confluentproto-midas-express-order-fact"}

The new regex exports this metric correctly.

I have added regexps to export them in server.yml.

codecov-commenter · 2024-10-03T10:56:31Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.82%. Comparing base (59551e4) to head (2fcd382).
Report is 1215 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #14155      +/-   ##
============================================
+ Coverage     61.75%   63.82%   +2.07%     
- Complexity      207     1536    +1329     
============================================
  Files          2436     2626     +190     
  Lines        133233   144721   +11488     
  Branches      20636    22147    +1511     
============================================
+ Hits          82274    92371   +10097     
- Misses        44911    45537     +626     
- Partials       6048     6813     +765

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (+99.99%)`	⬆️
integration	`100.00% <ø> (+99.99%)`	⬆️
integration1	`100.00% <ø> (+99.99%)`	⬆️
integration2	`0.00% <ø> (ø)`
java-11	`63.77% <ø> (+2.06%)`	⬆️
java-21	`63.72% <ø> (+2.09%)`	⬆️
skip-bytebuffers-false	`63.81% <ø> (+2.06%)`	⬆️
skip-bytebuffers-true	`63.68% <ø> (+35.95%)`	⬆️
temurin	`63.82% <ø> (+2.07%)`	⬆️
unittests	`63.82% <ø> (+2.07%)`	⬆️
unittests1	`55.59% <ø> (+8.70%)`	⬆️
unittests2	`34.29% <ø> (+6.56%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…s_exporter_tests

Jackie-Jiang

Thanks for adding the test!

Right now all tests are hard-coded and it won't be able to capture newly added metrics automatically. Instead, can we loop over the enums to ensure all the metrics are tested, which is also future proof?

suddendust · 2024-10-03T17:51:50Z

can we loop over the enums to ensure all the metrics are tested, which is also future proof?

Yes, that's the ideal way to do it. However, exported metric names are not standardised so it's not possible to derive it from the enum names (we should standardize them going forward). Further, metrics accept different kind of arguments for labelling. For example, some accept rawTableName, some accept tableNameWithType and some accept clientId (tableNameWithType-partition-topic). Determining these would be on a case-to-case basis.

suddendust · 2024-10-03T17:53:23Z

it won't be able to capture newly added metrics automatically

For this, I have added a check in each test case that basically asserts on the count of metrics exported. For any newly added metric, this check would fail. It's not fool-proof but does provide a basic check. Perhaps I can strengthen this further by adding a check that verifies that for each enum, we have an exported metric that contains the enum string in some form.

pinot-common/src/test/java/org/apache/pinot/common/metrics/PinotJMXToPromMetricsTest.java

gortiz · 2024-10-07T12:02:36Z

Where is the agent being added?

pinot-common/src/test/java/org/apache/pinot/common/metrics/PinotJMXToPromMetricsTest.java

soumitra-st

LGTM

…m_metrics_exporter_tests

pinot-common/src/test/java/org/apache/pinot/common/metrics/BrokerPrometheusMetricsTest.java

gortiz · 2024-10-21T07:12:53Z

pinot-common/src/test/java/org/apache/pinot/common/metrics/PinotPrometheusMetricsTest.java

+  public void setupTest()
+      throws Exception {
+    //read test configuration
+    JsonNode jsonNode = JsonUtils.DEFAULT_READER.readTree(loadResourceAsString("metrics/testConfig.json"));
+    _exporterConfigsParentDir = jsonNode.get(CONFIG_KEY_JMX_EXPORTER_PARENT_DIR).textValue();
+    _pinotComponentToConfigFileMap.put(PinotComponent.CONTROLLER,
+        jsonNode.get(CONFIG_KEY_CONTROLLER_CONFIG_FILE_NAME).textValue());
+    _pinotComponentToConfigFileMap.put(PinotComponent.SERVER,
+        jsonNode.get(CONFIG_KEY_SERVER_CONFIG_FILE_NAME).textValue());
+    _pinotComponentToConfigFileMap.put(PinotComponent.BROKER,
+        jsonNode.get(CONFIG_KEY_BROKER_CONFIG_FILE_NAME).textValue());
+    _pinotComponentToConfigFileMap.put(PinotComponent.MINION,
+        jsonNode.get(CONFIG_KEY_MINION_CONFIG_FILE_NAME).textValue());
+
+    String pinotMetricsFactory = jsonNode.get(CONFIG_KEY_PINOT_METRICS_FACTORY).textValue();
+    switch (pinotMetricsFactory) {
+      case "YammerMetricsFactory":
+        _pinotMetricsFactory = new YammerMetricsFactory();
+        break;
+      case "DropwizardMetricsFactory":
+        _pinotMetricsFactory = new DropwizardMetricsFactory();
+        break;
+      default:
+        throw new IllegalArgumentException("Unknow metrics factory specified in test config: " + pinotMetricsFactory
+            + ", supported ones are: YammerMetricsFactory and DropwizardMetricsFactory");
+    }


I don't understand this code. Assuming in future we test both Yammer and Dropwizard... how do you expect we should write the tests? It looks it is reading some config from testConfig.json but it is not possible to read two different files so... why is this better than forcing to use Yammer in the code?

Just to be clear. Due to the limitations of TestNG, what I would expect is to have 1 test class per component (broker, controller, etc) and per metric registry (Yammer, Dropwizard, etc).

In order to have so, I would expect PinotPrometheusMetricsTest to be abstract and have two abstract methods:

PinotComponent getComponent(), as explained in another comment.

PinotMetricsFactory getPinotMetricsFactory(), that returns the registry to be used.

Then for each component we may have another abstract class (like BrokerPrometheusMetricsTest) that inherits PinotPrometheusMetricsTest and implements getComponents and includes all test methods and finally two not-abstract classes YammerBrokerPrometheusMetricsTest and DropwizardBrokerPrometheusMetricsTest which extends BrokerPrometheusMetricsTest and only implements getPinotMetricsFactory().

Something like:

classDiagram class PinotPrometheusMetricsTest { } class BrokerPrometheusMetricsTest { getComponent } class DropwizardBrokerPrometheusMetricsTest { getPinotMetricsFactory } class YammerBrokerPrometheusMetricsTest { getPinotMetricsFactory } class ServerPrometheusMetricsTest { getComponent } class DropwizardServerPrometheusMetricsTest { getPinotMetricsFactory } class YammerServerPrometheusMetricsTest { getPinotMetricsFactory } PinotPrometheusMetricsTest <|-- BrokerPrometheusMetricsTest BrokerPrometheusMetricsTest <|-- DropwizardBrokerPrometheusMetricsTest BrokerPrometheusMetricsTest <|-- YammerBrokerPrometheusMetricsTest PinotPrometheusMetricsTest <|-- ServerPrometheusMetricsTest ServerPrometheusMetricsTest <|-- DropwizardServerPrometheusMetricsTest ServerPrometheusMetricsTest <|-- YammerServerPrometheusMetricsTest

Loading

Thanks for the detailed comment, have restructured the code as discussed.

vrajat · 2024-10-21T20:40:55Z

pinot-common/pom.xml

@@ -107,6 +107,10 @@
    </plugins>
  </build>
  <dependencies>
+    <dependency>


Should the dependencies added here be in test scope ?

It's already in test scope in parent pom:

 <dependency> <groupId>io.prometheus.jmx</groupId> <artifactId>jmx_prometheus_javaagent</artifactId> <version>0.19.0</version> <scope>test</scope> </dependency>

gortiz · 2024-10-22T10:16:53Z

...mon/src/test/java/org/apache/pinot/common/metrics/prometheus/PinotPrometheusMetricsTest.java

+
+  private String _exporterConfigsParentDir;
+
+  private final Map<PinotComponent, String> _pinotComponentToConfigFileMap = new HashMap<>();


This attribute is only used to do _pinotComponentToConfigFileMap.get(pinotComponent), where pinotComponent is defined by getPinotComponent(), which is an abstract method of this class. Instead I suggest to create an abstract getPrometheusConfigFile() we can implement on each test. For example, if yammer and dropwizard end up having different config files (which seems to be the solution for the future), we could override the method on each test class to return the specific prometheus file for each one.

This is addressed, thanks!

gortiz

Apart from the (non blocker) comment I've just added, LGTM

vrajat · 2024-10-22T16:03:04Z

...on/src/test/java/org/apache/pinot/common/metrics/prometheus/ServerPrometheusMetricsTest.java

+  //all exported server metrics have this prefix
+  private static final String EXPORTED_METRIC_PREFIX = "pinot_server_";
+
+  private static final List<ServerMeter> METERS_ACCEPTING_CLIENT_ID =


This is not DRY. If a new metric is added then it has to be added to one of these lists ? I am sure that will be missed.
Couple of options:

Add this metadata to ServerMeter etc

Check that every metric defined is in one of the lists.

Any new metric that'll be added will have tableNameWithType supplied to them. Here's the relevant piece of code for ServerMeter:

} else { if (METERS_ACCEPTING_CLIENT_ID.contains(serverMeter)) { addMeterWithLabels(serverMeter, CLIENT_ID); assertMeterExportedCorrectly(serverMeter.getMeterName(), ExportedLabels.PARTITION_TABLENAME_TABLETYPE_KAFKATOPIC); } else if (METERS_ACCEPTING_RAW_TABLE_NAMES.contains(serverMeter)) { addMeterWithLabels(serverMeter, ExportedLabelValues.TABLENAME); assertMeterExportedCorrectly(serverMeter.getMeterName(), ExportedLabels.TABLENAME); } else { //we pass tableNameWithType to all remaining meters addMeterWithLabels(serverMeter, TABLE_NAME_WITH_TYPE); assertMeterExportedCorrectly(serverMeter.getMeterName(), ExportedLabels.TABLENAME_TABLETYPE); }

We explicitly have to create a list for METERS_ACCEPTING_CLIENT_ID or METERS_ACCEPTING_RAW_TABLE_NAMES et. al. because this is how these metrics are being used right now in code. Ideally, we should not be using rawTableName anywhere but tableNameWithType. However, to maintain backward compatibility with metrics as they are being used currently, I have to explicitly filter them.

Now suppose a new metric is added in which the user expects the partition as one of the exported labels. If he does not add an explicit check for this new metric in the test, then it would go to the else block (with ExportedLabels.TABLENAME_TABLETYPE) and the test would PASS - This would ensure that at least the table and tabletype labels are present. We'll need to document that if you want some special label, you'll need to add a test case for it explicitly otherwise you'll only end up getting table and tableType.

vrajat · 2024-10-22T16:03:54Z

...apache/pinot/common/metrics/prometheus/dropwizard/DropwizardBrokerPrometheusMetricsTest.java

+
+
+/**
+ * Disabling tests as Pinot currently uses Yammer and these tests fail for for {@link DropwizardMetricsFactory}


Remove DropWizard tests instead ?

Since we have two metric registries rn, we need tests for both. The only reason why they're disabled is that these tests fail for Dropwizard. So I have kept them explicit that we want these tests but are disabling them because the code for DW seems broken.

Test cases for Server Metrics

86c32bc

Merge branch 'master' of github.com:apache/pinot into jmx_prom_metric…

bde0d23

…s_exporter_tests

Jackie-Jiang reviewed Oct 3, 2024

View reviewed changes

Jackie-Jiang added metrics testing labels Oct 3, 2024

suddendust added 5 commits October 4, 2024 14:15

Removed hardcoded metrics

51127e2

WIP

417a8a6

WIP

f85d7c8

controllerMeterTest working

af4a3d2

WIP

1f58846

gortiz reviewed Oct 7, 2024

View reviewed changes

pinot-common/src/test/java/org/apache/pinot/common/metrics/PinotJMXToPromMetricsTest.java Outdated Show resolved Hide resolved

gortiz reviewed Oct 7, 2024

View reviewed changes

pinot-common/src/test/java/org/apache/pinot/common/metrics/PinotJMXToPromMetricsTest.java Outdated Show resolved Hide resolved

suddendust added 14 commits October 13, 2024 19:08

WIP

836db84

WIP

d938a47

Finalise BrokerJMXToPromMetricsTest.java

18f23e1

WIP

4436fe6

WIP

4d2d7e1

Controller gauge test

37462ac

WIP

ea7b328

WIP

01d2aee

Agent working

04e7c8e

WIP

d8f11b9

All test cases working

d581be8

WIP

1b69b17

Added MinionJMXToPromMetricsTest.java

10c4831

Refactor

759ec6c

gortiz reviewed Oct 18, 2024

View reviewed changes

pinot-common/src/test/java/org/apache/pinot/common/metrics/PinotJMXToPromMetricsTest.java Outdated Show resolved Hide resolved

gortiz reviewed Oct 18, 2024

View reviewed changes

pinot-common/src/test/java/org/apache/pinot/common/metrics/PinotJMXToPromMetricsTest.java Outdated Show resolved Hide resolved

suddendust added 7 commits October 18, 2024 22:28

WIP

7d1e854

WIP

a300a34

WIP

6dc98d8

WIP

1f20b20

WIP

eff9b92

WIP

6fdd037

Addressed comments

0c460ec

soumitra-st approved these changes Oct 20, 2024

View reviewed changes

suddendust added 2 commits October 21, 2024 12:14

Rearrange code

a62a113

Merge branch 'master' of https://github.com/apache/pinot into jmx_pro…

f17c4d1

…m_metrics_exporter_tests

gortiz reviewed Oct 21, 2024

View reviewed changes

pinot-common/src/test/java/org/apache/pinot/common/metrics/BrokerPrometheusMetricsTest.java Outdated Show resolved Hide resolved

gortiz reviewed Oct 21, 2024

View reviewed changes

vrajat reviewed Oct 21, 2024

View reviewed changes

suddendust added 6 commits October 22, 2024 12:53

Separate tests for Yammer and Dropwizard metrics factory

2daac0b

Addressed comments

5489639

Added license header

069962c

Added license header

52f23c2

Skip dropwizard tests in maven

ee602cc

Remove pinotMetricsFactory from test config

f3064f5

gortiz reviewed Oct 22, 2024

View reviewed changes

gortiz approved these changes Oct 22, 2024

View reviewed changes

suddendust added 5 commits October 22, 2024 16:03

Remove testConfig.json

4264af3

Remove inadvertent changes

a9354eb

Remove testConfig.json

46899f3

License

b25b665

Rename method from getConfigParentDir -> getConfigFile

2fcd382

vrajat reviewed Oct 22, 2024

View reviewed changes

Jackie-Jiang merged commit 09e8812 into apache:master Oct 23, 2024
22 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Cases for JMX -> Prom Exporter Regexps #14155

Test Cases for JMX -> Prom Exporter Regexps #14155

suddendust commented Oct 3, 2024 •

edited

Loading

codecov-commenter commented Oct 3, 2024 •

edited

Loading

Jackie-Jiang left a comment

suddendust commented Oct 3, 2024 •

edited

Loading

suddendust commented Oct 3, 2024 •

edited

Loading

gortiz commented Oct 7, 2024

soumitra-st left a comment

gortiz Oct 21, 2024

gortiz Oct 21, 2024

suddendust Oct 22, 2024 •

edited

Loading

vrajat Oct 21, 2024

suddendust Oct 22, 2024

gortiz Oct 22, 2024

suddendust Oct 22, 2024

gortiz left a comment

vrajat Oct 22, 2024

suddendust Oct 23, 2024

vrajat Oct 22, 2024

suddendust Oct 23, 2024


		private String _exporterConfigsParentDir;

		private final Map<PinotComponent, String> _pinotComponentToConfigFileMap = new HashMap<>();



		/**
		* Disabling tests as Pinot currently uses Yammer and these tests fail for for {@link DropwizardMetricsFactory}

Test Cases for JMX -> Prom Exporter Regexps #14155

Test Cases for JMX -> Prom Exporter Regexps #14155

Conversation

suddendust commented Oct 3, 2024 • edited Loading

codecov-commenter commented Oct 3, 2024 • edited Loading

Codecov Report

Jackie-Jiang left a comment

Choose a reason for hiding this comment

suddendust commented Oct 3, 2024 • edited Loading

suddendust commented Oct 3, 2024 • edited Loading

gortiz commented Oct 7, 2024

soumitra-st left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suddendust Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gortiz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suddendust commented Oct 3, 2024 •

edited

Loading

codecov-commenter commented Oct 3, 2024 •

edited

Loading

suddendust commented Oct 3, 2024 •

edited

Loading

suddendust commented Oct 3, 2024 •

edited

Loading

suddendust Oct 22, 2024 •

edited

Loading