Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New metric to expose the number of content groups #28983

Open
pfrybar opened this issue Oct 17, 2023 · 4 comments
Open

New metric to expose the number of content groups #28983

pfrybar opened this issue Oct 17, 2023 · 4 comments
Assignees
Milestone

Comments

@pfrybar
Copy link

pfrybar commented Oct 17, 2023

Is your feature request related to a problem? Please describe.
When creating monitoring dashboards, it's difficult to know how many documents exist when using a grouped distribution, from just metric data alone. We can use the current metrics to find the total documents across the entire cluster, but there is no way to find the "correct" (average) number of documents within a content group.

Describe the solution you'd like
Expose a metric for the number of content groups currently in use.

Describe alternatives you've considered
We have graphs of total document count which show a rough picture, but they get noisy when the number of content groups change. See the screenshot below as an example:

Screenshot 2023-10-17 at 14 20 26
@yngveaasheim yngveaasheim self-assigned this Oct 18, 2023
@yngveaasheim
Copy link
Contributor

There are two ways you can get a better view of the number of documents in your setup:

  1. Split or filter on the "groupId" tag/dimension when aggregating number of documents per content group when using the "searchnode.content.proton.documentdb.documents.ready" metric or similar. Note that you will also need to take "searchable-copies" into account here, or "redundancy" for some of the related metrics.
  2. Aggregate on the "distributor.vds.distributor.docsstored" metric instead, to get the number of unique documents per content cluster.

Please let me know if this helps you accomplish what you want.

Best, -Yngve

@pfrybar
Copy link
Author

pfrybar commented Oct 18, 2023

Thanks for the quick response. I wasn't able to find a "groupId" dimension on any of the metrics, whether using the aggregated v2 metrics, node-level v1 metrics, or the prometheus endpoint.

I could try to do some aggregations on "distributor.vds.distributor.docsstored", but since we are using multiple content groups and I can't find a dimension to group by I'm not sure how to do it in a generic way.

For example, here is /state/v1/metrics on a content node:

      {
        "name": "content.proton.documentdb.documents.total",
        "description": "The total number of documents in this documents db (ready + not-ready)",
        "values": {
          "average": 9743208.0,
          "sum": 116918496.0,
          "count": 12,
          "rate": 0.2,
          "min": 9743208,
          "max": 9743208,
          "last": 9743208
        },
        "dimensions": {
          "documenttype": "mydocument"
        }
      },

@yngveaasheim
Copy link
Contributor

You should not use the groupId dimension for the distributor metric, but sum over them. If you have multiple content clusters then you will need to aggregate per cluster using the clusterid dimension.

Unfortunately it seems metrics are only decorated with those dimensions in the Vespa Cloud currently.

@yngveaasheim
Copy link
Contributor

I will clear assignee to have this discussed during our upcoming ticket scrub.

@yngveaasheim yngveaasheim removed their assignment Oct 18, 2023
@kkraune kkraune added this to the soon milestone Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants