Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Deduplicate top queries records when exporting #199

Closed
ansjcy opened this issue Jan 23, 2025 · 1 comment
Closed

[FEATURE] Deduplicate top queries records when exporting #199

ansjcy opened this issue Jan 23, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@ansjcy
Copy link
Member

ansjcy commented Jan 23, 2025

Is your feature request related to a problem?

Currently we are dumping all top n queries records to the same local index for different metric dimentions, which means there will be multiple same records exists in the same index

to reproduce:

  • init the settings with all metrics enabled and enable exporter
  • do search query on one index
curl -X GET "localhost:9200/my-index-0/_search?size=20&pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "user.id": "cyji"
    }
  }
}'
  • see the top queries stored in local index, clearly 3 same records are all exported
{
  "took" : 8,
  "phase_took" : {
    "dfs_pre_query" : 0,
    "query" : 6,
    "fetch" : 0,
    "dfs_query" : 0,
    "expand" : 0,
    "can_match" : 0
  },
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "top_queries-2025.01.23-62113",
        "_id" : "fNgCkpQBb1IjED2vvaem",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : 1737616431131,
          "id" : "1882ffd4-ee6e-47c3-9c19-08eb1869c93f",
          "indices" : [
            "my-index-0"
          ],
          "group_by" : "NONE",
          "task_resource_usages" : [
            {
              "action" : "indices:data/read/search[phase/query]",
              "taskId" : 169,
              "parentTaskId" : 168,
              "nodeId" : "G4cCLiIKSRKooRJzPtN4pw",
              "taskResourceUsage" : {
                "cpu_time_in_nanos" : 19581000,
                "memory_in_bytes" : 1907128
              }
            },
            {
              "action" : "indices:data/read/search",
              "taskId" : 168,
              "parentTaskId" : -1,
              "nodeId" : "G4cCLiIKSRKooRJzPtN4pw",
              "taskResourceUsage" : {
                "cpu_time_in_nanos" : 2750000,
                "memory_in_bytes" : 220016
              }
            }
          ],
          "labels" : { },
          "phase_latency_map" : {
            "expand" : 0,
            "query" : 30,
            "fetch" : 1
          },
          "search_type" : "query_then_fetch",
          "source" : {
            "size" : 20,
            "query" : {
              "term" : {
                "user.id" : {
                  "value" : "cyji",
                  "boost" : 1.0
                }
              }
            }
          },
          "total_shards" : 1,
          "measurements" : {
            "memory" : {
              "number" : 2127144,
              "count" : 1,
              "aggregationType" : "NONE"
            },
            "latency" : {
              "number" : 42,
              "count" : 1,
              "aggregationType" : "NONE"
            },
            "cpu" : {
              "number" : 22331000,
              "count" : 1,
              "aggregationType" : "NONE"
            }
          }
        }
      },
      {
        "_index" : "top_queries-2025.01.23-62113",
        "_id" : "fdgCkpQBb1IjED2vvaem",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : 1737616431131,
          "id" : "1882ffd4-ee6e-47c3-9c19-08eb1869c93f",
          "indices" : [
            "my-index-0"
          ],
          "group_by" : "NONE",
          "task_resource_usages" : [
            {
              "action" : "indices:data/read/search[phase/query]",
              "taskId" : 169,
              "parentTaskId" : 168,
              "nodeId" : "G4cCLiIKSRKooRJzPtN4pw",
              "taskResourceUsage" : {
                "cpu_time_in_nanos" : 19581000,
                "memory_in_bytes" : 1907128
              }
            },
            {
              "action" : "indices:data/read/search",
              "taskId" : 168,
              "parentTaskId" : -1,
              "nodeId" : "G4cCLiIKSRKooRJzPtN4pw",
              "taskResourceUsage" : {
                "cpu_time_in_nanos" : 2750000,
                "memory_in_bytes" : 220016
              }
            }
          ],
          "labels" : { },
          "phase_latency_map" : {
            "expand" : 0,
            "query" : 30,
            "fetch" : 1
          },
          "search_type" : "query_then_fetch",
          "source" : {
            "size" : 20,
            "query" : {
              "term" : {
                "user.id" : {
                  "value" : "cyji",
                  "boost" : 1.0
                }
              }
            }
          },
          "total_shards" : 1,
          "measurements" : {
            "memory" : {
              "number" : 2127144,
              "count" : 1,
              "aggregationType" : "NONE"
            },
            "latency" : {
              "number" : 42,
              "count" : 1,
              "aggregationType" : "NONE"
            },
            "cpu" : {
              "number" : 22331000,
              "count" : 1,
              "aggregationType" : "NONE"
            }
          }
        }
      },
      {
        "_index" : "top_queries-2025.01.23-62113",
        "_id" : "ftgCkpQBb1IjED2vvqc8",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : 1737616431131,
          "id" : "1882ffd4-ee6e-47c3-9c19-08eb1869c93f",
          "indices" : [
            "my-index-0"
          ],
          "group_by" : "NONE",
          "task_resource_usages" : [
            {
              "action" : "indices:data/read/search[phase/query]",
              "taskId" : 169,
              "parentTaskId" : 168,
              "nodeId" : "G4cCLiIKSRKooRJzPtN4pw",
              "taskResourceUsage" : {
                "cpu_time_in_nanos" : 19581000,
                "memory_in_bytes" : 1907128
              }
            },
            {
              "action" : "indices:data/read/search",
              "taskId" : 168,
              "parentTaskId" : -1,
              "nodeId" : "G4cCLiIKSRKooRJzPtN4pw",
              "taskResourceUsage" : {
                "cpu_time_in_nanos" : 2750000,
                "memory_in_bytes" : 220016
              }
            }
          ],
          "labels" : { },
          "phase_latency_map" : {
            "expand" : 0,
            "query" : 30,
            "fetch" : 1
          },
          "search_type" : "query_then_fetch",
          "source" : {
            "size" : 20,
            "query" : {
              "term" : {
                "user.id" : {
                  "value" : "cyji",
                  "boost" : 1.0
                }
              }
            }
          },
          "total_shards" : 1,
          "measurements" : {
            "memory" : {
              "number" : 2127144,
              "count" : 1,
              "aggregationType" : "NONE"
            },
            "latency" : {
              "number" : 42,
              "count" : 1,
              "aggregationType" : "NONE"
            },
            "cpu" : {
              "number" : 22331000,
              "count" : 1,
              "aggregationType" : "NONE"
            }
          }
        }
      }
    ]
  }
}

What solution would you like?

Ideally one record should only exist once in the top n queries local index. Otherwise it might be confusing if this is the same query gathered by different metrics, or there are 2 queries that happens at the same time.

What alternatives have you considered?

A clear and concise description of any alternative solutions or features you've considered.

Do you have any additional context?

Add any other context or screenshots about the feature request here.

@ansjcy ansjcy added enhancement New feature or request untriaged labels Jan 23, 2025
@ansjcy
Copy link
Member Author

ansjcy commented Jan 28, 2025

This is fixed in #210

@ansjcy ansjcy closed this as completed Jan 28, 2025
@ansjcy ansjcy removed the untriaged label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant