[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve metrics building performance, limit endpoints to whitelist #1616

Merged
merged 9 commits into from
Mar 29, 2023

Conversation

timvisee
Copy link
Member
@timvisee timvisee commented Mar 28, 2023

Improves #1541 with suggestions from #1541 (comment).

This improves upon the existing metrics system. It now collects/formats the metrics output more efficiently.

It now uses a whitelist for endpoints to report only the most significant ones: a selection of search, recommend and upsert endpoints:

const REST_ENDPOINT_WHITELIST: &[&str] = &[
    "/collections/{name}/index",
    "/collections/{name}/points",
    "/collections/{name}/points/payload",
    "/collections/{name}/points/recommend",
    "/collections/{name}/points/recommend/batch",
    "/collections/{name}/points/search",
    "/collections/{name}/points/search/batch",
];
const GRPC_ENDPOINT_WHITELIST: &[&str] = &[
    "/qdrant.Points/OverwritePayload",
    "/qdrant.Points/Recommend",
    "/qdrant.Points/RecommendBatch",
    "/qdrant.Points/Search",
    "/qdrant.Points/SearchBatch",
    "/qdrant.Points/SetPayload",
    "/qdrant.Points/Upsert",
];

This also:

  • limits timing output to HTTP 200
  • sets Content-Type header for /metrics
  • fixes incorrect value for cluster commit metric
  • adds a basic OpenAPI test for /metrics
Click here for a snippet of output with whitelisting.
# HELP app_info information about qdrant server
# TYPE app_info counter
app_info{name="qdrant",version="0.11.1"} 1
# HELP collections_total number of collections
# TYPE collections_total gauge
collections_total 4
# HELP cluster_enabled is cluster support enabled
# TYPE cluster_enabled counter
cluster_enabled 0
# HELP rest_responses_total total number of responses
# TYPE rest_responses_total counter
rest_responses_total{method="PUT",endpoint="/collections/{name}/points",status="200"} 5
rest_responses_total{method="POST",endpoint="/collections/{name}/points/search/batch",status="200"} 5
rest_responses_total{method="POST",endpoint="/collections/{name}/points/search",status="200"} 10
rest_responses_total{method="PUT",endpoint="/collections/{name}/index",status="200"} 15
rest_responses_total{method="POST",endpoint="/collections/{name}/points",status="200"} 5
# HELP rest_responses_fail_total total number of failed responses
# TYPE rest_responses_fail_total counter
rest_responses_fail_total{method="PUT",endpoint="/collections/{name}/points",status="200"} 0
rest_responses_fail_total{method="POST",endpoint="/collections/{name}/points/search/batch",status="200"} 0
rest_responses_fail_total{method="POST",endpoint="/collections/{name}/points/search",status="200"} 0
rest_responses_fail_total{method="PUT",endpoint="/collections/{name}/index",status="200"} 0
rest_responses_fail_total{method="POST",endpoint="/collections/{name}/points",status="200"} 0
# HELP rest_responses_avg_duration_seconds average response duration
# TYPE rest_responses_avg_duration_seconds gauge
rest_responses_avg_duration_seconds{method="PUT",endpoint="/collections/{name}/points",status="200"} 0.008645599609375
rest_responses_avg_duration_seconds{method="POST",endpoint="/collections/{name}/points/search/batch",status="200"} 0.0003522000122070312
rest_responses_avg_duration_seconds{method="POST",endpoint="/collections/{name}/points/search",status="200"} 0.0004141000061035156
rest_responses_avg_duration_seconds{method="PUT",endpoint="/collections/{name}/index",status="200"} 0.0002236666717529297
rest_responses_avg_duration_seconds{method="POST",endpoint="/collections/{name}/points",status="200"} 0.000155
# HELP rest_responses_min_duration_seconds minimum response duration
# TYPE rest_responses_min_duration_seconds gauge
rest_responses_min_duration_seconds{method="PUT",endpoint="/collections/{name}/points",status="200"} 0.006217
rest_responses_min_duration_seconds{method="POST",endpoint="/collections/{name}/points/search/batch",status="200"} 0.000283
rest_responses_min_duration_seconds{method="POST",endpoint="/collections/{name}/points/search",status="200"} 0.000337
rest_responses_min_duration_seconds{method="PUT",endpoint="/collections/{name}/index",status="200"} 0.000145
rest_responses_min_duration_seconds{method="POST",endpoint="/collections/{name}/points",status="200"} 0.000142
# HELP rest_responses_max_duration_seconds maximum response duration
# TYPE rest_responses_max_duration_seconds gauge
rest_responses_max_duration_seconds{method="PUT",endpoint="/collections/{name}/points",status="200"} 0.012047
rest_responses_max_duration_seconds{method="POST",endpoint="/collections/{name}/points/search/batch",status="200"} 0.000429
rest_responses_max_duration_seconds{method="POST",endpoint="/collections/{name}/points/search",status="200"} 0.000577
rest_responses_max_duration_seconds{method="PUT",endpoint="/collections/{name}/index",status="200"} 0.00032
rest_responses_max_duration_seconds{method="POST",endpoint="/collections/{name}/points",status="200"} 0.00017
# HELP grpc_responses_total total number of responses
# TYPE grpc_responses_total counter
grpc_responses_total{endpoint="/qdrant.Points/Recommend"} 8
grpc_responses_total{endpoint="/qdrant.Points/Upsert"} 8
grpc_responses_total{endpoint="/qdrant.Points/Search"} 32
# HELP grpc_responses_fail_total total number of failed responses
# TYPE grpc_responses_fail_total counter
grpc_responses_fail_total{endpoint="/qdrant.Points/Recommend"} 0
grpc_responses_fail_total{endpoint="/qdrant.Points/Upsert"} 0
grpc_responses_fail_total{endpoint="/qdrant.Points/Search"} 0
# HELP grpc_responses_avg_duration_seconds average response duration
# TYPE grpc_responses_avg_duration_seconds gauge
grpc_responses_avg_duration_seconds{endpoint="/qdrant.Points/Recommend"} 0.0001723572998046875
grpc_responses_avg_duration_seconds{endpoint="/qdrant.Points/Upsert"} 0.001592212890625
grpc_responses_avg_duration_seconds{endpoint="/qdrant.Points/Search"} 0.0005532779541015625
# HELP grpc_responses_min_duration_seconds minimum response duration
# TYPE grpc_responses_min_duration_seconds gauge
grpc_responses_min_duration_seconds{endpoint="/qdrant.Points/Recommend"} 0.000151
grpc_responses_min_duration_seconds{endpoint="/qdrant.Points/Upsert"} 0.001379
grpc_responses_min_duration_seconds{endpoint="/qdrant.Points/Search"} 0.000419
# HELP grpc_responses_max_duration_seconds maximum response duration
# TYPE grpc_responses_max_duration_seconds gauge
grpc_responses_max_duration_seconds{endpoint="/qdrant.Points/Recommend"} 0.000292
grpc_responses_max_duration_seconds{endpoint="/qdrant.Points/Upsert"} 0.001893
grpc_responses_max_duration_seconds{endpoint="/qdrant.Points/Search"} 0.000765

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@timvisee timvisee changed the title Draft: Improve metrics building performance, limit endpoints to whitelist Improve metrics building performance, limit endpoints to whitelist Mar 28, 2023
@timvisee timvisee marked this pull request as ready for review March 28, 2023 15:51
src/common/metrics.rs Outdated Show resolved Hide resolved
Copy link
Contributor
@ffuugoo ffuugoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great job, @timvisee! 👍

@timvisee
Copy link
Member Author

Added the sorting comment. All is green. Merging now.

@timvisee timvisee merged commit 91a0200 into dev Mar 29, 2023
generall pushed a commit that referenced this pull request Apr 11, 2023
…1616)

* Fix incorrect metrics value for cluster commit

* Rewrite metrics logic, don't use registry, write values directly

* Only report REST timings for requests having HTTP 200 response

* Limit metrics reporting of endpoints to whitelist

The whitelist contains a selection of search, recommend and upsert endpoints.

* Add MetricsParam, remove detail level, keep anonymize

* Request metrics in basic API test

* Specify content type for metrics endpoint

* Add OpenAPI test for metrics endpoint, remove from basic API test

This test probes for some strings that must exist in the output

* Add note that metrics endpoint whitelist must be sorted
@generall generall mentioned this pull request Apr 19, 2023
8 tasks
@agourlay agourlay deleted the improve-metrics branch July 12, 2023 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants