Skip to content

Commit 8afa5a7

Browse files
authored
Merge pull request #102308 from theashiot/OBSDOCS-1859-6-0
OBSDOCS-1859: Loki Query performance troubleshooting steps
2 parents 390b780 + 98e0178 commit 8afa5a7

8 files changed

+259
-0
lines changed

_topic_maps/_topic_map.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ Topics:
4040
# File: configuring-lokistack-otlp
4141
#- Name: OpenTelemetry data model
4242
# File: opentelemetry-data-model
43+
- Name: Loki query performance troubleshooting
44+
File: loki-query-performance-troubleshooting
4345
---
4446
Name: Upgrading logging
4547
Dir: upgrading

configuring/configuring-the-log-store.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ include::modules/logging-loki-reliability-hardening.adoc[leveloffset=+2]
5555
include::modules/loki-retention.adoc[leveloffset=+2]
5656
include::modules/loki-memberlist-ip.adoc[leveloffset=+2]
5757
include::modules/loki-restart-hardening.adoc[leveloffset=+2]
58+
//include::modules/enabling-automatic-stream-sharding.adoc[leveloffset=+2]
5859

5960
//Advanced deployment and scalability
6061
[id="advanced_{context}"]
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
:_newdoc-version: 2.18.4
2+
:_template-generated: 2025-09-22
3+
:_mod-docs-content-type: ASSEMBLY
4+
include::_attributes/common-attributes.adoc[]
5+
6+
:toc:
7+
[id="loki-query-performance-troubleshooting_{context}"]
8+
= Loki query performance troubleshooting
9+
10+
:context: loki-query-performance-troubleshooting
11+
12+
This documentation details methods for optimizing your Logging stack to improve query performance and provides steps for troubleshooting.
13+
14+
include::modules/best-practices-for-loki-query-performance.adoc[leveloffset=+1]
15+
16+
include::modules/best-practices-for-loki-labels.adoc[leveloffset=+1]
17+
18+
include::modules/configuration-of-stream-labels-in-loki-operator.adoc[leveloffset=+1]
19+
20+
include::modules/analyzing-loki-query-performance.adoc[leveloffset=+1]
21+
22+
include::modules/query-performance-analysis.adoc[leveloffset=+1]
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * configuring/loki-query-performance-troubleshooting.adoc
4+
5+
:_newdoc-version: 2.18.4
6+
:_template-generated: 2025-10-24
7+
:_mod-docs-content-type: PROCEDURE
8+
9+
[id="analyzing-loki-query-performance_{context}"]
10+
= Analyzing Loki query performance
11+
12+
Every query and subquery in Loki generates a `metrics.go` log line with performance statistics. Subqueries emit the log line in the queriers.
13+
Every query has an associated single summary `metrics.go` line emitted by the query-front end.
14+
Use these statistics to calculate the query performance metrics.
15+
16+
.Prerequisites
17+
* You have administrator permissions.
18+
* You have access to the {ocp-product-title} web console.
19+
* You installed and configured {loki-op}.
20+
21+
.Procedure
22+
. In the {ocp-product-title} web console, navigate to the *Metrics* -> *Observe* tab.
23+
24+
. Note the following values:
25+
26+
* *duration*: Denotes the amount of time a query took to run.
27+
* *queue_time*: Denotes the time a query spent in the queue before being processed.
28+
* *chunk_refs_fetch_time*: Denotes the amount of time spent in getting chunk information from the index.
29+
* *store_chunks_download_time*: Denotes the amount of time in getting chunks from cache or storage.
30+
31+
. Calculate the following performance metrics:
32+
33+
** total query time as `total_duration`:
34+
+
35+
[subs=+quotes]
36+
----
37+
total_duration = *duration* + *queue_time*
38+
----
39+
40+
** Percentage of the total duration that a query spent in the queue as `Queue Time`:
41+
+
42+
[subs=+quotes]
43+
----
44+
Queue Time = *queue_time* / total_duration * 100
45+
----
46+
47+
** Calculate the percentage of the total duration that was spent in getting chunk information from the index as `Chunk Refs Fetch Time`:
48+
+
49+
[subs=+quotes]
50+
----
51+
Chunk Refs Fetch Time = *chunk_refs_fetch_time* / total_duration * 100
52+
----
53+
54+
** Calculate the percentage of the total duration that was spent in getting chunks from cache or storage:
55+
+
56+
[subs=+quotes]
57+
----
58+
Chunks Download Time = *store_chunks_download_time* / total_duration * 100
59+
----
60+
61+
** Calculate the percentage of the total duration that was spent in executing the query:
62+
+
63+
[subs=+quotes]
64+
----
65+
Execution Time = (*duration* - *chunk_refs_fetch_time* - *store_chunks_download_time*) / total_duration * 100
66+
----
67+
68+
. Refer to https://docs.redhat.com/en/documentation/red_hat_openshift_logging/latest/html/about_openshift_logging/index/analyze-query-performance_loki-query-performance-troubleshooting[Query performance analysis] to understand the reason for each metric and how each metric affects query performance.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * configuring/loki-query-performance-troubleshooting.adoc
4+
5+
:_newdoc-version: 2.18.4
6+
:_template-generated: 2025-09-25
7+
:_mod-docs-content-type: CONCEPT
8+
9+
[id="best-practices-for-loki-labels_{context}"]
10+
= Best practices for Loki labels
11+
12+
Labels in Loki are the keyspace on which Loki shards incoming data. They are also the index used for finding logs at query-time. You can optimize query performance by properly using labels.
13+
14+
Consider the following criteria when creating labels:
15+
16+
* Labels should describe infrastructure. This could include regions, clusters, servers, applications, namespaces, or environments.
17+
18+
* Labels are long-lived. Label values should generate logs perpetually, or at least for several hours.
19+
20+
* Labels are intuitive for querying.
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * configuring/loki-query-performance-troubleshooting.adoc
4+
5+
6+
:_newdoc-version: 2.18.4
7+
:_template-generated: 2025-09-25
8+
:_mod-docs-content-type: CONCEPT
9+
10+
[id="best-practices-for-loki-query-performance_{context}"]
11+
= Best practices for Loki query performance
12+
13+
You can take the following steps to improve Loki query performance:
14+
15+
* Ensure that you are running the latest version of the {loki-op}.
16+
17+
* Ensure that you have migrated LokiStack schema to the `v13` version.
18+
19+
* Ensure that you use reliable and fast object storage. Loki places significant demands on object storage.
20+
If you are not using an object store solution from a cloud provider, use solid-state drive (SSD) for your object storage.
21+
By using SSDs you can benefit from the high parallelization capabilities of Loki.
22+
+
23+
To better understand the utilization of object storage by Loki, you can use the following query in the *Metrics* dashboard in the {ocp-product-title} web console:
24+
+
25+
[source]
26+
----
27+
sum by(status, container, operation) (label_replace(rate(loki_s3_request_duration_seconds_count{namespace="openshift-logging"}[5m]), "status", "${1}xx", "status_code", "([0-9]).."))
28+
----
29+
30+
* {loki-op} enables automatic stream sharding by default. The default automatic stream sharding mechanism should be adequate in most cases and users should not need to configure `perStream*` attributes.
31+
32+
* If you use the OpenTelemetry Protocol (OTLP) data model, you can configure additional stream labels in LokiStack. For more information, see link:https://docs.redhat.com/en/documentation/red_hat_openshift_logging/latest/html/configuring/configuring-the-log-store#best-practices-for-loki-labels_loki-query-performance-troubleshooting[Best practices for Loki labels].
33+
34+
* Different types of queries have different performance characteristics. Use simple filter queries instead of regular expressions for better performance.
35+
36+
[role="_additional-resources"]
37+
.Additional resources
38+
* link:https://docs.redhat.com/en/documentation/red_hat_openshift_logging/latest/html/about_openshift_logging/index/analyzing-loki-query-performance_loki-query-performance-troubleshooting[Analyzing Loki query performance]
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * configuring/loki-query-performance-troubleshooting.adoc
4+
5+
:_newdoc-version: 2.18.4
6+
:_template-generated: 2025-09-25
7+
:_mod-docs-content-type: CONCEPT
8+
9+
[id="configuration-of-stream-labels-in-loki-operator_{context}"]
10+
= Configuration of stream labels in {loki-op}
11+
12+
Configuring which labels the {loki-op} will use as stream labels depends on the data model you are using: ViaQ or OpenTelemetry Protocol (OTLP).
13+
14+
Both models come with a predefined set of stream labels, for more information, see link:https://docs.redhat.com/en/documentation/red_hat_openshift_logging/latest/html/configuring_logging/opentelemetry-data-model[OpenTelemetry data model].
15+
16+
ViaQ model::
17+
ViaQ does not support structured metadata.
18+
To configure stream labels for the ViaQ model, add the configuration in the `ClusterLogForwarder` resource. For example:
19+
+
20+
[source,yaml]
21+
----
22+
apiVersion: observability.openshift.io/v1
23+
kind: ClusterLogForwarder
24+
metadata:
25+
name: instance
26+
namespace: openshift-logging
27+
spec:
28+
serviceAccount:
29+
name: logging-collector
30+
outputs:
31+
- name: lokistack-out
32+
type: lokiStack
33+
lokiStack:
34+
target:
35+
name: logging-loki
36+
namespace: openshift-logging
37+
labelKeys:
38+
application:
39+
ignoreGlobal: <true_or_false>
40+
labelKeys: []
41+
audit:
42+
ignoreGlobal: <true_or_false>
43+
labelKeys: []
44+
infrastructure:
45+
ignoreGlobal: <true_or_false>
46+
labelKeys: []
47+
global: []
48+
----
49+
+
50+
`lokiStack.labelKeys` field contains the configuration that maps log record keys to Loki labels used to identify streams.
51+
52+
OTLP model::
53+
In the OTLP model all labels that are not specified as stream labels are attached as structured metadata.
54+
55+
The following are the best practices for creating stream labels:
56+
57+
* have a low cardinality with at most tens of values.
58+
* The values are long lived. For example, the first level of an HTTP path: `/load`, `/save`, and `/update`.
59+
* The labels can be used in queries to improve query performance.
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * configuring/loki-query-performance-troubleshooting.adoc
4+
5+
:_newdoc-version: 2.18.4
6+
:_template-generated: 2025-09-22
7+
:_mod-docs-content-type: CONCEPT
8+
9+
[id="query-performance-analysis_{context}"]
10+
= Query performance analysis
11+
12+
For best query performance, you want to see as much time as possible spent in execution time, denoted by the `Execution Time` metric.
13+
See the table below for the reason other performance metrics might be higher and the steps you can take to improve them.
14+
You can also reduce the execution time by modifying your queries, thereby improving the overall performance.
15+
16+
[options="header",cols="2,5,5"]
17+
|====
18+
|Issue
19+
|Reason
20+
|Fix
21+
.2+|High `Execution Time`
22+
|Queries might be doing many CPU-intensive operations such as regular expression processing.
23+
24+
a| You can make the following changes:
25+
26+
* Change your queries to reduce or remove regular expressions.
27+
* Add more CPU resources.
28+
29+
|Your queries have many small log lines.
30+
31+
|If your queries have many small lines, execution becomes dependent on how fast Loki can iterate the lines themselves. This becomes a CPU clock frequency bottleneck. To make things faster you need a faster CPU.
32+
33+
34+
|High `Queue Time`
35+
|You do not have enough queriers running.
36+
|The only fix is to increase the number of queriers replicas in the `LokiStack` spec.
37+
38+
|High `Chunk Refs Fetch Time`
39+
|Insufficient number of index-gateway replicas in the `LokiStack` spec.
40+
|Increase the number of index-gateway replicas or ensure they have enough CPU resources.
41+
42+
|High `Chunks Download Time`
43+
|The chunks might be too small
44+
|Check the average chunk size by dividing `total_bytes` value by `cache_chunk_req` value. The average represents the average uncompressed bytes per chunk. The value for best performance should be in the order of magnitude of megabytes. If the chunks are only a few hundred bytes or kilobytes in size, revisit labels to ensure that you are not splitting your data into very small chunks.
45+
46+
|Query timing out
47+
|Query timeout value might be too low
48+
|Increase the `queryTimeout` value in the LokiStack spec.
49+
|====

0 commit comments

Comments
 (0)