Skip to content

Commit 353f16a

Browse files
committed
Merge: perf: arm_cspmu: nvidia: update event list and filter
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6836 JIRA: https://issues.redhat.com/browse/RHEL-89488 This patchset enables NVLINK-C2C port filtering, as requested by the customer. Apart from that, there is a fix of the filter, a docs fix and some events that cannot be used, are removed. Signed-off-by: Michael Petlan <mpetlan@redhat.com> Approved-by: Mark Langsdorf <mlangsdo@redhat.com> Approved-by: ashelat <ashelat@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Augusto Caringi <acaringi@redhat.com>
2 parents 2440740 + 122b16e commit 353f16a

File tree

2 files changed

+50
-77
lines changed

2 files changed

+50
-77
lines changed

Documentation/admin-guide/perf/nvidia-pmu.rst

Lines changed: 43 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ strongly-ordered (SO) PCIE write traffic to local/remote memory. Please see
3434
traffic coverage.
3535

3636
The events and configuration options of this PMU device are described in sysfs,
37-
see /sys/bus/event_sources/devices/nvidia_scf_pmu_<socket-id>.
37+
see /sys/bus/event_source/devices/nvidia_scf_pmu_<socket-id>.
3838

3939
Example usage:
4040

@@ -66,7 +66,7 @@ Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about
6666
the PMU traffic coverage.
6767

6868
The events and configuration options of this PMU device are described in sysfs,
69-
see /sys/bus/event_sources/devices/nvidia_nvlink_c2c0_pmu_<socket-id>.
69+
see /sys/bus/event_source/devices/nvidia_nvlink_c2c0_pmu_<socket-id>.
7070

7171
Example usage:
7272

@@ -86,6 +86,22 @@ Example usage:
8686

8787
perf stat -a -e nvidia_nvlink_c2c0_pmu_3/event=0x0/
8888

89+
The NVLink-C2C has two ports that can be connected to one GPU (occupying both
90+
ports) or to two GPUs (one GPU per port). The user can use "port" bitmap
91+
parameter to select the port(s) to monitor. Each bit represents the port number,
92+
e.g. "port=0x1" corresponds to port 0 and "port=0x3" is for port 0 and 1. The
93+
PMU will monitor both ports by default if not specified.
94+
95+
Example for port filtering:
96+
97+
* Count event id 0x0 from the GPU connected with socket 0 on port 0::
98+
99+
perf stat -a -e nvidia_nvlink_c2c0_pmu_0/event=0x0,port=0x1/
100+
101+
* Count event id 0x0 from the GPUs connected with socket 0 on port 0 and port 1::
102+
103+
perf stat -a -e nvidia_nvlink_c2c0_pmu_0/event=0x0,port=0x3/
104+
89105
NVLink-C2C1 PMU
90106
-------------------
91107

@@ -96,7 +112,7 @@ Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about
96112
the PMU traffic coverage.
97113

98114
The events and configuration options of this PMU device are described in sysfs,
99-
see /sys/bus/event_sources/devices/nvidia_nvlink_c2c1_pmu_<socket-id>.
115+
see /sys/bus/event_source/devices/nvidia_nvlink_c2c1_pmu_<socket-id>.
100116

101117
Example usage:
102118

@@ -116,6 +132,22 @@ Example usage:
116132

117133
perf stat -a -e nvidia_nvlink_c2c1_pmu_3/event=0x0/
118134

135+
The NVLink-C2C has two ports that can be connected to one GPU (occupying both
136+
ports) or to two GPUs (one GPU per port). The user can use "port" bitmap
137+
parameter to select the port(s) to monitor. Each bit represents the port number,
138+
e.g. "port=0x1" corresponds to port 0 and "port=0x3" is for port 0 and 1. The
139+
PMU will monitor both ports by default if not specified.
140+
141+
Example for port filtering:
142+
143+
* Count event id 0x0 from the GPU connected with socket 0 on port 0::
144+
145+
perf stat -a -e nvidia_nvlink_c2c1_pmu_0/event=0x0,port=0x1/
146+
147+
* Count event id 0x0 from the GPUs connected with socket 0 on port 0 and port 1::
148+
149+
perf stat -a -e nvidia_nvlink_c2c1_pmu_0/event=0x0,port=0x3/
150+
119151
CNVLink PMU
120152
---------------
121153

@@ -125,13 +157,14 @@ to local memory. For PCIE traffic, this PMU captures read and relaxed ordered
125157
for more info about the PMU traffic coverage.
126158

127159
The events and configuration options of this PMU device are described in sysfs,
128-
see /sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>.
160+
see /sys/bus/event_source/devices/nvidia_cnvlink_pmu_<socket-id>.
129161

130162
Each SoC socket can be connected to one or more sockets via CNVLink. The user can
131163
use "rem_socket" bitmap parameter to select the remote socket(s) to monitor.
132164
Each bit represents the socket number, e.g. "rem_socket=0xE" corresponds to
133-
socket 1 to 3.
134-
/sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>/format/rem_socket
165+
socket 1 to 3. The PMU will monitor all remote sockets by default if not
166+
specified.
167+
/sys/bus/event_source/devices/nvidia_cnvlink_pmu_<socket-id>/format/rem_socket
135168
shows the valid bits that can be set in the "rem_socket" parameter.
136169

137170
The PMU can not distinguish the remote traffic initiator, therefore it does not
@@ -165,12 +198,13 @@ local/remote memory. Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section
165198
for more info about the PMU traffic coverage.
166199

167200
The events and configuration options of this PMU device are described in sysfs,
168-
see /sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>.
201+
see /sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>.
169202

170203
Each SoC socket can support multiple root ports. The user can use
171204
"root_port" bitmap parameter to select the port(s) to monitor, i.e.
172-
"root_port=0xF" corresponds to root port 0 to 3.
173-
/sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>/format/root_port
205+
"root_port=0xF" corresponds to root port 0 to 3. The PMU will monitor all root
206+
ports by default if not specified.
207+
/sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>/format/root_port
174208
shows the valid bits that can be set in the "root_port" parameter.
175209

176210
Example usage:

drivers/perf/arm_cspmu/nvidia_cspmu.c

Lines changed: 7 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -54,65 +54,24 @@ static struct attribute *scf_pmu_event_attrs[] = {
5454
ARM_CSPMU_EVENT_ATTR(scf_cache_wb, 0xF3),
5555

5656
NV_CSPMU_EVENT_ATTR_4(socket, rd_data, 0x101),
57-
NV_CSPMU_EVENT_ATTR_4(socket, dl_rsp, 0x105),
5857
NV_CSPMU_EVENT_ATTR_4(socket, wb_data, 0x109),
59-
NV_CSPMU_EVENT_ATTR_4(socket, ev_rsp, 0x10d),
60-
NV_CSPMU_EVENT_ATTR_4(socket, prb_data, 0x111),
6158

6259
NV_CSPMU_EVENT_ATTR_4(socket, rd_outstanding, 0x115),
63-
NV_CSPMU_EVENT_ATTR_4(socket, dl_outstanding, 0x119),
64-
NV_CSPMU_EVENT_ATTR_4(socket, wb_outstanding, 0x11d),
65-
NV_CSPMU_EVENT_ATTR_4(socket, wr_outstanding, 0x121),
66-
NV_CSPMU_EVENT_ATTR_4(socket, ev_outstanding, 0x125),
67-
NV_CSPMU_EVENT_ATTR_4(socket, prb_outstanding, 0x129),
6860

6961
NV_CSPMU_EVENT_ATTR_4(socket, rd_access, 0x12d),
70-
NV_CSPMU_EVENT_ATTR_4(socket, dl_access, 0x131),
7162
NV_CSPMU_EVENT_ATTR_4(socket, wb_access, 0x135),
7263
NV_CSPMU_EVENT_ATTR_4(socket, wr_access, 0x139),
73-
NV_CSPMU_EVENT_ATTR_4(socket, ev_access, 0x13d),
74-
NV_CSPMU_EVENT_ATTR_4(socket, prb_access, 0x141),
75-
76-
NV_CSPMU_EVENT_ATTR_4(ocu, gmem_rd_data, 0x145),
77-
NV_CSPMU_EVENT_ATTR_4(ocu, gmem_rd_access, 0x149),
78-
NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wb_access, 0x14d),
79-
NV_CSPMU_EVENT_ATTR_4(ocu, gmem_rd_outstanding, 0x151),
80-
NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wr_outstanding, 0x155),
81-
82-
NV_CSPMU_EVENT_ATTR_4(ocu, rem_rd_data, 0x159),
83-
NV_CSPMU_EVENT_ATTR_4(ocu, rem_rd_access, 0x15d),
84-
NV_CSPMU_EVENT_ATTR_4(ocu, rem_wb_access, 0x161),
85-
NV_CSPMU_EVENT_ATTR_4(ocu, rem_rd_outstanding, 0x165),
86-
NV_CSPMU_EVENT_ATTR_4(ocu, rem_wr_outstanding, 0x169),
8764

8865
ARM_CSPMU_EVENT_ATTR(gmem_rd_data, 0x16d),
8966
ARM_CSPMU_EVENT_ATTR(gmem_rd_access, 0x16e),
9067
ARM_CSPMU_EVENT_ATTR(gmem_rd_outstanding, 0x16f),
91-
ARM_CSPMU_EVENT_ATTR(gmem_dl_rsp, 0x170),
92-
ARM_CSPMU_EVENT_ATTR(gmem_dl_access, 0x171),
93-
ARM_CSPMU_EVENT_ATTR(gmem_dl_outstanding, 0x172),
9468
ARM_CSPMU_EVENT_ATTR(gmem_wb_data, 0x173),
9569
ARM_CSPMU_EVENT_ATTR(gmem_wb_access, 0x174),
96-
ARM_CSPMU_EVENT_ATTR(gmem_wb_outstanding, 0x175),
97-
ARM_CSPMU_EVENT_ATTR(gmem_ev_rsp, 0x176),
98-
ARM_CSPMU_EVENT_ATTR(gmem_ev_access, 0x177),
99-
ARM_CSPMU_EVENT_ATTR(gmem_ev_outstanding, 0x178),
10070
ARM_CSPMU_EVENT_ATTR(gmem_wr_data, 0x179),
101-
ARM_CSPMU_EVENT_ATTR(gmem_wr_outstanding, 0x17a),
10271
ARM_CSPMU_EVENT_ATTR(gmem_wr_access, 0x17b),
10372

10473
NV_CSPMU_EVENT_ATTR_4(socket, wr_data, 0x17c),
10574

106-
NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wr_data, 0x180),
107-
NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wb_data, 0x184),
108-
NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wr_access, 0x188),
109-
NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wb_outstanding, 0x18c),
110-
111-
NV_CSPMU_EVENT_ATTR_4(ocu, rem_wr_data, 0x190),
112-
NV_CSPMU_EVENT_ATTR_4(ocu, rem_wb_data, 0x194),
113-
NV_CSPMU_EVENT_ATTR_4(ocu, rem_wr_access, 0x198),
114-
NV_CSPMU_EVENT_ATTR_4(ocu, rem_wb_outstanding, 0x19c),
115-
11675
ARM_CSPMU_EVENT_ATTR(gmem_wr_total_bytes, 0x1a0),
11776
ARM_CSPMU_EVENT_ATTR(remote_socket_wr_total_bytes, 0x1a1),
11877
ARM_CSPMU_EVENT_ATTR(remote_socket_rd_data, 0x1a2),
@@ -122,35 +81,12 @@ static struct attribute *scf_pmu_event_attrs[] = {
12281
ARM_CSPMU_EVENT_ATTR(cmem_rd_data, 0x1a5),
12382
ARM_CSPMU_EVENT_ATTR(cmem_rd_access, 0x1a6),
12483
ARM_CSPMU_EVENT_ATTR(cmem_rd_outstanding, 0x1a7),
125-
ARM_CSPMU_EVENT_ATTR(cmem_dl_rsp, 0x1a8),
126-
ARM_CSPMU_EVENT_ATTR(cmem_dl_access, 0x1a9),
127-
ARM_CSPMU_EVENT_ATTR(cmem_dl_outstanding, 0x1aa),
12884
ARM_CSPMU_EVENT_ATTR(cmem_wb_data, 0x1ab),
12985
ARM_CSPMU_EVENT_ATTR(cmem_wb_access, 0x1ac),
130-
ARM_CSPMU_EVENT_ATTR(cmem_wb_outstanding, 0x1ad),
131-
ARM_CSPMU_EVENT_ATTR(cmem_ev_rsp, 0x1ae),
132-
ARM_CSPMU_EVENT_ATTR(cmem_ev_access, 0x1af),
133-
ARM_CSPMU_EVENT_ATTR(cmem_ev_outstanding, 0x1b0),
13486
ARM_CSPMU_EVENT_ATTR(cmem_wr_data, 0x1b1),
135-
ARM_CSPMU_EVENT_ATTR(cmem_wr_outstanding, 0x1b2),
136-
137-
NV_CSPMU_EVENT_ATTR_4(ocu, cmem_rd_data, 0x1b3),
138-
NV_CSPMU_EVENT_ATTR_4(ocu, cmem_rd_access, 0x1b7),
139-
NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wb_access, 0x1bb),
140-
NV_CSPMU_EVENT_ATTR_4(ocu, cmem_rd_outstanding, 0x1bf),
141-
NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wr_outstanding, 0x1c3),
142-
143-
ARM_CSPMU_EVENT_ATTR(ocu_prb_access, 0x1c7),
144-
ARM_CSPMU_EVENT_ATTR(ocu_prb_data, 0x1c8),
145-
ARM_CSPMU_EVENT_ATTR(ocu_prb_outstanding, 0x1c9),
14687

14788
ARM_CSPMU_EVENT_ATTR(cmem_wr_access, 0x1ca),
14889

149-
NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wr_access, 0x1cb),
150-
NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wb_data, 0x1cf),
151-
NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wr_data, 0x1d3),
152-
NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wb_outstanding, 0x1d7),
153-
15490
ARM_CSPMU_EVENT_ATTR(cmem_wr_total_bytes, 0x1db),
15591

15692
ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
@@ -194,6 +130,7 @@ static struct attribute *pcie_pmu_format_attrs[] = {
194130

195131
static struct attribute *nvlink_c2c_pmu_format_attrs[] = {
196132
ARM_CSPMU_FORMAT_EVENT_ATTR,
133+
ARM_CSPMU_FORMAT_ATTR(port, "config1:0-1"),
197134
NULL,
198135
};
199136

@@ -238,10 +175,12 @@ static u32 nv_cspmu_event_filter(const struct perf_event *event)
238175
const struct nv_cspmu_ctx *ctx =
239176
to_nv_cspmu_ctx(to_arm_cspmu(event->pmu));
240177

241-
if (ctx->filter_mask == 0)
178+
const u32 filter_val = event->attr.config1 & ctx->filter_mask;
179+
180+
if (filter_val == 0)
242181
return ctx->filter_default_val;
243182

244-
return event->attr.config1 & ctx->filter_mask;
183+
return filter_val;
245184
}
246185

247186
enum nv_cspmu_name_fmt {
@@ -274,7 +213,7 @@ static const struct nv_cspmu_match nv_cspmu_match[] = {
274213
{
275214
.prodid = 0x104,
276215
.prodid_mask = NV_PRODID_MASK,
277-
.filter_mask = 0x0,
216+
.filter_mask = NV_NVL_C2C_FILTER_ID_MASK,
278217
.filter_default_val = NV_NVL_C2C_FILTER_ID_MASK,
279218
.name_pattern = "nvidia_nvlink_c2c1_pmu_%u",
280219
.name_fmt = NAME_FMT_SOCKET,
@@ -284,7 +223,7 @@ static const struct nv_cspmu_match nv_cspmu_match[] = {
284223
{
285224
.prodid = 0x105,
286225
.prodid_mask = NV_PRODID_MASK,
287-
.filter_mask = 0x0,
226+
.filter_mask = NV_NVL_C2C_FILTER_ID_MASK,
288227
.filter_default_val = NV_NVL_C2C_FILTER_ID_MASK,
289228
.name_pattern = "nvidia_nvlink_c2c0_pmu_%u",
290229
.name_fmt = NAME_FMT_SOCKET,

0 commit comments

Comments
 (0)