Skip to content

Commit b5de45a

Browse files
Perf Event Array map user-mode API implementation with tests (#4302)
* Perf Event Array map user-mode API implementation with tests * CR Feedback * Refactored and Unified the Ring Buffer and perf event array user mode API. Incorporated test fixes from Michael. * Fixed build and Test issues * Review Feedback Changes * Fixed the AV in Perf_Buffer_free * Used Conditional Variables to wait for callbacks to complete * Added support for Using multiple subscription rings * CR Comments * Rename `_ebpf_map_subscription_ring` to `_ebpf_map_async_query_context` For better readability, renaming `_ebpf_map_subscription_ring` to `_ebpf_map_async_query_context`. * More CR comments * Added Check to make sure ring_buffer_new does not pass more than 1 CPU ID * More CR Feedback --------- Co-authored-by: Shankar Seal <74580197+shankarseal@users.noreply.github.com>
1 parent 6a3480d commit b5de45a

20 files changed

+1961
-788
lines changed

docs/PerfEventArray.md

Lines changed: 11 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -25,31 +25,26 @@ There are 3 primary differences between ring buffer maps and perf event arrays:
2525
- `perf_event_output` takes the bpf context as an argument and the helper implementation copies the payload.
2626
- The payload is whatever the data pointer of the program context points to (e.g. packet data including headers). The payload does not include the bpf program context structure itself.
2727

28-
The main motivation for this proposal is to efficiently support payload capture from the context in bpf programs.
28+
The main motivation for this implementation is to efficiently support payload capture from the context in bpf programs.
2929
- Supporting ring buffer reserve and submit in ebpf-for-windows is currently blocked on verifier support [#273](https://github.com/vbpf/ebpf-verifier/issues/273).
3030
- Without reserve+submit, using `ringbuf_output` for payload capture requires using a per-CPU array as scratch space to append the payload to the event before calling ringbuf_output.
3131
- The CTXLEN field in the flags of `perf_event_output` tells the kernel to append bytes from the payload to the record, avoiding the extra copy.
3232
- On Linux this works for specific program types, on Windows this will work for any program type with a data pointer in the context.
3333

3434

35-
## Proposal
35+
The implementation behaviour matches Linux, but currently only supports user-space consumers and bpf-program producers with a subset of the features.
3636

37-
The proposed behaviour matches Linux, but currently only supports user-space consumers and bpf-program producers with a subset of the features.
37+
The perf buffers are implemented using the existing per-CPU and ring buffer map support in ebpf-for-windows.
3838

39-
The plan is to implement perf buffers using the existing per-CPU and ring buffer map support in ebpf-for-windows.
40-
41-
To match Linux behaviour, by default the callback will only be called inside calls to `perf_buffer__poll()`.
42-
If the PERFBUF_FLAG_AUTO_CALLBACK flag is set, the callback will be automatically invoked when there is data available.
43-
44-
1. Implement a new map type `BPF_MAP_TYPE_PERF_EVENT_ARRAY`.
39+
1. Implements a new map type `BPF_MAP_TYPE_PERF_EVENT_ARRAY`.
4540
1. Linux-compatible default behaviour.
4641
- With Linux-compatible behaviour and bpf interfaces, additional features from Linux should be possible to add in the future.
4742
2. Only support the perf ringbuffer (not other Linux perf features).
4843
- Only support bpf program producers with a single user-space consumer per event array.
4944
- Features not supported include perf counters, hardware-generated perf events,
5045
attaching bpf programs to perf events, and sending events from user-space to bpf programs.
5146
3. In addition to the Linux behaviour, automatically invoke the callback if the auto callback flag is set.
52-
2. Implement `perf_event_output` bpf helper function.
47+
2. Implements `perf_event_output` bpf helper function.
5348
1. Only support writing to the current CPU (matches current Linux restrictions).
5449
- Specify current CPU in flags using BPF_F_INDEX_MASK or pass BPF_F_CURRENT_CPU.
5550
2. Support BPF_F_CTXLEN_MASK flags for any bpf program types with a data pointer in the context.
@@ -58,15 +53,10 @@ If the PERFBUF_FLAG_AUTO_CALLBACK flag is set, the callback will be automaticall
5853
- The extension-provided ebpf_context_descriptor_t includes the offset of the data pointer.
5954
- Passing a non-zero value in BPF_F_CTXLEN_MASK will return an operation not supported error for program types
6055
without a data pointer in the context.
61-
2. Implement libbpf support for perf event arrays.
56+
2. Implements libbpf support for perf event arrays.
6257
1. `perf_buffer__new` - Create a new perfbuf manager (attaches callback).
6358
- Attaches to all CPUs automatically.
64-
2. `perf_buffer__new_raw` - Not supported initially (can be future work).
65-
- This function gives extra control over the perfbuf manager creation (e.g. which CPUs to attach).
6659
2. `perf_buffer__free` - Free perfbuf manager (detaches callback).
67-
3. `perf_buffer__poll` - Wait the buffer to be non-empty (or timeout), then invoke callback for each ready record.
68-
- By default (without `PERFBUF_FLAG_AUTO_CALLBACK`), the callback will not be called except inside poll() calls.
69-
- poll() should not be called if the auto callback flag is set.
7060

7161
## bpf helpers
7262
```c
@@ -102,48 +92,26 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt);
10292
// Perf buffer manager options.
10393
struct perf_buffer_opts {
10494
size_t sz;
105-
uint64_t flags;
106-
};
107-
#define perf_buffer_opts__last_field flags
108-
109-
// Flags for configuring perf buffer manager.
110-
enum perf_buffer_flags {
111-
PERFBUF_FLAG_AUTO_CALLBACK = (uint64_t)1 << 0 /* Automatically invoke callback for each record */
11295
};
96+
#define perf_buffer_opts__last_field sz
11397
11498
/**
115-
* @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified
99+
* @brief **perf_buffer__new()** creates BPF perfbuffer manager for a specified
116100
* BPF_PERF_EVENT_ARRAY map
117101
* @param map_fd FD of BPF_PERF_EVENT_ARRAY BPF map that will be used by BPF
118102
* code to send data over to user-space
119-
* @param page_cnt number of memory pages allocated for each per-CPU buffer
103+
* @param page_cnt number of memory pages allocated for each per-CPU buffer. Should be set to 0.
120104
* @param sample_cb function called on each received data record
121105
* @param lost_cb function called when record loss has occurred
122106
* @param ctx user-provided extra context passed into *sample_cb* and *lost_cb*
123-
* @return a new instance of struct perf_buffer on success, NULL on error with
124-
* *errno* containing an error code
107+
* @param opts perfbuffer manager options. Not supported currently. Should be null.
108+
* @return a new instance of struct perf_buffer on success, NULL on error.
125109
*/
126110
LIBBPF_API struct perf_buffer *
127111
perf_buffer__new(int map_fd, size_t page_cnt,
128112
perf_buffer_sample_fn sample_cb, perf_buffer_lost_fn lost_cb, void *ctx,
129113
const struct perf_buffer_opts *opts);
130114
131-
/**
132-
* @brief poll perfbuf for new data
133-
* Poll for available data and consume records, if any are available.
134-
*
135-
* Must be called to receive callbacks by default (without auto callbacks).
136-
* NOT supported when PERFBUF_FLAG_AUTO_CALLBACK is set.
137-
*
138-
* If timeout_ms is zero, poll will not wait but only invoke the callback on records that are ready.
139-
* If timeout_ms is -1, poll will wait until data is ready (no timeout).
140-
*
141-
* @param[in] pb Pointer to perf buffer manager.
142-
* @param[in] timeout_ms maximum time to wait for (in milliseconds).
143-
*
144-
* @returns number of records consumed, INT_MAX, or a negative number on error
145-
*/
146-
int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms);
147115
/**
148116
* @brief Frees a perf buffer manager.
149117
*

ebpfapi/Source.def

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,7 @@ EXPORTS
133133
ebpf_object_load_native_by_fds
134134
ebpf_object_set_execution_type
135135
ebpf_object_unpin
136+
ebpf_perf_event_array_map_write
136137
ebpf_program_attach
137138
ebpf_program_attach_by_fd
138139
ebpf_program_attach_by_fds
@@ -151,5 +152,7 @@ EXPORTS
151152
libbpf_num_possible_cpus
152153
libbpf_prog_type_by_name
153154
libbpf_strerror
155+
perf_buffer__free
156+
perf_buffer__new
154157
ring_buffer__free
155158
ring_buffer__new

include/bpf/libbpf.h

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -943,6 +943,34 @@ bpf_program__flags(const struct bpf_program* prog);
943943
int
944944
bpf_program__set_flags(struct bpf_program* prog, __u32 flags);
945945

946+
/**
947+
* @brief Create BPF perfbuf manager.
948+
*
949+
* @param[in] map_fd File descriptor to perf event array map.
950+
* @param[in] page_cnt Number of memory pages allocated for each per-CPU buffer. This should be set to 0.
951+
* @param[in] sample_cb Pointer to perf buffer notification callback function.
952+
* @param[in] lost_cb Function pointer for callback when record loss has occurred.
953+
* @param[in] ctx User provided extra context passed into sample_cb and lost_cb.
954+
* @param[in] opts The perf buffer manager options. This should be set to NULL.
955+
* @return Pointer to perf buffer manager.
956+
*/
957+
LIBBPF_API struct perf_buffer*
958+
perf_buffer__new(
959+
int map_fd,
960+
size_t page_cnt,
961+
perf_buffer_sample_fn sample_cb,
962+
perf_buffer_lost_fn lost_cb,
963+
void* ctx,
964+
const struct perf_buffer_opts* opts);
965+
966+
/**
967+
* @brief Free a perf buffer manager.
968+
*
969+
* @param[in] rb Pointer to perf buffer manager to be freed.
970+
*/
971+
LIBBPF_API void
972+
perf_buffer__free(struct perf_buffer* pb);
973+
946974
#else
947975
#pragma warning(push)
948976
#pragma warning(disable : 4200) // Zero-sized array in struct/union

libs/api/api_internal.h

Lines changed: 18 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,7 @@
1313

1414
struct bpf_object;
1515

16-
typedef struct _ebpf_ring_buffer_subscription ring_buffer_subscription_t;
17-
typedef struct _ebpf_perf_event_array_subscription perf_event_array_subscription_t;
16+
typedef struct _ebpf_map_subscription ebpf_map_subscription_t;
1817

1918
typedef struct bpf_program
2019
{
@@ -607,64 +606,38 @@ ebpf_object_load(_Inout_ struct bpf_object* object) noexcept;
607606
EBPF_API_LOCKING _Must_inspect_result_ ebpf_result_t
608607
ebpf_object_unload(_Inout_ struct bpf_object* object) noexcept;
609608

610-
typedef int (*ring_buffer_sample_fn)(void* ctx, void* data, size_t size);
611-
612609
/**
613-
* @brief Subscribe for notifications from the input ring buffer map.
610+
* @brief Subscribe for notifications from the input perf event array or a ring buffer map.
614611
*
615-
* @param[in] ring_buffer_map_fd File descriptor to the ring buffer map.
616-
* @param[in, out] sample_callback_context Pointer to supplied context to be passed in notification callback.
617-
* @param[in] sample_callback Function pointer to notification handler.
618-
* @param[out] subscription Opaque pointer to ring buffer subscription object.
619-
*
620-
* @retval EBPF_SUCCESS The operation was successful.
621-
* @retval EBPF_NO_MEMORY Out of memory.
622-
*/
623-
_Must_inspect_result_ ebpf_result_t
624-
ebpf_ring_buffer_map_subscribe(
625-
fd_t ring_buffer_map_fd,
626-
_Inout_opt_ void* sample_callback_context,
627-
ring_buffer_sample_fn sample_callback,
628-
_Outptr_ ring_buffer_subscription_t** subscription) noexcept;
629-
630-
/**
631-
* @brief Unsubscribe from the ring buffer map event notifications.
632-
*
633-
* @param[in] subscription Pointer to ring buffer subscription to be canceled.
634-
*/
635-
bool
636-
ebpf_ring_buffer_map_unsubscribe(_In_ _Post_invalid_ ring_buffer_subscription_t* subscription) noexcept;
637-
638-
typedef void (*perf_buffer_sample_fn)(void* ctx, int cpu, void* data, uint32_t size);
639-
typedef void (*perf_buffer_lost_fn)(void* ctx, int cpu, uint64_t cnt);
640-
641-
/**
642-
* @brief Subscribe for notifications from the input perf event array map.
643-
*
644-
* @param[in] perf_event_array_map_fd File descriptor to the perf event array map.
645-
* @param[in, out] callback_context Pointer to supplied context to be passed in notification callback.
612+
* @param[in] map_fd File descriptor to the perf event array or a ring buffer map.
613+
* @param[in] cpu_ids The CPU Ids corresponding to this subscription. For a ring buffer map this is a single value with
614+
* Id 0.
615+
* @param[in] cpu_id_count The count of the elements in the cpu_ids parameter.
616+
* @param[in] callback_context Pointer to supplied context to be passed in notification callback.
646617
* @param[in] sample_callback Function pointer to notification handler.
647618
* @param[in] lost_callback Function pointer to lost record notification handler.
648-
* @param[out] subscription Opaque pointer to perf event array subscription object.
619+
* @param[out] subscription Opaque pointer to the subscription object.
649620
*
650621
* @retval EBPF_SUCCESS The operation was successful.
651622
* @retval EBPF_NO_MEMORY Out of memory.
652623
*/
653624
_Must_inspect_result_ ebpf_result_t
654-
ebpf_perf_event_array_map_subscribe(
655-
fd_t perf_event_array_map_fd,
625+
ebpf_map_subscribe(
626+
fd_t map_fd,
627+
_In_reads_(cpu_id_count) uint32_t* cpu_ids,
628+
_In_ size_t cpu_id_count,
656629
_Inout_opt_ void* callback_context,
657-
perf_buffer_sample_fn sample_callback,
658-
perf_buffer_lost_fn lost_callback,
659-
_Outptr_ perf_event_array_subscription_t** subscription) noexcept;
630+
_In_ const void* sample_callback,
631+
_In_opt_ const void* lost_callback,
632+
_Outptr_ ebpf_map_subscription_t** subscription) noexcept;
660633

661634
/**
662-
* @brief Unsubscribe from the perf event array map event notifications.
635+
* @brief Unsubscribe from the map event notifications.
663636
*
664-
* @param[in] subscription Pointer to perf event array subscription to be canceled.
637+
* @param[in] subscription Pointer to subscription to be canceled.
665638
*/
666639
bool
667-
ebpf_perf_event_array_map_unsubscribe(_In_ _Post_invalid_ perf_event_array_subscription_t* subscription) noexcept;
640+
ebpf_map_unsubscribe(_In_ _Post_invalid_ ebpf_map_subscription_t* subscription) noexcept;
668641

669642
/**
670643
* @brief Get list of programs and stats in an ELF eBPF file.

0 commit comments

Comments
 (0)