Skip to content

Commit 5a95c94

Browse files
committed
Merge: fwctl subsystem
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6905 JIRA: https://issues.redhat.com/browse/RHEL-86016 fwctl is a new subsystem intended to bring some common rules and order to the growing pattern of exposing a secure FW interface directly to userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are exposing a device for datapath operations fwctl is focused on debugging, configuration and provisioning of the device. In particular, fwctl is an upstream approach to provide a configuration interface for low-level tunables for mlx5 devices. Signed-off-by: Benjamin Poirier <bpoirier@redhat.com> Approved-by: Rafael Aquini <raquini@redhat.com> Approved-by: Michal Schmidt <mschmidt@redhat.com> Approved-by: Kamal Heib <kheib@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Augusto Caringi <acaringi@redhat.com>
2 parents 407773f + 821429a commit 5a95c94

File tree

23 files changed

+1521
-2
lines changed

23 files changed

+1521
-2
lines changed

Documentation/admin-guide/tainted-kernels.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@ Bit Log Number Reason that got the kernel tainted
101101
16 _/X 65536 auxiliary taint, defined for and used by distros
102102
17 _/T 131072 kernel was built with the struct randomization plugin
103103
18 _/N 262144 an in-kernel test has been run
104+
19 _/J 524288 userspace used a mutating debug operation in fwctl
104105
=== === ====== ========================================================
105106

106107
Note: The character ``_`` is representing a blank in this table to make reading
@@ -182,3 +183,7 @@ More detailed explanation for tainting
182183
produce extremely unusual kernel structure layouts (even performance
183184
pathological ones), which is important to know when debugging. Set at
184185
build time.
186+
187+
19) ``J`` if userpace opened /dev/fwctl/* and performed a FWTCL_RPC_DEBUG_WRITE
188+
to use the devices debugging features. Device debugging features could
189+
cause the device to malfunction in undefined ways.
Lines changed: 285 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,285 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
===============
4+
fwctl subsystem
5+
===============
6+
7+
:Author: Jason Gunthorpe
8+
9+
Overview
10+
========
11+
12+
Modern devices contain extensive amounts of FW, and in many cases, are largely
13+
software-defined pieces of hardware. The evolution of this approach is largely a
14+
reaction to Moore's Law where a chip tape out is now highly expensive, and the
15+
chip design is extremely large. Replacing fixed HW logic with a flexible and
16+
tightly coupled FW/HW combination is an effective risk mitigation against chip
17+
respin. Problems in the HW design can be counteracted in device FW. This is
18+
especially true for devices which present a stable and backwards compatible
19+
interface to the operating system driver (such as NVMe).
20+
21+
The FW layer in devices has grown to incredible size and devices frequently
22+
integrate clusters of fast processors to run it. For example, mlx5 devices have
23+
over 30MB of FW code, and big configurations operate with over 1GB of FW managed
24+
runtime state.
25+
26+
The availability of such a flexible layer has created quite a variety in the
27+
industry where single pieces of silicon are now configurable software-defined
28+
devices and can operate in substantially different ways depending on the need.
29+
Further, we often see cases where specific sites wish to operate devices in ways
30+
that are highly specialized and require applications that have been tailored to
31+
their unique configuration.
32+
33+
Further, devices have become multi-functional and integrated to the point they
34+
no longer fit neatly into the kernel's division of subsystems. Modern
35+
multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many
36+
subsystems while sharing the underlying hardware using the auxiliary device
37+
system.
38+
39+
All together this creates a challenge for the operating system, where devices
40+
have an expansive FW environment that needs robust device-specific debugging
41+
support, and FW-driven functionality that is not well suited to “generic”
42+
interfaces. fwctl seeks to allow access to the full device functionality from
43+
user space in the areas of debuggability, management, and first-boot/nth-boot
44+
provisioning.
45+
46+
fwctl is aimed at the common device design pattern where the OS and FW
47+
communicate via an RPC message layer constructed with a queue or mailbox scheme.
48+
In this case the driver will typically have some layer to deliver RPC messages
49+
and collect RPC responses from device FW. The in-kernel subsystem drivers that
50+
operate the device for its primary purposes will use these RPCs to build their
51+
drivers, but devices also usually have a set of ancillary RPCs that don't really
52+
fit into any specific subsystem. For example, a HW RAID controller is primarily
53+
operated by the block layer but also comes with a set of RPCs to administer the
54+
construction of drives within the HW RAID.
55+
56+
In the past when devices were more single function, individual subsystems would
57+
grow different approaches to solving some of these common problems. For instance
58+
monitoring device health, manipulating its FLASH, debugging the FW,
59+
provisioning, all have various unique interfaces across the kernel.
60+
61+
fwctl's purpose is to define a common set of limited rules, described below,
62+
that allow user space to securely construct and execute RPCs inside device FW.
63+
The rules serve as an agreement between the operating system and FW on how to
64+
correctly design the RPC interface. As a uAPI the subsystem provides a thin
65+
layer of discovery and a generic uAPI to deliver the RPCs and collect the
66+
response. It supports a system of user space libraries and tools which will
67+
use this interface to control the device using the device native protocols.
68+
69+
Scope of Action
70+
---------------
71+
72+
fwctl drivers are strictly restricted to being a way to operate the device FW.
73+
It is not an avenue to access random kernel internals, or other operating system
74+
SW states.
75+
76+
fwctl instances must operate on a well-defined device function, and the device
77+
should have a well-defined security model for what scope within the physical
78+
device the function is permitted to access. For instance, the most complex PCIe
79+
device today may broadly have several function-level scopes:
80+
81+
1. A privileged function with full access to the on-device global state and
82+
configuration
83+
84+
2. Multiple hypervisor functions with control over itself and child functions
85+
used with VMs
86+
87+
3. Multiple VM functions tightly scoped within the VM
88+
89+
The device may create a logical parent/child relationship between these scopes.
90+
For instance a child VM's FW may be within the scope of the hypervisor FW. It is
91+
quite common in the VFIO world that the hypervisor environment has a complex
92+
provisioning/profiling/configuration responsibility for the function VFIO
93+
assigns to the VM.
94+
95+
Further, within the function, devices often have RPC commands that fall within
96+
some general scopes of action (see enum fwctl_rpc_scope):
97+
98+
1. Access to function & child configuration, FLASH, etc. that becomes live at a
99+
function reset. Access to function & child runtime configuration that is
100+
transparent or non-disruptive to any driver or VM.
101+
102+
2. Read-only access to function debug information that may report on FW objects
103+
in the function & child, including FW objects owned by other kernel
104+
subsystems.
105+
106+
3. Write access to function & child debug information strictly compatible with
107+
the principles of kernel lockdown and kernel integrity protection. Triggers
108+
a kernel Taint.
109+
110+
4. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO.
111+
112+
User space will provide a scope label on each RPC and the kernel must enforce the
113+
above CAPs and taints based on that scope. A combination of kernel and FW can
114+
enforce that RPCs are placed in the correct scope by user space.
115+
116+
Denied behavior
117+
---------------
118+
119+
There are many things this interface must not allow user space to do (without a
120+
Taint or CAP), broadly derived from the principles of kernel lockdown. Some
121+
examples:
122+
123+
1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with
124+
untrusted code, or otherwise compromise device or system security and
125+
integrity.
126+
127+
2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel
128+
objects owned by kernel drivers.
129+
130+
3. Directly configure or otherwise control kernel drivers. A subsystem kernel
131+
driver can react to the device configuration at function reset/driver load
132+
time, but otherwise must not be coupled to fwctl.
133+
134+
4. Operate the HW in a way that overlaps with the core purpose of another
135+
primary kernel subsystem, such as read/write to LBAs, send/receive of
136+
network packets, or operate an accelerator's data plane.
137+
138+
fwctl is not a replacement for device direct access subsystems like uacce or
139+
VFIO.
140+
141+
Operations exposed through fwctl's non-taining interfaces should be fully
142+
sharable with other users of the device. For instance exposing a RPC through
143+
fwctl should never prevent a kernel subsystem from also concurrently using that
144+
same RPC or hardware unit down the road. In such cases fwctl will be less
145+
important than proper kernel subsystems that eventually emerge. Mistakes in this
146+
area resulting in clashes will be resolved in favour of a kernel implementation.
147+
148+
fwctl User API
149+
==============
150+
151+
.. kernel-doc:: include/uapi/fwctl/fwctl.h
152+
.. kernel-doc:: include/uapi/fwctl/mlx5.h
153+
154+
sysfs Class
155+
-----------
156+
157+
fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices
158+
(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device
159+
operates the iotcl uAPI described above.
160+
161+
fwctl devices can be related to driver components in other subsystems through
162+
sysfs::
163+
164+
$ ls /sys/class/fwctl/fwctl0/device/infiniband/
165+
ibp0s10f0
166+
167+
$ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
168+
fwctl0/
169+
170+
$ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
171+
dev device power subsystem uevent
172+
173+
User space Community
174+
--------------------
175+
176+
Drawing inspiration from nvme-cli, participating in the kernel side must come
177+
with a user space in a common TBD git tree, at a minimum to usefully operate the
178+
kernel driver. Providing such an implementation is a pre-condition to merging a
179+
kernel driver.
180+
181+
The goal is to build user space community around some of the shared problems
182+
we all have, and ideally develop some common user space programs with some
183+
starting themes of:
184+
185+
- Device in-field debugging
186+
187+
- HW provisioning
188+
189+
- VFIO child device profiling before VM boot
190+
191+
- Confidential Compute topics (attestation, secure provisioning)
192+
193+
that stretch across all subsystems in the kernel. fwupd is a great example of
194+
how an excellent user space experience can emerge out of kernel-side diversity.
195+
196+
fwctl Kernel API
197+
================
198+
199+
.. kernel-doc:: drivers/fwctl/main.c
200+
:export:
201+
.. kernel-doc:: include/linux/fwctl.h
202+
203+
fwctl Driver design
204+
-------------------
205+
206+
In many cases a fwctl driver is going to be part of a larger cross-subsystem
207+
device possibly using the auxiliary_device mechanism. In that case several
208+
subsystems are going to be sharing the same device and FW interface layer so the
209+
device design must already provide for isolation and cooperation between kernel
210+
subsystems. fwctl should fit into that same model.
211+
212+
Part of the driver should include a description of how its scope restrictions
213+
and security model work. The driver and FW together must ensure that RPCs
214+
provided by user space are mapped to the appropriate scope. If the validation is
215+
done in the driver then the validation can read a 'command effects' report from
216+
the device, or hardwire the enforcement. If the validation is done in the FW,
217+
then the driver should pass the fwctl_rpc_scope to the FW along with the command.
218+
219+
The driver and FW must cooperate to ensure that either fwctl cannot allocate
220+
any FW resources, or any resources it does allocate are freed on FD closure. A
221+
driver primarily constructed around FW RPCs may find that its core PCI function
222+
and RPC layer belongs under fwctl with auxiliary devices connecting to other
223+
subsystems.
224+
225+
Each device type must be mindful of Linux's philosophy for stable ABI. The FW
226+
RPC interface does not have to meet a strictly stable ABI, but it does need to
227+
meet an expectation that userspace tools that are deployed and in significant
228+
use don't needlessly break. FW upgrade and kernel upgrade should keep widely
229+
deployed tooling working.
230+
231+
Development and debugging focused RPCs under more permissive scopes can have
232+
less stabilitiy if the tools using them are only run under exceptional
233+
circumstances and not for every day use of the device. Debugging tools may even
234+
require exact version matching as they may require something similar to DWARF
235+
debug information from the FW binary.
236+
237+
Security Response
238+
=================
239+
240+
The kernel remains the gatekeeper for this interface. If violations of the
241+
scopes, security or isolation principles are found, we have options to let
242+
devices fix them with a FW update, push a kernel patch to parse and block RPC
243+
commands or push a kernel patch to block entire firmware versions/devices.
244+
245+
While the kernel can always directly parse and restrict RPCs, it is expected
246+
that the existing kernel pattern of allowing drivers to delegate validation to
247+
FW to be a useful design.
248+
249+
Existing Similar Examples
250+
=========================
251+
252+
The approach described in this document is not a new idea. Direct, or near
253+
direct device access has been offered by the kernel in different areas for
254+
decades. With more devices wanting to follow this design pattern it is becoming
255+
clear that it is not entirely well understood and, more importantly, the
256+
security considerations are not well defined or agreed upon.
257+
258+
Some examples:
259+
260+
- HW RAID controllers. This includes RPCs to do things like compose drives into
261+
a RAID volume, configure RAID parameters, monitor the HW and more.
262+
263+
- Baseboard managers. RPCs for configuring settings in the device and more
264+
265+
- NVMe vendor command capsules. nvme-cli provides access to some monitoring
266+
functions that different products have defined, but more exist.
267+
268+
- CXL also has a NVMe-like vendor command system.
269+
270+
- DRM allows user space drivers to send commands to the device via kernel
271+
mediation
272+
273+
- RDMA allows user space drivers to directly push commands to the device
274+
without kernel involvement
275+
276+
- Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc.
277+
278+
The first 4 are examples of areas that fwctl intends to cover. The latter three
279+
are examples of denied behavior as they fully overlap with the primary purpose
280+
of a kernel subsystem.
281+
282+
Some key lessons learned from these past efforts are the importance of having a
283+
common user space project to use as a pre-condition for obtaining a kernel
284+
driver. Developing good community around useful software in user space is key to
285+
getting companies to fund participation to enable their products.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
Firmware Control (FWCTL) Userspace API
4+
======================================
5+
6+
A framework that define a common set of limited rules that allows user space
7+
to securely construct and execute RPCs inside device firmware.
8+
9+
.. toctree::
10+
:maxdepth: 1
11+
12+
fwctl

Documentation/userspace-api/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ place where this information is gathered.
2222
unshare
2323
spec_ctrl
2424
accelerators/ocxl
25+
fwctl/index
2526
ebpf/index
2627
ioctl/index
2728
iommu

Documentation/userspace-api/ioctl/ioctl-number.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -317,6 +317,7 @@ Code Seq# Include File Comments
317317
0x97 00-7F fs/ceph/ioctl.h Ceph file system
318318
0x99 00-0F 537-Addinboard driver
319319
<mailto:buk@buks.ipn.de>
320+
0x9A 00-0F include/uapi/fwctl/fwctl.h
320321
0xA0 all linux/sdp/sdp.h Industrial Device Project
321322
<mailto:kenji@bitgate.com>
322323
0xA1 0 linux/vtpm_proxy.h TPM Emulator Proxy Driver

MAINTAINERS

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7976,6 +7976,24 @@ F: kernel/futex/*
79767976
F: tools/perf/bench/futex*
79777977
F: tools/testing/selftests/futex/
79787978

7979+
FWCTL SUBSYSTEM
7980+
M: Dave Jiang <dave.jiang@intel.com>
7981+
M: Jason Gunthorpe <jgg@nvidia.com>
7982+
M: Saeed Mahameed <saeedm@nvidia.com>
7983+
R: Jonathan Cameron <Jonathan.Cameron@huawei.com>
7984+
S: Maintained
7985+
F: Documentation/userspace-api/fwctl/
7986+
F: drivers/fwctl/
7987+
F: include/linux/fwctl.h
7988+
F: include/uapi/fwctl/
7989+
7990+
FWCTL MLX5 DRIVER
7991+
M: Saeed Mahameed <saeedm@nvidia.com>
7992+
R: Itay Avraham <itayavr@nvidia.com>
7993+
L: linux-kernel@vger.kernel.org
7994+
S: Maintained
7995+
F: drivers/fwctl/mlx5/
7996+
79797997
GATEWORKS SYSTEM CONTROLLER (GSC) DRIVER
79807998
M: Tim Harvey <tharvey@gateworks.com>
79817999
M: Robert Jones <rjones@gateworks.com>

drivers/Kconfig

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ source "drivers/connector/Kconfig"
1919

2020
source "drivers/firmware/Kconfig"
2121

22+
source "drivers/fwctl/Kconfig"
23+
2224
source "drivers/gnss/Kconfig"
2325

2426
source "drivers/mtd/Kconfig"

drivers/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,7 @@ obj-$(CONFIG_MEMSTICK) += memstick/
138138
obj-$(CONFIG_NEW_LEDS) += leds/
139139
obj-$(CONFIG_INFINIBAND) += infiniband/
140140
obj-y += firmware/
141+
obj-$(CONFIG_FWCTL) += fwctl/
141142
obj-$(CONFIG_CRYPTO) += crypto/
142143
obj-$(CONFIG_SUPERH) += sh/
143144
obj-y += clocksource/

drivers/fwctl/Kconfig

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# SPDX-License-Identifier: GPL-2.0-only
2+
menuconfig FWCTL
3+
tristate "fwctl device firmware access framework"
4+
help
5+
fwctl provides a userspace API for restricted access to communicate
6+
with on-device firmware. The communication channel is intended to
7+
support a wide range of lockdown compatible device behaviors including
8+
manipulating device FLASH, debugging, and other activities that don't
9+
fit neatly into an existing subsystem.
10+
11+
if FWCTL
12+
config FWCTL_MLX5
13+
tristate "mlx5 ConnectX control fwctl driver"
14+
depends on MLX5_CORE
15+
help
16+
MLX5 provides interface for the user process to access the debug and
17+
configuration registers of the ConnectX hardware family
18+
(NICs, PCI switches and SmartNIC SoCs).
19+
This will allow configuration and debug tools to work out of the box on
20+
mainstream kernel.
21+
22+
If you don't know what to do here, say N.
23+
endif

0 commit comments

Comments
 (0)