diff --git a/Documentation/libibverbs.md b/Documentation/libibverbs.md index 980f354a3..902b2d45d 100644 --- a/Documentation/libibverbs.md +++ b/Documentation/libibverbs.md @@ -1,10 +1,9 @@ # Introduction -libibverbs is a library that allows programs to use RDMA "verbs" for -direct access to RDMA (currently InfiniBand and iWARP) hardware from -userspace. For more information on RDMA verbs, see the InfiniBand -Architecture Specification vol. 1, especially chapter 11, and the RDMA -Consortium's RDMA Protocol Verbs Specification. +libibverbs is a library that allows userspace programs direct +access to high-performance network hardware. See the Verbs +Semantics section at the end of this document for details +on RDMA and verbs constructs. # Using libibverbs @@ -74,3 +73,305 @@ The following table describes the expected behavior when VERBS_LOG_LEVEL is set: |-----------------|---------------------------------|------------------------------------------------| | Regular prints | Output to VERBS_LOG_FILE if set | Output to VERBS_LOG_FILE, or stderr if not set | | Datapath prints | Compiled out, no output | Output to VERBS_LOG_FILE, or stderr if not set | + + +# Verbs Semantics + +Verbs is defined by the InfiniBand Architecture Specification +(vol. 1, chapter 11) as an abstract definition of the functionality +provided by an Infiniband NIC. libibverbs was designed as a formal +software API aligned with that abstraction. As a result, API names, +including the library name, are closely aligned with those defined +for Infiniband. + +However, the library and API have evolved to support additional +high-performance transports and NICs. libibverbs constructs have +expanded beyond their traditional roles and definitions, except that +the original Infiniband naming has been kept for backwards +compatibility purposes. + +Today, verbs can be viewed as defining software primitives for +network hardware supporting one or more of the following: + +- Network queues are directly accessible from user space. +- Network hardware can directly access application memory buffers. +- The transport supports RDMA operations. + +The following sections describe select libibverbs constructs in terms +of their current semantics and, where appropriate, historical context. +Items are ordered conceptually. + +*RDMA* +: RDMA takes on several different meanings based on context, + which are further described below. RDMA stands for remote direct memory + access. Historically, RDMA referred to network operations which could + directly read or write application data buffers at the target. + The use of the term RDMA has since evolved to encompass not just + network operations, but also the key features of such devices: + + - Zero-copy: no intermediate buffering + - Low CPU utilization: transport offload + - High bandwidth and low latency + +*RDMA Verbs* +: RDMA verbs is the more generic name given to the libibverbs API, + as it implies support for other transports beyond Infiniband. + A device which supports RDMA verbs is accessible through this library. + + A common, but restricted, industry use of the term RDMA verbs frequently + implies the subset of libibverbs APIs and semantics focused on reliable- + connected communication. This document will use the term RDMA verbs as + a synonym for the libibverbs API as a whole. + +*RDMA-Core* +: The rdma-core is a set of libraries for interfacing with the Linux + kernel RDMA subsystem. Two key rdma-core libraries are this one, + libibverbs, and the librdmacm, which is used to establish connections. + + The rdma-core is considered an essential component of Linux RDMA. + It is used to ensure that the kernel ABI is stable and implements the + user space portion of the kernel RDMA IOCTL API. + +*RDMA Device / Verbs Device / NIC* +: An RDMA or verbs device is one which is accessible through the Linux + RDMA subsystem, and as a result, plugs into the libibverbs and rdma-core + framework. NICs plug into the RDMA subsystem to expose hardware + primitives supported by verbs (described above) or RDMA-like features. + + NICs do not necessarily need to support RDMA operations or transports + in order to leverage the rdma-core infrastructure. It is sufficient for + a NIC to expose similar features found in RDMA devices. + +*RDMA Operation* +: RDMA operations refer to network transport functions that read or write + data buffers at the target without host CPU intervention. RDMA reads + copy data from a remote memory region to the network and return the data + to the initiator of the request. RDMA writes copy data from a local + memory region to the network and place it directly into a memory region + at the target. + +*RDMA Transport* +: An RDMA transport can be considered any transport that supports RDMA + operations. Common RDMA transports include Infiniband, + RoCE (RDMA over Converged Ethernet), RoCE version 2, and iWarp. RoCE + and RoCEv2 are Infiniband transports over the Ethernet link layer, with + differences only in their lower-level addressing. + However, the term Infiniband usually refers to the Infiniband transport + over the Infiniband link layer. RoCE is used when explicitly + referring to Ethernet based solutions. RoCE version 2 is often included + or implied by references to RoCE. + +*Device Node* +: The original intent of device node type was to identify if an Infiniband + device was a NIC, switch, or router. Infiniband NICs were labeled as + channel adapters (CA). Node type was extended to identify the transport + being manipulated by verb primitives. Devices which implemented other + transports were assigned new node types. As a result, applications which + targeted a specific transport, such as Infiniband or RoCE, relied on node + type to indirectly identify the transport. + +*Protection Domain (PD)* +: A protection domain provides process-level isolation of resources and is + considered a fundamental security construct for Linux RDMA devices. + A PD defines a boundary between memory regions and queue pairs. A + network data transfer is associated with a single queue pair. That queue + pair may only access a memory region that shares the same protection + domain as itself. This prevents a user space process from accessing + memory buffers outside of its address space. + + Protection domains provide security for regions accessed + by both local and remote operations. Local access includes work requests + posted to HW command queues which reference memory regions. Remote + access includes RDMA operations which read or write memory regions. + + A queue pair is associated with a single PD. The PD verifies that hardware + access to a given lkey or rkey is valid for the specified QP and the + initiating or targeted process has permission to the lkey or rkey. Vendors + may implement a PD using a variety of mechanisms, but are required to meet + the defined security isolation. + +*Memory Region (MR)* +: A memory region identifies a virtual address range known to the NIC. + MRs are registered address ranges accessible by the NIC for local and + remote operations. The process of creating a MR associates the given + virtual address range with a protection domain, in order to ensure + process-level isolation. + + Once allocated, data transfers reference the MR using a key value (lkey + and/or rkey). When accessing a MR as part of a data transfer, an offset + into the memory region is specified. The offset is relative to the start + of the region and may either be 0-based or based on the region’s starting + virtual address. + +*lkey* +: The lkey is designed as a hardware identifier for a locally accessed data + buffer. Because work requests are formatted by user space software and + may be written directly to hardware queues, hardware must validate + that the memory buffers being referenced are accessible to the application. + + NIC hardware may not have access to the operating system's + virtual address translation table. Instead, hardware can use the lkey to + identify the registered memory region, which in turn identifies a protection + domain, which finally identifies the calling process. The protection domain + the processing queue pair must match that of the accessed memory region. + This prevents an application from sending data from buffers outside of its + virtual address space. + +*rkey* +: The rkey is designed as a transport identifier for remotely accessed data + buffers. It's conceptually like an lkey, but the value is + shared across the network. An rkey is associated with transport + permissions. + +*Completion Queue (CQ)* +: A completion queue is designed to represent a hardware queue where the + status of asynchronous operations is reported. Each asynchronous + operation (i.e. data transfer) is expected to write a single entry + into the completion queue. + +*Queue Pair (QP)* +: A queue pair was originally defined as a transport addressable set of + hardware queues, with a QP consisting of send and receive queues (defined + below). The evolved definition of a QP refers only to the transport + addressability of an endpoint. A QP's address is identified as a + queue pair number (QPN), which is conceptually like a transport + port number. In networking stack models, a QP is considered a transport + layer object. + + The internal structure of the QP is not constrained to a pair of queues. + The number of hardware queues and their purpose may vary based on how + the QP is configured. A QP may have 0 or more command queues used for + posting data transfer requests (send queues) and 0 or more command queues + for posting data buffers used to receive incoming messages (receive queues). + +*Receive Queue (RQ)* +: Receive queues are command queues belonging to queue pairs. Receive + commands post application buffers to receive incoming data. + + Receive queues are configured as part of queue pair setup. A RQ is + accessed indirectly through the QP when submitting receive work requests. + +*Shared Receive Queue (SRQ)* +: A shared receive queue is a single hardware command queue for posting + buffers to receive incoming data. This command queue may be shared + among multiple QPs, such that data that arrives on any associated QP + may retrieve a previously posted buffer from the SRQ. QPs that share + the same SRQ coordinate their access to posted buffers such that a + single posted operation is matched with a single incoming message. + + Unlike receive queues, SRQs are accessed directly by applications to + submit receive work requests. + +*Send Queue (SQ)* +: More generically, a send queue is a transmit queue. It + represents a command queue for operations that initiate a network operation. + A send queue may also be used to submit commands that update hardware + resources, such as updating memory regions. Network operations submitted + through the send queue include message sends, RDMA reads, RDMA writes, and + atomic operations, among others. + + Send queues are configured as part of queue pair setup. A SQ is + accessed indirectly through the QP when submitting send work requests. + +*Send Message* +: A send message refers to a specific type of transport data transfer. + A send message operation copies data from a local buffer to the network + and transfers the data as a single transport unit. The receiving NIC + copies the data from the network into a user posted receive message + buffer(s). + + Like the term RDMA, the meaning of send is context dependent. Send could + refer to the transmit command queue, any operation posted to the transmit + (send) queue, or a send message operation. + +*Work Request (WR)* +: A work request is a command submitted to a queue pair, work queue, or + shared receive queue. Work requests define the type of network operation + to perform, including references to any memory regions the operation will + access. + + A send work request is a transmit operation that is directed to the send + queue of a queue pair. A receive work request is an operation posted + to either a shared receive queue or a QP's receive queue. + +*Address Handle (AH)* +: An address handle identifies the link and/or network layer addressing to + a network port or multicast group. + + With legacy Infiniband, an address handle is a link layer object. For other + transports, including RoCE, the address handle is a network layer object. + +*Global Identifier (GID)* +: Infiniband defines a GID as an optional network-layer or multicast address. + Because GIDs are large enough to store an IPv6 address, their use has evolved + to support other transports. A GID identifies a network port, with the most + well-known GIDs being IPv4 and IPv6 addresses. + +*GID Type* +: The GID type determines the specific type of GID address being referenced. + Additionally, it identifies the set of addressing headers underneath the + transport header. + + An RDMA transport protocol may be layered over different networking stacks. + An RDMA transport may layer directly over a link layer (like Infiniband or + Ethernet), over the network layer (such as IP), or another transport + layer (such as TCP or UDP). The GID type conveys how the RDMA transport + stack is constructed, as well as how the GID address is interpreted. + +*GID Index* +: RDMA addresses are securely managed to ensure that unprivileged + applications do not inject arbitrary source addresses into the network. + Transport addresses are injected by the queue pair. Network addresses + are selected from a set of addresses stored in a source addressing table. + + The source addressing table is referred to as a GID table. The GID index + identifies an entry into that table. The GID table exposed to a user + space process contains only those addresses usable by that process. + Queue pairs are frequently assigned a specific GID index to use for their + source network address when initially configured. + +*Device Context* +: Identifies an instance of an opened RDMA device. + +*command fd - cmd_fd* +: File descriptor used to communicate with the kernel device driver. + Associated with the device context and opened by the library. + The cmd_fd communicates with the kernel via ioctl’s and is used + to allocate, configure, and release device resources. + + Applications interact with the cmd_fd indirectly by calling libibverbs + function calls. + +*async_fd* +: File descriptor used to report asynchronous events. + Associated with the device context and opened by the library. + + Applications may interact directly with the async_fd, such as waiting + on the fd via select/poll, to receive notifications when an async event + has been reported. + +*Job ID* +: A job ID identifies a single distributed application. The job object + is a device-level object that maps to a job ID and may be shared between + processes. The configuration of a job object, such as assigning its + job ID value, is considered a privileged operation. + + Multiple job objects, each assigned the same job ID value, may be needed + to represent a single, higher-level logical job running on the network. + This may be nessary for jobs that span multiple RDMA devices, for + example, where each job object may be configured for different source + addressing. + +*Job Key* +: A job key associates a job object with a specific protection domain. This + provides secure access to the actual job ID value stored with the job + object, while restricting which memory regions data transfers to / from + that job may access. + +*Address Table* +: An address table is a virtual address array associated with a job object. + The address table allows local processes that belong to the same job to + share addressing and scalable encryption information to peer QPs. + + The address table is an optional but integrated component to a job + object. diff --git a/libibverbs/verbs.h b/libibverbs/verbs.h index 821341242..bcf0f3ab7 100644 --- a/libibverbs/verbs.h +++ b/libibverbs/verbs.h @@ -74,6 +74,8 @@ enum ibv_gid_type { IBV_GID_TYPE_IB, IBV_GID_TYPE_ROCE_V1, IBV_GID_TYPE_ROCE_V2, + IBV_GID_TYPE_UET_UDP, + IBV_GID_TYPE_UET_IP, }; struct ibv_gid_entry { @@ -112,6 +114,46 @@ enum ibv_transport_type { IBV_TRANSPORT_UNSPECIFIED, }; +enum ibv_qp_msg_order { + /* Atomic-Atomic Rd/Wr ordering */ + IBV_ORDER_ATOMIC_RAR = (1 << 0), + IBV_ORDER_ATOMIC_RAW = (1 << 1), + IBV_ORDER_ATOMIC_WAR = (1 << 2), + IBV_ORDER_ATOMIC_WAW = (1 << 3), + /* RDMA-RDMA Rd/Wr ordering */ + IBV_ORDER_RDMA_RAR = (1 << 4), + IBV_ORDER_RDMA_RAW = (1 << 5), + IBV_ORDER_RDMA_WAR = (1 << 6), + IBV_ORDER_RDMA_WAW = (1 << 7), + /* Send ordering wrt Atomic and RDMA Rd/Wr */ + IBV_ORDER_RAS = (1 << 8), + IBV_ORDER_SAR = (1 << 9), + IBV_ORDER_SAS = (1 << 10), + IBV_ORDER_SAW = (1 << 11), + IBV_ORDER_WAS = (1 << 12), + /* Atomic and RDMA Rd/Wr ordering */ + IBV_ORDER_RAR = (1 << 13), + IBV_ORDER_RAW = (1 << 14), + IBV_ORDER_WAR = (1 << 15), + IBV_ORDER_WAW = (1 << 16), +}; + +enum ibv_qp_use_flags { + IBV_QP_USAGE_IMM_DATA_RQ = (1 << 0), + IBV_QP_USAGE_ATTACH_MR = (1 << 1), +}; + +struct ibv_qp_semantics { + uint32_t comp_mask; + uint32_t msg_order; + uint32_t max_rdma_raw_size; + uint32_t max_rdma_war_size; + uint32_t max_rdma_waw_size; + uint32_t max_pdu; + uint8_t imm_data_size; + unsigned int usage_flags; +}; + enum ibv_device_cap_flags { IBV_DEVICE_RESIZE_MAX_WR = 1, IBV_DEVICE_BAD_PKEY_CNTR = 1 << 1, @@ -151,6 +193,7 @@ enum ibv_fork_status { */ #define IBV_DEVICE_RAW_SCATTER_FCS (1ULL << 34) #define IBV_DEVICE_PCI_WRITE_END_PADDING (1ULL << 36) +#define IBV_DEVICE_USER_RKEY (1ULL << 37) enum ibv_atomic_cap { IBV_ATOMIC_NONE, @@ -361,6 +404,10 @@ struct ibv_device_attr_ex { struct ibv_pci_atomic_caps pci_atomic_caps; uint32_t xrc_odp_caps; uint32_t phys_port_cnt_ex; + uint32_t max_job_ids; + uint32_t max_addr_entries; + uint32_t max_jkeys_per_pd; + uint16_t max_rwq_per_qp; }; enum ibv_mtu { @@ -557,6 +604,8 @@ enum ibv_create_cq_wc_flags { IBV_WC_EX_WITH_FLOW_TAG = 1 << 9, IBV_WC_EX_WITH_TM_INFO = 1 << 10, IBV_WC_EX_WITH_COMPLETION_TIMESTAMP_WALLCLOCK = 1 << 11, + IBV_WC_EX_WITH_IMM64 = 1 << 12, + IBV_WC_EX_WITH_SRC_ID = 1 << 13, /* implies job id */ }; enum { @@ -679,6 +728,7 @@ struct ibv_mr { uint32_t handle; uint32_t lkey; uint32_t rkey; + uint64_t rkey64; }; enum ibv_mr_init_attr_mask { @@ -687,6 +737,10 @@ enum ibv_mr_init_attr_mask { IBV_REG_MR_MASK_FD = 1 << 2, IBV_REG_MR_MASK_FD_OFFSET = 1 << 3, IBV_REG_MR_MASK_DMAH = 1 << 4, + IBV_REG_MR_MASK_JKEY = 1 << 5, + IBV_REG_MR_MASK_RKEY = 1 << 6, + IBV_REG_MR_MASK_CUR_MR = 1 << 7, + IBV_REG_MR_MASK_DERIVE_CNT = 1 << 8, }; struct ibv_mr_init_attr { @@ -698,6 +752,10 @@ struct ibv_mr_init_attr { int fd; uint64_t fd_offset; struct ibv_dmah *dmah; + struct ibv_job_key *jkey; + uint64_t rkey; + struct ibv_mr *cur_mr; + uint32_t derive_cnt; }; enum ibv_mw_type { @@ -849,6 +907,7 @@ enum ibv_wq_type { enum ibv_wq_init_attr_mask { IBV_WQ_INIT_ATTR_FLAGS = 1 << 0, IBV_WQ_INIT_ATTR_RESERVED = 1 << 1, + IBV_WQ_INIT_ATTR_WQ_NUM = 1 << 2, }; enum ibv_wq_flags { @@ -868,6 +927,7 @@ struct ibv_wq_init_attr { struct ibv_cq *cq; uint32_t comp_mask; /* Use ibv_wq_init_attr_mask */ uint32_t create_flags; /* use ibv_wq_flags */ + uint32_t wq_num; }; enum ibv_wq_state { @@ -930,6 +990,7 @@ enum ibv_qp_type { IBV_QPT_RAW_PACKET = 8, IBV_QPT_XRC_SEND = 9, IBV_QPT_XRC_RECV, + IBV_QPT_RU, IBV_QPT_DRIVER = 0xff, }; @@ -959,6 +1020,9 @@ enum ibv_qp_init_attr_mask { IBV_QP_INIT_ATTR_IND_TABLE = 1 << 4, IBV_QP_INIT_ATTR_RX_HASH = 1 << 5, IBV_QP_INIT_ATTR_SEND_OPS_FLAGS = 1 << 6, + IBV_QP_INIT_ATTR_QP_ATTR = 1 << 7, + IBV_QP_INIT_ATTR_QP_SEMANTICS = 1 << 8, + IBV_QP_INIT_ATTR_SRC_ID = 1 << 9, }; enum ibv_qp_create_flags { @@ -1013,6 +1077,11 @@ struct ibv_qp_init_attr_ex { uint32_t source_qpn; /* See enum ibv_qp_create_send_ops_flags */ uint64_t send_ops_flags; + + struct ibv_qp_attr *qp_attr; + int qp_attr_mask; + struct ibv_qp_semantics *qp_semantics; + uint32_t src_id; }; enum ibv_qp_open_attr_mask { @@ -1150,7 +1219,8 @@ enum ibv_send_flags { IBV_SEND_SIGNALED = 1 << 1, IBV_SEND_SOLICITED = 1 << 2, IBV_SEND_INLINE = 1 << 3, - IBV_SEND_IP_CSUM = 1 << 4 + IBV_SEND_IP_CSUM = 1 << 4, + IBV_SEND_DELIVERY_COMPLETE = 1 << 5, }; enum ibv_placement_type { @@ -1380,6 +1450,19 @@ struct ibv_qp_ex { void (*wr_flush)(struct ibv_qp_ex *qp, uint32_t rkey, uint64_t remote_addr, size_t len, uint8_t type, uint8_t level); + + void (*wr_send_imm64)(struct ibv_qp_ex *qp, __be64 imm_data); + void (*wr_rdma_read64)(struct ibv_qp_ex *qp, uint64_t rkey, + uint64_t remote_addr); + void (*wr_rdma_write64)(struct ibv_qp_ex *qp, uint64_t rkey, + uint64_t remote_addr); + void (*wr_rdma_write64_imm)(struct ibv_qp_ex *qp, uint64_t rkey, + uint64_t remote_addr, __be64 imm_data); + void (*wr_set_ru_addr)(struct ibv_qp_ex *qp, struct ibv_ah *ah, + uint32_t remote_qpn, uint32_t jkey); + void (*wr_set_job_addr)(struct ibv_qp_ex *qp, unsigned int addr_idx, + uint32_t jkey); + void (*wr_set_wq_num)(struct ibv_qp_ex *qp, uint32_t wq_num); }; struct ibv_qp_ex *ibv_qp_to_qp_ex(struct ibv_qp *qp); @@ -1416,12 +1499,24 @@ static inline void ibv_wr_rdma_read(struct ibv_qp_ex *qp, uint32_t rkey, qp->wr_rdma_read(qp, rkey, remote_addr); } +static inline void ibv_wr_rdma_read64(struct ibv_qp_ex *qp, uint64_t rkey, + uint64_t remote_addr) +{ + qp->wr_rdma_read64(qp, rkey, remote_addr); +} + static inline void ibv_wr_rdma_write(struct ibv_qp_ex *qp, uint32_t rkey, uint64_t remote_addr) { qp->wr_rdma_write(qp, rkey, remote_addr); } +static inline void ibv_wr_rdma_write64(struct ibv_qp_ex *qp, uint64_t rkey, + uint64_t remote_addr) +{ + qp->wr_rdma_write64(qp, rkey, remote_addr); +} + static inline void ibv_wr_flush(struct ibv_qp_ex *qp, uint32_t rkey, uint64_t remote_addr, size_t len, uint8_t type, uint8_t level) @@ -1435,6 +1530,12 @@ static inline void ibv_wr_rdma_write_imm(struct ibv_qp_ex *qp, uint32_t rkey, qp->wr_rdma_write_imm(qp, rkey, remote_addr, imm_data); } +static inline void ibv_wr_rdma_write64_imm(struct ibv_qp_ex *qp, uint64_t rkey, + uint64_t remote_addr, __be64 imm_data) +{ + qp->wr_rdma_write64_imm(qp, rkey, remote_addr, imm_data); +} + static inline void ibv_wr_send(struct ibv_qp_ex *qp) { qp->wr_send(qp); @@ -1445,6 +1546,11 @@ static inline void ibv_wr_send_imm(struct ibv_qp_ex *qp, __be32 imm_data) qp->wr_send_imm(qp, imm_data); } +static inline void ibv_wr_send_imm64(struct ibv_qp_ex *qp, __be64 imm_data) +{ + qp->wr_send_imm64(qp, imm_data); +} + static inline void ibv_wr_send_inv(struct ibv_qp_ex *qp, uint32_t invalidate_rkey) { @@ -1463,6 +1569,25 @@ static inline void ibv_wr_set_ud_addr(struct ibv_qp_ex *qp, struct ibv_ah *ah, qp->wr_set_ud_addr(qp, ah, remote_qpn, remote_qkey); } +static inline void ibv_wr_set_ru_addr(struct ibv_qp_ex *qp, struct ibv_ah *ah, + uint32_t remote_qpn, uint32_t jkey) +{ + qp->wr_set_ru_addr(qp, ah, remote_qpn, jkey); +} + +static inline void ibv_wr_set_job_addr(struct ibv_qp_ex *qp, + unsigned int addr_idx, + uint32_t jkey) +{ + qp->wr_set_job_addr(qp, addr_idx, jkey); +} + +static inline void ibv_wr_set_wq_num(struct ibv_qp_ex *qp, + uint32_t wq_num) +{ + qp->wr_set_wq_num(qp, wq_num); +} + static inline void ibv_wr_set_xrc_srqn(struct ibv_qp_ex *qp, uint32_t remote_srqn) { @@ -1593,6 +1718,9 @@ struct ibv_cq_ex { void (*read_tm_info)(struct ibv_cq_ex *current, struct ibv_wc_tm_info *tm_info); uint64_t (*read_completion_wallclock_ns)(struct ibv_cq_ex *current); + __be64 (*read_imm64_data)(struct ibv_cq_ex *current); + uint64_t (*read_job_id)(struct ibv_cq_ex *current); + uint32_t (*read_src_id)(struct ibv_cq_ex *current); }; static inline struct ibv_cq *ibv_cq_ex_to_cq(struct ibv_cq_ex *cq) @@ -1651,6 +1779,11 @@ static inline __be32 ibv_wc_read_imm_data(struct ibv_cq_ex *cq) return cq->read_imm_data(cq); } +static inline __be64 ibv_wc_read_imm64_data(struct ibv_cq_ex *cq) +{ + return cq->read_imm64_data(cq); +} + static inline uint32_t ibv_wc_read_invalidated_rkey(struct ibv_cq_ex *cq) { #ifdef __CHECKER__ @@ -1670,6 +1803,16 @@ static inline uint32_t ibv_wc_read_src_qp(struct ibv_cq_ex *cq) return cq->read_src_qp(cq); } +static inline uint64_t ibv_wc_read_job_id(struct ibv_cq_ex *cq) +{ + return cq->read_job_id(cq); +} + +static inline uint32_t ibv_wc_read_src_id(struct ibv_cq_ex *cq) +{ + return cq->read_src_id(cq); +} + static inline unsigned int ibv_wc_read_wc_flags(struct ibv_cq_ex *cq) { return cq->read_wc_flags(cq); @@ -1993,6 +2136,51 @@ struct ibv_flow_action_esp_attr { uint32_t esn; }; +struct ibv_job { + struct ibv_context *context; + void *user_context; + uint32_t handle; +}; + +struct ibv_job_attr { + uint32_t comp_mask; + unsigned int flags; + uint64_t id; + uint32_t max_addr_entries; + enum ibv_qp_type qp_type; + struct ibv_ah_attr ah_attr; +}; + +struct ibv_job * +ibv_alloc_job(struct ibv_context *context, struct ibv_job_attr *attr, + void *user_context); +int ibv_close_job(struct ibv_job *job); + +int ibv_insert_addr(struct ibv_job *job, uint32_t qpn, + struct ibv_ah_attr ah_attr, + unsigned int addr_idx, unsigned int flags); +int ibv_remove_addr(struct ibv_job *job, unsigned int addr_idx, + unsigned int flags); +int ibv_query_addr(struct ibv_job *job, unsigned int addr_idx, + uint32_t *qpn, struct ibv_ah_attr *ah_attr, + unsigned int flags); + +int ibv_export_job(struct ibv_job *job, int *fd); +int ibv_import_job(struct ibv_context *context, int fd, struct ibv_job **job); + +int ibv_query_job(struct ibv_job *job, struct ibv_job_attr *attr); + +struct ibv_job_key { + struct ibv_pd *pd; + uint32_t handle; + uint32_t jkey; +}; + +struct ibv_job_key * +ibv_create_jkey(struct ibv_pd *pd, struct ibv_job *job, unsigned int flags); +int ibv_destroy_jkey(struct ibv_job_key *job_key); + + struct ibv_device; struct ibv_context; @@ -2182,6 +2370,13 @@ struct ibv_values_ex { struct verbs_context { /* "grows up" - new fields go here */ + int (*attach_mr)(struct ibv_qp *qp, struct ibv_mr *mr); + int (*detach_mr)(struct ibv_qp *qp, struct ibv_mr *mr); + int (*query_qp_semantics)(struct ibv_context *context, + enum ibv_qp_type qp_type, + struct ibv_ah_attr *ah_attr, + struct ibv_qp_semantics *qp_semantics, + size_t qp_semantic_len); struct ibv_mr *(*reg_mr_ex)(struct ibv_pd *pd, struct ibv_mr_init_attr *mr_init_attr); int (*dealloc_dmah)(struct ibv_dmah *dmah); @@ -2524,6 +2719,21 @@ int ibv_query_pkey(struct ibv_context *context, uint8_t port_num, int ibv_get_pkey_index(struct ibv_context *context, uint8_t port_num, __be16 pkey); +static inline int ibv_query_qp_semantics(struct ibv_context *context, + enum ibv_qp_type qp_type, + struct ibv_ah_attr *ah_attr, + struct ibv_qp_semantics *qp_semantics, + size_t qp_semantic_len) +{ + struct verbs_context *vctx = verbs_get_ctx_op(context, query_qp_semantics); + + if (!vctx) + return EOPNOTSUPP; + + return vctx->query_qp_semantics(context, qp_type, ah_attr, + qp_semantics, qp_semantic_len); +} + /** * ibv_alloc_pd - Allocate a protection domain */ @@ -2730,6 +2940,22 @@ static inline int ibv_dealloc_mw(struct ibv_mw *mw) return mw->context->ops.dealloc_mw(mw); } +static inline int ibv_attach_mr(struct ibv_qp *qp, struct ibv_mr *mr) +{ + struct verbs_context *vctx = verbs_get_ctx_op(qp->context, attach_mr); + + if (!vctx) + return EOPNOTSUPP; + + return vctx->attach_mr(qp, mr); +} + +static inline int ibv_detach_mr(struct ibv_qp *qp, struct ibv_mr *mr) +{ + struct verbs_context *vctx = verbs_get_ctx_op(qp->context, attach_mr); + return vctx->detach_mr(qp, mr); +} + /** * ibv_inc_rkey - Increase the 8 lsb in the given rkey */