123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286 |
- Kernel Connection Mulitplexor
- -----------------------------
- Kernel Connection Multiplexor (KCM) is a mechanism that provides a message based
- interface over TCP for generic application protocols. With KCM an application
- can efficiently send and receive application protocol messages over TCP using
- datagram sockets.
- KCM implements an NxM multiplexor in the kernel as diagrammed below:
- +------------+ +------------+ +------------+ +------------+
- | KCM socket | | KCM socket | | KCM socket | | KCM socket |
- +------------+ +------------+ +------------+ +------------+
- | | | |
- +-----------+ | | +----------+
- | | | |
- +----------------------------------+
- | Multiplexor |
- +----------------------------------+
- | | | | |
- +---------+ | | | ------------+
- | | | | |
- +----------+ +----------+ +----------+ +----------+ +----------+
- | Psock | | Psock | | Psock | | Psock | | Psock |
- +----------+ +----------+ +----------+ +----------+ +----------+
- | | | | |
- +----------+ +----------+ +----------+ +----------+ +----------+
- | TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock |
- +----------+ +----------+ +----------+ +----------+ +----------+
- KCM sockets
- -----------
- The KCM sockets provide the user interface to the muliplexor. All the KCM sockets
- bound to a multiplexor are considered to have equivalent function, and I/O
- operations in different sockets may be done in parallel without the need for
- synchronization between threads in userspace.
- Multiplexor
- -----------
- The multiplexor provides the message steering. In the transmit path, messages
- written on a KCM socket are sent atomically on an appropriate TCP socket.
- Similarly, in the receive path, messages are constructed on each TCP socket
- (Psock) and complete messages are steered to a KCM socket.
- TCP sockets & Psocks
- --------------------
- TCP sockets may be bound to a KCM multiplexor. A Psock structure is allocated
- for each bound TCP socket, this structure holds the state for constructing
- messages on receive as well as other connection specific information for KCM.
- Connected mode semantics
- ------------------------
- Each multiplexor assumes that all attached TCP connections are to the same
- destination and can use the different connections for load balancing when
- transmitting. The normal send and recv calls (include sendmmsg and recvmmsg)
- can be used to send and receive messages from the KCM socket.
- Socket types
- ------------
- KCM supports SOCK_DGRAM and SOCK_SEQPACKET socket types.
- Message delineation
- -------------------
- Messages are sent over a TCP stream with some application protocol message
- format that typically includes a header which frames the messages. The length
- of a received message can be deduced from the application protocol header
- (often just a simple length field).
- A TCP stream must be parsed to determine message boundaries. Berkeley Packet
- Filter (BPF) is used for this. When attaching a TCP socket to a multiplexor a
- BPF program must be specified. The program is called at the start of receiving
- a new message and is given an skbuff that contains the bytes received so far.
- It parses the message header and returns the length of the message. Given this
- information, KCM will construct the message of the stated length and deliver it
- to a KCM socket.
- TCP socket management
- ---------------------
- When a TCP socket is attached to a KCM multiplexor data ready (POLLIN) and
- write space available (POLLOUT) events are handled by the multiplexor. If there
- is a state change (disconnection) or other error on a TCP socket, an error is
- posted on the TCP socket so that a POLLERR event happens and KCM discontinues
- using the socket. When the application gets the error notification for a
- TCP socket, it should unattach the socket from KCM and then handle the error
- condition (the typical response is to close the socket and create a new
- connection if necessary).
- KCM limits the maximum receive message size to be the size of the receive
- socket buffer on the attached TCP socket (the socket buffer size can be set by
- SO_RCVBUF). If the length of a new message reported by the BPF program is
- greater than this limit a corresponding error (EMSGSIZE) is posted on the TCP
- socket. The BPF program may also enforce a maximum messages size and report an
- error when it is exceeded.
- A timeout may be set for assembling messages on a receive socket. The timeout
- value is taken from the receive timeout of the attached TCP socket (this is set
- by SO_RCVTIMEO). If the timer expires before assembly is complete an error
- (ETIMEDOUT) is posted on the socket.
- User interface
- ==============
- Creating a multiplexor
- ----------------------
- A new multiplexor and initial KCM socket is created by a socket call:
- socket(AF_KCM, type, protocol)
- - type is either SOCK_DGRAM or SOCK_SEQPACKET
- - protocol is KCMPROTO_CONNECTED
- Cloning KCM sockets
- -------------------
- After the first KCM socket is created using the socket call as described
- above, additional sockets for the multiplexor can be created by cloning
- a KCM socket. This is accomplished by an ioctl on a KCM socket:
- /* From linux/kcm.h */
- struct kcm_clone {
- int fd;
- };
- struct kcm_clone info;
- memset(&info, 0, sizeof(info));
- err = ioctl(kcmfd, SIOCKCMCLONE, &info);
- if (!err)
- newkcmfd = info.fd;
- Attach transport sockets
- ------------------------
- Attaching of transport sockets to a multiplexor is performed by calling an
- ioctl on a KCM socket for the multiplexor. e.g.:
- /* From linux/kcm.h */
- struct kcm_attach {
- int fd;
- int bpf_fd;
- };
- struct kcm_attach info;
- memset(&info, 0, sizeof(info));
- info.fd = tcpfd;
- info.bpf_fd = bpf_prog_fd;
- ioctl(kcmfd, SIOCKCMATTACH, &info);
- The kcm_attach structure contains:
- fd: file descriptor for TCP socket being attached
- bpf_prog_fd: file descriptor for compiled BPF program downloaded
- Unattach transport sockets
- --------------------------
- Unattaching a transport socket from a multiplexor is straightforward. An
- "unattach" ioctl is done with the kcm_unattach structure as the argument:
- /* From linux/kcm.h */
- struct kcm_unattach {
- int fd;
- };
- struct kcm_unattach info;
- memset(&info, 0, sizeof(info));
- info.fd = cfd;
- ioctl(fd, SIOCKCMUNATTACH, &info);
- Disabling receive on KCM socket
- -------------------------------
- A setsockopt is used to disable or enable receiving on a KCM socket.
- When receive is disabled, any pending messages in the socket's
- receive buffer are moved to other sockets. This feature is useful
- if an application thread knows that it will be doing a lot of
- work on a request and won't be able to service new messages for a
- while. Example use:
- int val = 1;
- setsockopt(kcmfd, SOL_KCM, KCM_RECV_DISABLE, &val, sizeof(val))
- BFP programs for message delineation
- ------------------------------------
- BPF programs can be compiled using the BPF LLVM backend. For exmple,
- the BPF program for parsing Thrift is:
- #include "bpf.h" /* for __sk_buff */
- #include "bpf_helpers.h" /* for load_word intrinsic */
- SEC("socket_kcm")
- int bpf_prog1(struct __sk_buff *skb)
- {
- return load_word(skb, 0) + 4;
- }
- char _license[] SEC("license") = "GPL";
- Use in applications
- ===================
- KCM accelerates application layer protocols. Specifically, it allows
- applications to use a message based interface for sending and receiving
- messages. The kernel provides necessary assurances that messages are sent
- and received atomically. This relieves much of the burden applications have
- in mapping a message based protocol onto the TCP stream. KCM also make
- application layer messages a unit of work in the kernel for the purposes of
- steerng and scheduling, which in turn allows a simpler networking model in
- multithreaded applications.
- Configurations
- --------------
- In an Nx1 configuration, KCM logically provides multiple socket handles
- to the same TCP connection. This allows parallelism between in I/O
- operations on the TCP socket (for instance copyin and copyout of data is
- parallelized). In an application, a KCM socket can be opened for each
- processing thread and inserted into the epoll (similar to how SO_REUSEPORT
- is used to allow multiple listener sockets on the same port).
- In a MxN configuration, multiple connections are established to the
- same destination. These are used for simple load balancing.
- Message batching
- ----------------
- The primary purpose of KCM is load balancing between KCM sockets and hence
- threads in a nominal use case. Perfect load balancing, that is steering
- each received message to a different KCM socket or steering each sent
- message to a different TCP socket, can negatively impact performance
- since this doesn't allow for affinities to be established. Balancing
- based on groups, or batches of messages, can be beneficial for performance.
- On transmit, there are three ways an application can batch (pipeline)
- messages on a KCM socket.
- 1) Send multiple messages in a single sendmmsg.
- 2) Send a group of messages each with a sendmsg call, where all messages
- except the last have MSG_BATCH in the flags of sendmsg call.
- 3) Create "super message" composed of multiple messages and send this
- with a single sendmsg.
- On receive, the KCM module attempts to queue messages received on the
- same KCM socket during each TCP ready callback. The targeted KCM socket
- changes at each receive ready callback on the KCM socket. The application
- does not need to configure this.
- Error handling
- --------------
- An application should include a thread to monitor errors raised on
- the TCP connection. Normally, this will be done by placing each
- TCP socket attached to a KCM multiplexor in epoll set for POLLERR
- event. If an error occurs on an attached TCP socket, KCM sets an EPIPE
- on the socket thus waking up the application thread. When the application
- sees the error (which may just be a disconnect) it should unattach the
- socket from KCM and then close it. It is assumed that once an error is
- posted on the TCP socket the data stream is unrecoverable (i.e. an error
- may have occurred in in the middle of receiving a messssge).
- TCP connection monitoring
- -------------------------
- In KCM there is no means to correlate a message to the TCP socket that
- was used to send or receive the message (except in the case there is
- only one attached TCP socket). However, the application does retain
- an open file descriptor to the socket so it will be able to get statistics
- from the socket which can be used in detecting issues (such as high
- retransmissions on the socket).
|