Writing a Xen Paravirtualized (PV) Driver

Back to Xen J6 Main Page

Overview
This wiki explains the Xen architecture and support behind paravirtual drivers. It outlines Xen's interface for paravirtual drivers, and describes the steps needed to setup the driver.

Prerequisites
You will need some sort of Xen setup to play with these drivers. If you're planning on using the J6 board, then see the Xen J6 Main Page.

Overview
Paravirtualization is virtualization which is assisted by the guest Operating System. This means that the guest is aware that it is running in a virtualized environment, and typically has hooks into the hypervisor that it calls when performing certain operations. This greatly improves the performance of many operations. For example, setting up page tables is much quicker because the guest makes a hypercall to the hypervisor to tell it to setup the page table, rather than being trapped into the hypervisor.

The purpose of paravirtual drivers is to enable hardware to be shared among multiple operating systems. They tend to be a relatively small layer that bridges guest domains to Dom0, who has complete access to the hardware, and the Linux driver to interface with it. Paravirtual drivers are also aware that they are running in a virtualized environment. This allows them to take advantage of many nice features, such as sharing pages between OS's and notifying other OS's via interrupts.

Split Driver Model
The standard model for paravirtual drivers written for Xen is what is called the Split Driver Model. The model gets it name from the fact that Xen paravirtual drivers are split into two components, one which resides in the privileged domain (either Dom0 or a driver domain), and the other which resides in the guest domain. The split driver model is pictured below:



In the Split Driver Model, the frontend driver communicates requests from DomU to the backend driver using Xen's communication primitives. The typical control flow is as follows. First, a request is generated for the paravirtualized device in DomU. Typically, this comes from userspace, but the exact source doesn't matter. Next, the frontend driver constructs a request and sends it to the backend driver using Xen's communication primitives. The backend driver then decodes the message and translates it into a format that can be understood by the driver that controls the hardware device.

Xen's communication primitives consist of two components: ring buffers and event channels. These will be described in the next sections.

Ring Buffers
The first part of Xen's communication primitives are ring buffers. Ring buffers are a mechanism for domains to send data back and forth to one another, via a shared memory region. Xen's ring buffers are two-way consumer-producer buffers, so messages can be sent in either direction. The buffer is implemented as a double buffer, where there is one segment for requests, and another segment for responses to these requests. Behind the scenes, the ring buffer interface is implemented with a series of polymorphic macros. Thus, the user of ring buffer defines the structure of request and response messages, giving them complete freedom over the messages.

Ring buffers are simply shared memory between the two domains that contain the frontend and backend drivers. The current interface only permits the ring buffer to span a single page. However, it is possible to expand the size of the buffer (if you have extremely large messages) to more than a page. You would need to use the lower-level grant table interface.

The ring buffers are shared between domains using another one of Xen's primitives: grant tables. Grant tables are simply the mechanism used by Xen to allow two domains to share memory. Both of the domains may be unprivileged. The grant table is just a table maintained per domain by Xen that tracks which domains have privilege on which pages registered in the table. Unless a driver needs to share additional memory, the driver doesn't need to worry about grant tables; the details are abstracted away by Xen's paravirtual driver interface.

Event Channels
The second part of Xen's communication primitives are event channels. Event channels are mechanism for domains to asynchronously notify one another, via interrupts to each OS. Event channels are used for both synchronization and notifications between domains. The event channel is used in conjunction with ring buffers to send messages. The ring buffer will contain the message, and the sending domain will then send an interrupt to other domain, telling them to process the message.

Event channels are simply IRQ lines in each domain connected to each other. When one domain wants to send a notification to another, the IRQ line is put to the appropriate assertion level by Xen. The event channel is essentially a hook into the other domain's IRQ line.

Xen Store/Xenbus
The final part of Xen's communication is the Xen Store (also referred to the Xenbus inside the interface). The Xen Store is a mechanism for domains to communicate with one another before setting up an explicit channel between one another. The Xen Store is a directory-based data structure that is shared between all currently running domains. It is a pseudo-filesystem that exists in memory. Like a filesystem, it is path-based and it has permissions as well.

For initially communicating the event channel and ring buffer grant reference ids, the guest will use the Xen Store. The privlidged domain will then read out the paramters from the Xen Store, and this will allow communication to be established between the two domains. The Xen Store is essentially just a shared filesystem between all domains. Notification of Xen Store updates are handled by the Xen's kernel driver for the Xen Store, which will invoke certain callbacks in the domains.

Defining Ring Buffers
First, you need to define the types and structures of the request and response messages. Typically, this will usually be a structure consisting of an opcode, and unions of structures for each type of message. For example:

Of course, the ring buffer functions are polymorphic, so you can define any type you please.

Once your request and response messages are defined, the ring buffer types need to be defined. This is accomplished via DEFINE_RING_TYPES macro:

This macro defines three types, struct name_prefix_sring, struct name_prefix_front_ring, struct name_prefix_back_ring. These represent the ring used by the paravirtual driver. Response_type and request_type are the types corresponding to the request and response messages you defined. Note that a ring buffer entry is a union of the request and response types, so the size of ring entry is the maximum of the types' sizes. Keep that in mind when defining the messages.

Allocating Ring Buffers
To create a ring buffer, you must first allocate a free page on the guest side. This can be accomplished by many different functions, but the simplest is:

Typically, this is called with the GFP_NOIO and __GFP_HIGH flags.

Once you get the address, you'll need to cast it as a pointer to your sring type (e.g. struct name_prefix_sring). Then, you'll need to initialize the shared ring. Once the shared ring is initialized, you then need to allocate the front ring structure. The front ring is typically stored in your driver's private data. These can be accomplished with the following macros:

SHARED_RING_INIT takes a pointer to the memory region allocated for the ring buffer. FRONT_RING_INIT takes the same pointer and another pointer to the front ring structure. The ring_size parameter is the size of memory region allocated for the ring (in bytes). FRONT_RING_INIT places a pointer to the shared ring structure in the field ring->sring.

Once the ring is initialized, you now need to enable it to be shared with other domains. This involves allocating a grant table entry for the ring, and then getting the reference number for the ring back. This can be accomplished by the following function:

This function takes xenbus device structure maintained by the xenbus. This will be available in the context that this function is called from. This will be covered more later. The ring_mfn is the machine frame number of the page containing the ring. You can get this from the virtual address via the macro #define virt_to_mfn(addr). This function returns the grant reference number of the shared page. This is an id that is used to refer to the shared page both by the current domain, and any other domains the page is being shared with. NOTE: This function assumes that the ring only occupies a single page.

This needs to be sent to the guest so they can get the page mapped to them. How to do this will be covered in Xenbus. For now, assume the guest has somehow got the refernce number for the ring buffer. With the ring buffer's grant table refernce number, we now need to setup the virtual memory mapping for the shared page on the backend side. This can be accomplished via the function:

This function takes the xenbus device structure and the grant reference number of the shared ring, returned by xenbus_grant_ring in the frontend domain. This function returns the virtual address that the shared ring was mapped in at the address given by vaddr.

Allocating Event Channels
To create an event channel, you must first allocate one on the guest side, and then bind it to an interrupt handler for the guest. First, you must allocate an event channel inside of Xen:

This function takes the xenbus structure maintained by the xenbus. This will be available from the context that the function is called from. This function returns the id for the allocated event channel at the addresss given by port.

Now, with the event channel we need an interrupt handler for the event channel whenever the other domain sends a notification. We can allocate an interrupt with the following function:

This function takes the event channel number returned by xenbus_alloc_evtchn and and a pointer to a function to use as the interrupt handler. It takes interrupt flags and the name of the device (this can be arbitrary). It also changes a generic argument to pass to the interrupt handler whenever it is invoked.

Next Steps
Now that you know the basics of writing a Xen paravirtual driver, you can see an example of the interface being used. For the example driver, see Example Xen PV Driver: Mailer.