Device Tree

= Introduction = Linux and the Device Tree (Documentation/devicetree/usage-model) The Linux usage model for device tree data

Author: Grant Likely 

This article describes how Linux uses the device tree. An overview of the device tree data format can be found at the Device Tree Usage page on devicetree.org.

All the cool architectures are using device tree. I want to use device tree too!

The "Open Firmware Device Tree", or simply Device Tree (DT), is a data structure and language for describing hardware. More specifically, it is a description of hardware that is readable by an operating system so that the operating system doesn't need to hard code details of the machine.

Structurally, the DT is a tree, or acyclic graph with named nodes, and nodes may have an arbitrary number of named properties encapsulating arbitrary data. A mechanism also exists to create arbitrary links from one node to another outside of the natural tree structure.

Conceptually, a common set of usage conventions, called 'bindings', is defined for how data should appear in the tree to describe typical hardware characteristics including data busses, interrupt lines, GPIO connections, and peripheral devices.

As much as possible, hardware is described using existing bindings to maximize use of existing support code, but since property and node names are simply text strings, it is easy to extend existing bindings or create new ones by defining new nodes and properties.

History
The DT was originally created by Open Firmware as part of the communication method for passing data from Open Firmware to a client program (like to an operating system). An operating system used the Device Tree to discover the topology of the hardware at runtime, and thereby support a majority of available hardware without hard coded information (assuming drivers were available for all devices).

Since Open Firmware is commonly used on PowerPC and SPARC platforms, the Linux support for those architectures has for a long time used the Device Tree.

In 2005, when PowerPC Linux began a major cleanup and to merge 32-bit and 64-bit support, the decision was made to require DT support on all powerpc platforms, regardless of whether or not they used Open Firmware. To do this, a DT representation called the Flattened Device Tree (FDT) was created which could be passed to the kernel as a binary blob without requiring a real Open Firmware implementation. U-Boot, kexec, and other bootloaders were modified to support both passing a Device Tree Binary (dtb) and to modify a dtb at boot time.

Some time later, FDT infrastructure was generalized to be usable by all architectures. At the time of this writing, 6 mainlined architectures (arm, microblaze, mips, powerpc, sparc, and x86) and 1 out of mainline (nios) have some level of DT support.

Data Model
If you haven't already read the Device Tree Usage page, then go read it now. It's okay, I'll wait....

High Level View
The most important thing to understand is that the DT is simply a data structure that describes the hardware. There is nothing magical about it, and it doesn't magically make all hardware configuration problems go away. What it does do is provide a language for decoupling the hardware configuration from the board and device driver support in the Linux kernel (or any other operating system for that matter). Using it allows board and device support to become data driven; to make setup decisions based on data passed into the kernel instead of on per-machine hard coded selections.

Ideally, data driven platform setup should result in less code duplication and make it easier to support a wide range of hardware with a single kernel image.

Linux uses DT data for three major purposes:
 * 1) platform identification,
 * 2) runtime configuration, and
 * 3) device population.

Platform Identification
First and foremost, the kernel will use data in the DT to identify the specific machine. In a perfect world, the specific platform shouldn't matter to the kernel because all platform details would be described perfectly by the device tree in a consistent and reliable manner. Hardware is not perfect though, and so the kernel must identify the machine during early boot so that it has the opportunity to run machine-specific fixups.

In the majority of cases, the machine identity is irrelevant, and the kernel will instead select setup code based on the machine's core CPU or SoC. On ARM for example, setup_arch in arch/arm/kernel/setup.c will call setup_machine_fdt in arch/arm/kernel/devicetree.c which searches through the machine_desc table and selects the machine_desc which best matches the device tree data. It determines the best match by looking at the 'compatible' property in the root device tree node, and comparing it with the dt_compat list in struct machine_desc.

The 'compatible' property contains a sorted list of strings starting with the exact name of the machine, followed by an optional list of boards it is compatible with sorted from most compatible to least. For example, the root compatible properties for the TI BeagleBoard and its successor, the BeagleBoard xM board might look like:

compatible = "ti,omap3-beagleboard", "ti,omap3450", "ti,omap3"; compatible = "ti,omap3-beagleboard-xm", "ti,omap3450", "ti,omap3";

Where "ti,omap3-beagleboard-xm" specifies the exact model, it also claims that it compatible with the OMAP 3450 SoC, and the omap3 family of SoCs in general. You'll notice that the list is sorted from most specific (exact board) to least specific (SoC family).

Astute readers might point out that the Beagle xM could also claim compatibility with the original Beagle board. However, one should be cautioned about doing so at the board level since there is typically a high level of change from one board to another, even within the same product line, and it is hard to nail down exactly what is meant when one board claims to be compatible with another. For the top level, it is better to err on the side of caution and not claim one board is compatible with another. The notable exception would be when one board is a carrier for another, such as a CPU module attached to a carrier board.

One more note on compatible values. Any string used in a compatible property must be documented as to what it indicates. Add documentation for compatible strings in Documentation/devicetree/bindings.

Again on ARM, for each machine_desc, the kernel looks to see if any of the dt_compat list entries appear in the compatible property. If one does, then that machine_desc is a candidate for driving the machine. After searching the entire table of machine_descs, setup_machine_fdt returns the 'most compatible' machine_desc based on which entry in the compatible property each machine_desc matches against. If no matching machine_desc is found, then it returns NULL.

The reasoning behind this scheme is the observation that in the majority of cases, a single machine_desc can support a large number of boards if they all use the same SoC, or same family of SoCs. However, invariably there will be some exceptions where a specific board will require special setup code that is not useful in the generic case. Special cases could be handled by explicitly checking for the troublesome board(s) in generic setup code, but doing so very quickly becomes ugly and/or unmaintainable if it is more than just a couple of cases.

Instead, the compatible list allows a generic machine_desc to provide support for a wide common set of boards by specifying "less compatible" value in the dt_compat list. In the example above, generic board support can claim compatibility with "ti,omap3" or "ti,omap3450". If a bug was discovered on the original beagleboard that required special workaround code during early boot, then a new machine_desc could be added which implements the workarounds and only matches on "ti,omap3-beagleboard".

PowerPC uses a slightly different scheme where it calls the .probe hook from each machine_desc, and the first one returning TRUE is used. However, this approach does not take into account the priority of the compatible list, and probably should be avoided for new architecture support.

Runtime configuration
In most cases, a DT will be the sole method of communicating data from firmware to the kernel, so also gets used to pass in runtime and configuration data like the kernel parameters string and the location of an initrd image.

Most of this data is contained in the /chosen node, and when booting Linux it will look something like this:

chosen { bootargs = "console=ttyS0,115200 loglevel=8"; initrd-start = &lt;0xc8000000&gt;; initrd-end = &lt;0xc8200000&gt;; };

The bootargs property contains the kernel arguments, and the initrd-* properties define the address and size of an initrd blob. The chosen node may also optionally contain an arbitrary number of additional properties for platform-specific configuration data.

During early boot, the architecture setup code calls of_scan_flat_dt several times with different helper callbacks to parse device tree data before paging is setup. The of_scan_flat_dt code scans through the device tree and uses the helpers to extract information required during early boot. Typically the early_init_dt_scan_chosen helper is used to parse the chosen node including kernel parameters, early_init_dt_scan_root to initialize the DT address space model, and early_init_dt_scan_memory to determine the size and location of usable RAM.

On ARM, the function setup_machine_fdt is responsible for early scanning of the device tree after selecting the correct machine_desc that supports the board.

Device population
After the board has been identified, and after the early configuration data has been parsed, then kernel initialization can proceed in the normal way. At some point in this process, unflatten_device_tree is called to convert the data into a more efficient runtime representation. This is also when machine-specific setup hooks will get called, like the machine_desc .init_early, .init_irq and .init_machine hooks on ARM. The remainder of this section uses examples from the ARM implementation, but all architectures will do pretty much the same thing when using a DT.

As can be guessed by the names, .init_early is used for any machine- specific setup that needs to be executed early in the boot process, and .init_irq is used to set up interrupt handling. Using a DT doesn't materially change the behaviour of either of these functions. If a DT is provided, then both .init_early and .init_irq are able to call any of the DT query functions (of_* in include/linux/of*.h) to get additional data about the platform.

The most interesting hook in the DT context is .init_machine which is primarily responsible for populating the Linux device model with data about the platform. Historically this has been implemented on embedded platforms by defining a set of static clock structures, platform_devices, and other data in the board support .c file, and registering it en-masse in .init_machine. When DT is used, then instead of hard coding static devices for each platform, the list of devices can be obtained by parsing the DT, and allocating device structures dynamically.

The simplest case is when .init_machine is only responsible for registering a block of platform_devices. A platform_device is a concept used by Linux for memory or I/O mapped devices which cannot be detected by hardware, and for 'composite' or 'virtual' devices (more on those later). While there is no 'platform device' terminology for the DT, platform devices roughly correspond to device nodes at the root of the tree and children of simple memory mapped bus nodes.

About now is a good time to lay out an example. Here is part of the device tree for the NVIDIA Tegra board.

/{ 	compatible = "nvidia,harmony", "nvidia,tegra20"; #address-cells = <1>; #size-cells = <1>; interrupt-parent = <&intc>; chosen { }; aliases { }; memory { device_type = "memory"; reg = ; }; 	soc { compatible = "nvidia,tegra20-soc", "simple-bus"; #address-cells = <1>; #size-cells = <1>; ranges; intc: interrupt-controller@50041000 { compatible = "nvidia,tegra20-gic"; interrupt-controller; #interrupt-cells = <1>; reg =, ; }; 		serial@70006300 { compatible = "nvidia,tegra20-uart"; reg = ; interrupts = ; }; 		i2s-1: i2s@70002800 { compatible = "nvidia,tegra20-i2s"; reg = ; interrupts = <77>; codec = <&wm8903>; }; 		i2c@7000c000 { compatible = "nvidia,tegra20-i2c"; #address-cells = <1>; #size-cells = <1>; reg = ; interrupts = <70>; wm8903: codec@1a { compatible = "wlf,wm8903"; reg = ; interrupts = ; }; 		}; 	}; 	sound { compatible = "nvidia,harmony-sound"; i2s-controller = <&i2s-1>; i2s-codec = <&wm8903>; }; };

At .machine_init time, Tegra board support code will need to look at this DT and decide which nodes to create platform_devices for. However, looking at the tree, it is not immediately obvious what kind of device each node represents, or even if a node represents a device at all. The /chosen, /aliases, and /memory nodes are informational nodes that don't describe devices (although arguably memory could be considered a device). The children of the /soc node are memory mapped devices, but the codec@1a is an i2c device, and the sound node represents not a device, but rather how other devices are connected together to create the audio subsystem. I know what each device is because I'm familiar with the board design, but how does the kernel know what to do with each node?

The trick is that the kernel starts at the root of the tree and looks for nodes that have a 'compatible' property. First, it is generally assumed that any node with a 'compatible' property represents a device of some kind, and second, it can be assumed that any node at the root of the tree is either directly attached to the processor bus, or is a miscellaneous system device that cannot be described any other way. For each of these nodes, Linux allocates and registers a platform_device, which in turn may get bound to a platform_driver.

Why is using a platform_device for these nodes a safe assumption? Well, for the way that Linux models devices, just about all bus_types assume that its devices are children of a bus controller. For example, each i2c_client is a child of an i2c_master. Each spi_device is a child of an SPI bus. Similarly for USB, PCI, MDIO, etc. The same hierarchy is also found in the DT, where I2C device nodes only ever appear as children of an I2C bus node. Ditto for SPI, MDIO, USB, etc. The only devices which do not require a specific type of parent device are platform_devices (and amba_devices, but more on that later), which will happily live at the base of the Linux /sys/devices tree. Therefore, if a DT node is at the root of the tree, then it really probably is best registered as a platform_device.

Linux board support code calls of_platform_populate(NULL, NULL, NULL) to kick off discovery of devices at the root of the tree. The parameters are all NULL because when starting from the root of the tree, there is no need to provide a starting node (the first NULL), a parent struct device (the last NULL), and we're not using a match table (yet). For a board that only needs to register devices, .init_machine can be completely empty except for the of_platform_populate call.

In the Tegra example, this accounts for the /soc and /sound nodes, but what about the children of the SoC node? Shouldn't they be registered as platform devices too? For Linux DT support, the generic behaviour is for child devices to be registered by the parent's device driver at driver .probe time. So, an i2c bus device driver will register a i2c_client for each child node, an SPI bus driver will register its spi_device children, and similarly for other bus_types. According to that model, a driver could be written that binds to the SoC node and simply registers platform_devices for each of its children. The board support code would allocate and register an SoC device, an SoC device driver would bind to the SoC device, and register platform_devices for /soc/interrupt-controller, /soc/serial, /soc/i2s, and /soc/i2c in its .probe hook. Easy, right? Although it is a lot of mucking about for just registering platform devices.

It turns out that registering children of certain platform_devices as more platform_devices is a common pattern, and the device tree support code reflects that. The second argument to of_platform_populate is an of_device_id table, and any node that matches an entry in that table will also get its child nodes registered. In the tegra case, the code can look something like this:

static struct of_device_id harmony_bus_ids[] __initdata = { { .compatible = "simple-bus", }, {} }; static void __init harmony_init_machine(void) { 	/* ... */ 	of_platform_populate(NULL, harmony_bus_ids, NULL); }

"simple-bus" is defined in the ePAPR 1.0 specification as a property meaning a simple memory mapped bus, so the of_platform_populate code could be written to just assume simple-bus compatible nodes will always be traversed. However, we pass it in as an argument so that board support code can always override the default behaviour.

Appendix A: AMBA devices
ARM Primecells are a certain kind of device attached to the ARM AMBA bus which include some support for hardware detection and power management. In Linux, struct amba_device and the amba_bus_type is used to represent Primecell devices. However, the fiddly bit is that not all devices on an AMBA bus are Primecells, and for Linux it is typical for both amba_device and platform_device instances to be siblings of the same bus segment.

When using the DT, this creates problems for of_platform_populate because it must decide whether to register each node as either a platform_device or an amba_device. This unfortunately complicates the device creation model a little bit, but the solution turns out not to be too invasive. If a node is compatible with "arm,amba-primecell", then of_platform_populate will register it as an amba_device instead of a platform_device.

OMAP specifics bindings
For the moment (kernel 3.2), OMAP devices are still relying on the omap_device / omap_hwmod mechanism to populate the IRQ, DMA and address space. A TI specific binding was thus added for that purpose.

ti,hwmods = "hwmod1", "hwmod2";

Common Pitfalls of DT

 * All data is encoded in big endian format

For example be32_to_cpu converts data from big endian 32 bit format to the format supported by cpu
 * Helpers available to convert data from big endian format to whatever format a given cpu supports

No checking done at compile time by the DTC compiler
 * Typo’s in dts files can be very expensive


 * Editing a .dtsi file does not recompile the .dts file which includes it. Manually remove the .dtb file (Maybe there is a better alternative I don’t know of)

OMAP DT migration series
A couple of device tree series for OMAP were already posted and can be used as a reference / example for driver owner.

OMAP: Add initial support for DT on OMAP3 & OMAP4
 * git://gitorious.org/omap-pm/linux.git for_3.2/3_omap_devicetree

OMAP3+: Add DT support for early devices and i2c / twl6030
 * git://gitorious.org/omap-pm/linux.git for_3.2/4_omap_dt_i2c_twl

In this series, the GPIO driver can be used to see how to move some platform_data information to device tree properties.

Device tree support for regulators
 * git://gitorious.org/omap-pm/linux.git 5+_omap_dt_i2c_twl_reg

= Drivers / Features migration status =

= Booting with DT blob =

Using an updated/dt-enabled u-boot


 * get the latest u-boot sources from git://git.denx.de/u-boot.git


 * The basic DT support for OMAP is available in mainline since 3.2-rc1


 * use omap2plus_defconfig as it has DT support enabled by default


 * Build the dtb file using 'make ARCH=arm '


 * Supported  for now are omap4-panda.dtb, omap4-sdp.dtb and omap3-beagle.dtb


 * load the kernel to a and dtb file to a in RAM over NFS or from a MMC card.

set fdt_high 0xffffffff ; fdt addr ; fdt resize; bootm - '
 * Use the following u-boot command to specify the dtb file location to the kernel at boot

Using legacy u-boot and DT-APPEND support in kernel

Grant did a patch to build and append automatically the blob to the zImage before creating the uImage. The latest patch based upon kernel 3.4-rc5 is available here. The orignal patch can be found on the git tree below.

git://gitorious.org/omap-pm/linux.git dt_test/build_uimage_append

Thanks to that, you can create the proper uImage after enabling CONFIG_ARM_APPENDED_DTB in Boot Options config. The syntax is that one:

make uImage-dtb.

The following board DTS are available for the moment:

make uImage-dtb.omap4-sdp make uImage-dtb.omap4-panda make uImage-dtb.omap3-beagle

A legacy u-boot without DT support can then load this uImage like before. The kernel will detect the presence of the DTB blob and pass the physical address automatically.

= Links =

The devicetree.org wiki web maintained by Grant: http://devicetree.org/Main_Page

The most interesting page: http://devicetree.org/Device_Tree_Usage

The official spec: http://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.0.pdf

ARM Device Tree Status Report presentation done by Grant During ELC 2010: http://elinux.org/images/b/b6/ARM_Device_Tree_Status_Report.pdf

ELCE: Grant Likely on device trees http://lwn.net/Articles/414016/