Ducati For Dummies

Intended readers

 * This wiki in targeted towards software professionals, at TI and with customers, who are new to Ducati as well as those beginning to transition from OMAP3 to OMAP4. Professionals, including developers and development managers with OMAP customers, will find quick and easy information on various software elements of Ducati sub-system, and how these tie up together to provide a robust foundation for the deployment of video and image compression and processing algorithms. This wiki is also the recommended starting point for open source developers new to Ducati, for example, Panda Board users.


 * This wiki is not intended for professionals already using Ducati software, or developers who already have a reasonable understanding of Ducati sub-system. The wiki also does not delve deep into performance details and low level features, for which relevant links have been provided for the readers to explore further. It is not meant to be used as an Integration Guide, Interface Guide or Design Document.


 * The purpose of this wiki is to give its reader a quick yet useful insight into the architecture, software modules, features, tools and capability of Ducati sub-system, so as to significantly shorten the time it may take to ramp-up a new comer on Ducati and/or to swiftly boost the comfort level with Ducati.

Want to know more
Details of Ducati Cortex-M3 Sub System, Ducati Imaging Sub System (ISS) and Ducati Image Video Accelerator - High Definition (IVA-HD) Sub System are available in Chapters 7, 8 and 6 respectively of OMAP4430 TRM.

Performance data at Ducati level for the popular use-cases is available separately.

Interface Guide, accompanying each Ducati OpenMAX™ component, provides all information that a Cortex A9 developer needs to have inorder to integrate the given Ducati OpenMAX™ component into the HLOS MM framework.

Pre-requisites
Reader is expected to know the basics of multimedia codecs and pre/post processing. For example, for a reader who intends to understand how a H.264 video bitstream is decoded on Ducati sub-system, the fundamentals of H.264 video coding should be known.

Some familiarity with color spaces, sub-sampling formats, OpenMAX™ IL Specification, and system concepts such as memory map, inter-processor communication, DMA, program compilation and program loading, is also assumed.

License Terms
CC-BY-SA 3.0. Please visit http://creativecommons.org/licenses/by-sa/3.0/.

Inside the Sub System
Ducati Sub System comprises of two ARM® Cortex-M3 processors (CPUs), IVA-HD subsystem and ISS subsystem. The two Cortex-M3 processors are known as Sys M3 and App M3 and they comprise the MPU of Ducati Subsystem. Figure below shows a simplified block diagram of Ducati subsystem, with arrows indicating the software flow.




 * Dual Core Cortex-A9 MPU is the OMAP4 Host Processor running a High Level Operating System (HLOS) such as Linux or Android in Symmetric Multi Processing (SMP) mode. It uses the Ducati Sub System for video and image acceleration.
 * One of the 2 Cortex-M3 processors (Sys M3) runs the Notify Driver which accepts commands from the HLOS software and then provides these to the other Cortex-M3 processor (App M3).
 * The entire Ducati multimedia software executes on App M3. OMX components on App M3 invoke the necessary APIs for video or image processing on IVA-HD subsystem or ISS subsystem.
 * On the reverse path, App M3 can directly acknowledge the Dual Core Cortex-A9 MPU.
 * BIOS 6.x runs on both Cortex-M3 MPUs in non-SMP mode.

IVA-HD Overview
The Image Video Accelerator - High Definition (IVA-HD) Sub System is composed of hardware accelerators which enable video encoding and decoding up to 1080 p/i resolution at 30 frames per second (or 60 fields per second).

Currently Supported Codecs and Features
IVA-HD sub-system supports multiple video coding/compression standards.

Both encoder and decoder of the following video standards are currently supported, with performance up to 1920x1080 resolution at 30 fps (1080p30): For 1080p30 encoding/decoding, bit rates up to 20 Mbps are supported.
 * H.264: Constrained Baseline Profile (CBP), Main Profile (MP), High Profile (HP)
 * MPEG-4 Visual: Simple Profile (SP) and Advanced Simple Profile (ASP) for decode, SP for encode
 * H.263: Profiles 0 and 3 for decode, Profile 0 for encode

Only decoders of the following video standards are currently supported, with performance up to 1920x1080 resolution at 30 fps and 20 Mbps:
 * MPEG-2: Upto Simple Profile Main Level, Main Profile High Level.
 * VC-1: Simple Profile Low Level, Simple Profile Medium Level, Main Profile Low Level, Main Profile Medium Level, Main Profile High Level, Advanced Profile Level 0, Level 1, Level 2, and Level 3.
 * SVC: Scalable Baseline Profile, with some constraints on features and achievable performance.

The interfaces of all video encoders and decoders on IVA-HD are XDM complaint.

Please note that tools such as Resync Marker (RM), Data Partitioning (DP) and Reversible Variable Length Codes (RVLC) are supported by the MPEG-4 encoder/decoder codec at IVA-HD level, but currently not enabled by MPEG-4 encoder/decoder OMX component at Ducati sub-system level. For details and latest information about the tools and features enabled at Ducati level, please refer to the Interface Guide of corresponding OMX component.

ISS Overview
The Imaging Sub System (ISS) deals with the processing of pixel data coming from an external image sensor or from memory. ISS as a component forms part of still image capture, camera viewfinder and video record use-cases. Figure below shows the Imaging subsystem constituents.



ISS subparts include: Among these, ISP and SIMCOP are the two major processing blocks.
 * Two CSI2: Receives data from sensor.
 * CCP2: Receives data from sensor or reads from memory.
 * ISP: Pre/post processing operations on pixel data (received from sensor or from memory).
 * SIMCOP: Imaging accelerator.
 * BTE: Burst Translation Engine. Converts from raster to 2D tiled order and vice versa, for data write and data read respectively.
 * CBUFF: Circular Buffer for linear space, physically located in memory.

ISP
The Image Signal Processor (ISP) subsystem is designed to perform various kinds of pre/post processing operations on pixel data.

ISP supports the following features:
 * On-the-fly or memory-to-memory processing
 * Up to 200 MPix/s throughput
 * Statistic data collection
 * Image pipe interface front-end raw data processing
 * RGB and YUV data processing
 * Hardware 3A statistics block for real-time auto focus, auto exposure, and auto white balance
 * Two real-time resizers
 * Video port for interfacing with the receivers and directing data to the ISP

SIMCOP
The Still IMage COProcessor (SIMCOP) subsystem is designed to encode, decode, and process image data. SIMCOP is a block-based memory-to-memory processing engine. It fetches blocks from system memory and stores them into local memories. Different accelerators take the fetched data, perform processing, and send the processed output back to local memories. From there the data could be further processed by other accelerators or be sent back to system memory. The SIMCOP needs an external central processing unit (CPU) to perform high-level control tasks and configurations; in the current software design it is closely coupled to Cortex App M3.

SIMCOP enables the following use-cases:
 * Single shot High Quality Image Capture (Lens Distortion Correction, Noise Filtering, JPEG Encode, with and without rotation) of resolutions up to 16 Mega Pixels
 * JPEG Decoding (with and without rotation) of resolutions up to 16 Mega Pixels
 * Burst mode (High Speed) Image Capture (JPEG Encode, with and without rotation) of resolutions up to 16 Mega Pixels
 * Video Noise Filter based Video Capture

Camera
ISS has two camera interfaces: primary and secondary. The primary interface is CSI2-A and the secondary interface is CSI2-B/CCP2. All interfaces can use the ISP, but not concurrently. When one interface uses the ISP, the other one must send data to memory. However, the ISP can still be used to process this data in memory-to-memory. Time multiplex processing is also possible.

The camera subsystem can manage up to two serial image sensors that can be active at the same time. However, only one data flow can use the ISP. Because CSI2-B and CCP2 share pins, they cannot be used simultaneously. CSI2-A and CSI2-B/CCP2 can function at the same time by sending data to memory.

Ducati Camera software exists as an OpenMAX™ component compliant with Khronos OpenMAX™ IL V1.1.2 specification. JPEG encoding also takes place within OMX Camera component. Ducati camera component has been designed to bring in standardization (OpenMAX™), to enable easy 3rd party algorithm integration, to comfortably enable use-case creation and modification, and to conveniently support new sensor adaptation.

Camera driver supports switching between the two camera sensors - primary and secondary. Primary sensor may be up to 16 MPix and secondary up to 5 MPix. Camera includes a dynamic sensor detection algorithm that detects the sensor currently in use from an array of known possibilities, except when using an external-ISP.

OMX Camera component supports
 * Auto and manual control of ISO settings for the sensor.
 * YUV420P NV12 and YUV422I UYVY sub sampling formats for image capture preview and video preview.
 * YUV420P NV12 and JPEG formats for image capture output.
 * YUV420P NV12 format for video capture output.
 * Both ITU-R BT.601-5 range and full 8-bit range pixel scaling based on user input for video capture.
 * Configuration of output image resolution.
 * Control of the image contrast and brightness in manual and automatic modes.
 * On-the-fly rotation before JPEG encoding.
 * Zooming in video mode, aborting the image capture operation and frame rate query and configuration.
 * High quality image capture in both office and low light conditions, in single shot mode. After an image is captured camera will immediately return to viewfinder mode or stay in image capture mode until notified to switch to viewfinder mode.
 * Picture cropping and resizing, where the supported resizing ratio can vary from 16x to 1/16x. The resizing ratio is 256/N where N ranges from 16 to 4096.

ISS Design and Features
ISS is designed so that it can reach high throughput and low latency with large image sensors. In high-performance mode, ISS supports a pixel throughput of 200 MPix/s. Two programmable imaging extension (iMX) modules are included in the SIMCOP subsystem to add further flexibility to implement new algorithms or in case any issues are encountered with the image sensors. The iMX processors are also open for third-party algorithms.

ISS targets the following major use-cases:
 * Viewfinder with digital zoom, video stabilization, and rotation
 * Up to 1080p video record @ 30 fps with digital zoom, video stabilization, and rotation
 * Up to 16 MPix still image capture with digital zoom and rotation
 * High performance mode: Up to 200 MPix/s throughput
 * High quality and low light modes: Up to 50 MPix/s throughput
 * Still image capture during video record

Ducati Multimedia Software
Ducati Code is organised into ducati multimedia (ducatimm), drivers, algorithm, 3a framework and tools modules. Following sections outline Ducati Build Tools, Memory Map, Boot Sequence and Debug Features

Ducati Build Tools
The Ducati multimedia software executes on App M3 which is running BIOS 6.x operating system.


 * BIOS is a real-time operating system (RTOS). It is designed to be used by applications that require real-time scheduling and synchronization, host-to target communication, or real-time instrumentation. BIOS provides preemptive multi-threading, hardware abstraction, real-time analysis, and configuration tools. BIOS 6.x is the new version of BIOS which consists entirely of RTSC components.


 * RTSC is a C-based programming model for developing, delivering, and deploying Real-Time Software Components targeted for diverse embedded platforms, without compromising system performance.


 * XDC is the build and configuration system used to build Ducati multimedia software. XDC is also based on RTSC. The XDCTools product includes foundational tooling and runtime elements for producing and consuming C-based embedded software components.


 * Code Generation Tools (CG Tools) is the compiler tool chain that is used for building the Ducati sources targeted for Cortex-M3.

Apart from the above, there are a few other modules which are dependencies for building the Ducati software. These dependencies are bundled as RTSC components. The following is the summary of the dependencies.


 * 1) XDC - Build system
 * 2) BIOS - OS
 * 3) CG Tools - Compiler
 * 4) Codec Engine
 * 5) Framework components
 * 6) XDAIS
 * 7) OSAL
 * 8) Ducati SysLink [rpmsg]

Debugging
Code Composer Studio (CCS) is used for debugging Ducati code. For more information regarding CCS and it's installation please email to ducati_system-info@gforge.ti.com.

CCS provides full debug capability to allow source and assembly code debugging. It also contains built in tools based on Unified Instrumentation Architecture (UIA) to allow low latency tracing and instrumentation. UIA is explained in a later section.

Configuring traces on Ducati

Within Ducati multimedia software, TIMM OSAL module provides OS abstraction of key functionalities. Within the same module, trace definitions and configurations are defined.

At the top level, we have two macros which are used to control the traces

TIMM_OSAL_DEBUG_TRACE_DETAIL TIMM_OSAL_DEBUG_TRACE_LEVEL

These macros are defined in \WTSD_DucatiMMSW\platform\osal\timm_osal_trace.h.

Note: Individual components can override these macros to change the trace level and detail. This is achieved by defining macros with the same name before including (#include) the file timm_osal_trace.h.

The TIMM_OSAL_DEBUG_TRACE_DETAIL macro can be configured to values 0, 1, 2

TIMM_OSAL_DEBUG_TRACE_DETAIL 0 - no detail TIMM_OSAL_DEBUG_TRACE_DETAIL 1 - prints the function name TIMM_OSAL_DEBUG_TRACE_DETAIL 2 - prints the function name and line number along with the trace

The TIMM_OSAL_DEBUG_TRACE_LEVEL macro can be configured to values 0, 1, 2, 3, 4, 5

TIMM_OSAL_DEBUG_TRACE_LEVEL 0 - no traces TIMM_OSAL_DEBUG_TRACE_LEVEL 1 - Error trace TIMM_OSAL_DEBUG_TRACE_LEVEL 2 - Warning TIMM_OSAL_DEBUG_TRACE_LEVEL 3 - Info TIMM_OSAL_DEBUG_TRACE_LEVEL 4 - Debug TIMM_OSAL_DEBUG_TRACE_LEVEL 5 - Function Entry/Exit

By default, TIMM_OSAL_DEBUG_TRACE_DETAIL and TIMM_OSAL_DEBUG_TRACE_LEVEL are set to 1. The traces can be enabled or disabled only at compile time.

Ducati Memory Map
Content under preparation.

Ducati Boot Sequence
Content under preperation

Ducati Resource And Power Management Framework
Content under preperation

DOMX
Distributed OpenMAX™ Integration Layer (IL) framework (DOMX) is the multimedia framework used for Inter Processor Communication (IPC) on OMAP4 to enable easy and transparent use of OpenMAX IL (OMX) components, on remote processing cores like Ducati. It is built upon Remote Procedure Call (RPC) concepts.

On OMAP4, many of the (high performance) video/imaging codecs and algorithms execute on the hardware accelerator subsystem (Ducati), but the application/client which invokes these codecs and algorithms executes on the host processor (Cortex A9). The IPC between Cortex A9 and Ducati is handled by SysLink. The only interface that Ducati exposes to Cortex A9 is OMX.

DOMX framework provides a seamless abstraction for Cortex A9 and Ducati to communicate via SysLink. This framework abstracts all details like buffer address translations, maintaining cache-external memory coherence, exact mechanisms used to communicate to remote cores, etc.

IPC Overview
Based on the functionalities involved, DOMX is layered into OMX-Proxy, OMX-RPC (OpenMAX API specific Remote Procedure Call) and RCM (Remote Command Message is a generic RPC service maintained by SysLink).

All IPC between Cortex A9 and Ducati has to pass through these layers. Figure below shows a block diagram of communication between these two processing cores by means of DOMX. Note that the Host processor is Cortex A9 and Remote processor is Ducati sub system. There are proxies for each component (H.264 encoder, MPEG-4 Visual decoder, JPEG decoder, etc.) The code common to all these proxies is extracted into Proxy Common block, with only the component specific parts in the individual proxies.



A Proxy is a thin component that provides OpenMAX interface APIs for peer OpenMAX components and clients. A proxy component as the name suggests is a ‘Proxy’ on local core that corresponds to a real component implemented on a remote core. It is a thin layer exporting the OpenMAX interfaces.

The proxy component’s principal functionality is to forward calls to remote component using RPC Stub functions and RCM Client-Server infrastructure. Proxies also forward client call backs to the IL Client for calls that originated on the remote core. All buffer address mapping, maintenance, etc. are the responsibility of the proxy component.

OMX-RPC layer implements Stubs (packing the information of an API call) and Skeletons (unpacking the information of an API call) for all the OpenMAX APIs. For calls originating on local core RPC Stub allocates message, packs the arguments and invokes appropriate remote function based on OpenMAX API using RCM layer.

The RPC Skeleton (on the remote core) receives the packed data, unpacks it and invokes appropriate remote function with the unpacked arguments. RPC Skeleton is also responsible for packing any return values associated with the remote call and sending it back to the local core. On the return path, the RPC Stub extracts the remote invocation ‘Return’ value, frees the message and returns the remote invocation ‘Return’ value to the caller.

The RCM layer is provided by SysLink and abstracts core specific IPC and transport layers and manages the messaging between processing cores. RCM is implemented on Client-Server paradigm. RCM client implements the transport mechanism to send a message to a given server. The server is identified by a name that is unique across the entire system. On reception of the message the RCM Server extracts the function index from it and invokes the associated remote function. To enable index based remote invocation the server maintains an Index-Remote Function table that is populated at initialization or by registration. The client will discover the index for a given remote function by querying the server by the remote function name.

All communication from host to remote OMX component goes via a proxy which exposes the same OMX interface as the real component. The proxy sends message to RPC layer which packs data and uses the RCM client-server model to send this data to remote core via SysLink. On remote core, the RPC layer unpacks the message received and sends it to the real OMX component. Callbacks if any, follow a similar path in reverse direction.

Thus the DOMX layer ensures that OMX components and the clients which invoke these components remain unaware of any IPC involved. As far as the components and their clients are concerned there is no difference between a local and remote call. DOMX framework ensures that a remote call is executed in the same way as a local call.

OpenMAX™ Test Bench
OpenMAX™ Test Bench (OMTB) is a target based test bench, developed internally, to test OpenMAX™ components in a uniform manner. It provides the user either Command Line Interface or a Scripting Interface. Underneath, it has a Command Parser, Template Manager and Individual Usecase/Instance Managers.

OMTB resides on Host MPU and interfaces directly with DOMX Proxy components or Cortex A9 side OMX components. Figure below shows a block diagram of OMTB framework and interface.




 * Currently supports testing of the following Ducati based OMX Components:
 * OMX Camera
 * OMX H.264 Video Decoder
 * OMX MPEG-4 Video Decoder
 * OMX MPEG-2 Video Decoder
 * OMX VC1 Video Decoder
 * OMX H.264 Video Encoder
 * OMX MPEG-4 Video Encoder

OMTB offers 3 types of commands at the OMTB Command Line – OMX, Utility and System

OMTB Commands: ---  func                  api                   api_test

Utility Commands: --  -s                   setp                  getp omtb_rel_info       add                   remove load                store                 reset omtb_dbg_lvl

System Commands: [NO omx prefix]:

system        sleep        notify conf [khronos Conf] # [Comment]


 * In FUNC mode, the entire usecase can be executed in a single command. The below syntax shows the OMTB command to video encode using Instance 0 of H.264 Encoder. Configuration parameters like IO files will be picked up using Template 2.

omx setp 2 venc compname omx.TI.DUCATI1.VIDEO.H264E omx setp 2 venc codingtype omx_VIDEO_CodingAVC omx setp 2 venc infile sample.yuv omx setp 2 venc outfile sample.264 omx setp 2 venc frame_width 176 omx setp 2 venc frame_height 144

omx func videnc venc 0 2


 * In API mode, OMTB provides a set of commands which mimic the OMX API usage. Following commands have to be executed to stich a usecase.

omx api gethandle vdec 2 0 omx api sendcommand state vdec 0 idle omx api sendcommand state vdec 0 exec omx api sendcommand state vdec 0 idle omx api sendcommand state vdec 0 loaded omx api freehandle vdec 0

In FUNC and API modes, the OMTB calls are non blocking, thus facilitating multi instance testing. In case of back-2-back testing, client should ensure that “End Of Stream” string has been received before starting the next usecase.


 * API Test mode facilitates API testing of each OMX API for input parameters, and evaluates against the expected return codes. Example scenarios include Null pointer in input parameter of the API or Version mismatch in input parameter of the API.

OMTB Utility commands provide the following functionality:
 * 1) Scripting interface to execute OMTB commands. OMTB framework executes the commands sequentially from the OMS (OMTB Script) file. It does not support any control commands other than sleep.
 * 2) Changing OMTB Trace levels at run time.
 * 3) Template feature to aid in setting and viewing of configuration parameters needed for the test case. For example, at a minimum, before running H.264 video encode usecase the input and output files have to be specified.
 * 4) OMTB> omx setp 0 h264venc infile xyz.yuv
 * 5) OMTB> omx setp 0 h264venc outfile xyz.264
 * 6) setp commands described above will overwrite the default template variables.
 * 7) Other example configurations include whether the component should use Buffer or allocate Buffer for the use-case, or specifying the number of input and output buffers for the component.
 * 8) Execute System Commands

OMTB provides a useful command line help. It provides a strong, uniform and simple interface to achieve component testing of the OMX component via usecase, API and API test modes. The template feature offers an easy way of configuring the test cases. OMTB also caters towards multi instance testing, combo testing and multi process testing of OMX components. OMTB can be integrated with tools such as TTL or PTL to enable automation.

OMX Components
The only interface exposed by Ducati multimedia software, for invocation by Cortex A9 for example, is OpenMAX™. The OMX components on Ducati subsystem encapsulate one or more algorithms and/or codec, or a single algorithm or codec. Existing video and imaging Ducati OMX components are as follows: The following sub-sections illustrate the execution of an example Decoding H264 Stream using OMX Common decoder
 * H.264 encoder,
 * H.264 decoder,
 * MPEG-4/H.263 encoder,
 * MPEG-4/H.263 decoder,
 * MPEG2 Decoder,
 * VC1 Decoder,
 * OMX camera (includes JPEG encoder algorithm),

OMX Common video decoder
OMX Common video decoder in Ducati handles different formats. This is activated by selecting the appropriate role and coding type. It is accessible to host side via DOMX framework.

H.264 video decoder OMX component testing is done using the OpenMAX Test Bench (OMTB). OMTB’s use case mode can be used to decode a H.264 bit stream using the H.264 video decoder OMX component. To select the H264 Mode, the role and coding type need to be set.

In this mode OMTB framework is supplied with the bit stream to be decoded, its resolution, and a text file containing the frame sizes of all frames in the said bit stream. OMTB internally takes care of calling all the OMX APIs in the necessary sequence to decode the given bit stream. The individual OMTB commands needed to implement the use case mode test case can be entered one after another at the OMTB prompt or they can be bunched together in a file called OMS (OMTB Script) and OMTB can be made to run all commands in this script file.

Following is the layout of a simple OMS file:

omx setp vdec role

omx setp vdec codingtype

omx setp vdec frame_width

omx setp vdec frame_height

omx setp vdec frame_size_file 

omx setp vdec infile 

omx setp vdec outfile 

omx getp vdec

omx func viddec vdec

Generally and are set to ‘0’. For more information on these two parameters please refer to the OMTB User Guide.

The script above dumps the decoded output to a .yuv file. If the output needs to be sent to LCD, following line has to be added at the beginning of the above script:

omx setp 0 vdec outdata_mode v4l2

To enable detailed OMTB logging following line needs to be added at the beginning of the above script:

omx omtb_dbg_lvl 0x1E

A sample OMS file is shown below (OMTB logging is enabled; output is routed to LCD):

omx omtb_dbg_lvl 0x1E omx setp 0 vdec role video_decoder.avc omx setp 0 vdec codingtype OMX_VIDEO_CodingAVC omx setp 0 vdec outdata_mode v4l2 omx setp 0 vdec frame_width 1920 omx setp 0 vdec frame_height 1088 omx setp 0 vdec frame_size_file /mnt/mmc/1080p_tc.txt omx setp 0 vdec infile /mnt/mmc/1080p_tc.264 omx setp 0 vdec outfile /mnt/mmc/1080p_tc.yuv omx getp 0 vdec omx func viddec vdec 0 0

Following sequence of commands should to be used to run the OMS (OMTB script) file:
 * 1) Boot Linux.
 * 2) Run the omtb.out file to obtain "OMTB>" prompt.
 * 3) Run the OMS file: omx -s 

FAQ
Where do I get started from?

Please read the complete Ducati for Dummies Guide.

Whom to contact for more details?

For further questions or more detailed information send an e-mail to ducati_system-info@gforge.ti.com.

How can I start using the existing Ducati Software?

A pre-built ready-to-try-out package and the file system will be made available soon.

How do I compile and execute my own code?

Section 3 of Ducati for Dummies Guide contains all details associated with Ducati Multimedia Software Compilation, Loading and Execution, in both Standalone and OMAP4 configurations.

What is the maximum supported resolution for camera capture and JPEG encode?

Maximum image width supported by Ducati Hardware (ISP) is 5376. There is no specific limit on image height. But both width and height may be further limited by the maximum values supported by sensor.

What is the maximum supported resolution for camera capture and video encode?

Current Ducati software implementation supports video encoding and decoding upto a maximum width and height of 1920 and 1088 respectively.

What are the various software tools used on Ducati?

The build system, compilation and loading tools are described in Section 3. Ducati test framework and instrumentation tools are described in Section 5.