USB testing

=  USB TESTING  =

Formatting eMMC partition using g_file_storage gadget
@ OMAP side:

1. echo /dev/block/mmcblk1 >/sys/devices/platform/musb_hdrc/gadget/lun0/file

2. insmod g_file_storage file=/sys/devices/platform/musb_hdrc/gadget/lun0/file stall=0

@ Linux PC side:

1. Connect USB cable from the OMAP board to a the Linux PC

2. run “cat /proc/partitions” to see if the partitions has been recognized

3. Use “mkfs.ext3” on the PC to format the partition

USB mass-storage throughput Measurement:
The USB mass-storage throughput depends on a number of variables:


 * Host machine used Windows/Linux : Linux can send data at a faster rate (can send 120 KB in data-phase) as compared to windows (can send only 60 KB in data-phase)


 * Speed of the HOST CPU


 * Host specific Delay between recieved CSW and transmitting next CBW


 * Type of the SD card used Ex: SDHC Class 6 vs class 4 etc


 * Type of filesystem on the SDCARD Ex: VFAT, ext2, ext3 etc


 * Formatter used to format the SD card


 * Amount of data transfered in relation to the size of the file-system page-cache. Larger data-size will lead to more delays of writing to the MMC more often as cache will get filled up quickly.

For best performance use Sandisk SDHC class 6 EXTREME III card with a Linux host. Only use SD Formatter 2.0 for SD/SDHC cards. You can download the Formatter tool from here:

http://www.sdcard.org/consumers/formatter/

A. Throughput with the filesysytem involved:

1. Using a Windows Host Machine:

Make sure you have the following selection in place:


 * Right click on the mounted drive in my computer and click on properties


 * Click on Hardware tab


 * Select the appropriate USB device EX: Linux File-Storage Gadget USB Device


 * Click the 'Properties' switch below


 * Click the 'Policies' Tab


 * Select 'Optimize for Performance'

Asynchronous Transfers:

The 'FUA' bit of the CBW packet is in fact a matter of Synchronous vs. Asynchronous Write mode. While transferring files from Windows/MS-DOS, the FUA bit is always set meaning that all write accesses are synchronous accesses. So since we cannot change Windows/MS-DOS we have decided to ignore the FUA bit on the Gadget side. In file_storage.c file in the do_write routine you need to comment out the following lines:

if (fsg->cmnd[1] & 0x08) { // FUA spin_lock(&curlun->filp->f_lock); curlun->filp->f_flags |= O_DSYNC; spin_unlock(&curlun->filp->f_lock); }

That way you make sure that whatever the value of the FUA bit set by the Host, it is ignored by the Gadget driver meaning that all write accesses are asynchronous accesses. Now to measure throughput follow:

From an Ms-Dos command window you type: usb_ms_perf_benchmarking.bat [source] [destination]

With 'source' being a file on the PC Host and 'destination' being the SD Card seen from the PC Host; this is for a write transfer.It’s of course the contrary for read transfer, I mean 'source' being a file on the SD Card and 'destination' being a location on your PC Host’s Hard Drive.

Synchronous Transfers:

While transfering files from a Windows/MS-DOS system, the FUA bit is always set and hence all transfers are synchronous. To measure throughput follow:

From an Ms-Dos command window you type: usb_ms_perf_benchmarking.bat [source] [destination]

With 'source' being a file on the PC Host and 'destination' being the SD Card seen from the PC Host; this is for a write transfer. It’s of course the contrary for read transfer, I mean 'source' being a file on the SD Card and 'destination' being location on your PC Host’s Hard Drive.

For instance execute:

C:\OMAP_PSI\OMAP3630\USB_Mass_Storage>usb_ms_perf_benchmarking.bat test_file.avi E:\

And here is the kind of output you get:

USB MS Performance Benchmarking version : alpha -11/18/2009 author : d-ramonda@ti.com 15:39:18.92       1 file(s) copied. 15:42:06.93 Process took 0 hours, 2 minutes, 48 seconds, 1 centiseconds Process took 16801 centiseconds

Then you of course need to perform a basic computation: 701 MB (size of test_file.avi in my example) / 168.01 => 4.17 MB/s

Attached is the executable script usb_ms_perf_benchmarking.txt save it as a .BAT extension on your Windows Host

2. Using Lnux Host Machine:

Asynchronous Transfer :

/usr/bin/time -p dd if=/dev/zero of=/media/boot/test.bin bs=102400 count=1024; //Write 100 MB Un-mount the drive and detach-attach MUSB cable again /usr/bin/time -p dd if=/media/boot/test.bin of=/dev/null bs=102400 count=1024; //Read 100 MB

Synchronous Transfer :

/usr/bin/time -p dd if=/dev/zero of=/media/boot/test.bin bs=102400 count=1024;/usr/bin/time -p sync;  //Write 100 MB Un-mount drive and detach-attach cable again /usr/bin/time -p dd if=/media/boot/test.bin of=/dev/null bs=102400 count=1024;/usr/bin/time -p sync;  //Read 100 MB

Here is the kind of output you get from the dd command:

/usr/bin/time -p dd if=/dev/zero of=/media/boot/test.bin bs=102400 count=1024; // Asynchronous Write of 100 MB 100+0 records in 100+0 records out 64857600 bytes (65 MB) copied, 5.40164 s, 12.0 MB/s real 5.40 user 0.00 sys 0.76 Then calculate throughput = 64857600 Bytes/ 5.40 = 11.5 MBps

/usr/bin/time -p dd if=/dev/zero of=/media/boot/test.bin bs=102400 count=1024;/usr/bin/time -p sync;  //Synchronous Write of 100 MB 100+0 records in 100+0 records out 64857600 bytes (65 MB) copied, 5.40164 s, 12.0 MB/s real 5.40 user 0.00 sys 0.76 real 1.92 user 0.00 sys 0.01 Then, calculate throughput = 64857600 Bytes/ (5.40+1.92) = 8.5 MBps

Note:
 * Use the real times for all calculations.
 * Add up the real time of sync command (1.92) as well for Synchronous write

B. RAW throughput without the filesystem involved using MSC and Hdparm:

MSC:

It is an USB Mass Storage Class Verification Tool written by Felipe Balbi as part of his usb-tools tree, for testing the Mass Storage Class (MSC) devices. This msc has various different read/write test cases available within it defined by the -t (which test number) parameter.

Ex:

-t 0: Simple write/Read/verify -t 1: write/Read/verify 1 sector at a time -t 2: write/Read/verify 8 sectors at a time -t 3: write/Read/verify 32 sectors at a time -t 4: write/Read/verify 64 sectors at a time -t 5: SG write/read/verify 2 sectors at a time ... -t n: attempt to read past last sector -t n: attempt to read starting past the last sector -t n: attempt to write past last sector -t n: write 1 64k sg and read in several of random size -t n: write several of random size, read 1 64k -t n: write and read several of random size ...many more * n is appropriate test number, read the source code for the exact test number needed

You can download the source code using:

git clone git://gitorious.org/usb/usb-tools.git

Compile the code:

make CROSS_COMPILE=arm-none-linux-gnueabi- gcc -Wall -O2 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -g -o msc msc.c

This will give you the executable msc

Hdparm:

'hdparm' is a command line utility for the Linux and Windows operating systems to set and view SATA and IDE hard disk hardware parameters. It can set parameters such as drive caches, sleep mode, power management, acoustic management, and DMA settings. Changing hardware parameters from suboptimal conservative defaults to their optimal settings can improve performance greatly. For example, turning on DMA can in some instances double or triple data throughput.

Unfortunately at present there's no reliable method for determining the optimal settings for a given controller/drive combination, except careful trial and error; nor is there yet any central database that collects and shares the combined experience of hdparm users.

Following are the parameters of our interest:

-T    Perform timings of cache reads for benchmark and comparison purposes.

For meaningful results, this operation should be repeated 2-3 times on an otherwise inactive system (no other active processes) with at least a couple of megabytes of free memory. This displays the speed of reading directly from the Linux buffer cache without disk access. This measurement is essentially an indication of the throughput of  the processor, cache, and memory of the system under test.

-t    Perform timings of device reads for benchmark and comparison purposes.

For meaningful results, this operation should be repeated 2-3 times on an otherwise inactive system (no other active processes) with atleast a couple of megabytes of free memory. This displays the speed of reading through  the  buffer cache to the disk without any prior caching of data. This measurement is an indication of how fast the drive can sustain sequential data reads under Linux, without any filesystem overhead. To ensure accurate  measurements,  the  buffer  cache  is :flushed during the processing of -t using the BLKFLSBUF ioctl.

I was able to get this tool running on my system here are the results of that:

$ sudo ./msc -t 0 -o /dev/sdi -s 65536 -c 1024 test 0: sent   64.0000 MBytes read   17937.22 kB/s write    7471.40 kB/s ... success $ sudo hdparm -tT /dev/sdi /dev/sdi: Timing cached reads:  558 MB in  2.00 seconds = 278.89 MB/sec Timing buffered disk reads:  40 MB in  3.05 seconds =  13.13 MB/sec

USB mass storage Average CPU Load measurement
A. Using LINUX TOP command :

The method of determining the Average CPU load using %id (idle) from the TOP command is not correct.

Actually idle represents the amount of time when the CPU is idle, but when we run our test case, for majority of the time CPU waits for IO operations (Read/Write to the SD card in mmcqd perhaps) and actually is not 'idle' and therefore we see a low %idle and a high %io during the transfer. Now while it is in this wait state definitely it can perform other work as we do ‘schedule’ within mmcqd, but it's not idle it's waiting on io and hence %idle is low.

Sample:

CPU: 0.0% usr 13.0% sys  0.0% nic  0.0% idle 87.0% io  0.0% irq  0.0% sirq CPU: 0.0% usr 21.2% sys  0.0% nic  0.0% idle 78.7% io  0.0% irq  0.0% sirq CPU: 0.0% usr 30.0% sys  0.0% nic  0.0% idle 70.0% io  0.0% irq  0.0% sirq CPU: 0.0% usr 30.6% sys  0.0% nic  0.0% idle 69.3% io  0.0% irq  0.0% sirq CPU: 0.0% usr 32.3% sys  0.0% nic 12.1% idle 55.5% io  0.0% irq  0.0% sirq
 * 1) top d 1 | grep '^C'

Actually %idle is in no way a true parameter to determine the CPU load. Instead we should use the following methods to determine the Average CPU load:

1. Using the 'Load average' field of top command:

-Execute this command on OMAP: top d 1 | grep '^L'

-Execute the transfer of data between the Host and SD card [You will see the load numbers increasing]

Sample top output on OMAP:

Load average: 0.01 0.02 0.00 1/41 695 Load average: 0.01 0.02 0.00 1/41 695 Load average: 0.01 0.02 0.00 1/41 695 Load average: 0.01 0.02 0.00 1/41 695 Load average: 0.01 0.02 0.00 4/41 695 Load average: 0.01 0.02 0.00 2/41 695 Load average: 0.33 0.08 0.02 1/41 695 Load average: 0.33 0.08 0.02 2/41 695 Load average: 0.54 0.13 0.04 2/41 695 Load average: 0.54 0.13 0.04 2/41 695 Load average: 0.54 0.13 0.04 2/41 695 Load average: 0.74 0.18 0.05 2/41 695 Load average: 0.74 0.18 0.05 1/41 695 Load average: 0.76 0.19 0.06 1/41 695 Load average: 0.76 0.19 0.06 1/41 695 …
 * 1) top d 1 | grep '^L'

-When this transfer is completed, allow the top command to continue running until the first (leftmost) load number slowly decays to a steady point from where it started initially (in this example say ~0.01)

-Then for the collected output, add all the leftmost numbers and divide it by the total number of outputs from top (i.e. average). This would be ~ < 0.25. Use this as the % CPU utilization i.e. (25%* ARM frequency)

2. Use the average load of all the active threads over the whole file transfer duration to determine average CPU load

-Execute this command on OMAP: top d 1 | egrep 'mmcqd|file-storage|flush'

-Execute the transfer of data between the Host and SD card [you will see the %cpu numbers increasing for mmcqd, file-storage-ga, pdflush ]

Note: Number before the process name is the %cpu

Sample top output on OMAP:

702  659 root     S     2536  1.0   0  0.0 egrep mmcqd|file-storage|flush 562    2 root     SW<      0  0.0   0  0.0 [mmcqd] 299    2 root     SW       0  0.0   0  0.0 [pdflush] 662    2 root     DW<      0  0.0   0 19.7 [file-storage-ga] 299    2 root     SW       0  0.0   0  3.9 [pdflush] 562    2 root     DW<      0  0.0   0  1.9 [mmcqd] 702  659 root     S     2536  1.0   0  0.0 egrep mmcqd|file-storage|flush 562    2 root     RW<      0  0.0   0  3.9 [mmcqd] 702  659 root     S     2536  1.0   0  0.0 egrep mmcqd|file-storage|flush 662    2 root     DW<      0  0.0   0  0.0 [file-storage-ga] 299    2 root     SW       0  0.0   0  0.0 [pdflush] 562    2 root     DW<      0  0.0   0 16.6 [mmcqd] 662    2 root     DW<      0  0.0   0  4.9 [file-storage-ga] 702  659 root     S     2536  1.0   0  0.0 egrep mmcqd|file-storage|flush 299    2 root     SW       0  0.0   0  0.0 [pdflush] 562    2 root     DW<      0  0.0   0 27.6 [mmcqd] 662    2 root     DW<      0  0.0   0  8.9 [file-storage-ga]
 * 1) top d 1 | egrep 'mmcqd|file-storage|flush'

-When this transfer is completed, stop the top command.

-Now add all %cpu numbers for mmcqd occurrences and divide by the number of mmcqd occurrences, this will give the average %cpu utilization for mmcqd over the file transfer length. Likewise calculate this average %cpu for ‘file-storage’ and ‘pdflush’ processes

-Then,

Total Avg %cpu = Avg %cpu[mmcqd] + Avg %cpu[file=storage] + Avg %cpu[pdflush] + ~1.5% [other commands which we don’t calculate]

This total Avg %cpu would be ~ < 25%. Use this as the % CPU utilization i.e. (25%* ARM frequency)

'''3. Use ‘%sys’ field from the top command output, which is the ‘System CPU Time: Time the CPU has spent running the kernel and its processes’. This more or less reflects the load on the system.'''

-Execute this command on OMAP: top d 1 | grep '^C'

-Execute the transfer of data between the Host and SD card [You will see the %sys numbers increasing ]

Note: use the number before the %sys

Sample top output on OMAP:

CPU: 0.0% usr  9.0% sys  0.0% nic 90.9% idle  0.0% io  0.0% irq  0.0% sirq CPU: 0.0% usr  0.9% sys  0.0% nic 99.0% idle  0.0% io  0.0% irq  0.0% sirq CPU: 0.0% usr  0.9% sys  0.0% nic 99.0% idle  0.0% io  0.0% irq  0.0% sirq CPU: 0.9% usr  0.9% sys  0.0% nic 98.0% idle  0.0% io  0.0% irq  0.0% sirq CPU: 1.0% usr  4.0% sys  0.0% nic 95.0% idle  0.0% io  0.0% irq  0.0% sirq CPU: 0.0% usr 16.8% sys  0.0% nic 63.3% idle 19.8% io  0.0% irq  0.0% sirq CPU: 0.0% usr 13.0% sys  0.0% nic  0.0% idle 87.0% io  0.0% irq  0.0% sirq CPU: 0.0% usr 21.2% sys  0.0% nic  0.0% idle 78.7% io  0.0% irq  0.0% sirq CPU: 0.0% usr 30.0% sys  0.0% nic  0.0% idle 70.0% io  0.0% irq  0.0% sirq CPU: 0.0% usr 30.6% sys  0.0% nic  0.0% idle 69.3% io  0.0% irq  0.0% sirq CPU: 0.0% usr 32.3% sys  0.0% nic 12.1% idle 55.5% io  0.0% irq  0.0% sirq CPU: 0.4% usr 21.3% sys  0.0% nic  0.0% idle 78.1% io  0.0% irq  0.0% sirq CPU: 0.0% usr 31.3% sys  0.0% nic  0.0% idle 68.6% io  0.0% irq  0.0% sirq CPU: 0.0% usr 27.7% sys  0.0% nic  0.0% idle 72.2% io  0.0% irq  0.0% sirq CPU: 0.9% usr  8.9% sys  0.0% nic  0.0% idle 90.0% io  0.0% irq  0.0% sirq CPU: 0.0% usr 31.6% sys  0.0% nic  0.0% idle 68.3% io  0.0% irq  0.0% sirq CPU: 0.0% usr 10.9% sys  0.0% nic 35.7% idle 53.2% io  0.0% irq  0.0% sirq CPU: 0.0% usr  1.9% sys  0.0% nic 98.0% idle  0.0% io  0.0% irq  0.0% sirq CPU: 0.9% usr  0.9% sys  0.0% nic 98.0% idle  0.0% io  0.0% irq  0.0% sirq CPU: 0.0% usr  0.9% sys  0.0% nic 59.4% idle 39.6% io  0.0% irq  0.0% sirq CPU: 0.9% usr  0.9% sys  0.0% nic 98.0% idle  0.0% io  0.0% irq  0.0% sirq CPU: 0.0% usr  0.9% sys  0.0% nic 99.0% idle  0.0% io  0.0% irq  0.0% sirq CPU: 0.9% usr  0.9% sys  0.0% nic 98.0% idle  0.0% io  0.0% irq  0.0% sirq CPU: 0.0% usr  1.9% sys  0.0% nic 98.0% idle  0.0% io  0.0% irq  0.0% sirq
 * 1) top d 1 | grep '^C'

-When this transfer is completed, stop the top command. -Now add all %sys numbers and divide by the number of output prints, this will give the average %cpu utilization over the file transfer length.

This total Avg %sys would be ~ < 25%. Use this as the % CPU utilization i.e. (25%* ARM frequency)

== B. Using OPROFILE: ==

Here is the the wiki link to get oprofile running:

[]