Direct bypasses both data cache and I/O queueing; nocache bypasses only the data cache. The performance can be dramatically different depending the amount of I/O the device supports and how it manages it's I/O requests without the kernel I/O queue:
No caches, I/O queue, 512 bytes read block:
root@Servidor:~# echo 3 > /proc/sys/vm/drop_caches
root@Servidor:~# dd if=/dev/sdh of=/dev/null count=1048576 iflag=nocache
1048576+0 records in
1048576+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 2.14517 s, 250 MB/s
No caches, I/O queue, 1M read block:
root@Servidor:~# echo 3 > /proc/sys/vm/drop_caches
root@Servidor:~# dd if=/dev/sdh of=/dev/null bs=1M count=512 iflag=nocache
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 1.98191 s, 271 MB/s
No caches, no I/O queue, 512 bytes read block:
root@Servidor:~# echo 3 > /proc/sys/vm/drop_caches
root@Servidor:~# dd if=/dev/sdh of=/dev/null count=1048576 iflag=direct
1048576+0 records in
1048576+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 217.961 s, 2.5 MB/s
No caches, no I/O queue, 1M read block:
root@Servidor:~# echo 3 > /proc/sys/vm/drop_caches
root@Servidor:~# dd if=/dev/sdh of=/dev/null bs=1M count=512 iflag=direct
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 2.12653 s, 252 MB/s