07/15/2009 3:12 PM
fs-dos FAT scan performance
Our data acquisition systems record data onto FAT32 drives for easy interoperability with our Windows processing nodes.
Recently we've been testing a new, high bandwidth instrument, and we've discovered that the recording system will
occasionally choke and die for no obvious external reason - devb-eide will suddenly ramp up to 100% CPU for 15-30
seconds, then go back to normal operation, leaving a pile of overflowed buffers in it swake.
I tried instrumenting the block subsystem by adding 'verbose' prints on both blk and eide and logging the sloginfo
output to a higher-priority network filesystem. The results were interesting: it appears that during these dropouts it
is spending it's time scanning through the FAT one block at a time. It would appear the driver is busy-waiting for each
transaction, and since they're all only for one block each time with no read ahead the summed latency is enormous.
We originally observed this behaviour on 6.3.2; I compiled and tested the latest io-blk and fs-* from the subversion
repository, but it didn't make any noticeable difference.
My questions are:
-> What circumstances trigger a FAT scan like this? Is there some way of preventing this behaviour?
-> If altering the scanning behaviour is difficult or impossible, is it possible to make io-blk do some kind of read-
ahead, so as not to lock up the disk controller for such a long time? Is there any reason that the usual read-ahead and
buffering is bypassed?