|
|
fs-dos FAT scan performance
|
|
07/15/2009 3:12 PM
post33863
|
fs-dos FAT scan performance
Hi,
Our data acquisition systems record data onto FAT32 drives for easy interoperability with our Windows processing nodes.
Recently we've been testing a new, high bandwidth instrument, and we've discovered that the recording system will
occasionally choke and die for no obvious external reason - devb-eide will suddenly ramp up to 100% CPU for 15-30
seconds, then go back to normal operation, leaving a pile of overflowed buffers in it swake.
I tried instrumenting the block subsystem by adding 'verbose' prints on both blk and eide and logging the sloginfo
output to a higher-priority network filesystem. The results were interesting: it appears that during these dropouts it
is spending it's time scanning through the FAT one block at a time. It would appear the driver is busy-waiting for each
transaction, and since they're all only for one block each time with no read ahead the summed latency is enormous.
We originally observed this behaviour on 6.3.2; I compiled and tested the latest io-blk and fs-* from the subversion
repository, but it didn't make any noticeable difference.
My questions are:
-> What circumstances trigger a FAT scan like this? Is there some way of preventing this behaviour?
-> If altering the scanning behaviour is difficult or impossible, is it possible to make io-blk do some kind of read-
ahead, so as not to lock up the disk controller for such a long time? Is there any reason that the usual read-ahead and
buffering is bypassed?
Thanks,
-Will
|
|
|
|
|
|
|
Re: fs-dos FAT scan performance
|
|
07/15/2009 3:35 PM
post33866
|
Re: fs-dos FAT scan performance
This does sound like a known bug in the FAT scanning code (PR/62370, not
using the last-known-free-cluster-hint properly). Things are bearable
whilst the FAT is all cached, and then once the fsys gets to a certain
fullness (location of first free cluster deeper than blk cache= into the
FAT) it catastrophically degrades into exponential input.
The fix should have been rolled back into 6.4.1 and possibly in the
code (r204653, r213668). Can you confirm you've either tried release
6.4.1 or that your private compilation of dos_extendfile() has
"mount->m_cluster_hint = cllook + 1;" and not the flawed conditional
increment ...
Re readahead, yes it probably could do that (with a hint that the FAT
was being scanned for a free slot and not chaining for a file), although
6.4.2 has IO klustering so this will now happen for free anyway. And
fixing the scan to start at a last-known place is likely going to find
a free cluster in the suggested block most times now.
|
|
|
|
|
|
|
Re: fs-dos FAT scan performance
|
|
07/15/2009 4:12 PM
post33870
|
Re: fs-dos FAT scan performance
Hi John,
Thanks for the quick reply! Unfortunately I don't have the ability to
test 6.4.1 yet (migrating our platform is time-consuming and alas isn't
scheduled until later this year), but yes - I built and tested SVN
r225680, which is documented as including the fix, but still ran in to
the same behaviour. I can forward a chunk of the log file if you'd
like, though it's rather verbose. What seems to be happening is this:
-> Disk is being filled up linearly; next available free block is the
next FAT block, everything seems sane.
-> A cluster is freed up near the beginning of the FAT. This must
reset the hint, because:
-> The next file write causes a FAT scan starting from the freed
cluster. This can involve tens of thousands of block reads to get back
to the end of the used area.
I'm glad to hear that the 6.4.2 core will implement a readahead in
the blk layer - it seems a more natural place for that sort of
logic. In fact, while investigating the source code, I was actually a
bit surprised that it bypasses the read-ahead caching for FAT reads -
though you're correct that apart from free-block scans there's little
advantage to readaheads there.
Unfortuantely, though, in the short term I do need a fix for this - we
don't have enough RAM to buffer our data output for the 10+ seconds it
takes for the scan to complete while reading single blocks (not to
mention the havoc it causes to our UI when the system wanders off at
priority 21). Would you recommend I try to implement some read-ahead
logic, or would it be simpler to just not change the hint when
shrinking a file?
Thanks again for your help so far,
-Will
On Wed, 15 Jul 2009 15:35:23 -0400 (EDT)
John Garvey <community-noreply@qnx.com> wrote:
> This does sound like a known bug in the FAT scanning code (PR/62370, not
> using the last-known-free-cluster-hint properly). Things are bearable
> whilst the FAT is all cached, and then once the fsys gets to a certain
> fullness (location of first free cluster deeper than blk cache= into the
> FAT) it catastrophically degrades into exponential input.
>
> The fix should have been rolled back into 6.4.1 and possibly in the
> code (r204653, r213668). Can you confirm you've either tried release
> 6.4.1 or that your private compilation of dos_extendfile() has
> "mount->m_cluster_hint = cllook + 1;" and not the flawed conditional
> increment ...
>
> Re readahead, yes it probably could do that (with a hint that the FAT
> was being scanned for a free slot and not chaining for a file), although
> 6.4.2 has IO klustering so this will now happen for free anyway. And
> fixing the scan to start at a last-known place is likely going to find
> a free cluster in the suggested block most times now.
>
> _______________________________________________
> General
> http://community.qnx.com/sf/go/post33866
>
>
--
Will Miles <wmiles@sgl.com>
|
|
|
|
|
|
|
Re: fs-dos FAT scan performance
|
|
07/15/2009 4:40 PM
post33876
|
Re: fs-dos FAT scan performance
Will Miles wrote:
> -> A cluster is freed up near the beginning of the FAT. This must
> reset the hint, because:
> -> The next file write causes a FAT scan starting from the freed
> cluster. This can involve tens of thousands of block reads to get back
> to the end of the used area.
Yes, the hint is reset. There is no pending change to this behaviour,
although you can obviously construct some degenerate scenarios where
immediately re-using clusters is good or bad :-/
> Unfortuantely, though, in the short term I do need a fix for this - we
> don't have enough RAM to buffer our data output for the 10+ seconds it
> takes for the scan to complete while reading single blocks (not to
> mention the havoc it causes to our UI when the system wanders off at
Have you considered pre-growing the files? With DCMD_FSYS_PREGROW_FILE
this is very fast (one chunk of FAT/bitmap allocated all at once,
data blocks not zero-filled, subsequent writes are then over the top
of allocated blocks with no additional FAT access required). It does
require that you be able to recognise how much of the written data
is valid in case of powerfail (before you get a change to ftruncate
it back to the real size if you guessed wildly big). A logfile or
data capture you might be able to extend in large chunks as you fill.
Hmm, although as the same allocation scan is used, it too will be
fooled by delete fragmentation resetting the hint.
> priority 21). Would you recommend I try to implement some read-ahead
> logic, or would it be simpler to just not change the hint when
> shrinking a file?
So, you can disable the hint reseting (in dos_makesmaller()). The
side-effect will be a sequential pass right through to the end of
the partition (or you reboot/remount). The end of the disc is
slower than the beginning (property of HDD) and many factory-
formatted USB sticks use a canned fsys which doesn't actually fit
on the partition (so will try to use non-existent clusters).
I can't think of any other short-term solution. Longer term some
form of FAT overview (like with the fs-qnx4/qnx6 bitmap summaries)
could be used to skip parts of the FAT known from a previous scan
to be totally full, rather than just the the single cluster_hint
from the FSInfo32, might be reasonable ... ?
|
|
|
|
|
|
|
Re: fs-dos FAT scan performance
|
|
07/16/2009 11:04 AM
post33935
|
Re: fs-dos FAT scan performance
On Wed, 15 Jul 2009 16:40:58 -0400 (EDT)
John Garvey <community-noreply@qnx.com> wrote:
> Will Miles wrote:
> > -> A cluster is freed up near the beginning of the FAT. This must
> > reset the hint, because:
> > -> The next file write causes a FAT scan starting from the freed
> > cluster. This can involve tens of thousands of block reads to get back
> > to the end of the used area.
>
> Yes, the hint is reset. There is no pending change to this behaviour,
> although you can obviously construct some degenerate scenarios where
> immediately re-using clusters is good or bad :-/
Alas, it would seem our recording engine is one of them. :(
>
> > Unfortuantely, though, in the short term I do need a fix for this - we
> > don't have enough RAM to buffer our data output for the 10+ seconds it
> > takes for the scan to complete while reading single blocks (not to
> > mention the havoc it causes to our UI when the system wanders off at
>
> Have you considered pre-growing the files? With DCMD_FSYS_PREGROW_FILE
> this is very fast (one chunk of FAT/bitmap allocated all at once,
> data blocks not zero-filled, subsequent writes are then over the top
> of allocated blocks with no additional FAT access required). It does
> require that you be able to recognise how much of the written data
> is valid in case of powerfail (before you get a change to ftruncate
> it back to the real size if you guessed wildly big). A logfile or
> data capture you might be able to extend in large chunks as you fill.
> Hmm, although as the same allocation scan is used, it too will be
> fooled by delete fragmentation resetting the hint.
We did try a pre-allocation system in the past, if only to reduce
filesystem wear. We were quickly shot down by our users who were
annoyed at both not being able to tell how much data was actually
recorded by the file size, as well as having to copy significantly more
data in the event of early termination. Our current recording format
uses a COTS database engine which, while it's technically possible to
do some preallocation, would suffer from the same basic problem except
internal to the database (scanning for the next 'free' entry).
But you're right, even with a preallocation scheme, every time a new
filesystem allocation is performed it will still suffer from the
FAT-scan performance hit; the only real difference would be that we can
more precisely predict when those hits will occur.
>
> > priority 21). Would you recommend I try to implement some read-ahead
> > logic, or would it be simpler to just not change the hint when
> > shrinking a file?
>
> So, you can disable the hint reseting (in dos_makesmaller()). The
> side-effect will be a sequential pass right through to the end of
> the partition (or you reboot/remount). The end of the disc is
> slower than the beginning (property of HDD) and many factory-
> formatted USB sticks use a canned fsys which doesn't actually fit
> on the partition (so will try to use non-existent clusters).
OK - I'll give that a try. I think the loss in performance at the end
of the drive will have less real-time impact on our system than the FAT
scans do. I'm not too worried about invalid partitions; our
policy is always to repartition and reformat our recording drives
ourselves before deploying them to the field. Unfortunately we haven't
had much success with recording to USB sticks either, but that's an
unrelated problem - devb-umass just keeps locking up altogether (on
four totally different USB devices, including a USB->SATA adapter). We
haven't followed up on that since we're still using 6.3.2; if we still
run into it when we get to 6.4.x, I'm sure you'll hear back from us
then.. :)
>
> I can't think of any other short-term solution. ...
View Full Message
|
|
|
|
|
|