Re: The 9997th file specific DOSFS problem
- From: yue@xxxxxx
- Date: Fri, 11 Jul 2008 00:06:57 -0700 (PDT)
Thanks alot Peter. You helped me to understand the problem better.
A few things i need to clarify:
1) The average performance of accessing the raid, single file I/O, is
~40 MB/s, except when it hits the ~10000th file. i.e., it takes ~0.1
second
to write a 4 MB file normally. When it hits the ~10000th file, it
takes
~30 seconds, which is ~300 times slower. Writing the file right after
it will then take the normal 0.1 seconds, until it hits the next one.
It seems to me not a pure performance problem.
2) Thought it is already using 32 KB sector size. Will double check
on that. We do need it to be compatible with Windows / Solaris, so
32 KB is the max. We need it to be on a single partition as well for
a few reasons.
3) Adjusting size of the cache of the dcacheLib(?) didn't affect the
problem much.
If we consider the volumn, two FATs and the # 9997, it's exactly 10000
(speculating). This makes me think that there has to be this magic #
somewhere in the code that's related to the problem.
The 512 bytes "tiny cache" for the FAT is very intersting. Is it
possible
that a break down of the algorithm, say, an overflow / underflow, is
causing
it to "re-caching" every sector of the FAT. I just realized the 30
second
time duration is very consistent every time it hits this magic file.
Anyidea this 512B "tiny cache" canbe adjusted? Does dosFs try to
remember
/ use the last searched sector?
Any other idea? Appreciated you help!
On Jul 10, 6:39 am, peter.mit...@xxxxxxxxx wrote:
I used to be part of the VxWorks file systems team and I did my fair
share of time working with DosFS (though not-so-much with the 5.x
version). It has been over a year since I last worked with VxWorks,
but maybe some of the following comments will help shed some light.
One of the things that we noticed was that the size of the storage did
affect the file system performance to one degree or another. We
thought that this was due to two things: resulting cluster size, and
position seeking. The larger the cluster size, the more efficient
VxWorks becomes at reading and writing its data. If I remember
correctly, the maximum recommended cluster size for DosFS is 32 kB.
VxWorks does allow a cluster size of 64 kB, but this is not
universally supported with other operating systems. With regards to
position seeking, the larger the storage, the larger the FAT. This
generally means that the disk head must physically longer distances
when writing/reading data as it must read the FAT to get the number of
the next cluster to read. Caching helps with this. So too can
partitioning.
I don't know why off hand you are encountering such suddenly large
delays at the ~10000th file. My suspicion, and it is only a
suspicion, would be some sort of interplay between the various
caches. VxWorks implementation of DosFS uses a "tiny cache" (512
bytes) for one sector of the FAT, and one big cache for caching other
FAT, data and directories. There have been problems in that area in
the past. To toss out an idea, the tiny cache could potentially be
getting flushed more often than necessary (causing both extra writes
and disk seeks).
Another thought that crossed my mind is that it could simply have to
do with the cluster allocation strategy. Perhaps it is spending too
much time trying to find available clusters. But if that were the
case, I would expect more files to be experiencing the same slow
behaviour from that point forward.
Without anymore access to VxWorks, the source code or a target
machine, I can't really add anything else. I do know that there were
a LOT of DosFS fixes between 5.5 and 6.2 and later. Many of these
were for stability and some for performance. Unfortunately, that
probably does not help your situation much.
Hopefully something in the above will be useful.
Peter Mitsis (pcm)
On Jul 9, 9:22 pm, y...@xxxxxx wrote:
setup : tornado2.2.1, vxworks 5.5, dosfs2.0, FAT32, connecting to a
~500 GB raid, single partition (no dpartLib)
problem: It takes a looong time to write the 9997th file, ~30 seconds.
It takes ~0.1 seconds for other (4 MB) files.
details:
The time it takes actually depends on the size of the storage, it
ranges from ~16 seconds for a 128 GB to
~35 seconds for ~900 GB raid. It has to be the DOSFS, since both the
CPU and raid are busy when the problem
happens. It appears that cbio is continuously doing reading/writting,
possibly accessing the FAT?
The problem happens only once if the raid is being writen continuously
from the beginning to the end. If the
raid is unmounted / mounted in between, the problem could happen
multiple times, but still around ~10000th
file. Some time around the 20000th, 30000th .... files as well, or
could happen continuosly for consecutive
files.
The files are stored in different directories (up to 8000 files max
per directory). The time is spent on write
instead of file open / close.
Any gent has seen this problem, got a solution / workaround, or any
suggestions for things to try?
Thanks!- Hide quoted text -
- Show quoted text -
.
- Follow-Ups:
- Re: The 9997th file specific DOSFS problem
- From: peter . mitsis
- Re: The 9997th file specific DOSFS problem
- References:
- The 9997th file specific DOSFS problem
- From: yue
- Re: The 9997th file specific DOSFS problem
- From: peter . mitsis
- The 9997th file specific DOSFS problem
- Prev by Date: Re: The 9997th file specific DOSFS problem
- Next by Date: Re: The 9997th file specific DOSFS problem
- Previous by thread: Re: The 9997th file specific DOSFS problem
- Next by thread: Re: The 9997th file specific DOSFS problem
- Index(es):
Loading