Re: Can anyone explain why this is happening?



syraero@xxxxxxxxx wrote:
> I am using 16 CPU-local clusters to run a parallelized Computation
> Fluid Dynamics code.
>
> In the middle of calculation, I frequently overwrite a restart file so
> that this restart file can be read into all CPUs simultaneously.
>
> The problem is that, soemtimes, it looks like one or two nodes read the
> unreasonable values from this restart file.
>
> So.. I reboot the computers and tried again and it worked file after
> the reboot.
>
> Can anyone please tell me why this is happening?
>
> Thank you.

Is it possible that your writes do not include a local buffer flush
before the next read? In C, printing a \n (newline) will force a buffer
flush, as will a call to sync(2). In many fortrans, you must call
_flush (or the same sync(2)) to expel the data onto disk. Of course, if
the writing process closes the file, that should also force a flush.

It's possible that the shared file's data integrity sometimes is being
compromised when a process' recently written data has not yet been
flushed to disk, but another process reads the file before the write is
flushed. If a remote process prints, but does not flush the data from
its unix/kernel I/O buffer to NFS, then NFS remains unaware of the new
data and another process may not see the update.

You also should be able to avoid this by:

1) having each process could call fsync(2) before it reads from a newly
written file (to guarantee that newly written data have been flushed to
disk).

2) having all processes (or your batch scheduler) call unix's sync(1)
before it reads from a new file (which also flushes all buffered written
data to disk).

3) switching to MPI I/O read and write routines, instead of read/write
via NFS. These are part of ROMIO or OpenMPI (or some of the avant garde
or commercial versions of MPI). I *think* MPI I/O's services assure
that each process' write buffers are flushed with every print/write, but
I don't know for sure.

Randy

--
Randy Crawford http://www.ruf.rice.edu/~rand rand AT rice DOT edu
.



Relevant Pages

  • Re: How to flush the ostringstream buffer?
    ... (useful for file streams which hold written data in memory buffer and only ... actually write to file when buffer gets full or someone calls flush). ...
    (microsoft.public.vc.stl)
  • Re: How to flush the ostringstream buffer?
    ... (useful for file streams which hold written data in memory buffer and only ... actually write to file when buffer gets full or someone calls flush). ...
    (microsoft.public.vc.language)
  • Re: How to flush the ostringstream buffer?
    ... (useful for file streams which hold written data in memory buffer and only ... actually write to file when buffer gets full or someone calls flush). ...
    (microsoft.public.dotnet.languages.vc)
  • Re: "secure" file flag?
    ... you really need to flush the on-device cache on each ... > pass to make sure the bit patterns get written to the platter in proper ... A simple algorithm could just mark each buffer with a special ... read all file blocks into buffers that are marked dirty and get the ...
    (freebsd-hackers)
  • Re: Response.Flush: Differences between IIS 6.0 and 5.0?
    ... since IIS 6.0 is now on top of http.sys. ... smaller buffer and so on, each flush cause the packet to send.... ... > Server: Microsoft-IIS/5.0 ...
    (microsoft.public.inetserver.iis)