Re: Reading and evaluating huge ASCII Files



On Jun 9, 11:18 pm, Scott <n...@xxxxxxxx> wrote:
Crypto wrote:

Hi,

I am having trouble with reading and evaluating ASCII-Files.

In this case, the files can be 1GB in size.

They are standard ASCII-text files *.txt.

They contain tables with headlines and data of measurements.

The problem is that they cannot be imported/read into Matlab
because of
memory errors (they are too big), so I need to read and evaluate
them step
by step.

Another problem is the format of the data inside the files. The
data
consists of many rows. The first column denotes the headline of the
particular data part, then come the values or other descriptive
comments.

The lines/rows most have such a headline in the first column, but
the lenght
of the lines differs, because it can happen that there is only one
value
for one headline, or two, three... there can be hundreds of values
in one
line.

So I do not have a "straight" table with headlines and data, and I
do not
know in advance how long one line is.

It looks like this (it CAN look like this):

Headline1 Data1
Headline2 Data 2 Data3 Data4
Headline3 Data5 Data 6
Headline4 Data7
Headline5 Data8 Data9 Data10 Data11

The lines are terminated with EOL, the delimiter is TAB between
headlines
and data values.

So what can I do to read my headlines and data values? It seems to
me that
textscan does not help me because it cannot read text that has
lines with
changing lenght. There is a function in the Matlab discussion
forums
named "readtext", but it does not help either because using it I
run into
the memory problem.

Do You have suggestions, ideas...?

Thanks in advance for any hint,
Crypto.

Well, I can't speak for 1GB files but I have some recent experience
with data files in the 800MB range.

We couldn't read them.

My io routines call dataread.dll directly which was running out of
memory and an expection error was being generated when ML addressed
memory reserved by the OS.

Someone pointed out the /3GB switch available in boot.ini for 32-bit
WinXP systems.

For the 4GB available address space in 32-bit architecture XP
normally allocates 2GB for applications and 2GB for the OS kernel.
Setting the /3GB switch causes the OS to allocate 3GB to applications
and 1GB to the kernel.

This worked for us but it isn't without a price. Some of the network
stuff couldn't load its .dlls at bootup. We were able to update a
couple of drivers and these problems cleared.

We're reading 800MB flight data files containing mixed data (text and
numeric) into a structure routinely in a couple of minutes

If you are running XP you might want to take a look at it. The OS
must also be told via a registry setting that ML is "3GB aware". You
must also alter the page file size.

You can google /3GB and find lots of instructions on how to
implement.

Regarding method, you might be ok using fgetl in a while loop and
parsing on the fly as others have mentioned. Its surprisingly fast.

hth,
Scott

Scott,
If the /3GB is giving you trouble add the following to your line "/
USERVA=2800"

You can use whatever value you want, but it is a way to fine tune the
3GB switch. Basically in my version this makes it a "2.8GB switch"
which allows enough for the operating system.

HTH
Dan

.



Relevant Pages

  • Re: the 3GB Switch?
    ... The Perils of Trying to Overcome the 2GB Memory ... Windows will not allow you to use more than 2GB for a single ... SolidWorks is written to take advantage of the 3GB switch. ...
    (comp.cad.solidworks)
  • Re: the 3GB Switch?
    ... The Perils of Trying to Overcome the 2GB Memory ... Windows will not allow you to use more than 2GB for a single ... SolidWorks is written to take advantage of the 3GB switch. ...
    (comp.cad.solidworks)
  • Re: the 3GB Switch?
    ... The Perils of Trying to Overcome the 2GB Memory ... Windows will not allow you to use more than 2GB for a single ... SolidWorks is written to take advantage of the 3GB switch. ...
    (comp.cad.solidworks)
  • Re: 4GBs of RAM Miscount
    ... onboard video and yet it sounds like you have a better card plugged in, ... have you ever applied the switch to a SBS 2003 box? ... The idea is that /PAE allows the system to move the reserved space ... it increases the 'pointers' for memory handling to allow 32b systems ...
    (microsoft.public.windows.server.sbs)
  • Re: HLA malloc problem
    ... but I personally don't run SQL server or something that needs ... doing this in large memory systems. ... switch to 32-bit coding until Win95 appeared. ... Would someone who has decided to pick up assembly language know what ...
    (comp.lang.asm.x86)