Re: SAX - is there an equivalent to the DOM .nodeTypedValue for reading the whole node data at once?
- From: yf110@xxxxxxxxxxxxxxxxxxx (Malcolm Dew-Jones)
- Date: 9 Sep 2005 09:55:33 -0700
jimmyfishbean@xxxxxxxxxxx wrote:
: Hi,
: I am using VB6, SAX (implementing IVBSAXContentHandler).
: I need to extract binary encoded data (images) from large XML files and
: decode this data and generate the appropriate images onto disk. My XML
: files have the following structure:
: <?xml version="1.0" encoding="utf-8" ?>
: <imagepla xmlns:dt="urn:schemas-microsoft-com:datatypes">
: <attachment>
: <primary_id>28899</primary_id>
: <filename>userguide3.pdf</filename>
: <file
: dt:dt="bin.base64">JVBERi0xLjMNJeLjz9MNCjU5NTAgMCBvYmoNPDwgDS9MaW5lYXJpemVkIDEgDS9PIDU5NTMgDS9I
: IFsgMTM4OSAzODY0IF0gDS9MIDUwNTEyOTggDS9FIDEwMTQ3NCANL04gMTUzIA0vVCA0OTMyMTc4
: ........
: ..................
: </file>
: </attachment>
: <attachment>
: ......
: ......
: </attachment>
: </imagepla>
: The encoded data (in the <file> element) neds to be extracted and then
: decoded. I am trying to use SAX but I cannot read the whole of the
: <file> element data at once (i.e. using DOM I would use
: DOMDoc.nodeTypedValue). I understand that the DOM loads the whole
: document into memory therefore the nodeTypedValue can be used.
: I am using the following extract of code:
: Dim strTmp as String
: Dim byArr() as Byte
: Private Sub IVBSAXContentHandler_characters(text As String)
: ...
: strTmp = strTmp & text
: ...
: btArr = strTmp
: Open strAttFile For Binary As #1
: Put #1, 1, btArr
: Close #1
: ...
: End Sub
: The problem is that only 1 line at a time of the <file> node data is
: passed to this sub. Therefore I need to reconstruct the whole of the
: binary data for the image in a temp variable (strTmp), before I
: determine the end of the file and then write it to disk.
: This takes a vast amount of time (i.e. 20 minutes to try and decode a
: 4MB image). The XML file will contain 100s of images, so really the
: current way of processing is no good at all.
: Is there a way to read the whole of the data from the <file> node in
: one go?
In SAX in general you cannot ever be sure to read the whole of the
character data at once, though there is a slim chance that the sax module
you have available in VB has an option to do that (I have no idea, I
wouldn't count on it).
But why do you need to read the whole thing into memory? Base64 can be
decoded on the fly. Each sequence of four characters gives you three
bytes of data. Read a chunk, decode multiples of four characters at one
go and write them out. You may have to worry about the last few bytes
that have to hold over from one read to the next to get a multiple of
four.
And where is the slow down? I suspect that the string concatenation is to
blame. VB may be allocating a longer string each time and then copying
all the existing data plus the appended data into it. If you keep doing
that for an eventually large string it could get very slow. Can you
preallocate a much larger string and use substr to push the data into that
single large string. (VB substr, is that right?
substr(the_line,offset,len) = data_to_insert, something like that.)
: Also, I will be extracting the binary data and then use DOM to rewrite
: the XML file without the binary data (so the user has a copy of the
: original XML file - but a much smaller one since no binary in it).
: Should I use DOM or SAXReader/SAXWriter?
If you are not changing anything else in the xml except removing the
file data (and possibly replacing that one tag) then I would think it
easiest use a sax approach. As you read the data you also spool it back
out, except that one tag. I suppose a SAXWriter would help do that.
$0.10
--
This programmer available for rent.
.
- References:
- Prev by Date: Re: How do I allow both elements or No elements
- Next by Date: XML 2005 Late-Breaking and Product Presentation Deadline - Tutorials Added/ changed
- Previous by thread: SAX - is there an equivalent to the DOM .nodeTypedValue for reading the whole node data at once?
- Next by thread: Re: SAX - is there an equivalent to the DOM .nodeTypedValue for reading the whole node data at once?
- Index(es):
Relevant Pages
|