Re: libxml: is it possible not to use doctype declaration?
- From: ruud grosmann <r.grosmann@xxxxxxxxx>
- Date: Sat, 2 Aug 2008 12:38:42 -0500
thanks everybody,
I think I rather do a system call for saxon. It's just to many little
bugs and uncertainties to me. Thanks anyway for your efforts and
helping me.
Regards, Ruud
On 01/08/2008, Robert Klemme <shortcutter@xxxxxxxxxxxxxx> wrote:
2008/7/30 Phill Davies <binary011010@xxxxxxxxxxx>:
ruud grosmann wrote:
hi Phill,
I've tried it right away. I ended up with the following:
XML::Parser.default_load_external_dtd = false
XML::Parser.default_validity_checking = false
XML::Parser.default_substitute_entities = false
parser = XML::Parser.file( file)
#parser.default_substitute_entities = false
#parser.default_load_external_dtd = false
#parser.default_validity_checking = false
doc = parser.parse
node = doc.find( xpath).first
But the script still tries to resolve the entity. The doctype
definition is a slightly changed real one. The message I get with the
above code is:
Operation in progress./tmp/ut21.uit:3: I/O warning : failed to load
external entity "http://ruud.grosmann.nl/op/dtd/publicatie.dtd"
e publicaties 1.0//NL" "http://ruud.grosmann.nl/op/dtd/publicatie.dtd"
You were right that the methods are not instance methods, although I
am not sure how to conclude that from the documentation.
Did I something wrong in the script?
regards, Ruud
On 30/07/2008, Phill Davies <binary011010@xxxxxxxxxxx> wrote:
Whoops, those were supposed to be class variables. What you really want
to do (I think) is more like:
LibXML::XML::Parser.default_load_external_dtd = false
LibXML::XML::Parser.default_validity_checking = false
And then:
parser = LibXML::XML::Parser.file(<file>)
doc = parser.parse
That seems to work with your example.
Phill Davies wrote:
To start, the rdoc documentation can be found at
http://libxml.rubyforge.org/rdoc/index.html. Now I don't know this for
sure, but
<!DOCTYPE test PUBLIC "-//FARAWAY//DTD-verweg//NL"
"http://some.site.nl/dtd/test.dtd">
doesn't look like a real doctype definition, so if you can pull it out
of your xml (by hand, not programmatically) before trying to parse it,
I'd say that would be a good idea. That being said, there are two
attributes of the XML::Parser class that look like they may be of
interest: default_load_external_dtd and default_validity_checking. Try
setting both of those to false, unless you have a real dtd to validate
against and the example above was fake. Of course, since this is using
XML::Parser instead of XML::Document I think you would need to do
e.g.: parser = XML::Parser.file(<file>)
parser.default_load_external_dtd = false
parser.default_validity_checking = false
doc = parser.parse
... and then go from there.
Phill D.
Phlip wrote:
ruud grosmann wrote:
Is using libxml the right thing to do to, or are there smarter
alternatives?
Libxml-ruby is the most complete & accurate parser of the big three
(REXML, Libxml-ruby, and Hpricot), and its documentation can be very
challenging. How much of the original C Libxml documentation have you
been able to read?
Hey Ruud,
Nope, I can't see that you're doing anything wrong. I guess all I can
say
is if can send the actual XML so I can give it a try with it (because when
I
use your original example it seems to work fine as long as I set those
class
variables). Also, the error message you sent was broken up, if you could
please try to send that again it would probably help. Here's what I'm
using:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE test PUBLIC "-//FARAWAY//DTD-verweg//NL"
"http://some.site.nl/dtd/test.dtd">
<test>
<p>this is a test</p>
</test>
And here's the error I get when I don't set those class variables:
test.xml:2:
I/O
warning :
failed to load HTTP resource
TYPE test PUBLIC "-//FARAWAY//DTD-verweg//NL"
"http://some.site.nl/dtd/test.dtd"
Hm, Java XML parsers I know have a special callback that you can set
that will deal with resolving external entities. I could not find
anything similar in libxml documentation but maybe I just looked in
the wrong places. With that you could load the file just once (or
even fetch it from some internal memory or file system). Also, I find
it a bit strange that those flags are global - this can introduce
weird bugs when using an application which parses XML concurrently and
needs different flags for each process...
Kind regards
robert
--
use.inject do |as, often| as.you_can - without end
.
- References:
- Re: libxml: is it possible not to use doctype declaration?
- From: Robert Klemme
- Re: libxml: is it possible not to use doctype declaration?
- Prev by Date: Read file over SSH
- Next by Date: Re: Weird: Just unzipped ruby-1.9.0-2.zip -- no Ruby.exe
- Previous by thread: Re: libxml: is it possible not to use doctype declaration?
- Next by thread: Re: My simple downloader doesn't work
- Index(es):
Relevant Pages
|