Re: libxml: is it possible not to use doctype declaration?



thanks everybody,

I think I rather do a system call for saxon. It's just to many little
bugs and uncertainties to me. Thanks anyway for your efforts and
helping me.

Regards, Ruud

On 01/08/2008, Robert Klemme <shortcutter@xxxxxxxxxxxxxx> wrote:
2008/7/30 Phill Davies <binary011010@xxxxxxxxxxx>:
ruud grosmann wrote:

hi Phill,

I've tried it right away. I ended up with the following:

XML::Parser.default_load_external_dtd = false
XML::Parser.default_validity_checking = false
XML::Parser.default_substitute_entities = false

parser = XML::Parser.file( file)
#parser.default_substitute_entities = false
#parser.default_load_external_dtd = false
#parser.default_validity_checking = false
doc = parser.parse
node = doc.find( xpath).first

But the script still tries to resolve the entity. The doctype
definition is a slightly changed real one. The message I get with the
above code is:

Operation in progress./tmp/ut21.uit:3: I/O warning : failed to load
external entity "http://ruud.grosmann.nl/op/dtd/publicatie.dtd";
e publicaties 1.0//NL" "http://ruud.grosmann.nl/op/dtd/publicatie.dtd";

You were right that the methods are not instance methods, although I
am not sure how to conclude that from the documentation.

Did I something wrong in the script?

regards, Ruud

On 30/07/2008, Phill Davies <binary011010@xxxxxxxxxxx> wrote:


Whoops, those were supposed to be class variables. What you really want
to do (I think) is more like:

LibXML::XML::Parser.default_load_external_dtd = false
LibXML::XML::Parser.default_validity_checking = false

And then:
parser = LibXML::XML::Parser.file(<file>)
doc = parser.parse

That seems to work with your example.

Phill Davies wrote:


To start, the rdoc documentation can be found at
http://libxml.rubyforge.org/rdoc/index.html. Now I don't know this for
sure, but

<!DOCTYPE test PUBLIC "-//FARAWAY//DTD-verweg//NL"
"http://some.site.nl/dtd/test.dtd";>

doesn't look like a real doctype definition, so if you can pull it out
of your xml (by hand, not programmatically) before trying to parse it,
I'd say that would be a good idea. That being said, there are two
attributes of the XML::Parser class that look like they may be of
interest: default_load_external_dtd and default_validity_checking. Try
setting both of those to false, unless you have a real dtd to validate
against and the example above was fake. Of course, since this is using
XML::Parser instead of XML::Document I think you would need to do
e.g.: parser = XML::Parser.file(<file>)
parser.default_load_external_dtd = false
parser.default_validity_checking = false
doc = parser.parse

... and then go from there.
Phill D.

Phlip wrote:


ruud grosmann wrote:



Is using libxml the right thing to do to, or are there smarter
alternatives?


Libxml-ruby is the most complete & accurate parser of the big three
(REXML, Libxml-ruby, and Hpricot), and its documentation can be very
challenging. How much of the original C Libxml documentation have you
been able to read?











Hey Ruud,
Nope, I can't see that you're doing anything wrong. I guess all I can
say
is if can send the actual XML so I can give it a try with it (because when
I
use your original example it seems to work fine as long as I set those
class
variables). Also, the error message you sent was broken up, if you could
please try to send that again it would probably help. Here's what I'm
using:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE test PUBLIC "-//FARAWAY//DTD-verweg//NL"
"http://some.site.nl/dtd/test.dtd";>
<test>
<p>this is a test</p>
</test>

And here's the error I get when I don't set those class variables:

test.xml:2:
I/O
warning :
failed to load HTTP resource
TYPE test PUBLIC "-//FARAWAY//DTD-verweg//NL"
"http://some.site.nl/dtd/test.dtd";

Hm, Java XML parsers I know have a special callback that you can set
that will deal with resolving external entities. I could not find
anything similar in libxml documentation but maybe I just looked in
the wrong places. With that you could load the file just once (or
even fetch it from some internal memory or file system). Also, I find
it a bit strange that those flags are global - this can introduce
weird bugs when using an application which parses XML concurrently and
needs different flags for each process...

Kind regards

robert

--
use.inject do |as, often| as.you_can - without end



.



Relevant Pages

  • Re: libxml: is it possible not to use doctype declaration?
    ... am not sure how to conclude that from the documentation. ... Is using libxml the right thing to do to, ... Libxml-ruby is the most complete & accurate parser of the big three ... is if can send the actual XML so I can give it a try with it (because when I ...
    (comp.lang.ruby)
  • Re: LibXML versus XML::Twig ?
    ... > on some XML parsing just for that reason alone. ... > LibXML, none on Twig, ... The CPAN LibXML doc is very terse with no examples and the ... documentation is not that good. ...
    (comp.lang.perl.misc)
  • Re: LibXML versus XML::Twig ?
    ... much better documentation with examples for XML::Twig and am considering using that to get started on some XML parsing just for that reason alone. ... I am wondering however what people might have to say in reagrds to these two XML processing components ... ... LibXML, none on Twig, ... It is slower and requires more memory for a ...
    (comp.lang.perl.misc)
  • Re: PHP/pear http put method and text/xml?
    ... As you insisted in a previous posting that the examples serve as documentation, ... When putting it as a binary, the XML ... > And what message appears when you do HTTP PUT with Python or Java? ... > the response when using the PEAR class? ...
    (comp.lang.php)
  • Re: How to convert CSV row to Java object?
    ... - only use "rectangular" data ... I can concede that XML is very useful for many purposes, ... I believe in documentation. ... And I believe in type safeness, but I don't see what real advantage ...
    (comp.lang.java.programmer)