Re: Bug in URI.parse?
- From: John Joyce <dangerwillrobinsondanger@xxxxxxxxx>
- Date: Thu, 30 Aug 2007 04:45:19 +0900
I wouldn't call it a bug exactly, it does do what it is written to do.
Instead, let's just say that URI.parse isn't very robust.
It doesn't handle lots of real-world situations in ways you would expect.
You would expect some sort of message saying the TLD (top level domain) is missing or bad, but also you would not expect this to end your program abruptly.
A good URI parser will also accept IP addresses, since those are also valid, at least in the sense that they are real and do exist and are likely to be used or entered by users.
Another problem is the way it handles URLs missing the www or http:// or https://
While strictly speaking this should be required, it clearly is not the reality of URLs in the world or the reality of how humans use them. People have become accustomed to using what are officially partial or bad URLs.
Most web browsers will accept a simple string and attempt to find it, even if it means adding a TLD.
ARPANET is pretty pointless now.
I've begun my own script to check if a URL is correct, but only if it is the human readable variety.
One of the biggest problems becomes the transitory nature of URLs. They can change or disappear without notice.
Another problem is the path after a TLD. The path can be nearly anything and can only be determined to be the first single / after the apparent TLD.
.
- Follow-Ups:
- Re: Bug in URI.parse?
- From: RubyTalk@xxxxxxxxx
- Re: Bug in URI.parse?
- References:
- Re: Bug in URI.parse?
- From: Daniel Berger
- Re: Bug in URI.parse?
- Prev by Date: Re: Getting all google results with hpricot and connecting two gsub statements to just one?
- Next by Date: Re: How to make an array of hashes to a single array with all the values of these hashes ?
- Previous by thread: Re: Bug in URI.parse?
- Next by thread: Re: Bug in URI.parse?
- Index(es):