Re: Client-side search engine capable of indexing .pdf files is needed.
- From: SAM <stephanemoriaux.NoAdmin@xxxxxxxxxxxxxxxxxx>
- Date: Fri, 04 Sep 2009 03:51:06 +0200
Le 9/4/09 1:11 AM, Stefan Weiss a écrit :
On 03/09/09 22:15, SAM wrote:Le 9/3/09 3:43 PM, Stefan Weiss a écrit :On 03/09/09 14:56, SAM wrote:Well, who will choice the terms to index ?I certainly do not well understand what you mean by indexing files.I doubt gordom was interested in a list of file names. Creating a full
If it is only to report the list of the names of pdf files stored in a folder (on the CD) the browser must be abble to display it
Then on this window there is certainly a search button, no?
text index is quite a bit more complex than simply listing directory
contents. <http://fr.wikipedia.org/wiki/Indexation>
Who will built for each file its own array of terms ?
Who will built the links for each term (to the files and inside them)?
The indexer will do all of that.
From the point where the data are complete and in an object (or a simple array) I suppose that most of the job is made.
Not necessarily. You need both parts for an efficient search engine: the
index and the lookup algorithm. The index lookup needs to be fast, and
able to sort the results in a meaningful way.
<http://cjoint.com/?jdvO4bUE6Q> 1500 items
(without index ... not in SANstore)
| var liste = [
| '00.htm',
| '000.htm',
| '0000000000000001.txt',
| '001.htm',
| '12-1.gif',
| '20-100_100tre.htm',
| '20-100_100tre2.htm',
That's just a list of file names again, not a full text index. It has
only 1500 entries, which isn't even close to what we're dealing here.
It has 1500 entries, will the CD contain more than 1500 files ?
With these simple entries (they could have been lines of a cvs file, each line been a card of the file with name, date, list of indexed terms, short introduction ...)
I didn't understand the "not in SANstore" part - how is that relevant?
I havn't more complicated example in stock (in store ? in SAM's shop).
If you would have one I'll be glad to see it.
Searching one or more terms along this list is very fast because we have only to keep each line containing one of the terms : a single loop on the 1500 lines (or entries). The new list of files, expected relatively short, can then be easily manipulated to show what wanted.
About indexation of a list of terms met in the files I suppose we can have an array of them
terms = [
'add 12 125 956',
'addition 1 8 274 315 977 1235',
...
where the numbers are the indexes to find the correct files stored in another array.
This method would have to be faster.
Maybe it takes more room in memory ? Not sure.
Regarding your other post: Spotlight is only available on OSX, and
(AFAIK) doesn't have a JavaScript front-end. It may be possible to burn
a its index to a CD, but without the Spotlight executable, that won't
help much.
At least that could be a solution for a specific environment ;-)
<http://www.apple.com/downloads/macosx/home_learning/deliciouslibrary.html>
TNO's suggestion has a similar problem: it requires WSH to be installed
and accessible from an HTML page (unlikely). It will be afwully slow as
well, because each search will have to read the complete contents of the
I suppose that it would be better to have all the content written in memory.
CD. And then it probably won't find "à bientôt" because the source
encoding doesn't match the search encoding.
Once Reg Exp will plan that \w is no more only ASCII characters but those of more complet charsets, perhaps will we can match more seriously (or easily) !english words, even if search functions were made by an illiterate guy from US.
JSSINDEX still looks like the way to go (didn't test it, though). BTW, I
just checked, Lush is available as Debian and Ubuntu packages. If there
aren't any other requirements, getting the indexer to work should be a
piece of cake.
Something in Ruby ?
<http://books.google.fr/books?id=OBhAuww-OokC&pg=PA137&lpg=PA137&dq=ruby+file+indexer&source=bl&ots=2yh2lSt1bK&sig=0vjYl4cMJ-3PxayHwg0YJOGYnbk&hl=fr&ei=t1ugSr24Ac74-QaGqsD0Dw&sa=X&oi=book_result&ct=result&resnum=8#v=onepage&q=&f=false>
--
sm
.
- Follow-Ups:
- Re: Client-side search engine capable of indexing .pdf files is needed.
- From: Dr J R Stockton
- Re: Client-side search engine capable of indexing .pdf files is needed.
- References:
- Client-side search engine capable of indexing .pdf files is needed.
- From: gordom
- Re: Client-side search engine capable of indexing .pdf files is needed.
- From: Stefan Weiss
- Re: Client-side search engine capable of indexing .pdf files is needed.
- From: SAM
- Re: Client-side search engine capable of indexing .pdf files is needed.
- From: Stefan Weiss
- Re: Client-side search engine capable of indexing .pdf files is needed.
- From: SAM
- Re: Client-side search engine capable of indexing .pdf files is needed.
- From: Stefan Weiss
- Re: Client-side search engine capable of indexing .pdf files is needed.
- From: SAM
- Re: Client-side search engine capable of indexing .pdf files is needed.
- From: Stefan Weiss
- Client-side search engine capable of indexing .pdf files is needed.
- Prev by Date: Dynamicly add dropdown lists to a form populated with php/mysql query
- Next by Date: Re: Multiple Popups focus problem
- Previous by thread: Re: Client-side search engine capable of indexing .pdf files is needed.
- Next by thread: Re: Client-side search engine capable of indexing .pdf files is needed.
- Index(es):
Relevant Pages
|