Re: Can Results From A Search By Adobe Reader 9 Or Another Software Be Exported?



Martin wrote:
Hello,

I'm currently using Adobe Reader 9 (free) to search for common terms within
a folder in which I keep certain PDF files.

Quite frequently, I locate multiple instances of the searched term, which
then results in several hours of noting, by pen on paper, the particulars
involving each term located.

Though I have yet to locate it, does Adobe Reader 9 have a function that can
be enabled to export the results, either one-by one or as a batch (much
preferred!) into TXT or another format?

Failing the above, is there a software available that can accomplish my
objective of saving/exporting results?

It depends on what "particulars" you need to extract/record for each hit. Most PDF files contain little structural information that a computer can detect, so if you wanted -- for example -- to record the structural location (eg chapter 17, section 5, subsection 4) you'd probably have to do that by hand no matter what kind of searching you used.

On the other hand, if all you need is some words of context (eg the sentence and the page) then it's not hard to do this with a simple script. pdftotext, for example, outputs paragraphs as long lines and inserts a FF character at each page break, so a few lines of Perl or awk can easily find the word you are looking for, counting pages and paragraphs as it goes. I did something similar recently to extract a 5-word-either-side context for a simple search:

for f in *.pdf; do

echo $f
pdftotext $f - |\
grep -Ein '(^\ |text)' |\
awk -F: '{if(substr($2,1,1)=="\f"){++page;line=$1}} \
/.*text.*/ {n=split($2,w," ");\
for(i=1;i<=n;++i) \
if(w[i]~q){print f,"p." page+1,"para",$1-line;s="";\
for(j=i-5;j<=i+5;++j){s=s " " w[j]};\
print s}}' f=$f q="text"

done

(substitute the word you want for "text"). This formats the output as:

thesis.pdf p.83 para 12
the hierarchy which merely quoted text from an earlier post thesis.pdf p.101 para 12
in almost any situation where text has to be typed or

etc.

///Peter
.



Relevant Pages

  • Re: PDF linking problem
    ... Acrobat 6.0 professional. ... machines and their is no pattern to which ones open. ... I like PDF files because I can secure them. ... has Adobe Reader 6.0 on their machine. ...
    (microsoft.public.frontpage.client)
  • Re: Acrobat Reader
    ... Adobe Reader 7.0.0 can read documents in PDF format. ... you to search within PDF files, search for PDF files on the internet and ... Summary: Adobe Reader browser plugin ... %defattr(-, root, root) ...
    (Fedora)
  • RE: Hyperlinks to PDF documents
    ... Office document, such as Microsoft Word or Microsoft Excel. ... Linked PDF files that are located on web servers open as expected. ... Install the Acrobat 7.0.1 update or the Adobe Reader 7.0.1 update. ... > We have any word documents that contain hyperlinks to PDF files. ...
    (microsoft.public.word.docmanagement)
  • Re: File association issue
    ... show Adobe Reader 8.0 as a generic icon as well. ... If this happens with all file associations, ... If the problem only exists with Adobe .pdf files, ...
    (microsoft.public.windowsxp.general)
  • Re: Parsing some pdf files failed
    ... I modified you script to look like this: ... produces copious non-text output with that script, but pdftotext gives ... The pdf files seem to have been created originally ... Hektor site: fascinating work! ...
    (comp.lang.perl.misc)