Re: count occurance of a word/string in the body of an HTML page



In comp.lang.javascript message <aec1b339-3206-4aa8-b374-7943f02aee3f@c2
9g2000yqd.googlegroups.com>, Thu, 27 Aug 2009 11:16:27, Question Boy
<question.boy@xxxxxxxxxxx> posted:
I'm trying to find an easy way to count how many time a given word
appear on a webpage. For instance, I would like to be able to count
the number of occurance of the word 'Accepted', how would I go about
this?

No, occurrences.

If the Web page is not yours, you can take a copy of the source and work
on that, so one can assume source to be available. However,
straightforwardly counting words in the source is not going to give,
reliably, the right answer. The word may appear in comment, or within
HTML tags, or in JavaScript or VBScript; and code may write it
conditionally or repeatedly. The word may be in an undisplayed or
hidden part of the page. The word may be generated by included script,
and not be in the source at all. The word may be computed - consider
what document.write( ['mk'+'op', '\x44um'].reverse().join("")+"f" )
might give.

You wrote "appear on a webpage". Display the web page, use Select All
and Copy; then paste it into something which can count words. I think
MS Word can do it; alternatively, you can paste it into a textarea and
match its value property with a well-chosen RegExp. See in my
<URL:http://www.merlyn.demon.co.uk/js-valid.htm>.

You will need to be very careful to see that you implement an
appropriate definition of a word. Will, for example, the word "Accep-
ted" be found? If looking for "paw", should it be found in "cat's-paw"?

Given what you wrote above, should you also be looking for alternative
spellings?

--
(c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)
.



Relevant Pages

  • Re: count occurance of a word/string in the body of an HTML page
    ... appear on a webpage. ... the number of occurance of the word 'Accepted', ... Danny Goodman's books are out of date and teach practices that are ...
    (comp.lang.javascript)
  • Re: count occurance of a word/string in the body of an HTML page
    ... appear on a webpage. ... the number of occurance of the word 'Accepted', ... Danny Goodman's books are out of date and teach practices that are ...
    (comp.lang.javascript)
  • Re: count occurance of a word/string in the body of an HTML page
    ... the number of occurance of the word 'Accepted', ... as asked - "appear on a webpage". ... Apparently, document.body.textContent fails in IE8. ... can appear in images. ...
    (comp.lang.javascript)
  • Re: Getting parameters from the get method
    ... > "Matt" wrote ... >> that will basically act as a wrapper from one webpage to another. ... >> wondering how I can use just Javascript and read those values passed to ... longer passes query strings from ActionScript links on local files? ...
    (comp.lang.javascript)
  • Re: WebBrowser Control (VB5, VB6) & window.external. Possible?
    ... I know I can easily "inject" data into the webpage for use in Javascript like this: ... In Javascript, I can then simply read the contents of the ComputerName field. ... I would need to inject this data in intervals which uses up unnecessary resources. ... intranet, or it could be something as simple as a few pre-defined ...
    (microsoft.public.vb.general.discussion)