Re: NetTools--Accessing All Levels from the First Level of a Website



In <1147030100.716476.256720@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> TIGER wrote:

Matt Wills wrote:
In <1146836795.782137.255670@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> TIGER
wrote:
Matt,

Response problem seems to centre around perceptions of whether
issue is Filemaker or =B1NetTools based.

=B1Yes, you are ritht, =B1I am trying to access all levels. What
am doing now is finding the contacts page and then manually posting
in the url and then using your syntax finding the email address.
Acrobat 7.0 has a open =B1=B1PDF for a website in which you can
specify all levels and it rolls out text for all the links. I
wonder if there is a way to llink to that routine.

Also, at a dead end expanding your syntax to pick up multiple
emails.

Last issue brought up to Wayne is handling of records that do not
have a website. What his syntax does is posts in the website
results from the previous and last record that has a website.
Beleive we need an if statement in his syntax to overcome this
problem. Haven't heard from him yet on this one.

Am getting up to speed with Filemaker after 5 years of not using it.
=B1It"s amazing how quickly one loses the logic for what has always
been to me a steep learning curve program. So, be patient with me
and I will get there.

=B1Have a major presentation on my project status to the CE=B1O
here and am going to advise employing a backup Filemaker consultant,
but I=B1 don't know who to chose. Any ideas you may have would be
appreciated.


The email extraction routine I gave you does just one address in a
page.

Off the top of my head, multiple addresses would involve first
determining how many there are, using PatternCount to see how many @
's there are. Then it would be a matter of stepping up the
positioning in the formula to locate each complete address in
succession.

It would be pretty much the same thing for spidering pages starting
at the top level: use PatternCount to determine how many links there
are on a page, then extract the full address into a temporary table.
I would probably do some checking to make sure the extracted URL is
in the same domain before going to any of those pages.

Matt

Matt,

Thanks. Understand where you are coming from. Don't know the
sequence for "stepping up the positioning in the formula to locate
each complete address in succession. What function do I use here?

Also, what feedback did you get from Wayne concerning support and more
specifically gaining access to all levels (pages) in a website using a
modified NetTools plugin.

Thanks,

Jim


If you look at the calculation that extracts the address, you'll note it
uses the Position function to locate the first occurrence of "mailto"
starting from position 1 (beginning of the field).

Replace the occurence element of the function with a script variable
$Count or a global field Count. In a loop, once you retrieve an address,
increment $Count or Count by 1.

Precede all this with a PatternCount to determine how many "mailtos"
there are in the page, and exit the loop when the value of $Count or
Count (addresses extracted) exceeds the pattern count.

Roughly the same proceedure would be used to spider the page for links:
PatternCount how many "http's" there are, extract them one at a time
using the same method (storing them in a utility table, to come back to
them later).

#2: Something must be going on with Wayne. He hasn't been immediately
available on AOL IM in the evenings as before.

Matt
.



Relevant Pages

  • Re: Script for Accessing an Email Address Embedded in a Website
    ... Work from their website using built in Filemaker ... "Open URL" script but need to develop script further to find an email ... Position function, and using the Middle function, extract everything ...
    (comp.databases.filemaker)
  • Re: Automating search for words in a website using WSH
    ... The URL is fixed (the one I give here is not the actual website, ... ideas as to extract that data. ... going to have to parse the HTML. ... HTML scraping is very specific to the ...
    (microsoft.public.scripting.wsh)
  • Re: joysticks,game controllers not support with my xp computer
    ... Use MSConfig to extract the files. ... > |I have stopped trying to fix the missing HID problem for a while since I ... I still cannot get the files the website says I ... The listed files it says are needed are ...
    (microsoft.public.windowsxp.games)
  • Re: MS Giving VC++ 2003 Away Free?
    ... "Andy Mortimer " wrote: ... I was looking to extract the files from the package, ... Maybe I didn't search the Microsoft website long enough, ... I only spent 5 minutes searching for the required information. ...
    (microsoft.public.vc.mfc)
  • Re: ATI ES 1000 chip
    ... of supported chipsets b ... Extract from the 3.1.8 pro readme: ... Okay...I was looking at the list that they posted on their website rather than the readme but the readme probably has the last and best information. ...
    (comp.os.os2.misc)