Re: HTML - Extract Specific Text



In article <2006031914484216807-jh@postercom>, Jason <jh@xxxxxxxxxx>
wrote:

Hi,
I'm trying to develop a database that uses a plug-in (Troi or yooWeb)
to get HTML data from a database. The return HTML appears fine and I
can put this into a text field on its own, but I need to search the
text and extract only a few peices of information from each page, and
then put those small peices into my own DB.
For Example:

---Whole Page ---
find text "<br>No: xxxx-xxx-xx <br>" and paste xxxx-xxx-xx into a field
find text "<br>Type: Some Data <br>" and paste 'Some Data' into a
field... etc etc
---End Page ----

I can find the end of the data by searching for the HTML line break
code (which is always the same on each pages return).

Is there any way of doing this from scripts within FMP? AppleScripts? I
have the possiblity of passing the data into MS Word, and creating a
VBA script to do this for me, but it seems a very long winded way of
doing an apparently simple job!

Can anybody point me in the right direction?

There's no need for AppleScripts or VBA scripts (even even FileMaker
scripts for that matter).

FileMaker has it's own text functions built-in (Left, Right, Middle,
Position, etc.) that can be used in any calculation. As long as the
imported text is standardised you can easily extract whatever you want.
By looking for the standard pieces using the Position function you can
work out where to start and finish (if needed) extracting the text ...
probably in a very similar way to how you where going to achieve it in
a VBA script.

If the "xxxx-xxx-xx" number is ALWAYS preceeded by "No: " and is ALWAYS
11 characters long then you can extract just the number using a
calculation like:

Middle(ImportedText,
Position(ImportedText, "No: ", 1, 1) + 4,
11)

This looks for the position in ImportedText of the first match to the
text "No: ", and then takes the 11 characters starting with the fourth
character after the 'N' of "No: " - the result is just the
"xxxx-xxx-xx".


Similarly, if "Type: " ALWAYS preceeds the "Some Data" text that you
want then it can be extracted with a Calculation like:

Middle(ImportedText,
Position(ImportedText, "Type: ", 1, 1) + 6,
Position(ImportedText, " <br>",
Position(ImportedText, "Type: ", 1, 1), 1)
- Position(ImportedText, "Type: ", 1, 1) - 6)

Just like the calculation above, this one looks for the position of the
first match to the text "Type: " in Imported Text and extracts the text
starting with the sixth character after the 'T' of "Type: " ... but
this time we don't have an exact number of characters to extract, so
instead of just taking 11 characters we have to search for the position
of the first "<br>" tag AFTER "Type: " and stop just before that. The
number of characters we need to extract is found by knowing the
location of the first "<br>" tage after the "Type: " text:
ie.
Position(ImportedText, " <br>",
Position(ImportedText, "Type: ", 1, 1),
1)

and subtracting the location of the "Type: " text and those six
characters. The result this time is just "Some Data" ... as long as
"Some Data" doesn't contain a <br> tag within it.

Both of these work for your simple example above and can be used in any
FileMaker way you need:

- a simple Calculation field (with a Text result),

or - an Auto-Enter calcultion for a normal Text field,

or - in a Script using the Set Field or Insert Text / Insert
Calculated Result commands,

or - CAREFULLY in a calculation for the Replace command (via
script or Records menu).

They can probably also be easily modified for use as a Custom Function
in newer versions of FileMaker, but I don't have a version with that
ability.


Note: Some versions of FileMaker use the ";" character to separate the
parameters in functions rather than the "," character, so you may have
to change them.





Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)
.



Relevant Pages

  • Re: Remove extra spaces and line endings
    ... > All those calcs below are lovely, but why jump through so many hoops to ... The Trim function ONLY removes leading / trailing space characters - it ... > why not just run a replace with calculation on the original field? ... > it part of the import script and you'll never know it's happening. ...
    (comp.databases.filemaker)
  • Re: Script to add new records based on incremental serial numbers?
    ... to receive the number of machines to be built. ... If you want a uniform number of characters, make the calculation ... Plan for the maximum number of characters you might want. ... Is this the first part of my script? ...
    (comp.databases.filemaker)
  • Grep and cut removing spaces
    ... I'm using grep and cut to extract 30 characters from a file. ... the script runs, ... Value= ABCD 12345678>>> Note there is only 1 space ...
    (comp.unix.shell)
  • Re: Grep and cut removing spaces
    ... > I'm using grep and cut to extract 30 characters from a file. ... > the script runs, ...
    (comp.unix.shell)
  • SQL SCRIPT
    ... I need to write a script that will return data on a condition; ... I have a field that has the ability to take 30 characters, ... extract all rows that have less than 10 characters in this specific ...
    (microsoft.public.sqlserver.programming)