Re: HTML - Extract Specific Text
- From: Helpful Harry <helpful_harry@xxxxxxxxxxxxxxxx>
- Date: Mon, 20 Mar 2006 12:39:06 +1200
In article <2006031914484216807-jh@postercom>, Jason <jh@xxxxxxxxxx>
wrote:
Hi,
I'm trying to develop a database that uses a plug-in (Troi or yooWeb)
to get HTML data from a database. The return HTML appears fine and I
can put this into a text field on its own, but I need to search the
text and extract only a few peices of information from each page, and
then put those small peices into my own DB.
For Example:
---Whole Page ---
find text "<br>No: xxxx-xxx-xx <br>" and paste xxxx-xxx-xx into a field
find text "<br>Type: Some Data <br>" and paste 'Some Data' into a
field... etc etc
---End Page ----
I can find the end of the data by searching for the HTML line break
code (which is always the same on each pages return).
Is there any way of doing this from scripts within FMP? AppleScripts? I
have the possiblity of passing the data into MS Word, and creating a
VBA script to do this for me, but it seems a very long winded way of
doing an apparently simple job!
Can anybody point me in the right direction?
There's no need for AppleScripts or VBA scripts (even even FileMaker
scripts for that matter).
FileMaker has it's own text functions built-in (Left, Right, Middle,
Position, etc.) that can be used in any calculation. As long as the
imported text is standardised you can easily extract whatever you want.
By looking for the standard pieces using the Position function you can
work out where to start and finish (if needed) extracting the text ...
probably in a very similar way to how you where going to achieve it in
a VBA script.
If the "xxxx-xxx-xx" number is ALWAYS preceeded by "No: " and is ALWAYS
11 characters long then you can extract just the number using a
calculation like:
Middle(ImportedText,
Position(ImportedText, "No: ", 1, 1) + 4,
11)
This looks for the position in ImportedText of the first match to the
text "No: ", and then takes the 11 characters starting with the fourth
character after the 'N' of "No: " - the result is just the
"xxxx-xxx-xx".
Similarly, if "Type: " ALWAYS preceeds the "Some Data" text that you
want then it can be extracted with a Calculation like:
Middle(ImportedText,
Position(ImportedText, "Type: ", 1, 1) + 6,
Position(ImportedText, " <br>",
Position(ImportedText, "Type: ", 1, 1), 1)
- Position(ImportedText, "Type: ", 1, 1) - 6)
Just like the calculation above, this one looks for the position of the
first match to the text "Type: " in Imported Text and extracts the text
starting with the sixth character after the 'T' of "Type: " ... but
this time we don't have an exact number of characters to extract, so
instead of just taking 11 characters we have to search for the position
of the first "<br>" tag AFTER "Type: " and stop just before that. The
number of characters we need to extract is found by knowing the
location of the first "<br>" tage after the "Type: " text:
ie.
Position(ImportedText, " <br>",
Position(ImportedText, "Type: ", 1, 1),
1)
and subtracting the location of the "Type: " text and those six
characters. The result this time is just "Some Data" ... as long as
"Some Data" doesn't contain a <br> tag within it.
Both of these work for your simple example above and can be used in any
FileMaker way you need:
- a simple Calculation field (with a Text result),
or - an Auto-Enter calcultion for a normal Text field,
or - in a Script using the Set Field or Insert Text / Insert
Calculated Result commands,
or - CAREFULLY in a calculation for the Replace command (via
script or Records menu).
They can probably also be easily modified for use as a Custom Function
in newer versions of FileMaker, but I don't have a version with that
ability.
Note: Some versions of FileMaker use the ";" character to separate the
parameters in functions rather than the "," character, so you may have
to change them.
Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)
.
- References:
- HTML - Extract Specific Text
- From: Jason
- HTML - Extract Specific Text
- Prev by Date: Re: HTML - Extract Specific Text
- Next by Date: Re: Custom Function
- Previous by thread: Re: HTML - Extract Specific Text
- Next by thread: Date fields
- Index(es):
Relevant Pages
|