Re: WORD doc info into FMP?



I think it's actually considerably easier than these methods, and doesn't
require a script.

Scanning the text, it appears that each line of the entry follows the format

Field name<colon><space>Data<return>

I'm assuming the line breaks which appear here within the data are the
result of posting to UseNet.

Now, given this structure, you can use the following formula template to
extract any bit of information you want:

_________________________________________

Let([
Field="¶"&"Alternate Title: ";
Line=
MiddleValues(Source;
PatternCount(Left(Source;
Position(Source;Field;1;1));"¶")+1;1)
];
Right(Line;Length(Line)-Length(Field)+1)
)

_________________________________________

Where "Source" is the name of the field you imported/pasted the Word data
into. For example:

1) Create a new calculation field, "Alternate title" with a text result.
2) Enter the formula above, replacing {The Field You Want To Extract} with
"Alternate title"
3) Repeat steps 1 and 2 for all the other fields you're interested in
extracting. (e.g.: "Release date" or "Assistant director")

This leaves the three lines at the beginning of the file. In your example:

627
D.W. Griffith, Inc.
THE STRUGGLE

Is it always three lines, and are the lines always the same? If so you can
just use the MiddleValues function again.

FileNum (calculation, text result)
= MiddleValues(Source,1,1) = "627"

Studio (calculation, text result)
= MiddleValues(Source,2,1) = "D.W. Griffith, Inc."

Title (calculation, text result)
= MiddleValues(Source,3,1) = "THE STRUGGLE"

Am I missing anything?



"Kent" <kent.news.account@xxxxxxxxx> wrote in message
news:a0_qg.127196$IK3.96682@xxxxxxxxxxx
A lot is going to depend on how consistently the records follow a format.
If, generally, the information is in the same order for each one, it will
be easier to use one of the first two methods I suggest below (1, 2). If
they are a hodge-podge of order, your original idea is likely the best bet
(4).

(1) (guessing, since xml is not my thing) If there are a fixed maximum
number of any given type of field (Location, script, sets etc.) you might
have luck by first using some find and replace and macros in word to turn
it into an XML schema, then import from that. It would require you or
someone else around knowing XML and understanding Filemaker's
implementation thereof.

(2) If I were doing it, I'd likely use brute force. If the line headings
are consistent, I'd be tempted to first use find and replace to mark ends
of records, then turn all carriage returns into tabs, then turn my
previously marked end of records back into carriage returns. Using an
extremely wide page, and carefully positioned tabs, you can get a rough
idea whether things are lining up consistently, and insert some extra tabs
to get a better match. Then import the whole thing into a spreadsheet to
figure out how well they line up and do further cleaning. Once it looks
like they line up, use search and replace to remove all the headings,
which would also let you know if there are any misspelled headings
anywhere. Once it is a spreadsheet (saved in Excel format) Filemaker can
open it directly and convert it on the fly.

(3) Another method (and the one I am leaning toward after typing all the
below) might be to use a table, or Excel to put a identifier at the
beginning of each line (either the number [627 in your example] or the
title "THE STRUGGLE") followed by a tab. Then each line would be imported
into a separate record in a database, and a separate related "database 2"
could contain nothing but the number and use the match to display all
related records. In that case, a small sample before importing of the
record you included would look like:

627 D.W. Griffith, Inc.
627 THE STRUGGLE
627
627 Alternate title: Ten Nights in a Barroom (New York State Archives)
627 Filming date: 6 July9 or 14 August 1931
627 Location: Audio Cinema studios, 198th and Decatur Avenue,
The Bronx, New York;
627 exteriors: 175th Street, The Bronx; Stamford Rolling Mills,
Springdale, Connecticut

Then in the second database, selecting product 627 would display all of
that information, either via a portal or perhaps take you to a found set
in the other database. That would let you search the long database of
lines, and use a button to take you to a view of just the matching product
in Database 2, where you would see all information for product xxx.

you could add new records via a portal, and adding new information lines
to a record would be as simple as adding a line with the right ID number.

(4)As you suggested, an alternative would be to replace all carriage
returns (except the one at the end of each record) with a unique string,
such as x*x*x, then import each complete record into a single field of a
different record, then use the headers and position commands to populate
other calculated fields with the appropriate content. If the headings are
not consistent, that may prove to be a nightmare.

Kent



Albert wrote:
Hello there --

I would like to explore the feasibility of importing bibliographic-type
information from a Word document into a Filemaker database. An example of
a single entry is provided at the bottom of this page. I suspect that
the undertaking is too complex, given the variations in the information
from entry to entry, but I want to at least get some other opinions.

Would I be right in thinking that, conceptually, the approach would be to
first import each entry as a single block of data into a single field, so
that I would have one record for each film title -- and then write a
series of scripts for all the different fields I wanted (Film ID #,
Alternate Title, Location . . . etc. etc. etc.) that would in a sense
"mine" the original text block for the needed info?

Let's say there are 700 records like the one below. Does this sound like
the sort of task that is practicable to undertake -- or is it so likely
to involve glitches and errors due to inconsistencies/variations in the
original text that it might be easier (and more accurate) to have someone
input the data by hand?

Any observations appreciated.

Albert

------------------------------------------------------------------------
627
D.W. Griffith, Inc.
THE STRUGGLE

Alternate title: Ten Nights in a Barroom (New York State Archives)
Filming date: 6 July9 or 14 August 1931
Location: Audio Cinema studios, 198th and Decatur Avenue, The Bronx, New
York; exteriors: 175th Street, The Bronx; Stamford Rolling Mills,
Springdale, Connecticut
Distribution: United Artists Corp.
Connecticut preview: late November 1931
New York premiere: 10 December 1931, Rivoli Theatre
Release date: 6 February 1932
Release length: nine reels, 77 or 87 minutes
Copyright date: 25 November 1931 (LP2843)

Director: D.W. Griffith
Assistant director: Richard A. Blaydon
Second assistant director: Jack Aichele
Production manager: Raymond A. Clune
Production advisor: A. Griffith Grey
Script: Anita Loos, John Emerson, (uncredited:) D.W. Griffith
Story: Anita Loos, John Emerson
Source: loosely based on L¹Assommoir, the novel (1877) by Emile Zola; The
Demon Drink, the play by Augustin Daly; L¹assommoire, the play by William
Busrach
Cinematographer: Joseph Ruttenberg; G.W. Bitzer?; Larry Williams?
Camera crew: Nick Rogalli, Richard Hertel, Ben Wetzler, Paul Rogalli
Sets: Clement Williams
Electrician: Johnny Murphy
Supervisor to makeup: Edward Scanlon
Film editor: Barney Rogan
Sound system: Western Electric Recording
Sound recording: Joe W. Coffman
Music arranger/effects: Philip Scheib, D.W. Griffith
Script girl: Alice Hunter
Still photographer: Frank Kirby
Cast: Hal Skelly (Jimmie Wilson); Zita Johann (Florrie Wilson); Charlotte
Wynters (Nina); Evelyn Baldwin (Nan Wilson); Jackson Halliday (Johnny
Marshall); Edna Hagan (Mary Wilson); Claude Cooper (Sam); Arthur Lipson
(Cohen); Charles Richman (Mr. Craig); Helen Mack (A catty girl); Scott
Moore (Al, a gigolo); Dave Manley (Tony, a mill worker)


.