Re: I would've sworn it was mentioned here



Ian Goddard wrote:

singhals wrote:

Didn't someone in the past, oh, say, month, mention software that would compare databases and flag matches?

Not necessarily a *specific* genealogy program database, jsut databases in general?

I'm looking for an easy way to vacuum up "hit" lists from Ancestry. WC, Google, et al, and find the common ones.


Cheryl


I don't recall anything like that and a quick google doesn't find anything. Wishful thinking?

It's an interesting problem. First of all what's the format of the hit lists? Are the hits from all the sources in the same format?

Secondly, most comparison tools that I can think of work on a specific file format, usually a flat text file although there are some that work on XML files. You would need to get the files into the appropriate format.

Thirdly, many comparison tools do the opposite of what you want - they look for differences. My favourite approach to looking for multiple occurrences of *identical* lines across multiple files would be the Unix command

cat x y z|sort|uniq -c|sort -rn|more

where x, y & z would be 3 file names (you can cat as few or many files as you like). This will merge the contents into alphabetical order so that duplicates follow each other, process each line with the count of times it was found, re-sort them in descending order of count and page the output. You can then see which lines were in more than one file but not which file they were in.

This requires that you have the hits in a common flat file format or can convert them to that; that hits which you would consider matching are identical within the files; that you either don't care which lists the matches were in, don't mind just comparing them in pairs or are prepared to hunt for them in the files and finally that you have access to Unix-style commands (if you're on Windows only, google for "cygwin").



Yes, quite possibly I was mis-remembering either the details or the list. I couldn't find it either. (g)

I've done it by hand, and it's not /that/ onerous, but the person who needs it would reach for the smellin' salts if I mentioned Unix or even CMD lines.

Thanks.

Cheryl
.



Relevant Pages

  • Re: I wouldve sworn it was mentioned here
    ... compare databases and flag matches? ... Not necessarily a *specific* genealogy program database, jsut databases in general? ... Are the hits from all the sources in the same format? ... This requires that you have the hits in a common flat file format or can convert them to that; that hits which you would consider matching are identical within the files; that you either don't care which lists the matches were in, don't mind just comparing them in pairs or are prepared to hunt for them in the files and finally that you have access to Unix-style commands. ...
    (soc.genealogy.computing)
  • Re: Lebans Calendar Question
    ... I developed a simple databases for production reporting at our manufacturing ... I changed my short date format through my ... and it was reflected prpoerly in the Calendar window. ...
    (microsoft.public.access.forms)
  • Re: Accounting, Database Problem
    ... > accounting value to each user. ... if the web site in question gets less than, say, 500 to a thousand ... hits a days. ... Text files (or "flat file" databases) are only good if you don't have ...
    (comp.lang.perl.misc)
  • Re: Advice on desktop search engine
    ... Windows desktop does. ... Unfortunately, Copernic does not do databases, in any format. ... desktop search programs. ...
    (sci.lang.translation)
  • Re: simple persistence without databases
    ... >I have a simple application that has been using standard serialization ... > for the save format. ... Using databases does involve a learning curve. ... It is better to define an external storage format ...
    (comp.lang.java.databases)