Re: [ENG] closer result



"Bob Bedford" <bedford1@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> The program that send datas is done by a third part company that doesn't it
> for me but for the need of hundreds of clients.
> I use a database coming also from a third part company. I've to try to match
> datas coming from the 2 companies that, off course, don't have the same
> structure of datas (as they are concurrent).

So I see no need to do this *in* the database. Typical "string
distance" functions like Levenshtein can't use indexes and are
therefore seldom implemented *in* the database. Other solutions
like bi- or trigram counting may benefit from storing preprocessed
data, however I don't know of any database that supports it.

So you have two choices:

1. (let somebody) write a string distance function for i.e. the
Levenshtein algorithm. Preferrably this function would be
implemented in C/C++ and loaded into MySQL as -> UDF

Then you could use this function as follows:

SELECT ... , Levenshtein(a.string, b.string) AS distance
FROM table1 AS a
JOIN table2 AS b
WHERE distance < $threshold
ORDER BY distance


2. Do the same outside the database. In both cases you have to test
the full product (= every combination of entries) of both data sets.
Some distance approaches are faster if you compare a single string
against a list of candidates. So if you task is "find best match
from list and take it if distance is small enough" this would be
for you.


But FIRST you should LEARN; google the following keywords

string distance
approximate string matching


XL
.



Relevant Pages

  • Re: [ENG] closer result
    ... >>The program that send datas is done by a third part company that doesn't it ... >>I use a database coming also from a third part company. ... > distance" functions like Levenshtein can't use indexes and are ... > Some distance approaches are faster if you compare a single string ...
    (de.comp.datenbanken.mysql)
  • Re: Update datas online
    ... > I think you could do that with a Database Server on the on side and the ... My app already deal with a database server, ... Update-Modify-Delete datas in a database. ...
    (borland.public.delphi.language.objectpascal)
  • Re: JDialog and returning value
    ... I'd like a JDialog to return a value when is closed. ... I have a main frame, then the Dialog is opened, some datas are put by the user in a text area and then the dialog is closed, but I need to access those datas. ... If you need something more complicated than a single String then just get the data from the components in the dialog after it is closed. ... Posted via NewsDemon.com - Premium Uncensored Newsgroup Service ...
    (comp.lang.java.programmer)
  • Re: Converting pointers to non nul-terminated C "strings" to Ada string
    ... implementation must use access types to implement out parameters. ... C string (the position is inevitably less than the length of the ... what does the null character stands for within these ... exactly know what does datas stands for. ...
    (comp.lang.ada)
  • OleDbConnection and MultiThreading
    ... datas and write them to the database, or have to read from the database ... to compare the datas and make updates if needed. ... I open the connection in the main routine ...
    (microsoft.public.dotnet.languages.csharp)