Re: [ENG] closer result



Axel Schwenke schrieb:
> "Bob Bedford" <bedford1@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>>The program that send datas is done by a third part company that doesn't it
>>for me but for the need of hundreds of clients.
>>I use a database coming also from a third part company. I've to try to match
>>datas coming from the 2 companies that, off course, don't have the same
>>structure of datas (as they are concurrent).
>
>
> So I see no need to do this *in* the database. Typical "string
> distance" functions like Levenshtein can't use indexes and are
> therefore seldom implemented *in* the database. Other solutions
> like bi- or trigram counting may benefit from storing preprocessed
> data, however I don't know of any database that supports it.
>
> So you have two choices:
>
> 1. (let somebody) write a string distance function for i.e. the
> Levenshtein algorithm. Preferrably this function would be
> implemented in C/C++ and loaded into MySQL as -> UDF
>
> Then you could use this function as follows:
>
> SELECT ... , Levenshtein(a.string, b.string) AS distance
> FROM table1 AS a
> JOIN table2 AS b
> WHERE distance < $threshold
> ORDER BY distance
>
>
> 2. Do the same outside the database. In both cases you have to test
> the full product (= every combination of entries) of both data sets.
> Some distance approaches are faster if you compare a single string
> against a list of candidates. So if you task is "find best match
> from list and take it if distance is small enough" this would be
> for you.
>
>

Not quite (unless Levenshtein is not a real metric). Since metric(a,b)
= metric(b,a) and metric(a,a) = 0, it's sufficient to check for
(N*(N-1))/2 combinations. Which can still be a lot, but fortunately
less than N*N.

> But FIRST you should LEARN; google the following keywords
>
> string distance
> approximate string matching

I agree.
.



Relevant Pages

  • Re: [ENG] closer result
    ... > The program that send datas is done by a third part company that doesn't it ... So I see no need to do this *in* the database. ... distance" functions like Levenshtein can't use indexes and are ... Some distance approaches are faster if you compare a single string ...
    (de.comp.datenbanken.mysql)
  • Re: Update datas online
    ... > I think you could do that with a Database Server on the on side and the ... My app already deal with a database server, ... Update-Modify-Delete datas in a database. ...
    (borland.public.delphi.language.objectpascal)
  • Re: JDialog and returning value
    ... I'd like a JDialog to return a value when is closed. ... I have a main frame, then the Dialog is opened, some datas are put by the user in a text area and then the dialog is closed, but I need to access those datas. ... If you need something more complicated than a single String then just get the data from the components in the dialog after it is closed. ... Posted via NewsDemon.com - Premium Uncensored Newsgroup Service ...
    (comp.lang.java.programmer)
  • Re: Converting pointers to non nul-terminated C "strings" to Ada string
    ... implementation must use access types to implement out parameters. ... C string (the position is inevitably less than the length of the ... what does the null character stands for within these ... exactly know what does datas stands for. ...
    (comp.lang.ada)
  • OleDbConnection and MultiThreading
    ... datas and write them to the database, or have to read from the database ... to compare the datas and make updates if needed. ... I open the connection in the main routine ...
    (microsoft.public.dotnet.languages.csharp)