Re: [ENG] closer result
- From: Christian Kirsch <ck@xxxxxxx>
- Date: Mon, 12 Dec 2005 12:36:38 +0100
Axel Schwenke schrieb:
> "Bob Bedford" <bedford1@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>>The program that send datas is done by a third part company that doesn't it
>>for me but for the need of hundreds of clients.
>>I use a database coming also from a third part company. I've to try to match
>>datas coming from the 2 companies that, off course, don't have the same
>>structure of datas (as they are concurrent).
>
>
> So I see no need to do this *in* the database. Typical "string
> distance" functions like Levenshtein can't use indexes and are
> therefore seldom implemented *in* the database. Other solutions
> like bi- or trigram counting may benefit from storing preprocessed
> data, however I don't know of any database that supports it.
>
> So you have two choices:
>
> 1. (let somebody) write a string distance function for i.e. the
> Levenshtein algorithm. Preferrably this function would be
> implemented in C/C++ and loaded into MySQL as -> UDF
>
> Then you could use this function as follows:
>
> SELECT ... , Levenshtein(a.string, b.string) AS distance
> FROM table1 AS a
> JOIN table2 AS b
> WHERE distance < $threshold
> ORDER BY distance
>
>
> 2. Do the same outside the database. In both cases you have to test
> the full product (= every combination of entries) of both data sets.
> Some distance approaches are faster if you compare a single string
> against a list of candidates. So if you task is "find best match
> from list and take it if distance is small enough" this would be
> for you.
>
>
Not quite (unless Levenshtein is not a real metric). Since metric(a,b)
= metric(b,a) and metric(a,a) = 0, it's sufficient to check for
(N*(N-1))/2 combinations. Which can still be a lot, but fortunately
less than N*N.
> But FIRST you should LEARN; google the following keywords
>
> string distance
> approximate string matching
I agree.
.
- Follow-Ups:
- Re: [ENG] closer result
- From: Axel Schwenke
- Re: [ENG] closer result
- References:
- [ENG] closer result
- From: Bob Bedford
- Re: [ENG] closer result
- From: Christian Kirsch
- Re: [ENG] closer result
- From: Bob Bedford
- Re: [ENG] closer result
- From: Christian Kirsch
- Re: [ENG] closer result
- From: Bob Bedford
- Re: [ENG] closer result
- From: Axel Schwenke
- [ENG] closer result
- Prev by Date: Re: [ENG] closer result
- Next by Date: Re: verknüpfung in php
- Previous by thread: Re: [ENG] closer result
- Next by thread: Re: [ENG] closer result
- Index(es):
Relevant Pages
|