Re: [ENG] closer result
- From: Axel Schwenke <axel.schwenke@xxxxxx>
- Date: Mon, 12 Dec 2005 12:27:02 +0100
"Bob Bedford" <bedford1@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> The program that send datas is done by a third part company that doesn't it
> for me but for the need of hundreds of clients.
> I use a database coming also from a third part company. I've to try to match
> datas coming from the 2 companies that, off course, don't have the same
> structure of datas (as they are concurrent).
So I see no need to do this *in* the database. Typical "string
distance" functions like Levenshtein can't use indexes and are
therefore seldom implemented *in* the database. Other solutions
like bi- or trigram counting may benefit from storing preprocessed
data, however I don't know of any database that supports it.
So you have two choices:
1. (let somebody) write a string distance function for i.e. the
Levenshtein algorithm. Preferrably this function would be
implemented in C/C++ and loaded into MySQL as -> UDF
Then you could use this function as follows:
SELECT ... , Levenshtein(a.string, b.string) AS distance
FROM table1 AS a
JOIN table2 AS b
WHERE distance < $threshold
ORDER BY distance
2. Do the same outside the database. In both cases you have to test
the full product (= every combination of entries) of both data sets.
Some distance approaches are faster if you compare a single string
against a list of candidates. So if you task is "find best match
from list and take it if distance is small enough" this would be
for you.
But FIRST you should LEARN; google the following keywords
string distance
approximate string matching
XL
.
- Follow-Ups:
- Re: [ENG] closer result
- From: Christian Kirsch
- Re: [ENG] closer result
- References:
- [ENG] closer result
- From: Bob Bedford
- Re: [ENG] closer result
- From: Christian Kirsch
- Re: [ENG] closer result
- From: Bob Bedford
- Re: [ENG] closer result
- From: Christian Kirsch
- Re: [ENG] closer result
- From: Bob Bedford
- [ENG] closer result
- Prev by Date: Re: verknüpfung in php
- Next by Date: Re: [ENG] closer result
- Previous by thread: Re: [ENG] closer result
- Next by thread: Re: [ENG] closer result
- Index(es):
Relevant Pages
|