Click here to Skip to main content
15,885,546 members
Home / Discussions / Algorithms
   

Algorithms

 
GeneralRe: Hi There is a function in PhotoShop Color match! Pin
shengcheng_jin10-Nov-11 20:57
shengcheng_jin10-Nov-11 20:57 
AnswerRe: Hi There is a function in PhotoShop Color match! Pin
Richard MacCutchan29-Oct-11 7:16
mveRichard MacCutchan29-Oct-11 7:16 
GeneralRe: Hi There is a function in PhotoShop Color match! Pin
shengcheng_jin30-Oct-11 19:37
shengcheng_jin30-Oct-11 19:37 
GeneralRe: Hi There is a function in PhotoShop Color match! Pin
the north face15-Nov-11 15:57
the north face15-Nov-11 15:57 
QuestionLevenshtein Distance Pin
PIEBALDconsult14-Oct-11 6:05
mvePIEBALDconsult14-Oct-11 6:05 
AnswerRe: Levenshtein Distance Pin
TheGreatAndPowerfulOz18-Oct-11 12:44
TheGreatAndPowerfulOz18-Oct-11 12:44 
GeneralRe: Levenshtein Distance Pin
PIEBALDconsult19-Oct-11 6:27
mvePIEBALDconsult19-Oct-11 6:27 
AnswerRe: Levenshtein Distance Pin
Luc Pattyn18-Oct-11 14:08
sitebuilderLuc Pattyn18-Oct-11 14:08 
For similar strings the relevant data in the 2D matrix eventually is located around the diagonal (although the naive implementation of the algorithm, such as shown by Wikipedia, needs all the matrix elements to come up with the distance). So what you could do is:

1. use a sparse matrix, as ahmed already suggested; [ADDED] maybe in the top triangle, you don't need to store those elements that equal their top neighbour plus one, and in the bottom triangle those that equal their left neighbour plus one. However this is likely to fail dramatically if the strings start out with a couple of differences...[/ADDED]
2. or use a band centered around the main diagonal, making the memory cost linear rather than quadratic (unless you want to have the band's thickness a constant fraction of the problem size);
3. or develop something similar to the sliding window approach I applied here: LPTextFileDiff: another textfile compare utility.[^]

For #2 and #3 you would need:
- to estimate the first row and column values from what you already have, that could be tricky and somewhat unreliable;
- to detect pathological cases
You would not get exact distance for very distant strings, which I guess isn't really important.

Another way to attack the problem is by "tokenizing", i.e. replace your input (a sequence of characters) by a sequence of tokens, where a token would be a specific identifier, a keyword, an operator, white space, etc., whatever is relevant in the language you're dealing with. That may reduce the dimension of the problem by a factor of 3 to 5 I'd say.

If you're not inclined to create a full tokenizer (which is much simpler than a full parser), you could still look for popular strings (say all the keywords) and replace each of them by a different Unicode character that is not supposed to be present otherwise.

[ADDED]And then, maybe you don't really want Levinshtein at all, any good measure of difference could be sufficient. How about this stochastic one:
int distance=0;
for(a fixed number of iterations, say 300) {
    from the shortest string select a random substring with length 1%
    search for it in the longer string;
    if (no hit) {
        distance+=BIG_NUMBER;
    } else {
        if (multiple hits) find the one that is closest to the original, position wise (and relative, work in %)
        distance+=square of distance of positions in smaller and larger string
    }
}

This may take more cycles but it won't cost you a lot of memory!
[/ADDED]

Smile | :)
Luc Pattyn [My Articles] Nil Volentibus Arduum
iSad


modified 18-Oct-11 22:02pm.

AnswerRe: Levenshtein Distance Pin
Luc Pattyn18-Oct-11 17:23
sitebuilderLuc Pattyn18-Oct-11 17:23 
GeneralRe: Levenshtein Distance Pin
PIEBALDconsult18-Oct-11 17:54
mvePIEBALDconsult18-Oct-11 17:54 
AnswerRe: Levenshtein Distance Pin
Luc Pattyn19-Oct-11 0:31
sitebuilderLuc Pattyn19-Oct-11 0:31 
GeneralRe: Levenshtein Distance Pin
PIEBALDconsult19-Oct-11 2:40
mvePIEBALDconsult19-Oct-11 2:40 
AnswerRe: Levenshtein Distance Pin
Luc Pattyn19-Oct-11 3:01
sitebuilderLuc Pattyn19-Oct-11 3:01 
GeneralRe: Levenshtein Distance Pin
PIEBALDconsult19-Oct-11 3:16
mvePIEBALDconsult19-Oct-11 3:16 
AnswerRe: Levenshtein Distance Pin
Luc Pattyn19-Oct-11 3:51
sitebuilderLuc Pattyn19-Oct-11 3:51 
AnswerRe: Levenshtein Distance Pin
TheGreatAndPowerfulOz19-Oct-11 4:58
TheGreatAndPowerfulOz19-Oct-11 4:58 
GeneralRe: Levenshtein Distance Pin
PIEBALDconsult19-Oct-11 6:24
mvePIEBALDconsult19-Oct-11 6:24 
AnswerRe: Levenshtein Distance Pin
YvesDaoust19-Oct-11 3:31
YvesDaoust19-Oct-11 3:31 
AnswerRe: Levenshtein Distance Pin
Luc Pattyn19-Oct-11 3:49
sitebuilderLuc Pattyn19-Oct-11 3:49 
GeneralRe: Levenshtein Distance Pin
YvesDaoust19-Oct-11 4:06
YvesDaoust19-Oct-11 4:06 
GeneralRe: Levenshtein Distance Pin
Luc Pattyn19-Oct-11 4:16
sitebuilderLuc Pattyn19-Oct-11 4:16 
GeneralRe: Levenshtein Distance Pin
YvesDaoust19-Oct-11 4:27
YvesDaoust19-Oct-11 4:27 
GeneralRe: Levenshtein Distance Pin
PIEBALDconsult19-Oct-11 6:26
mvePIEBALDconsult19-Oct-11 6:26 
GeneralRe: Levenshtein Distance Pin
PIEBALDconsult19-Oct-11 6:24
mvePIEBALDconsult19-Oct-11 6:24 
GeneralRe: Levenshtein Distance Pin
PIEBALDconsult19-Oct-11 6:28
mvePIEBALDconsult19-Oct-11 6:28 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.