|
Face detection is quite a complex subject and requires a fair amount of research. However there are some libraries available which you can probably find via Google.
Unrequited desire is character building. OriginalGriff
I'm sitting here giving you a standing ovation - Len Goodman
|
|
|
|
|
Yes face detection is difficult. I use sereral libriaries too.
Everything will be happy!
|
|
|
|
|
<a href=""></a><a href=""></a><a href=""></a><a href=""></a><a href=""></a><a href=""></a><a href=""></a><big><big><small><b></b></small></big></big>
Mens North Face 3 in 1 Some sort of Crest Series ®, no matter if natural conditions associated with climbing the actual Himalayan peak, and also climb hidden within To the south The us jungle huge walls, or even the freezing cool Antarctic expedition apple, all of our out-of-doors activities lovers can certainly withstand this test from the world negative setting, maintain the entire shape comfortable and also dry out. North face outlet.
http://www.northfacehonsale.com/
|
|
|
|
|
I have been looking at Levenshtein Distance again, as part of a tool to display differences in database schema items (scripts of views and such).
I am hoping to find a more memory-efficient way of calculating the distance; I don't like having to allocate a huge two-dimensional array. The best I have come up with so far is to allocate only a 2-by-[length of the shorter string] array.
Has anyone here devised an implementation of Levenshtein Distance that requires less memory?
|
|
|
|
|
You could try the idea of sparse arrays. A sparse array is an array that has a default value for a given element unless it's been set. The setting of the element actually allocates space for that "slot".
So, the underlying implementation of the "array" may be a dictionary or hashtable where the index is the key to the element. If the element doesn't "exist" then the default value is returned.
Capiche?
If your actions inspire others to dream more, learn more, do more and become more, you are a leader." - John Quincy Adams You must accept one of two basic premises: Either we are alone in the universe, or we are not alone in the universe. And either way, the implications are staggering” - Wernher von Braun
|
|
|
|
|
I don't think that would help, because the basic Levenshtein Distance algorithm fills the entire array.
|
|
|
|
|
For similar strings the relevant data in the 2D matrix eventually is located around the diagonal (although the naive implementation of the algorithm, such as shown by Wikipedia, needs all the matrix elements to come up with the distance). So what you could do is:
1. use a sparse matrix, as ahmed already suggested; [ADDED] maybe in the top triangle, you don't need to store those elements that equal their top neighbour plus one, and in the bottom triangle those that equal their left neighbour plus one. However this is likely to fail dramatically if the strings start out with a couple of differences...[/ADDED]
2. or use a band centered around the main diagonal, making the memory cost linear rather than quadratic (unless you want to have the band's thickness a constant fraction of the problem size);
3. or develop something similar to the sliding window approach I applied here: LPTextFileDiff: another textfile compare utility.[^]
For #2 and #3 you would need:
- to estimate the first row and column values from what you already have, that could be tricky and somewhat unreliable;
- to detect pathological cases
You would not get exact distance for very distant strings, which I guess isn't really important.
Another way to attack the problem is by "tokenizing", i.e. replace your input (a sequence of characters) by a sequence of tokens, where a token would be a specific identifier, a keyword, an operator, white space, etc., whatever is relevant in the language you're dealing with. That may reduce the dimension of the problem by a factor of 3 to 5 I'd say.
If you're not inclined to create a full tokenizer (which is much simpler than a full parser), you could still look for popular strings (say all the keywords) and replace each of them by a different Unicode character that is not supposed to be present otherwise.
[ADDED]And then, maybe you don't really want Levinshtein at all, any good measure of difference could be sufficient. How about this stochastic one:
int distance=0;
for(a fixed number of iterations, say 300) {
from the shortest string select a random substring with length 1%
search for it in the longer string;
if (no hit) {
distance+=BIG_NUMBER;
} else {
if (multiple hits) find the one that is closest to the original, position wise (and relative, work in %)
distance+=square of distance of positions in smaller and larger string
}
}
This may take more cycles but it won't cost you a lot of memory!
[/ADDED]
modified 18-Oct-11 22:02pm.
|
|
|
|
|
I gave it another pass, and now I discovered Wikipedia[^] holds the obvious optimization: "We can adapt the algorithm to use less space, O(min(n,m)) instead of O(mn), since it only requires that the previous row and current row be stored at any one time." (I had come to a similar conclusion holding two lines perpendicular to the major diagonal).
And you would only need one sweep, as all you want is the distance, not the way to get there with insert/delete/replace operations.
|
|
|
|
|
I don't know... it sounds similar to what I'm doing, which only reduces the memory footprint, not the complexity of the algorithm. I expect it's still O(mn), but I'll give it another read.
|
|
|
|
|
of course the calculations are O(mn), each matrix element needs some (simple) calculations. If you don't like that (your original question was about memory foot print, not calculations), you shouldn't be using Levinshtein at all; I've never used it myself as I consider it too much trouble to get an estimated difference between strings. That is why I suggested some alternatives. Tokenizing would save both cycles and bytes, the stochastic approach probably only saves bytes, not cycles, as it needs a lot of string scanning.
|
|
|
|
|
I was merely stating that I don't think the page is correct when it says the stated algorithm is O(min(n,m)) .
Anyway, I've had an idea of a different technique forming in my head, but before I try it I wanted to find out what other Code Projectors had used.
|
|
|
|
|
it is O(min(n,m)) in space, and that is what Wikipedia said. The computational effort remains quadratic no matter what.
|
|
|
|
|
I didn't realize that Big-O could be used for space.
|
|
|
|
|
You could use Big-O notation for any cost function you choose; time is the most popular one obviously.
|
|
|
|
|
PIEBALDconsult wrote: Code Projectors LOL. I like that. I'm gonna have to remember that and use it.
If your actions inspire others to dream more, learn more, do more and become more, you are a leader." - John Quincy Adams You must accept one of two basic premises: Either we are alone in the universe, or we are not alone in the universe. And either way, the implications are staggering” - Wernher von Braun
|
|
|
|
|
|
The French article in Wikipedia points to two references that should be of interest to you: http://fr.wikipedia.org/wiki/Distance_de_Levenshtein[^]. Check the Notes et réferences.
These describe implementations that are faster than O(NM) when you bound the number of differences.
|
|
|
|
|
Great link. I seldom see a regional wikipedia page offering more or better information than the English one does.
BTW: the work by Myers is what Nick Butler[^] started from when creating his entry for the lean-and-mean competition[^] two years ago. However IMO it is overkill for what PIEBALD is aiming at.
|
|
|
|
|
and a great algorithmician
Well, from the little info provided by PIEBALD, it is hard to figure out what he needs.
|
|
|
|
|
YvesDaoust wrote: a great algorithmician
YvesDaoust wrote: it is hard to figure out what he needs
he wants the codez.
|
|
|
|
|
Shouldn't be a big deal to find source code of a good diff utility.
|
|
|
|
|
I'm not looking for a diff utility, as stated in my post I'm just looking for a more efficient distance calculation.
|
|
|
|
|
Hey! I'm standing right here, I can hear you!
|
|
|
|
|
I'll give it a look later. Merci.
|
|
|
|
|
You can also have a look at
"A faster algorithm computing string edit distances, William J. Masek, Michael S. Paterson, Journal of Computer and System Sciences, Volume 20, Issue 1, February 1980, Pages 18-31."
It has an O(n^2/log(n)) behavior.
There are faster algorithm for approximate Levenshtein distance computation.
|
|
|
|