|
Ironically, googling that brings it up for you haha
|
|
|
|
|
Ummm... what?
Apparently there is an Easter Egg for it, but I don't see how that qualifies as "ironic".
modified 29-Dec-14 16:49pm.
|
|
|
|
|
Hint: Start by writing a tokenizer.
Advanced: Write / use a multi-lingual tokenizer like the ones used in enterprise search solutions.
Extra points: Make a performance comparison running the tokenizer in expansion and reduction mode.
Cheers!
"I had the right to remain silent, but I didn't have the ability!"
Ron White, Comedian
|
|
|
|
|
What's the tokenizer for?
|
|
|
|
|
Hmm, how do I phrase this?
A tokenizer creates tokens.
Cheers!
"I had the right to remain silent, but I didn't have the ability!"
Ron White, Comedian
|
|
|
|
|
Of course it does, but what is its purpose in the problem?
|
|
|
|
|
Procrastination!
You move the problem of finding the largest common substring in two strings to finding the largest common sequence of tokens in two token streams.
You know that it ruins a joke if you have to explain it, don't you?
"I had the right to remain silent, but I didn't have the ability!"
Ron White, Comedian
|
|
|
|
|
Right, the tokens are just characters, so that's done.
|
|
|
|
|
The fastest Levenshtein distance algorithm I've tried converted strings to integer arrays first.
Wrong is evil and must be defeated. - Jeff Ello
(√-sh*t) 2
|
|
|
|
|
Might be sacrificing space for speed?
What size character and integer? Did it convert individual 8-bit characters to 32-bit integers? Or put four characters in each integer?
|
|
|
|
|
PIEBALDconsult wrote: Might be sacrificing space for speed?
Yes but not badly so.
PIEBALDconsult wrote: What size character and integer?
UTF16 (windows internal) and Int32.
The point is simply that integer math is faster than string comparisons.
Wrong is evil and must be defeated. - Jeff Ello
(√-sh*t) 2
|
|
|
|
|
Jörgen Andersson wrote: string comparisons
Sure, but you don't do any actual string comparisons, only character comparisons, which are very similar to integer comparisons in most repects.
|
|
|
|
|
To create tokens for video game play. Like the old arcades.
|
|
|
|
|
I'm going to help Santa out with his luggage that is left over.
The sh*t I complain about
It's like there ain't a cloud in the sky and it's raining out - Eminem
~! Firewall !~
|
|
|
|
|
Jörgen Andersson wrote: To find the longest common substring between two strings. Good times, you can do it in O(n+m). Suffix trees are pretty cool.
Jörgen Andersson wrote: Assume that both strings are in the same language, etc Why? None of this matters. It's a problem on strings, what they mean is of no consequence.
"tag" is a substring of both "let's play laser tag next weekend" and of "Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn".
|
|
|
|
|
harold aptroot wrote: Why? None of this matters. It's a problem on strings, what they mean is of no consequence.
Depends on how you attack the problem. I'm not limiting you to existing solutions you find on wikipedia.
Or maybe I'm just confusing you.
Wrong is evil and must be defeated. - Jeff Ello
(√-sh*t) 2
|
|
|
|
|
This question was asked in the lounge in the last couple of months......well, at least I'm sure it was.
|
|
|
|
|
Link?
Wrong is evil and must be defeated. - Jeff Ello
(√-sh*t) 2
|
|
|
|
|
|
Stealing that one.
Wrong is evil and must be defeated. - Jeff Ello
(√-sh*t) 2
|
|
|
|
|
No need to, it is all over the internet, it was one of the 'big things' that were discussed earlier in the year.
Just go to google and search 'shruggie'.
I have it as an alternate email signature at work in outlook, and if someone asks a daft question I just reply with a blank email and change the signature! Comes in handy
|
|
|
|
|
Well, haven't seen it before and I can see the use.
Wrong is evil and must be defeated. - Jeff Ello
(√-sh*t) 2
|
|
|
|
|
|
|
Search what?
|
|
|
|