|
If the set is fully populated, then it simply deserves one bit per plate, obviously; and then it doesn't need any intelligence. It is when the set is sparse that some intelligence can improve the situation. You don't want 78B license plates, do you? (that would be 10 per human being) So don't judge the quality, efficiency, or whatever characteristic for a sparse solution by feeding it the numbers of a full set.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
Oh right... I missed a minor part of your solution... you are only storing the prefixes in use . Just places like Google, when they interview you, the want solution that work at all levels .
|
|
|
|
|
Again, nothing to do with license plates, just how you handle vast amounts of data (think Google or something along that scale).
With your solution, you are using the first 6 chars as the "prefix" / key and then "compressing" the last char... (36 plates into one entry).
So wouldn't you need the FULLY populated dictionary if say, every 36th plate was taken .
So your storage requirement is 20GB between 2B plates and 78B plates . 2B in 20GB vs. 78B in 9GB with the bit array.
Sorry, hope I'm not getting you angry or anything , just this is how the interview went, so its a real world thing. They kept poking holes in everything haha...
But unless I misunderstood your algorithm, it does seem like you need the full 20GB as soon as you hit 2B plates (assuming 1 in every range of course)... if you had them in tightly packed groups, then your requirements wouldn't be as great.
I even told the interviewer once I started getting annoyed at his hole poking "Well, at that point I would probably change the license plate # selector algorithm to hand out #'s from a more tightly packed range". Lol.
|
|
|
|
|
Again, I offered 3 ideas. Your current concerns get handled by the first one. I gave a detailed implementation for the third and simplest approach, I am not going to do the others as well.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
If the data is going to be the worst case for whatever datastructure you pick, I pick the packed bit array. Information theory gets in the way otherwise. Any datastructure that can be smaller than the packed bit array can only do so because it's using some sort of regularity in the data - if there isn't one then there are 2^(36^7) possible states requiring 36^7 bits to store and that's the end of it.
|
|
|
|
|
right.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
Yeah, I dunno what structure he was expecting. My guess is he was just trying to get me to punch him in his face or something. Who knows? The packed bit array is best in the worst case obviously, but it's not good in the best case or an even remotely "average" case. 1 license plate should not take up 9GB of memory . I think he wanted something along the lines of a DAWG. If Office can store the entire English dictionary in 4MB, then surely this problem can be solved in less space .
|
|
|
|
|
Was googling how spell checkers and dictionaries store words. Seems like most use a DAWG.
http://en.wikipedia.org/wiki/Directed_acyclic_word_graph[^]
I guess if I stored each license plate in the DAWG and did a "spell check" on it... it would work, although I'm not sure how well that'll scale.
Says a DAWG is most space efficient, so maybe thats the answer.
EDIT: saw a sample on the net where the guy said a dictionary was 17MB and the DAWG version was only 4MB. He didn't say how many words, etc.
|
|
|
|
|
|
Thanks for the suggestion.
Does it work in such a way that Chris understands when and how texts gets pasted in forum messages, resulting in CP article links being turned into article titles, other links being linkified, and pasted code being formatted with PRE tags?
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
my previous reply was using Dragon and CP auto-magically linked the URLs and created tags.
|
|
|
|
|
Thanks. I'll give it a spin.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
|
|
|
SledgeHammer01 wrote: I had one large company disqualify me because they assumed I couldn't write
socket code because I didn't memorize the 7 layer OSI model
lol.
I had one interviewer who got visibly upset with me after I said I was capable of writing database code but I couldn't explain the 5 rules of normalization.
Might note that I still can't. But I do know that 2 of them are absolutely worthless for practical programming.
|
|
|
|
|
jschell wrote: I had one interviewer who got visibly upset with me after I said I was capable of writing database code but I couldn't explain the 5 rules of normalization.
SEVEN!
There are seven levels of normalization as defined, with 6NF being the top. None of the companies that I worked for went beyond BNF. Now, would that moron be able to explain why he'd need to normalize to the fifth level, or was it merely random?
Bastard Programmer from Hell
|
|
|
|
|
You know, I couldn't name a single one of these rules, but I went and looked up the poster. OMG!!! Instead of having duplicate data break it out into a separate table and create a FK index into it!! OMG... you are so high brow!! (not directed at you lol, but the people who care about knowing the 'book term' I mean) . Ok, I've been designing my tables like this for 16 yrs, but seriously had ZERO clue that this is what it was called. Ok... guess I suck at SQL now too .
Many years ago, I went to an interview at AutoByTel when I was first starting out in C# and the guy asked me if I knew what boxing & unboxing was. I had no clue. I went home and looked it up and was like OMG!!!! its passing around something as an object!!! OMG!!! I don't know C# at all. I have never done anything so complex!!! That guy must have 7 phd's in C#!!! He is brilliant!!
|
|
|
|
|
I once got ask why a variable would be scoped with friend (back in my VB days) I had no idea, 3 hours later I certainly did but the opportunity was long gone!
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
SledgeHammer01 wrote: Pretty much Think most "real" companies ask these retarded doomsday questions now.
That's a good thing; it's not like there's only one job available, and I almost always walk away smiling
SledgeHammer01 wrote: I had one large company disqualify me because they assumed I couldn't write socket code because I didn't memorize the 7 layer OSI model
You need a thief to catch a thief, and a programmer to identify a software-developer. If they're looking for someone who can memorize well, they are obviously not in the position to pick a decent developer.
Sounds like a bureaucracy, and you can't put a worker between people who merely shove with paper and responsibilities.
Bastard Programmer from Hell
|
|
|
|
|
SledgeHammer01 wrote: so stuff like databases, etc. are irrelevant...Say you are working for the DMV
First thought - the requirements are broken, because of course a database is the solution.
Absolutely no one is going to just store the plate number. And of course no one would save the entire range.
So one is left with the silly task of creating a database from scratch.
Databases are based on indexes and pages. Presuming all they care about is the indexed plate number you just create a sparse b-tree index. First block in the file is the first letter followed by an page index (or zero for no use) to the next page and next letter. Each page in this structure has a fixed size with a pointer to the real record.
36 * 8 bytes
For optimization reasons you can increase node size to 2 characters, which is more likely to be reasonable with native storage device page size.
SledgeHammer01 wrote: ...because it didn't scale to handle the worst case or even beyond 200M ranges.
And *again* broken requirements. There are about 250 million vehicles in the United states. License plates are issued by state and other entities. So a valid bid for a system at the current time would never need to manage 200 million.
|
|
|
|
|
The point was to deal with a large amount of data, not license plates specifically .
|
|
|
|
|
|
Collin Jasnoch wrote: I have noticed a pattern with him missing the point on many posts You're just bitching b/c you got stuck in a troll thread in the Silverlight forum!
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
Collin Jasnoch wrote: I have noticed a pattern with him missing the point on many posts. Gets stuck on the hypothetical part.
When people decide to denigrate me I always appreciate it if they try to be a bit creative about it.
Just saying.
|
|
|
|