|
Yes, that's my point. Potential plates (at least in this system) are uniformly distributed, so you can't use the clustering to your advantage. For words, there are (say) 100,000 starting with 'exa' but 0 starting with 'ejf'. So your top level of tree, if you make it 3 characters (max 26³ entries), can cut out whole sections of the theoretical word space. That is because of the clustering of words within the search space which is predictable and non-random.
|
|
|
|
|
I think it was you who misunderstood me.
The whole hypothetical situation of storing a buttload of data in memory is fine, but it's not because "no database is available".
SledgeHammer01 wrote: Would you store a complete list of English words in a database
Yes, I have, for checking Scrabble words. In fact I used Access to do it.
For an actual spell-checker I would use a spell check tree, but I still need to persist the list between runs -- ergo, a database, which can then be loaded (perhaps on demand) the next time I run it. (The tree could be more of a cache.)
On a mobile, I expect a Web Service would be more appropriate.
SledgeHammer01 wrote: the packed bit array is the most scalable, compact & fastest runtime.
And it's even better if you can leave it on disk rather than hold it in memory
.
|
|
|
|
|
PIEBALDconsult wrote: it's not because "no database is available".
No, it's because that's what you were asked in the interview . If you told your interviewer that you refused to answer the question aside from using a database and you would refuse to implement any solution that did not use one, I'm guessing you probably wouldn't get the job .
Anyways, while its not likely the majority of the time NOW, there are plenty of scenarios where a database might not be available: mobile, embedded, military, etc. Now sure, I could add a requirement to the Air Force Smart Bomb design where I need to install an Access database on every bomb, but seeing as I'm just going to blow it up, I'm guessing they want to make it as cheap as possible.
Now, before you go off all half-cocked, foaming at the mouth about why a smart bomb would have a spell checker, it wouldn't. It was a hypothetical example of a situation where you MIGHT not have one.
I have an iPhone now, but I used to have a Moto Razr. I'm guessing the iPhone has a database and the Moto Razr didn't. Hmm... weird... the Moto Razr had EXTREMELY limited resources, but it still happened to have a full spell checker in it.
EDIT: the Moto Razr had 5MB of available space. That's for the OS, phone book, pics, videos, games, email, etc. STILL managed to squeeze a spell checker in there... bet there wasn't room for an access database or a database of any variety . Bet they had to use some really efficient data structures.
|
|
|
|
|
V. wrote: Why not just the ones that are available eg?
That doesn't alter the essential question (stupid question it may be) -- creating an in-memory structure to hold a whole lot of data.
|
|
|
|
|
PIEBALDconsult wrote: stupid question it may be) -- creating an in-memory structure to hold a whole
lot of data.
Yeah, true. Cuz you NEVER need to do that . Maybe not that AMOUNT of data, but...
|
|
|
|
|
Such a vague scenario is pretty much impossible to answer, because how you'd go about it depends on where the data has patterns. For example, if IDs are allocated in blocks, use a range based structure. If it will always be very sparse, store only the 'hits'. If the search space is likely to be over 50% full, you might want to use an inverse (i.e. store the empties).
License plates are a clearly unrealistic scenario because the number you need to store will only be of the order of 100m at most and therefore you can just use a conventional database (even at 1kb per record you're still only talking 100GB which is within the capacity of a normal computer, never mind an institutional server).
9GB is not big enough to be a problem, as long as you don't hook it all into memory, so I'd just dump the whole bit array to a file, in sequence, and use seek to look in the right place (and poke a bit back when a plate is allocated), if a database was disallowed and there was no aggregation or querying requirement.
Edit: obviously, if I was writing this system for real, I'd just use a database, heh.
|
|
|
|
|
I'm confused: you say the specific question asked was specifically about checking available license plates, but, then, you say the question was about data structures, and databases are "irrelevant:" did the interviewer(s) tell you the question was about data structures, and exclude "databases," specifically, or not ?
Or are these conclusions that you inferred from the interviewer(s) question ?
imho any perceived ambiguity in the interviewer(s)' question "opens the door" for you to follow-through with clarifying questions that attempt to frame the question asked in terms of real-world scenarios: like, as mentioned in a comment above, that the largest possible number of license plates for any one real state (in the US) would be far less than 78 billion ... as in California's 37 million, of which we can safely assume that many do not have a driver's license.
My answer would have been to say straight out: that storing the total possible set of all possible variations of license plates was absurd, no matter what data structures would be used, and that the database should store only the plates already allocated, and use a function to generate new ones at random, or new ones that meet some predefined criteria like having the letters "IAMOK" somewhere in the sequence (similar to the custom tag services offered by many states for which a fee is charged).
If the interviewer(s) then altered the question, so it became, in general: how would you store very large amounts of data: I would have replied that to answer that question requires a consideration of the internal organization of the data, and its usage, and storage (centralized vs. distributed), and there is no "one-size-fits-all" answer, and that a good programmer becomes intimately familiar with the internal organization of the data, its real-world "incarnation," and how the data is used in the real world, and then selects data structures based on several criteria, including efficiency of storage, compatibility with existing hardware and network constraints, the speed of access that end-users will demand/require, etc.
And then, I'd probably walk over to the food-stamp office, to collect my monthly allocation, and think about the next job interview
best, Bill
"Our life is a faint tracing on the surface of mystery, like the idle, curved tunnels of leaf miners on the surface of a leaf. We must somehow take a wider view, look at the whole landscape, really see it, and describe what's going on here. Then we can at least wail the right question into the swaddling band of darkness, or, if it comes to that, choir the proper praise." Annie Dillard
|
|
|
|
|
BillWoodruff wrote: I'm confused: you say the specific question asked was specifically about
checking available license plates, but, then, you say the question was about
data structures, and databases are "irrelevant:" did the interviewer(s) tell you
the question was about data structures, and exclude "databases," specifically,
or not ?
The question was to determine how you think when large amounts of data are involved. Do you think you would not have to solve such problems if you were working at say Google? or Amazon?, etc.
Your response is actually pretty similiar to the majority I have gotten . "This problem is stupid", "there aren't that many license plates in the entire world", etc. rather then thinking of the problem in generic terms. For example.. oh hey, maybe this problem is a variation on the traveling salesman problem, or the backpack problem and I can apply that solution .
That was the point of the exercise , but everybody seems to be focusing on the license plate aspect of the question .
modified 3-Feb-12 12:31pm.
|
|
|
|
|
SledgeHammer01 wrote: The question was to determine how you think when large amounts of data are involved.
SledgeHammer01 wrote: everybody seems to be focusing on the license plate aspect of the question I think the very simple reason for the focus on the "license plate" issue is: that's what you told us the actual question was.
Now, if you had told us the question actually asked was:
"We want you to tell us the strategies you would employ for storing very large amounts of data, as Google, and Amazon, do: you might use as an example, to make this question more specific: the issue of a license-plate management system where all permutations of the sets ... blah and blah ... are ... blah ... blah ... blah ... and an immediate task is the allocation of an unused license number.
Then, I think you would have received a very different set of replies.
To that (hypothetical) question, I would have replied: "what is the correlation of adding an unused license number of a car from a set of all permutations, to adding a new user, book review, to Amazon, or to adding a new link-listing to Google: are you asking about the issue of generating unique identification indexes in master databases (GUIDS, or some hash) ?"
Note my assumption (which you have every right to be skeptical about) that: in technical interviews, where you know the people interviewing you are "smart," and informed about what they are asking you about: that they respect someone who responds to vague, overly-broad, questions, by pro-actively demanding they be clarified: in the process of which you reveal your awareness of the issues and factors that surround real-world implementations.
Also, I believe a "challenging interviewee" ... who shows intellectual balance in the act of challenging ... and demonstrates emotional equanimity in the art of doing so ... is impressive ... or at least would impress persons at companies I would wish to be employed by.
But, I would make these distinctions: HR "screening" interviews are inappropriate places to be challenging: they ask "mush," and should be given back whatever "mush" they want to hear. "Rubber-stamp" interviews with managers, when you've been offered the job based on the real technical interviews, are also scenarios where you just want to "go with the flow," and appear as a "good citizen" who looks forward to "learning the job."
"Float like a butterfly, sting like a bee." Muhammad Ali
Where'd I put those food stamps
best, Bill
"Our life is a faint tracing on the surface of mystery, like the idle, curved tunnels of leaf miners on the surface of a leaf. We must somehow take a wider view, look at the whole landscape, really see it, and describe what's going on here. Then we can at least wail the right question into the swaddling band of darkness, or, if it comes to that, choir the proper praise." Annie Dillard
|
|
|
|
|
BillWoodruff wrote: I think the very simple reason for the focus on the "license plate" issue is:
that's what you told us the actual question was.
That's my whole point. The question given was very specific. A senior / principle / lead software engineer who makes $120k to $150k / yr should not have to be told that he is allowed to generalize a problem or how to generalize it . I'm sorry if that comes out bad, its not intended as an attack on you or anybody else. Its just how they expect you to think. Think of the problem in generic terms and then, hey, maybe you can do something specific for this type of data that wouldn't work for other types of data, even in the same problem type.
BillWoodruff wrote: To that (hypothetical) question, I would have replied: "what is the correlation
of adding an unused license number of a car from a set of all permutations, to
adding a new user, book review, to Amazon, or to adding a new link-listing to
Google: are you asking about the issue of generating unique identification
indexes in master databases (GUIDS, or some hash) ?"
Again, you are focusing on specifics . None of that matters to the hypothetical question. It's just "given this data and the vast amount of it, how would you store / work with it?". Thats just a question to see if you come back and say "I can't, its too hard", or "I can do it, but my algorithm requires 10TB of hard drive space and 5 days of processing per request". Maybe Amazon and Google don't issue license plates today, but they are planning to in the future? It doesn't really matter, that wasn't the point of the question.
One responder came back and said he would just store a flat list of used license plates. Ok. Awesome. Lets say 50B of the 78B plates were taken. Thats 50B x 6 bytes = 279GB. Unless you maintain the list in sorted order, your search time to check a plate # could be O ( n ). 279GB and worst case O ( n ) for only 64% of the data. Through discussion, we have already found a solution for O ( 1 ) for ALL operations and 9GB worst case storage requirements for 100% of the data and actually, you would only need to read/write a SINGLE BYTE out of the 9GB file so your memory requirements are 0. What's the better solution?
BillWoodruff wrote: Note my assumption (which you have every right to be skeptical about) that: in
technical interviews, where you know the people interviewing you are "smart,"
and informed about what they are asking you about: that they respect someone who
responds to vague, overly-broad, questions, by pro-actively demanding they be
clarified: in the process of which you reveal your awareness of the issues and
factors that surround real-world implementations.
You are certainly allowed to ask questions. HOWEVER... are you a sr. / principle / lead software engineer in real life? I am, and I can tell you that if my boss came to me and asked me to implement feature X and I started asking him to clarify / solve the problem / lead me to water / etc. whatever you want to call it... he'd probably get annoyed and say "I'm paying you to solve the technical problems, thats your job, its not mine". That is actually true. As senior / lead software engineers, its our job to take a problem and make it work .
However, since you are skeptical about how Google interviews and probably think I made this whole scenario up ... I *DID* really have an interview there and one of the questions was "we have millions of servers in production running millions of queries & processes per second... lets say your piece was not working properly (memory leak was the exact scenario), how would you go about debugging it?"
Me: If I did not instantly know what the issue was, I would first try to reproduce it on my own 1 or 2 PC test environment
Him: Lets say it was not an issue that showed up on 1 or 2 PCs and you needed say 10,000 PCs to see the issue?
Me: I would put logging information in the suspect areas
Him: What would you log?
Me: memory allocations and deletions
Him: Ok, but remember, we are talking about 10,000 machines doing millions of allocations and deletions, do you really think you could make sense of a log file with billions of entries?
Me: good point (chuckle)
Me: Well, I've always been a big fan of what Microsoft did with MFC. They overloaded the new and delete operators and kept track of allocated memory in a linked list or whatever and when the process would exit dump out the list of leaked memory and its contents. That way it would solve the billion entry log file issue and you would only get a log file with the leaks and nothing else and your code wouldn't be cluttered with logging writes since it all happened automatically.
Him: Cool, that would work in theory, but remember, these are production servers and you are not allowed to deploy test code to production servers since a programming error could bring down the entire site.
Me < at this point, I started to get annoyed and headed down your road >... hmm... well, I'm assuming you have a smaller scale test lab or a procedure in place for testing scenarios like that?
Him: Yeah, we have a test lab with 1000 machines, but your leak or issue only shows up with 10,000+ machines
Me: And you don't have a procedure in place for testing large scale issues?
Him: Yeah, we do. Assume we don't and that you are responsible for coming up with one.
etc. and it went on and on like that.
|
|
|
|
|
Hi, thanks for taking the time to respond in such depth: I find your answers fascinating.
I think the "dis-connect" in our conversation is that I am assuming you are being interviewed by your technical peers, other programmers, for a position on their team; from what you have told me, now, I believe you are describing a situation where you are being interviewed by project managers as a technical lead, and you wouldn't have gotten to that interview without a very impressive resume. And I wonder if you would have gotten to that interview without first being interviewed by the programmers you would "lead."
But, the question you describe, does not seem to me the type of questions I'd expect project managers to ask. But, my experience is based on a reality twenty or more years ago, so it should be discounted heavily.
I was asked in a phone interview at Microsoft maybe ten years ago, for a technical documentation lead position in a certain area of .NET, by the manager of the group: "tell me about object-oriented programming ?"
When I responded: "that's such vast area we need to narrow the focus down to have any meaningful discussion ... I mean are we talking history ... Xerox Parc, SmallTalk, Eiffel, Yourdon, Gang of Four ... or are we talking OOP as implemented in .NET with an emulation of multiple inheritance via Interfaces, or are we talking theory of n-Tier development as found in modern "business applications" ?"
His response was to act as if my response was totally stupid But, I sensed he was bored anyway, and was only doing the interview as a favor to his boss who a friend of mine at MS had prevailed upon to grant me the interview: so I think it was "rigged game."
We speak of two very different scenarios ?
best, Bill
"Our life is a faint tracing on the surface of mystery, like the idle, curved tunnels of leaf miners on the surface of a leaf. We must somehow take a wider view, look at the whole landscape, really see it, and describe what's going on here. Then we can at least wail the right question into the swaddling band of darkness, or, if it comes to that, choir the proper praise." Annie Dillard
|
|
|
|
|
BillWoodruff wrote: I think the "dis-connect" in our conversation is that I am assuming you are
being interviewed by your technical peers, other programmers, for a position on
their team; from what you have told me, now, I believe you are describing a
situation where you are being interviewed by project managers as a technical
lead, and you wouldn't have gotten to that interview without a very impressive
resume. And I wonder if you would have gotten to that interview without first
being interviewed by the programmers you would "lead."
The Google interview process is 15-30 min HR prescreen, then a 1hr technical phone screen by phone w/ a random Sr. Engineer @ Google who is a "certified interviewer" (whatever that means) and then they fly you in for an all day interview barage. All with "certified interviewers" Sr. Engineers. All the interviews are technical and ask you data structure / algorithm type questions only. None of this "give me the difference between delete and delete[]" garbage, but questions like I have given you. Afterwards, the 8 people who have interviewed you submit a thumbs up or thumbs down. I don't know what % of thumbs up you need to get "voted in", but I did not get it
I have had similiar experiences at Amazon, etc.
That is why I asked CP if they had good data structures for dealing with large data sets .
Obviously in a real world solution, you would store license plate #, make, model, owner, etc. in a database, but its more a "how do you think?" type question. You don't get disqualified because you don't come up with the perfect solution, often in these interviews they will lead you to what they want to hear as I have shown you with my Google interview . Its a process designed to weed out guys who are close minded / can't or won't think outside the box.
Reason why is think about it... lots of people know C++ and/or C#. I know both languages, I'm assuming you are an expert at both as well. Meaning you know the syntax by heart, a lot of the APIs, pointers, classes, etc.
There is a big difference between the guy who only knows how to code vs. the guy who maybe isn't as strong as you in the coding, but is awesome at DESIGNING an algorithm and then handing it off to you or me to implement.
Google, Amazon, etc. want the type of guys who can design & invent.
|
|
|
|
|
Once again, your patient reply, is appreciated !
Actually, I don't know C++, and I consider myself only at "journeyman" status with C#, but I was once a guru-level PostScript programmer who ended up (where else), at Adobe.
It's funny: at Adobe the programmers who could really produce, on the application side (PhotoShop, Illustrator, Premiere, etc.) were often hard-core programmers who were self-taught, and the CS graduates who were "ace" at algorithms often turned out to be "dead wood."
But, then, there were "geniuses" like Mark Hamburg who transformed PhotoShop into its "post-Knoll" incarnation: I was once talking to Mark, and the topic of what he majored in college came up: he told me he majored in math because he had already read, and understood, all of Knuth's books in high-school, and felt CS had little to offer him. No boundaries in that man's mind ! He was later awarded the Gordon Moore prize by his Silicon Valley peers for his remarkable accomplishments.
The four-person team that created the Acrobat prototype from "zilch" (I was one), were all new hires from a company Adobe acquired, and none of us had formal CS background But, please, don't blame me for what Acrobat became Our "skunkworks" project, under the direct supervision of John Warnock, was under constant attack by other groups within Adobe who considered us, I guess, as "barbarians at the gates"
But, a key difference I saw in the programmers who were so productive on the application side was that they intimately understood the quirks and features of the hardware/OS combination (Mac or Windows) they worked on. On Windows: they understood COM; on the Mac the understood things like custom window-defs, and its intricate, arcane, convoluted system of interacting system rom calls.
And, the fresh CS graduates often had no clue to inner architecture and hardware of personal PC's. Why should they: back in those days they did their homework on terminals wired up to mainframes, usually DEC.
There are people who have strengths in both algorithms and coding, as well as great depth in specific platforms: I have great respect for them !
What I don't understand is how Google ever got Jon Skeet to jump ship from C# ! Skeet, to me, is the guru's guru in .NET and C#.
cheers, Bill
"Our life is a faint tracing on the surface of mystery, like the idle, curved tunnels of leaf miners on the surface of a leaf. We must somehow take a wider view, look at the whole landscape, really see it, and describe what's going on here. Then we can at least wail the right question into the swaddling band of darkness, or, if it comes to that, choir the proper praise." Annie Dillard
|
|
|
|
|
Well, I don't really put much weight into having a formal CS degree. I have one, but I didn't learn how to program in college. I was self taught before that. I went to college right before the DOT COM bubble, so the school didn't really care about computers at the time. Everything was taught in VAX ADA (!) and there was only a 1 unit C++ course.
I worked with a self taught guy who never went to college, but could freestyle regex off the top of his head and get it perfect and could look at a disassembly of your application and figure out within a day or so how to crash it in various ways (via specially formatted user input type attacks I mean). I would hire him in a second to do security testing, etc. but I would never hire him to write an app because he could really care less about having a tool usable by other people besides himself.
I came to the conclusion a few days ago, that interview styles go through certain phases...
* in the early 2000's, Windows interviews were all about MFC & COM and asking you the stupid questions like "what class would you use for this?", "what class would you use for that?" They essentially expected you to memorize the MFC framework.
* in the early-mid 2000's, all the interviews were hitting heavy on design patterns. Singleton, class factory, etc.
* in the mid to late 2000's, they moved on to asking C# .NET questions and expected you to have a solid understanding of the entire framework
* 2009 - 2011, they wanted you to know C#, Winforms, ASP.NET, asp.net web services, JavaScript, html, etc. They wanted desktop + web hybrid guys
This time around, the trend I'm seeing is that very few places are doing desktop apps, so all the C# jobs are web and expect you to know the 15 to 20 web technologies at the expert level.
If you aren't a web guy, the only other jobs are server, mobile, linux type jobs. Those are all in C++. Since there is no .NET in the C++ world, they are asking these types of data structure questions again. Back when C++ was the main language, there wasn't any libraries like this and you had to invent everything yourself. MFC standardized a lot of the libraries.
Oh, I guess there is STL as well. They want you to know that now too. Lol...
|
|
|
|
|
Hello,
If i my application handles an event in the middle of other code running, does it return to previously running code after the event was handled? if not, how can i do it?
|
|
|
|
|
If you are writing a GUI application, the main GUI thread is event driven and will handle an event and execute all code for that handler before returning to the code that raised the event. If you are writing a multithreaded application, the background thread works independently of the main thread and would not be interrupted unless its listening to events as well.
|
|
|
|
|
AFAICT your suggestion has two major drawbacks, memory-wise:
1. it repeats the leading symbols over and over;
2. for each new chunk of license plates it costs a pointer.
I'd consider an N-tier approach; for N=2 that would be some data structure for the rightmost say 3 symbols, and then a second layer of structure to hold all of those, representing the remaining symbols.
The first one could be a run-length as you described, or a linked list, or switch from one to the other scheme depending on the population.
The second one could be a full array, or maybe a dictionary.
Maybe a 3-tier approach would be optimal: an array for 2 symbols (36^2 elements), pointing to an array for 2 symbols, pointing to an array for 2 symbols, holding a 36-bit bitmask for the last symbol. (Yes it would work better if there were only 32 symbols in the alphabet used).
Note: the "pointers" above don't have to be real pointers, they could be indexes into a huge array, requiring say 4B each rather than 8B.
And the simplest approach that might be good enough would be a Dictionary<string,long> where the key holds the first N-1 symbols, and the value is a bitmask covering the last symbol.
PS: this is what I was going to reply to your vanishing question in Algos
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
Lol... saw that forum was dead, so I moved it over here .
|
|
|
|
|
That is exactly how forums die.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
Luc Pattyn wrote: That is exactly how forums die. "Not with a ! (bang), but by forum transmigration," T.S. Elliot did not write in "Hollow Men."
"Our life is a faint tracing on the surface of mystery, like the idle, curved tunnels of leaf miners on the surface of a leaf. We must somehow take a wider view, look at the whole landscape, really see it, and describe what's going on here. Then we can at least wail the right question into the swaddling band of darkness, or, if it comes to that, choir the proper praise." Annie Dillard
|
|
|
|
|
1. your threads do whatever you make them do, and will continue to do so.
And events get handled by either the main aka GUI thread, or one of the threadpool threads. See here[^].
2. If you have long winding code that executes on the main thread (e.g. in a button click handler), then other GUI events (and Windows.Forms.Timer tick events, and more) will be stalled. That basically is why long winding operations should NOT be executed by the main thread.
3. If other threads need to access WinForm GUI components (Controls), you need to use Invoke. See here[^].
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
Beware of Application.DoEvents() ! Using such a terrible line of code can cause chaos, e.g. a second click on the same button can "overtake" the first click somewhere in the middle of execution.
While you can avoid those calls in your code, you'll never be safe when you have to use ThirdParty (often closed-source) code.
|
|
|
|
|
ok.
can i raise event in a thread (Task) and handle this event in the main thread? so task will keep running when event is handled?
|
|
|
|
|
Not exactly (event handlers are always called in the same thread that called them) but if you're talking about a GUI application you can have event handlers that marshal the call to the UI thread if they're called on a different one:
void MyEventHandler(object sender, EventArgs ea){
if(InvokeRequired) {
BeginInvoke(new EventHandler(MyEventHandler), new object[] { sender, ea });
return;
}
}
Using BeginInvoke (instead of Invoke) causes the call to be asynchronous, i.e. the calling thread continues executing while the event handler runs. This is what you requested, but be aware of synchronisation issues with any form of parallel execution. Because you usually don't care about the result of a delegate call, you can ignore the return value and not bother calling EndInvoke at any point.
|
|
|
|
|
OK.
So what will happen if i call an event that is handled in the main thread (not UI thread)?
|
|
|
|
|