|
I prefer anything to be done at the insert stage. So, that's taken.
Storing null values depends if it is an object or a list. In case of an class object constructor may demand all values and hence the null value. It can be skipped if possible.
Doesn't any implementation exist in this format? A very practical example will be Configuration Plans which is time bound. We may only want to save the planned changes and not at every stage.
|
|
|
|
|
Som Shekhar wrote: Doesn't any implementation exist in this format?
I don't know of any in the Framework, but then again, I don't know the entire framework by heart yet.
Som Shekhar wrote: We may only want to save the planned changes and not at every stage.
Like I said, memory is cheap these days - and I doubt that you'll save a lot by inserting null values for a non-changed field. These kind of constructs are very common in SQL Server, where one would use a query to recreate the full list. The query does the lookups (over a speedy index) and you can read it as if it were the original table.
..but in-memory? Short lists should just store the values in a redundant way, and long lists shouldn't be kept in memory at all.
I are Troll
|
|
|
|
|
Eddy Vluggen wrote: Like I said, memory is cheap these days
Memory is not a trouble. Speed is.
This is for a single such list. When this kind of list exist in 100s and each has to be going through a for loop with multiple calculations in between, the calculations go in 0.2-0.3 sec range. this is not too much to look at but with drag and drop, 0.2-0.3 lag is not acceptable. I am just trying to go to 0.05-0.06 sec range.
These kind of change cannot come by simply tweaking a line or two. It can only come by changing the whole method of searching records.
By the way, this whole thing is not even happening in the server/sql. its happening in the memory... hence the speed trouble.
|
|
|
|
|
Som Shekhar wrote: By the way, this whole thing is not even happening in the server/sql. its happening in the memory... hence the speed trouble.
The good news is that memory is usually faster to access then a server
Som Shekhar wrote: each has to be going through a for loop with multiple calculations in between
The only option that I'm aware of is to precalculate the missing values. That way you only have a lookup in a list, which is quite fast. So, instead of inserting the nulls for the unchanged values, insert the value.
That would move the cost for inserting from the "read from the list"-part to the "add to the list"-part of the code.
I are Troll
|
|
|
|
|
If you read the modified question, i am saying an index along with as where to lookup again next time.
Saving values instead of null may have problems later one when more values are inserted or removed. The whole purpose of the list is to save the value where changed.
|
|
|
|
|
Som Shekhar wrote: Saving values instead of null may have problems later one when more values are inserted or removed. The whole purpose of the list is to save the value where changed.
It sounded like the main issue was to have a list with values that you can query at a high speed. If you save all the values, then you won't have the problems you mentioned when removing an item;
Col1 Col2 Col3
100 200 300
100 200 100
100 100 100 Remove the second line, all data will still be correct. On the other hand, if you store the null-values (to indicate a non-change), then you might run into trouble;
Col1 Col2 Col3
100 200 300
? ? 100
? 100 ? If you remove the second line now, you'll get this;
Col1 Col2 Col3
100 200 300
? 100 ? Which decodes to this when you try to track back the changes;
Col1 Col2 Col3
100 200 300
100 100 300 As you can see, the last item has changed. I don't see the advantage of storing only the changes in this particular structure, only more challenges
It seems that I don't understand the question well enough to provide you with an answer.
I are Troll
|
|
|
|
|
Hey there!!!
I really appreciate your efforts.
But the question is to know if I am missing something fundamentally? If there is an implementation already present? Like we have Hashtables, List, Dictionaries for various purposes. Is there any other tool that I missed which can handle such a case?
Or, may be there could be a need to develop such a list which could only record changes, do all the calculations internally, fast enough to match those of dictionary/indexed methods.
In any case, thanks once again.
|
|
|
|
|
Som Shekhar wrote: I really appreciate your efforts.
Nice to see someone who's biting into a subject, instead of just asking for code
Som Shekhar wrote: But the question is to know if I am missing something fundamentally? If there is an implementation already present? Like we have Hashtables, List, Dictionaries for various purposes. Is there any other tool that I missed which can handle such a case?
Not that I know. Yes, we got generic lists that can take all kinds of data, and we got observable lists that give you a notification if anything changes. But no list that's specialized in doing an incremental save.
I think that most of us would cache the result, storing redundant values. It's a waste of memory, I know, but we often make these kind of trades. If you got some spare CPU-time, then it might make sense to add this optimization. You'll lose a bit of speed reconstructing the data at a particular index, but return you'd have some extra memory.
The guys who work with Windows Mobile might have more experience with this, as they have less resources and actually need to think about using them effectively. On my desktop, I don't mind wasting a megabyte or so, if it means that I can spend my time on more critical issues.
Som Shekhar wrote: Or, may be there could be a need to develop such a list which could only record changes, do all the calculations internally, fast enough to match those of dictionary/indexed methods.
Again, recalculating the data will (logically seen) cost more processor-time than just reading it. Then again, the time that it takes might be neglectable, and it may also be true that you win back a fair amount of memory. That would depend on the amount of data, and the amount of 'holes' that one has to move through to get the 'last known values' for the columns on that particular index.
At the start of this thread, I would have advised against that on the assumption that there's not much to gain. I'm not that sure anymore. The only way to get a definite answer is by building a prototype and measuring the results. Therein lies another consideration; would it be worth to spend the time on building such a prototype?
I are Troll
|
|
|
|
|
Eddy Vluggen wrote: Nice to see someone who's biting into a subject, instead of just asking for code
Coding is easy. Concepts are difficult to grasp. If you know the direction, you can reach anywhere. If you only know the target,god save you.
Eddy Vluggen wrote: It's a waste of memory
Memory is not really an issue. I am building an application for a bigger use and hence using all kind of hardware resources onto it. I can tell my clients to use better hardware. This means a good speed CPU and a good amount of RAM. Hence I really don't mind 1-2 MB extra here.
I am already looping to create lookup-ready directory. Hence that is already covered.
As I mentioned, The trouble comes when multiple of such calculations happen together. I am currently working on multi-threading of different instances. Atleast to save some more time.
Let me give you a link of another problem that i posted. You would see the use of such a datatype there.
http://www.codeproject.com/Messages/3304858/What-will-be-the-height-of-fluid-columns-in-a-vari.aspx[]
In this problem, calculation of fluid height is needed. There are multiple fluid columns and many such tubes. With drag and drop functionality
Usually working with already implemented concepts is always better. Consider using a dictionary vs. implemented List with key.
Eddy Vluggen wrote: would it be worth to spend the time on building such a prototype?
You would be surprised that i have come across such situation more than 4-5 times already while designing my applications. I usually work on disconnected database system and speed is a primary concern in loading and saving data.
I initially worked with datatables which worked fine when my application was young. As it grew older, datatables are damn slow. I moved to dictionary. So far, they are fine. Even today, i experience a max lag of 0.5-0.6 sec on a drag drop operation which isn't too much to worry about.
By multi-threading, i hope to reduce it to around 0.1-0.2 which should be manageable. But it is good to keep up with concepts.
Usually a parallel solutions does wonders and thats what I was hoping here.
|
|
|
|
|
Som Shekhar wrote: Coding is easy.
I'm looking at a buglist right now which tells me that it's not as easy as English.
Som Shekhar wrote: As I mentioned, The trouble comes when multiple of such calculations happen together. I am currently working on multi-threading of different instances. Atleast to save some more time.
Have you seen the article[^] on the AForge.Parallel.For -class? It might help in building a prototype to measure against
Som Shekhar wrote: In this problem, calculation of fluid height is needed. There are multiple fluid columns and many such tubes. With drag and drop functionality
True, but it would also make an impressive interface
Som Shekhar wrote: Usually working with already implemented concepts is always better. Consider using a dictionary vs. implemented List with key.
I'd try to mirror the concept of a database in-memory; creating a list of the records, and the equivalent of an index. IQueryable[^] springs into mind.
Som Shekhar wrote: You would be surprised that i have come across such situation more than 4-5 times already while designing my applications. I usually work on disconnected database system and speed is a primary concern in loading and saving data.
I initially worked with datatables which worked fine when my application was young. As it grew older, datatables are damn slow. I moved to dictionary. So far, they are fine. Even today, i experience a max lag of 0.5-0.6 sec on a drag drop operation which isn't too much to worry about.
This post[^] confirms that although databases manipulate data very fast, your results are faster.
Som Shekhar wrote: By multi-threading, i hope to reduce it to around 0.1-0.2 which should be manageable. But it is good to keep up with concepts.
Usually a parallel solutions does wonders and thats what I was hoping here.
One could consider multiple ways to optimize, and I'm sure that would be some creative ways that'd get posted to do so. Using the Parallel.For class to lookup all the elements could be a good start.
Another, perhaps better implementation yet, would then be a readonly list, to describe a table like presented below. Instead of writing a null, your could launch a short lived thread to calculate it's distance to a yougest version in the list, that disctince gives you the index of the value that it actually stands for. This should be done when you load the data; you'd have to process it a bit of a time decoding it, but that also shortens the amount needed to retrieve data from that list. This would be an optimization on the readprocess, as you can forget about fetching it at all if it's really there. Moving this particular task to the method that's doing the initalization, lookups would be faster. The initialization-routine could also be (ab)used to dynamically enrich your data, if that would be required.
You could then do parallel lookups, each lookup falling back on it's PK - A Perhaps a HashTable<key, record="" [as="" struct!]="">. You'd would then already be pointing at all the correct values, for all correct columns, without having to worry for corruption. That's as long as the read is readonly and easily accessible by threads.
I'm of to bed, this kept going through my head all the time. I wonder if I'm now gonna dream about it?
I are Troll
|
|
|
|
|
Parallel.For looks promising. Will dig into it.
Currently, since i have already implemented multithreading, I guess no need to implement that for now.
Eddy Vluggen wrote: True, but it would also make an impressive interface
Oh you bet. These days, looks may not be everything but that is what sells the first.
I guess, that is it for now... I gotta be happy with multithreading for now. Since no other implementation already exist in this area.
It was great having some meaningful conversations. 
|
|
|
|
|
Som Shekhar wrote: Oh you bet. These days, looks may not be everything but that is what sells the first.
Sad, but true.
Som Shekhar wrote: I guess, that is it for now... I gotta be happy with multithreading for now.
You're dividing your workload over multiple CPU's, there's not much room for improvement there. If you get unhappy in the future, try Brahma[^], that would give you the option to offload some work from the CPU to the GPU, abusing the graphics-card.
Som Shekhar wrote: It was great having some meaningful conversations.
Yup, engaging a conversation is simply more fun than posting an answer. Good luck with your venture
I are Troll
|
|
|
|
|
Brahma looks interesting!!!
Now, that we are talking of multi threading, why is it that we need to code for multi-threading?
If you look at any operating system that can support multiple processors, it automatically distributes work onto different cores. Can't there be a framework which can do the same without the need to code differently?
If a user defines a program priority as "High" or "RealTime" it only increases the share of thread time to current process. But no change in threading...
Am I missing something?
|
|
|
|
|
Som Shekhar wrote: Am I missing something?
The perversion to launch a new thread from a Visual Basic 6 application, I hope
Som Shekhar wrote: If a user defines a program priority as "High" or "RealTime" it only increases the share of thread time to current process. But no change in threading...
..and "realtime" isn't really realtime, but just the name of the highest level of priority.
Som Shekhar wrote: If you look at any operating system that can support multiple processors, it automatically distributes work onto different cores.
Though it feels that way, it's an illusion. A program is made up of a logical set of commands/instructions that get executed one after another. That's reflected in our applications; we expect that the second instruction won't be execute before the first instruction. A short example;
10 A$ = "Hello"
20 B$ = "World"
30 PRINT A$ + " " + B$ These three lines of code should be considered atomic, meaning that you don't want to distribute them over 2 different people to interpret. This is a task that can't be divided. One processor had access to it's own cache, and it's memory. Windows was created and started to fake multitasking. Applications would run (to line 20 in our example), get thrown into the deep-freeze, another application would be defrozen and run, ad infinitum. Do that very fast, and it seems to become a fluid movement.
Threads we're already there; it was preferred to launch your own thread instead of spawning a new process if you needed to do some additional tasks. Using a thread would cost less resources and they behaved like an additional proces, owned by some other (main)-thread. Fibers were introduced also, but those never gained popularity.
You wanted this processing to happen in "some other place" than the thread that ran your interface. Every Windows-application has a method that's called "WndProc", which Windows calls now and then to inform your application of mouse-movements that have occurred, or that certain parts of the form need to be repainted. Let's extend our example application;
10 REM Example :)
20 REM
30 WndProc:
40 MSG = GWBASIC_INTEROP.GETMESSAGE()
50 IF (MSG = WM_UIT) THEN
60 GOTO THE_END
70 END IF
80 IF (MSG = WM_PAINT) THEN
90 GOSUB SAY_HI
100 END IF
110 GOTO WndProc
120 SAY_HI:
130 FOR X = 1 TO 100
140 PRINT "Hello World, number " + X
150 NEXT X
160 RETURN
170 THE_END:
180 There's a loop that processes the messages, and there's code. This meant that if the processor was doing line 140, it would get frozen there in the middle of the job. This, as a consequence, means that the application wouldn't accept a "quit" message until it's finished doing those 100 iterations!
A thread is gets frozen with it's state, that's the reason why it's "illegal" to write into memory that another thread is working with, and the reason why the mainthread of any application is reserved to handle the UI.
The Parallel.For is an abstraction that creates multiple threads (let's take 4 as an example) to run a loop. One of the pre-requirements is that they shouldn't share variables that could mess up the way they work (because one has X=3, on X=4, and two have X=5). They should also say hello to the mainthread, before changing any of it's values. This model scaled to multiple processors.
SQL Express is limited to using a single processor, whereas SQL Server goes as far as making the processor-affinity a mere setting. Some applications still do their processing in the UI, easily recognizable by the white space that they show where a form should be. It's not a perfect situation, but it's often hard 'enough' to make an application run correctly with a single path-of-execution.
There is indeed a growing need for extra tools. The .NET Framework has a BackgroundWorker which makes it easy to manage a new line-of-execution, and you'll often find an Asynch-version for a method-call.
Som Shekhar wrote: Brahma looks interesting!!!
Sure does - there's a lot of GPU's onboard of the motherboards in the office without being used very much
I are Troll
|
|
|
|
|
Topic is long over, I am just loving the conversations
Eddy Vluggen wrote: and "realtime" isn't really realtime, but just the name of the highest level of priority
Yes, I know that... I guess you would know that Microsoft has told that one shouldn't use "RealTime" in their applications as it will freeze the OS. I wonder if they created a check to freeze the system if someone used "realtime" instead of letting the processor do the job.
Well, you are right about the process and threading. I agree fully when it comes to the concept of threading vs processes.
My question was a little different. I know that two processes are resource heave and thread do just the jobs quite well.
Let me try to suggest my concept here.
Lets say we got two classes "Car" and "Bike". Car has its own methods and so does the bike. We create two objects "Car1" and "Bike1" If these two are there in an application, all internal methods could be handled through a new thread and thus they will always be thread safe. Even two objects "Car1" and "Car2" will always be thread safe.
Instead of programmer creating such new threads and their completion events for each of the methods, the framework could automatically run them on new threads.
Is it that I have a plan for a new programming language? Am I talking weird?
|
|
|
|
|
Som Shekhar wrote: Topic is long over, I am just loving the conversations
Ditto, but we'd better move to the soapbox, or email
Som Shekhar wrote: Let me try to suggest my concept here.
Lets say we got two classes "Car" and "Bike". Car has its own methods and so does the bike. We create two objects "Car1" and "Bike1" If these two are there in an application, all internal methods could be handled through a new thread and thus they will always be thread safe. Even two objects "Car1" and "Car2" will always be thread safe.
Instead of programmer creating such new threads and their completion events for each of the methods, the framework could automatically run them on new threads.
Is it that I have a plan for a new programming language? Am I talking weird?
Not at all, sounds like a convenient way to distribute the load. One way to do so would be by instantiating a BackgroundWorker and passing the Bike to the RunWorkerAsync[^]-method. You'd only need locks where the objects need to share data.
Those objects still need to be 'invoked' from somewhere, and that somewhere is most likely going to be the mainthread. It doesn't make much sense to create an Async-version for every method or class, since threads still cost performance. Creating a thread to change Form.Visible is going to be rather inefficient.
If the Bike is a webserver-kind of class, then yes, it's the correct pattern. If the Bike is a DataGridViewColumn , then it might be wiser to keep the code rather short and simple anyway. If you need a long-running process happening in such a place, then move it to a backgroundworker and have them signalling status.
There's two other interesting places to visit;- Rx extensions[^], since collections are another example where easy multhithreading makes sense
- This cheatsheet[^] might provide some valuable tips on optimizing. It made me think twice about progressbars
I are Troll
|
|
|
|
|
I have mailed you this time instead of replying here. Given my id too.
|
|
|
|
|
I do not believe that there is something built into the framework. There may be something in the big wide internet, but I'm not sure what I'd use to look it up.
Are the items all indexed concsecutively? If so, you will probably want to look for something that implements IList instead of IDictionary . The reason other people suggested Dictionary is probably because you said your list had "keys" and "values" implying possible non-consecutive indexes.
Making such a collection yourself is not that hard. I put together a sample of how I would do it. It turned out to ~400 lines, most of which were boilerplate. The Tuple class from .NET 4.0 would be helpful here, but you could create one yourself pretty easily. My sample class definition was:
public class DeltaCollection<T1, T2, T3> :
IList,
IList<Tuple<T1, T2, T3>>,
IList<Tuple<T1?, T2?, T3?>>
where T1 : struct, T2 : struct, T3 : struct
(I did it in VB, so I may have messed up the generic constraint syntax.)
Update
I played around some more and was able to get it down to ~200 lines +15-30 lines per tuple size (number of values per row) you want to support. Again, this would be benefited greatly by .NET 4.0's Tuple class, which had to be reimplemented to use in my class now. The main difference between this version and the original is that the Item property and foreach enumerator will now return Tuple<T1?, T2?, T3?> instead of Tuple<T1, T2, T3> .
modified on Wednesday, December 16, 2009 11:04 PM
|
|
|
|
|
I am working on 3.5 sp1. So, can't use Tuple for now.
Please read the comments in the modified question.
|
|
|
|
|
Well, you could make your own Tuple if you really wanted, but it looks like you already have a similar class to use instead.
Like Eddy said, for your situation, you will have to loop somewhere. Either you have to keep a continuous list with the calculated values for all indexes up to the highest one, which implies a loop on every item insert/change, or you have to perform a search for the next lowest item and calculate its values on item access. It sounds like you want array-like access performance, so you will probably want to take the first option of calculating the values on insert (as Eddy suggested).
If you will be accessing the items in sequence and not randomly indexing, you could get the best of both worlds by writing an iterator that keeps a "current value" and "index" to keep the calculation time minimal.
A quick search did not find any existing implementation for such a collection, but it should not be that difficult to roll your own (especially since you know exactly what features you want).
|
|
|
|
|
Gideon Engelberth wrote: Like Eddy said, for your situation, you will have to loop somewhere.
Yes. I am already doing that. That is how it being done currently even in the class. options are obvious that either at the insert or at the lookup time, looping is inevitable.
The point it that I want to achieve performance boost by some better technique instead of obvious looping.
hashtables improve the efficiency of dictionary. and similarly datatable uses indexing techniques. A datatable like structure could be created by using a simple dictionary also.
Since I have a basic functionality need, I only wonder if there could be very basic. Less the number of lines, faster the code. I am already working on reducing lines further by creating internal classes. Done most of it already.
I will try and refine that and may be post here later. I owe a lot to CP already.
Seems like there is nothing to save me here right now. In any case, thanks a lot guys.
|
|
|
|
|
Namaste Sri Som,
[edit : made array of "current value" nullable integers private and non-static after realizing that by making the array static you could only have one usable instance of the class : please remember this code was written in less than five minutes "off top of my head" : welcome any suggestions to improve it ! ]
Here's an idea off the "top of my head" :
1. use a class inheriting from a generic dictionary (we'll use an int [non-nullable] here for a key, but, of course, you could use something else as a key)
2. for the value part of each dictionary entry use a List of nullable ints List<int?> were going to make use of nulllable here to simulate "missing" data entries.
3. use an array of nullable ints inside the dictionary to track the current values you may need to replace "missing values" with : that gets us around the problem that a .NET Dictionary has no inherent "ordinality" you can rely on.
public class keyedNullableIntTable : Dictionary<int, List<int?>>
{
private int?[] referenceValues = new int?[3];
public void Add(int theKey, List<int?> theValues)
{
for (int i = 0; i < 3; i++)
{
if (theValues[i] == null)
{
if (referenceValues[i] != null)
{
theValues[i] = referenceValues[i];
}
}
else
{
referenceValues[i] = theValues[i];
}
}
base.Add(theKey, theValues);
}
}
Here's a sample test : assumes you have a form with a 'button1 on it to call the eventhandler :
private void button1_Click(object sender, EventArgs e)
{
keyedNullableIntTable t1 = new keyedNullableIntTable();
t1.Add(1, new List<int?> { 100,null,300 });
t1.Add(2, new List<int?> { null, null, 300 });
t1.Add(3, new List<int?> { 222, 333, 444 });
t1.Add(4, new List<int?> { 111, null, null });
t1.Add(5, new List<int?> { null, null, null });
foreach (var theEntry in t1)
{
Console.Write("key = " + theEntry.Key + "\tlist = ");
foreach (var listValue in theEntry.Value)
{
Console.Write(listValue + " : ");
}
Console.WriteLine();
}
}
I bet there's some really elegant way you could use Linq to get this done, too, but I am only a larva when it comes to Linq Hope I understood your question, and this is useful.
best, Bill
"Many : not conversant with mathematical studies, imagine that because it [the Analytical Engine] is to give results in numerical notation, its processes must consequently be arithmetical, numerical, rather than algebraical and analytical. This is an error. The engine can arrange and combine numerical quantities as if they were letters or any other general symbols; and it fact it might bring out its results in algebraical notation, were provisions made accordingly." Ada, Countess Lovelace, 1844
modified on Thursday, December 17, 2009 1:35 AM
|
|
|
|
|
This only solves the problem of creating the dictionary. Question is for retrieving them back.
I have added my comments to the question itself. There are some more areas to look at.
Please give it a read.
And yes. thanks for the attempt.
|
|
|
|
|
Namaste Suits just fine. However, you could do with a Hello too. (We are too used to "Hello World" anyways) 
|
|
|
|
|
Som Shekhar wrote:
"I have added my comments to the question itself. There are some more areas to look at."
Hi Som,
Have edited the first code example to make the internal list<int?> of "current values to replace with if incoming item# is null" private and not static.
Will be able to review your comments later tonight (I live at GMT +7 by the way) to try to understand what you mean by "retrieving them back" : isn't the test example I show in the code ... where the keys and list values are being read out in a foreach loop ... and printed to the console ... an example of retrieving back the values ?
If I want the 3rd. item in the List<int?> in the dictionary t1 which is accessed by the key #4 : and I access it via :
t1[4][2]
Isn't that retrieving ?
Namaste, Bill
"Many : not conversant with mathematical studies, imagine that because it [the Analytical Engine] is to give results in numerical notation, its processes must consequently be arithmetical, numerical, rather than algebraical and analytical. This is an error. The engine can arrange and combine numerical quantities as if they were letters or any other general symbols; and it fact it might bring out its results in algebraical notation, were provisions made accordingly." Ada, Countess Lovelace, 1844
|
|
|
|
|