|
Hey
I have a problem, I have a lot of files and each file contains a lot of strings with length of 4.
There might be double strings in each file.
I need a fast way to implement a check of how many different strings I have in all the files together.
I can't load all the strings into memory, there are many strings.
The number of different strings can arrive easily to 100 millions.
Clint
|
|
|
|
|
If you need a distinct count then you have to loop! You don't need to load the entire file into memory, you can use a stream to quickly scan the file.
100 million seems like a large estimate for the number of distinct 4 character strings. Are you really looking for an alphabet of 255 characters? (or worse if unicode)? If you really will have this many distinct records create a file of the number of distinct possible records * 4 bytes and store the count as an unsigned int in the file offset.
A man said to the universe:
"Sir I exist!"
"However," replied the Universe, "The fact has not created in me A sense of obligation."
-- Stephen Crane
|
|
|
|
|
That's even a small estimate.
I will have 2 on the power of 32 records maximum. I don't know if it is a good idea to build such a big file.
|
|
|
|
|
There is a difference between distinct records and records. If you want to count distinct records you must have a way of storing the count. 2^32 is roughly 255^4 which makes me wonder why it is even string data (they would more accurately be binary at this point)
"Got Ram?"
clint1982 wrote: I don't know if it is a good idea to build such a big file
A man said to the universe:
"Sir I exist!"
"However," replied the Universe, "The fact has not created in me A sense of obligation."
-- Stephen Crane
|
|
|
|
|
You are right, It's 4 character binary. I need an efficient way to count the number of distinct records,
|
|
|
|
|
Index the files then loop through to count. However, 4^255 is huge and you will not be able to store the results in ram so you will need a file structure just to store the results. Better if you could use small subsets at a time.
A man said to the universe:
"Sir I exist!"
"However," replied the Universe, "The fact has not created in me A sense of obligation."
-- Stephen Crane
|
|
|
|
|
Build a self building huffman tree to get the counts. It would be slightly slower than an incremental loop but would occupy 50% less space unless the counts are uniform.
A man said to the universe:
"Sir I exist!"
"However," replied the Universe, "The fact has not created in me A sense of obligation."
-- Stephen Crane
|
|
|
|
|
I have developed extensive socket handling routines in C/C++.
These routines read and write TCP blocks using my own compressed/encrypted protocols.
Some of these routines don't return until a full sequence of sends and receives is satisfied.
Actions are controlled using states with work loops broken by Sleep(20) calls.
When these routines are compiled into a C++ server with muti-threading, all works fine. You can have hundreds of simultaneous sockets in use all in various protocol states.
Now I'm developing the server in .NET and the I've simply wrapped the socket handling routines in a Class within a COM DLL.
I have to create the COM DLL Class once (since the socket handling routines communicate via an external socket number that the routines hand out) and then proceed to create threads in .NET to handle concurrent actions within the DLL.
When one .NET thread issues a call to the DLL that implements a Sleep loop internally, it locks up any calls to the DLL from any other .NET threads.
I have tried setting the module and thread apartment state on the COM handling to Multi-threaded mode in the belief this might help but no go...
Surely there is a way of multi-threading the DLL.
Thanks.
psernz
|
|
|
|
|
I read a book , "Thinking in C++".At pg:694( Introduction to templates) he writes:
<br />
"The Smalltalk solution. Smalltalk (and Java, following its<br />
example) took a simple and straightforward approach: You want to<br />
reuse code, so use inheritance. To implement this, each container<br />
class holds items of the generic base class Object (similar to the<br />
example at the end of Chapter 15). But because the library in<br />
Smalltalk is of such fundamental importance, you don’t ever create<br />
a class from scratch. Instead, you must always inherit it from an<br />
existing class. You find a class as close as possible to the one you<br />
want, inherit from it, and make a few changes. Obviously, this is a<br />
benefit because it minimizes your effort (and explains why you<br />
spend a lot of time learning the class library before becoming an<br />
effective Smalltalk programmer).<br />
But it also means that all classes in Smalltalk end up being part of a<br />
single inheritance tree. You must inherit from a branch of this tree<br />
when creating a new class. Most of the tree is already there (it’s the<br />
Smalltalk class library), and at the root of the tree is a class called<br />
Object – the same class that each Smalltalk container holds.<br />
This is a neat trick because it means that every class in the<br />
Smalltalk (and Java1) class hierarchy is derived from Object, so<br />
every class can be held in every container (including that container<br />
itself). This type of single-tree hierarchy based on a fundamental<br />
generic type (often named Object, which is also the case in Java) is<br />
referred to as an "object-based hierarchy." You may have heard this<br />
term and assumed it was some new fundamental concept in OOP,<br />
like polymorphism. It simply refers to a class hierarchy with Object<br />
(or some similar name) at its root and container classes that hold<br />
Object.<br />
Because the Smalltalk class library had a much longer history and<br />
experience behind it than did C++, and because the original C++<br />
compilers had no container class libraries, it seemed like a good<br />
idea to duplicate the Smalltalk library in C++. This was done as an<br />
experiment with an early C++ implementation2, and because it<br />
1 With the exception, in Java, of the primitive data types. These were made non-<br />
Objects for efficiency.<br />
2 The OOPS library, by Keith Gorlen while he was at NIH.<br />
16: Introduction to Templates 695<br />
represented a significant body of code, many people began using it.<br />
In the process of trying to use the container classes, they discovered<br />
a problem.<br />
The problem was that in Smalltalk (and most other OOP languages<br />
that I know of), all classes are automatically derived from a single<br />
hierarchy, but this isn’t true in C++. You might have your nice<br />
object-based hierarchy with its container classes, but then you<br />
might buy a set of shape classes or aircraft classes from another<br />
vendor who didn’t use that hierarchy. (For one thing, using that<br />
hierarchy imposes overhead, which C programmers eschew.) How<br />
do you insert a separate class tree into the container class in your<br />
object-based hierarchy? Here’s what the problem looks like:"<br />
<br />
And a diagram<br />
and he writes:<br />
<br />
"Because C++ supports multiple independent hierarchies,<br />
Smalltalk’s object-based hierarchy does not work so well."<br />
Is this true?
I have 2 questions
1-)Can we say same thing for C#?(I mean: C#’s object-based hierarchy does not work so well).What is C# 's solution?
2-)We know C# is very very differnet than c++.Someone look and see that everyline of code is ended with ";" so says:"C# is similiar to c++".But this is not true.
Can we accept author's expression for "showing differences between C++'s OO fashion and C#'s OO fashion"
What are your opinions.
I am looking for your answers.
|
|
|
|
|
Smalltalk (as the author references) requires that all objects inherit from another object, and new objects are added to the tree. Based on my experiences (surface only) with Smalltalk part of the weakness comes in where a problem in one object causes a cascading failure throughout all objects in the tree. So, for example, you have a message object that contains a collection of messages. If you add a new message but do not explicitely tell that message object it has a new object, it will fail and the object it inherited from fails, and the one that inherited from it fails, etc. There is no way to gracefully handle a failure like that.
C++ does not require that you inherit from anything....however you also need either a header file or write your own for everything you want to do.
C# provides inheritance where it starts by inheriting from Object. However you also have a framework that supports the ability to handle a graceful failure in the heirarchy.
This actually sounds like an old article explaining why Smalltalk is bad. There was a time when that language was the only OO language in existance. But future languages improve upon the basics to provide a richer environment and better behaviour.
|
|
|
|
|
Hey all!
I have implemented a forms designer, with a toolbar and propertygrid etc. It all works well and I have move, resize and add controls to the designer host. I am constructing sort of a monitoring application with graphs and gauges etc. My plan for this application was to let the user enter "design mode" where he/she can choose what controls to show and customize them using the forms designer. When satisfied, the user should be able to switch back to "run mode" where all the designer-related behavior/properties is hidden and the controls acts like normal controls (not in DesignMode).
How do I achieve this? Anyone has some tips of how to do this? Simply, I have a bunch of controls in DesignMode, I want to be able to toggle them between designmode and run-mode.
Any help would be very appreciated!
Best regards,
Peter
|
|
|
|
|
Reading the elements you describe, it seems that what you need is a runtime environment that is interpretive rather than compiled. The runtime manageer would contain the forms and expose a an element for switching between execution mode and design mode.
The runtime manager would then change the containing class as it moves between runmode and designmode. The containing class would be like an MDI container with the toolbox and property grid as decorators in the designmode view of the container.
The hard part will be coming up with a way to represent this stuff and interpret it and then display it in a reasonable level of response time.
-- modified at 15:20 Friday 28th July, 2006
|
|
|
|
|
What can I use instead of struct to store little info? I want create instances, put them in a dictionary, iterate and modify elements in the dictionary. I know every addition will involve a boxing operation, and every modification will involve an unboxing followed by a boxing operation. What is the best thing to do?
|
|
|
|
|
Use .NET 2.0 with generics to allieviate the boxing involved. If you don't have .NET 2.0 ask yourself just how many elements you will be boxing an unboxing. In business apps the difference between 1ms and 3ms is very small. If you are modifying extremely large sets maybe C# is the wrong language. Also, consider your dictionary as a source of problems. If you are iterating a dictionary may be the wrong structure since lists are efficient in this respect.
A man said to the universe:
"Sir I exist!"
"However," replied the Universe, "The fact has not created in me A sense of obligation."
-- Stephen Crane
|
|
|
|
|
Could you please explain what you mean by this - "Use .NET 2.0 with generics". What generics are you talking about?
|
|
|
|
|
Instead of defining a struct you should define a class.
public class MySpecialData
{
private bool isDataDirty;
public bool IsDataDirty { get{ return isDataDirty; } }
private string someValue;
public string SomeValue
{
get { return someValue; }
set { someValue = value; }
}
public int GetHashcode(){}
}
Now it was a known issue that collections of data (even strong typed collections) bring in boxing and downcasting/upcasting issues. Visual Studio 2005 and the 2.0 .NET Framework solve that by using what is called Generics. This moves your collection definition from runtime to compile time. As such your collection is now fully strong typed and no boxing occurs.
public class MySpecialDataCollection : Dictionary< int, MySpecialData> {}
The statement above uses generics....and that is all you need define for a fully functional collection. When you compile your program, the dictionary expects a key of type int and data of type MySpecialData. So at execute time, you do not have any boxing or downcasting occuring at all!!!
Then to use it, it is as any other dictionary:
MySpecialDataCollection list = new MySpecialDataCollection();
foreach (Twizzle element in MyTwizzleCollection )
{
list.Add( element.SpecialObject.GetHashcode(), element.SpecialObject);
}
|
|
|
|
|
I've created an application using VS2005 Express using SQLServer 2005 express. Now the fun bit. How do I create an installation program that installs the SQLServer Express runtime, my application SQLServer express database on a server and update my client connection strings to the database???
Regards,
|
|
|
|
|
Hello,
what i have is this
foreach( FileInfo f in di.GetFiles("*.txt"))
{
//do something
}
foreach( FileInfo f in di.GetFiles("*.log"))
{
//do the same thing
}
what i wan't is this
foreach( FileInfo f in di.GetFiles("*.txt, *.log"))
{
//do something
}
but di.GetFiles("*.txt, *.log") returns nothing.
can i put in a RegEx or something?
Ronald Hahn, CNT - Computer Engineering Technologist
New Technologies Analyst
HahnTech Affiliated With Code Constructors
Edmonton, Alberta, Canada
Email: rhahn82@telus.net
|
|
|
|
|
Nope, you're pretty limited in this regard. Maybe an alternative is to pair up the collections:
List<FileInfo> lst = new List<FileInfo>();
lst.AddRange(di.GetFiles("*.txt"));
lst.AddRange(di.GetFiles("*.log"));
foreach (FileInfo fi in lst) { ... }</FileInfo></FileInfo>
If you want them to be in a particular order, then you can just call Sort() .
Logifusion[^]
|
|
|
|
|
Setteled on this
foreach( string f in Directory.GetFiles(_ControleFileBucket))
{
if( f.EndsWith("*." + CDoc.EmailCF) || f.EndsWith("*." + CDoc.FaxCF)||f.EndsWith("*." + CDoc.GMailCF))
{ ...
Ronald Hahn, CNT - Computer Engineering Technologist
New Technologies Analyst
HahnTech Affiliated With Code Constructors
Edmonton, Alberta, Canada
Email: rhahn82@telus.net
|
|
|
|
|
Hello,
I am trying to do logging to SQL Server 2000 database by using Database Trace Listener. The problem I have is very strange:
Scenario1:
I have SQLExpress installed on my machine and I was able to achive logging in database in this scenario where I was using integrated security I I got all the desired value in tables; such as Log, Category etc.
Scenario2:
I furtured my experiment by doing logging on SQL Server 2000 which is installed on a seperate machine but within the domain where my machine is registered as well. After I got done creating tables and stored procedure in already made database. When ever I tried to do logging in the database nothing happens, no data get populated in Data tables such as Category, Log etc. I tried using integrated security after impersonating my application to an administrator accout, I tried by making connection string by using "sa" account. Nothing is working..
Can somebody give me any clue?
Thanks in advance
|
|
|
|
|
Hi
My app has a Wizard like interface that has 4 pages.
I have created a phone class in this app. On the second page, depending on user's selection I have to instantiate either a digital phone class or normal phone class(Both inherite from parent phone class)
I instantiate the appropirate class and set all the properties in the "Leave" event of second page, and I use the "Enter" event of third page to use these values.
The problem is how can I tell the third page,which class to instantiate and how can I transfer these values(class properties on second page) to third page.
I am thinking may be the phone class has to be a singleton so that whenever it is instantiated I get the first instance each and everytime.
Is there a better way of communicating data between pages?
Thanks,
|
|
|
|
|
jerrymei wrote: Is there a better way of communicating data between pages?
MVC design[^]
|
|
|
|
|
string translationText = "";
if(dicText.TryGetValue(translation[r].EntryCode, out translationText))
{
dicText.Add(translation[r].EntryCode, translationText);
}
I don't get this method. As far as I understand it, it should do one lookup and if found then assign the related value of "translation[r].EntryCode" to translationText. Is this correct?
|
|
|
|
|
TTFCAFO or RTFM or STFW or ...
Just type the code and time it and see what works best. Thats what we do when we don't know the answer.
A man said to the universe:
"Sir I exist!"
"However," replied the Universe, "The fact has not created in me A sense of obligation."
-- Stephen Crane
|
|
|
|