|
Hi, They are both right. Just to be more accurate Where() does not necessarily enumerate over all elements, that depends on what you put in the condition. In general the advice is to stick to Any().
|
|
|
|
|
A very good afternoon/time of day to all.
So, I want to read very large files sequentially. By very big, I'm thinking of 1GB up to maybe several 100GB. First question is, how sensible/practical/possible is it to have a file of say 250GB?
Probably not very sensible at all, but I'm observing strange effects even at a couple of GB. 1GB files behave the way I'd expect all the time, but 3GB files often do not.
To demonstrate what I mean, I create 8 files, the first 1GB in size, the subsequent 1GB bigger. File 1 = 1GB, file 8 = 8GB:
private static void Main(string[] args)
{
byte[] buffer = new byte[1 << 20];
Random r = new Random();
for (int mb = 1024; mb <= 8192; mb += 1024)
{
using (FileStream fs = File.Create(string.Format(@"c:\temp\GB{0}.dat", mb >> 10)))
{
for (long index = 0; index < mb; index++)
{
r.NextBytes(buffer);
fs.Write(buffer, 0, buffer.Length);
}
}
}
}
14 minutes later I have my test files, all full of lots of randomness. And all I'm going to do is read each file in its entirety using FileStream.Read:
private static void Main(string[] args)
{
byte[] buffer = new byte[1 << 20];
Random r = new Random();
for (int mb = 1024; mb <= 8192; mb += 1024)
{
Console.Write("GB{0}.dat: ", mb >> 10);
Stopwatch sw = Stopwatch.StartNew();
using (FileStream fs = File.Open(string.Format(@"c:\temp\GB{0}.dat", mb >> 10), FileMode.Open))
{
for (long index = 0; index < mb; index++)
{
fs.Read(buffer, 0, buffer.Length);
}
}
Console.WriteLine("{0:0.00}s, {1:0.00}MB/s", sw.ElapsedMilliseconds / 1000d, mb * 1000d / sw.ElapsedMilliseconds);
}
}
Now, regardless of the file size I'd expect the sequential read speed to me similar, here's what I get:
GB1.dat: 17.54s, 58.38MB/s
GB2.dat: 39.20s, 52.25MB/s
GB3.dat: 149.56s, 20.54MB/s
GB4.dat: 92.97s, 44.06MB/s
GB5.dat: 175.25s, 29.22MB/s
GB6.dat: 84.29s, 72.90MB/s
GB7.dat: 104.96s, 68.29MB/s
GB8.dat: 179.43s, 45.66MB/s
First thought would be, well things slow down when other processes are accessing the disk so the inconsistency could just be other processes doing stuff, but the results seem to be a bit too consistent. At about 3GB, the speed always decreases rapidly.
I don't know what's going on. Windows is clearly caching, the disc probably is, maybe there's some interaction (garbage collection/virtual memory) which arises at certain sizes.
My second question - do you think the size of the file affects the speed which you can read it sequentially? Actually, I'd like random access but let's do the simple things first.
Regards,
Rob Philpott.
|
|
|
|
|
How would "random access" work on a file that large? Imagine someone editing and locking the file
You want a database.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
|
|
|
|
|
Well, the file is immutable so no locking should be required. I don't know the underlying file system structures but would think that random access would probably degrade with size.
Regards,
Rob Philpott.
|
|
|
|
|
It won't be something that fits in memory. Meaning that it will probably be seeking a lot, starting from the start of the file.
Any special format? CSV, XML?
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
|
|
|
|
|
Variable length byte arrays, somewhere between 50 bytes and 5KB, numbered 0, 1, 2, 3 etc. up to 500 million of them. No more than 1TB in total.
Requirement is quite simple really, retrieve byte array by index as fast as possible (random access), provide sequential access as fast as possible (all of them, one by one - obviously this will take a while).
My current thinking is a format of 4 byte length + byte array repeated, and a separate index file to provide the necessary indirection to cope with the variable length nature.
If there were 100 of them you'd stick them in an array, so this really is just a problem of scale. Quite an interesting one.
Regards,
Rob Philpott.
|
|
|
|
|
Rob Philpott wrote: a separate index file to provide the necessary indirection to cope with the
variable length nature. I'd still recommend a database
Your large file will be fragmented and hard to handle. Since the file doesn't fit in the cache of the harddisk, you'd be reading a lot, mostly to change position.
500 million records are somewhat easier to manage. Would also make it easier to locate a particular instance.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
|
|
|
|
|
Thanks for your thoughts Eddy.
Hmm, I don't know. The data is immutable, non-rational, non ACID, no fill-factors or btrees or other DB weirdness required. Mad dog that I am I shall keep trying - I'm confident I can do it faster than SQL Server.
Or at least - try.
Regards,
Rob Philpott.
|
|
|
|
|
You're welcome 
|
|
|
|
|
I would say that this is just a bad way to code. Why in the first place do you need to use a file that spans over to 3GB and then you say you want to go to a file size that spans over to 32GB+ or 100GB. Why, why do you want to do that to .NET framework?
I would like to recommend that you convert the file to small chunks of bytes. 1GB can be converted to 10 chunks of 100 MB each. This would allow you to work with all of them in a pretty much quick way.
The problem is that .NET framework supports objects of 2GB, your files of that much size needs to come to memory and .NET wastes most of the cycles just to keep the content of your files in the RAM. That is why, when the file size increases the process takes longer, because it has to make sure the resources are kept in memory and the managed framework has control of the resources also. As suggested, keep the chunk size to 100MB at max and then work on them separately.
The sh*t I complain about
It's like there ain't a cloud in the sky and it's raining out - Eminem
~! Firewall !~
|
|
|
|
|
Afzaal Ahmad Zeeshan wrote: Why, why do you want to do that to .NET framework?
Well, because I have a lot of data. See further down the thread for explanations, but I could be working with up to 1TB data - that split into 100MB files isn't going to be ideal. NTFS allegedly supports files into Exabytes, so why not?
I'm aware of the unsatisfactory 2GB limit on arrays and such even on x64, but I'm using streams here and at no point am I bringing the whole file into memory, in fact, I'm only ever looking at a 1MB slither of it in the example code.
Unusual yes, unrealistic, well I'd argue not.
Regards,
Rob Philpott.
|
|
|
|
|
We are not at all talking about NTFS in this context, but .NET framework. Which does not support objects of more than 2GB in size. NTFS indeed supports big size files and directories, but .NET framework does not support that much data. NTFS is a hard-disk format for the file system, .NET framework keeps everything managed in the memory.
If that is the case, then I don't seem to know why are you worrying about the 3GB file size, since in your memory the object size is just 1MB? Even with a 1MB file, your application is going to load and reload the data in memory, managing the references and all. That would take a lot of time, wouldn't it?
The sh*t I complain about
It's like there ain't a cloud in the sky and it's raining out - Eminem
~! Firewall !~
|
|
|
|
|
If you look at the code the same 1MB buffer is being reused. The process footprint is small. There's practically no garbage collection going on.
How long it takes is irrelevant, how the performance degrades with file size is the pertinent question. This is not typical of forward stream operations.
Regards,
Rob Philpott.
|
|
|
|
|
Ahem, you can have arrays larger than 2GB now.Source[^]
|
|
|
|
|
I have a multiproject c# solution which builds in VS2013 on my local machine. When
I do TFS MSBuild in fires the following error:
It's a just a _default. aspx page
<i>The type or namespace name 'Class1' could not be found (are you missing a using directive or an assembly reference?)</i>
What surprises me is that it has no external references and it's calling a class in the same namespace but in another file.
default.apx:
<i><%@ Page Language="C#" AutoEventWireup="true" CodeFile="default.aspx.cs" Inherits="MyProject.Service.Proxy._default" %>
</i>
default.aspx.cs:
namespace MyProject.Service.Proxy
{
public partial class _default : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
var a = new Class1();
......
}
}
}
class1.cs:
<i>namespace MyProject.Service.Proxy
{
public class Class1
{
public int Number { get; set; }
public Class1()
{
Number = 1;
}
}
}
</i>
As you can see all classes and files are in the very same namespace.
But if I write the Class1 code in default.aspx.cs IT WORKS!
<i>namespace MyProject.Service.Proxy
{
public class Class1
{
public int Number { get; set; }
public Class1()
{
Number = 1;
}
}
public partial class _default : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
var a = new Class1();
......
}
}
}
</i>
I wouldn't like to end up having a too long and unreadable default.aspx.cs
Do you have any clue what I'm doing wrong?
Thanks so much
|
|
|
|
|
Did you include the file in the project? Is it included in the .csproj file?
Are you using an IDE or a text-editor?
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
|
|
|
|
|
Yeah, I did include the file.
I'm using VS2013 IDE.
Should I use csc in command line?
|
|
|
|
|
Should work from the IDE as well; it will look in the csproj file to determine which files belong to the project. Make sure that the file exists.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
|
|
|
|
|
i am working on the web project and the issue is that whenever i made any changes in the asp.net designer code ,my session ends automatically
please help
|
|
|
|
|
Hi all.
I want to get all the network activity data in Resouce Monitor use C#.
Anyone can tell me, How I can do it.
I really need some help. Hope someone can help me.
|
|
|
|
|
|
Thank for your reply.
I already read all but not yet find answer.
|
|
|
|
|
I don't think you have read them all. For example A Network Sniffer in C#[^] offers at least a starting point for you to build on.
|
|
|
|
|
|
Thank for your reply.
But I have to write a application to get all the network activity data in Resouce Monitor not use tools.
|
|
|
|
|