The Lounge - CodeProject

First Prev Next

Re: Storing huge numbers of files

JasonSQ17-Jul-20 12:48

JasonSQ

17-Jul-20 12:48

File size is critically important. If you're breaking across the block size by just a little bit, the rest of the block is dead space.

Assuming 4k block size and files storing 1K of data. That's 3k of wasted space on disk, per file.

If you zip up the files, they'll store much, much more efficiently.

We have this problem with hundreds of thousands of small text files. We sweep them up and zip them into archive folders on occasion to clean up the folders and reclaim disk space.

Re: Storing huge numbers of files

harold aptroot11-Jul-20 9:51

harold aptroot

11-Jul-20 9:51

As far as I know, Windows itself doesn't mind it too much if there are lots of files in a folder. Explorer is an other matter. So you can put lots of files in a folder, but you can never look at them.

And FAT32 can only have 65534 files in a folder.

Re: Storing huge numbers of files

kalberts11-Jul-20 10:03

kalberts

11-Jul-20 10:03

I hope to persue most users to go for NTFS rather than FAT32.

The most common access will be through an application, which will read the directory programmatically. Windows Explorer access can be considered an exception (although not that exceptional!).

Re: Storing huge numbers of files

Dave Kreskowiak11-Jul-20 10:33

Dave Kreskowiak

11-Jul-20 10:33

If users would be copying these files to a USB stick for any reason, you may run into a problem as formatting a stick using FAT32 is a distinct possibility.

Asking questions is a skill
CodeProject Forum Guidelines
Google: C# How to debug code
Seriously, go read these articles.

Dave Kreskowiak

Re: Storing huge numbers of files

Member 1325675013-Jul-20 5:29

Member 13256750

13-Jul-20 5:29

You could always format the USB stick with NTFS.

Re: Storing huge numbers of files

Dave Kreskowiak13-Jul-20 5:38

Dave Kreskowiak

13-Jul-20 5:38

You could, but how many users actually read the documentation for your app?

Asking questions is a skill
CodeProject Forum Guidelines
Google: C# How to debug code
Seriously, go read these articles.

Dave Kreskowiak

Re: Storing huge numbers of files

Gerry Schmitz12-Jul-20 3:06

Gerry Schmitz

12-Jul-20 3:06

Windows explorer will be your bottleneck ... while you sit and wait while it "builds" a 100k tree view. Odds are, it will "hang". "Reading" directories is not a big deal; how you "display" them is.

It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it.
― Confucian Analects: Rules of Confucius about his food

Re: Storing huge numbers of files

JohnnyCee13-Jul-20 2:38

JohnnyCee

13-Jul-20 2:38

I’m “curious” why you “quoted” those “words” in your “post”.

JohnnyCee

Re: Storing huge numbers of files

Gerry Schmitz13-Jul-20 5:19

Gerry Schmitz

13-Jul-20 5:19

Too lazy to use italics.

"Builds": iterating and instantiating.
"Hangs": no response or exceeding an acceptable response time.
"reading": file i/o
"display": where one loads a visual element for each file object.

Better?

It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it.
― Confucian Analects: Rules of Confucius about his food

Re: Storing huge numbers of files

JFCee13-Jul-20 5:29

JFCee

13-Jul-20 5:29

I don't think emphasis is required for those words, but you do you.

Re: Storing huge numbers of files

Gerry Schmitz15-Jul-20 4:44

Gerry Schmitz

15-Jul-20 4:44

It's from writing too many User Manuals.

As in: the CD disk "tray" is not a "cup holder."

Glad to know you and your users are more sophisticated and have time to sweat this stuff.

It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it.
― Confucian Analects: Rules of Confucius about his food

Re: Storing huge numbers of files

Daniel Pfeffer11-Jul-20 10:20

Daniel Pfeffer

11-Jul-20 10:20

IIRC, NTFS uses a B-tree variant to store file names in a directory. This guarantees fast access to a single file, but may slow down access if you are trying e.g. to enumerate all files in the directory.
FAT32 has a limit of just under 64K entries. The search is linear. Note that a long filename takes at least two entries - one for the short name and one for the long name.
I don't know how exFAT stores directories.

Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.

Re: Storing huge numbers of files

Eddy Vluggen11-Jul-20 11:36

Eddy Vluggen

11-Jul-20 11:36

File-access; so, mostly reading "files"?

A database would give you the most flexibility and performance.

--edit
You can easily expand Sql Server over multiple servers if need be, with more control over sharding and backups than with a regular filesystem.

Bastard Programmer from Hell Suspicious | :suss:

If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

Re: Storing huge numbers of files

obermd13-Jul-20 3:23

obermd

13-Jul-20 3:23

NTFS uses database techniques for file management.

Re: Storing huge numbers of files

Eddy Vluggen13-Jul-20 5:55

Eddy Vluggen

13-Jul-20 5:55

Which is not the same as using a database. The Dokan libraries have proven that a DB is very capable as a FS.

Bastard Programmer from Hell Suspicious | :suss:

If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

Re: Storing huge numbers of files

Patrice T11-Jul-20 20:28

Patrice T

11-Jul-20 20:28

Member 7989122 wrote:
If there are reasons to distribute the files over a series of subdirectories, what are the reasons (/explanations) why it would be an advantage?

If performance downgrade with number of files in a directory, there is only 1 explanation:
The directory is organized as a flat list of files, unsorted.
This imply that to find a file, you have to scan the list/directory sequentially. In O(n).
If an OS can have the directory sorted in the order you look for (file name), cost of finding a file is in O(log(n))

Patrice

“Everything should be made as simple as possible, but no simpler.” Albert Einstein

Re: Storing huge numbers of files

#realJSOP12-Jul-20 0:24

#realJSOP

12-Jul-20 0:24

Maximum number of files on disk: 4,294,967,295

As already mentioned, the problems will start when you try to browse the disk in question with pretty much any existing application.

A better option would be to put the files in a database as blobs. At that point, you'll only have one file on the disk for the database itself. It wouldf also be easier to organize and manage than a complex folder hierarchy.

".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013

Re: Storing huge numbers of files

Nelek12-Jul-20 2:07

Nelek

12-Jul-20 2:07

We have some directories that contain that big number of files, the record I can remember right now is around 450k files in a folder.

They come from long time meassurements that trigger a data file a between 3 and 5 in a minute, each between 1 and 5 Mb.

Accessing the directory is slow, changing the order from name to timestamp is slow, moving the directory to another place is slow, getting the properties of the folder is slow, deleting the folder once is not needed anymore is slow.

Windwos 10 even slower specially the "folder properties" it needs over 15 minutes to count the files and give the size of the folder.
Windows 7 did it in 30 or 40 seconds.

We can't move that to FAT drives, due to number limitations as other said. Need to be NFTS.

M.D.V.

If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.

Re: Storing huge numbers of files

Jörgen Andersson12-Jul-20 19:34

Jörgen Andersson

12-Jul-20 19:34

What are you saving the files for?

How will you access them?
And how will you search for them?
One at a time, sequential, by date, by name...?

Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger

Re: Storing huge numbers of files

soulesurfer12-Jul-20 22:11

soulesurfer

12-Jul-20 22:11

Neither Windows nor Linux do well when putting too many files in a single folder. I've tried it with a million files, it is very painful. Some operations, like simply listing the directory, or even trying to delete the files take absurdly long.

It seems to be doing some operations that are simply not designed for large numbers of files.

Like said, around 10,000 files in a folder is a reasonable max. I simply make it 1,000. So for a million files, spread them across 1,000 folders. There is a nice symmetry here, and it works like a charm.

Re: Storing huge numbers of files

JohaViss6112-Jul-20 23:26

JohaViss61

12-Jul-20 23:26

A few years ago I worked on a system that generates around 50.000 to 100.000 files a day.
We ran in trouble right away.
Storing the files was not a problem, but retrieving them was impossible.
And a second problem was that we needed to search the contents of the files to find all files with a certain string in the text.

We eventually choose to store all files in a database. This was quite easy because the files were small. (Less than 10K)
We choose an Oracle database because of the CLOB datatype. (it allows for indexing and searching)

We had no problems since and have more the 200 million files. Cool | :cool:

Re: Storing huge numbers of files

jarvisa13-Jul-20 0:21

jarvisa

13-Jul-20 0:21

I worked on a system that had to stream 1MB images to disk at 75fps. I found that once there were about 700 files in a directory, creating new files suddenly became slower and the required transfer rate was unachievable. I ended up creating a new subdirectory every 500 files.
Of course this won't be a problem if your system is purely for archive.

Re: Storing huge numbers of files

agolddog13-Jul-20 4:40

agolddog

13-Jul-20 4:40

I don't know about access issues for a large number of files in a directory, but you might also consider security issues.

If, for example, you have several different users whose files should not be accessible by the others, creating a subfolder for each user might allow you to secure them such that only their user has access to their subfolder (plus maybe some 'admin' user that you use which can see all directories). Obvious organizational advantages as well.

Re: Storing huge numbers of files

Member 123113713-Jul-20 7:43

Member 1231137

13-Jul-20 7:43

It really depends on your use case for accessing/managing these files. If you're going to be enumerating the files a lot (or portions of the files) then everything in one directory/folder may not be the best. You can at least "chunk up" the enumeration by subfolder if you create those.

Also, if you break them up into subfolders in some logical way, then managing those units and/or groupings of files will become much easier. I.E. Backups, restoring, archiving, deleting.

If you are storing the path to each file in a database, then you're going to get the same performance either way (subdirectories and everyone in the pool together).

Can you explain a little more about the repository and how you'll be using it?

Re: Storing huge numbers of files

englebart13-Jul-20 7:44

englebart

13-Jul-20 7:44

Consider drive corruption, backups, replication, file listeners, aging/document retention and all of the other access aspects as well. Folder per day/month/year can help out with some of those items as suggested on another post.

Last Visit: 31-Dec-99 18:00 Last Update: 25-Jun-24 4:01

Refresh

ᐊ Prev 1...5001 5002 5003 500450055006 5007 5008 5009 5010 Next ᐅ

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Welcome to the Lounge