|
|
Simples.
Formats like ZIP employ compression on a file-by-file basis. This is obviously prone to poor rates of compression as compared to a scheme that can compress the entire contents of an archive, or in some cases, data that is contained within lots of small files.
The solution is to slap all of the files together first in a monolithic chunk. You then run compression on that chunk in the (almost always delivered) hope that you'll achieve a smaller output than if the compressed output of all the contained files was then glued together into a single chunk.
TAR - turn a bunch of files into one.
GZ - compress a file.
|
|
|
|
|
enhzflep wrote: Formats like ZIP employ compression on a file-by-file basis. This is obviously prone to poor rates of compression as compared to a scheme that can compress the entire contents of an archive If you experience significantly better compression by merging a lot of small files into one, either your average file size is extremely small (like in one classical Unix study showing that for the system as a whole, more than 80% of the files were less than 5 kbytes).
Or, you misinterpret data: it is not poorer compression, but more metadata, administrative information. One large file requires one descriptor, five thousand tiny files require five thousand descriptors. That is not poorer rate of compression, but similar to gathering the five thousand files into one even without compression: That would save the space of 4999 inodes, as well as the internal fragmentation loss - if file sizes are evenly distributed: half an allocation unit (/disk block) per file. You save space by making this huge file, but it has nothing to do with data compression.
If you want to make an exact comparison, you cannot compare the size of the .tar file to the size of the .tar.gz file. That sure would give you the compression rate of the .tar file, but to created the .tar file you had to add a noticable amount of metadata. So what you save by having only one file/compression descriptor, you partially loose to .tar administrative information.
I keep a number of 'archives' of many small files in .zip format, saving space due to the compression, of course, but also a lot is saved by not wasting 2 Kbyte on each file in internal fragmentation.
Another advantage of .zipping up these file groups: I frequently move the files between machines on USB sticks. Writing a few thousand files to a USB stick takes a lot of time to create the files. I guess that it has to do with USB stick writes not being cached, at least not to the same degree, and file creation requires lots of writes, even if the file contents is done in one single write. Writing a single .zip archive to a USB stick is several times faster than writing two thousand tiny files.
A similar situation: We run a fairly large build system, with about a hundred build agents. A build may be producing dozens, in some cases hundreds, of individual artifacts. On the central server, distributing these artifacts, the inode table exploded when each artifact was treated separately. We were forced to modify the builds to pack up related files into archives (usually a single one) to be saved centrally as an artifact of the build.
Most of these advantages comes from the archive file, whether compressed or not. Compression comes as an additional benefit.
When you use .tar.gz as a distribution format, having to untar and ungzip the entire collection is perfectly fine. When using a .zip file as an (often mostly or fully read-only) 'working' archive, extracting a single file quickly is essential. For my use, .tar.gz would be very cumbersome. Also, having the file system retrieving zipped files for applications that do not have unzipping built into the code is great. Of course: A self-explanatory user interface that doesn't require you to memorize a zillion of options and command words, can display the directory structure in the archive, and preview files, is also nice. The ability to encrypt files is valuable as well.
I haven't discovered any real disadvantage of .zip even as a distribution format, but for that purpose, .tar.gz is also fine. However, for daily work, I most certainly prefer a format that lets me access individual files in the archive without having to decrypt, untar and ungzip the entire archive.
|
|
|
|
|
What are you downloading? Most source code bundles are just compressed with the .gz extension.
Maybe if there were a bunch of videos, jpeg or pdf files, which have inbuilt compression, then the uncompressed .tar file might be as short, if not shorter than the tar.gz file. But muscle memory will automatically add the z option to tar to invoke gzip to compress the output. Similar, there's little point is adding the -C option to scp when transferring a .tar.gz file.
|
|
|
|
|
Because GZ refers to gzip - and TAR refers to Tape Archive - which is not compressed (though some Unixes do use "compress" to compress the files in the TAR archive as a default setting).
You can decompress with -zxvf to do the decompress all in the one step.
You probably also noticed that some have TGZ extensions, and some have TAR.GZ extensions - depending on the process used to create the archive.
In short - the reason is the Unix tradition for small programs that do one thing well and allow you to pipe them into one another.
|
|
|
|
|
A TAR file in your sense is indeed a TAR.GZ file, which embed two formats : TAR and GZ. Here's the process :
- A TAR file is ceated, concatening several files together in their uncompressed form ; note that resulting TAR file is uncompressed,
- A GZ file is created by compressing the previous TAR file.
So to decompress a TAR.GZ file, you have to :
- Decompress the compressed GZ file and
- "Untar" (unarchive) the uncompressed resulting TAR file.
Note that you can compress a TAR file with other popular compressors (bzip2 => TAR.BZ2, 7zip => TAR.7Z...).
modified 21-May-19 4:10am.
|
|
|
|
|
The decompression can be combined, but indeed the format packed twice as .tar.gz or `.tar.bz2` or such.
This is à la Unix where small operations are combined into on large operation.
Its advantage here: tar (tape archive) concatenates all files, and the ensuing gz compression can do a "far" better compression over all content. As opposed to .zip compression.
|
|
|
|
|
Let's not forget that a fundamental thought in Unix was small, individual programs strung together through pipes and stdin/stdout redirection.
So:
find . -print |grep "draw" | grep "\.c$" |tar -c -T - |gzip -c > temp.tar.gz
is an elaborate way (albeit inefficient) of finding all files ending in ".c" in the current directory and all subdirectories that contain "draw" as any part of the file name to create a tar file which is then zipped to stdout to a file named "temp.tar.gz"
The point here is demonstrate hooking together small programs through piping.
So, because they probably found that many tar files got zipped, someone got the idea of combining zip into the tar command through "tar -cvfz temp.tar.gz filelist". A good idea.
|
|
|
|
|
|
Well yes. We support Wales, and whoever is playing against England.
Sent from my Amstrad PC 1640
Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
F***ing troglodyte.
|
|
|
|
|
This was intended to be a humorous discussion, but I clearly failed in that regard. Based on the responses, I seem to have unintentionally touched a nerve. Since I can't simply delete the post, and I do believe in transparency, I'll leave it unedited below. Needless to say, I won't be collating responses. I'll just take my lumps and move on...lesson learned
--- Original Message ---
I've been lurking around the CodeProject's Q&A forums a little bit lately and noticing a few trends. I thought it might be fun to share pet peeves...I'm coming down with a case of them lately
I truly don't want to discourage anyone from posting questions, so please be nice, keep it general, and don't call out anyone by name. Remember we all started out needing to learn.
So, to get it started, here are some of my pet peeves. What are yours?
"it didn't work" - Help us. What was "it"? And, how did it fail to meet your expectations?
"nothing happened" - Really, nothing? Could you please be less specific?
"help urgently needed" - What makes it urgent? Should I rush to help?
Also...no quote here, but the purity police rush in, leave an opinion, and utterly fail to answer the question.
I am curious to hear your experiences. In a few days, if there is any interest, I'll try to collate and rank similar observations.
Also, what's the consensus? Should I simply edit this post to include the results? Or, should I make a new post with the results?
modified 16-Jun-18 14:31pm.
|
|
|
|
|
|
My biggest pet peeve is "Help = write my code for me".
Having said that, nobody who NEEDS to read your post is ever going to.
|
|
|
|
|
well, if it's just complaining about it: soapbox
if you have solutions you wish to propose: bugs & suggs
This internet thing is amazing! Letting people use it: worst idea ever!
|
|
|
|
|
I would like to respond to you, Eric, but I am not clear what you are talking about, and whether you are describing your own personal reactions, or describing what you have actually observed.
Rather than "lurking," why not lend a hand, and see how that goes ?
«... thank the gods that they have made you superior to those events which they have not placed within your own control, rendered you accountable for that only which is within you own control For what, then, have they made you responsible? For that which is alone in your own power—a right use of things as they appear.» Discourses of Epictetus Book I:12
|
|
|
|
|
I just left a comment on a week-old question that I offered a solution for with no feedback at all from the OP...and it's not the first time. The only time I goto QA is usually on the weekends when it gets boring here.
An interesting question might be 'how many questions have you personally asked in a forum/qa?' I've been here for 11 years and have posted (without looking) maybe 2 questions. Why so few? It's rare that I can't find what I'm looking for without bothering a bunch of strangers who probably have better things to do. I also wouldn't want to discourage anyone from posting questions, but it should be done as a last resort. C'mon, everyone should know how google works in this day and age.
I don't know about others, but I stay away from homework questions. That's what instructors/aides/tutors are paid for.
"Go forth into the source" - Neal Morse
|
|
|
|
|
kmoorevs wrote: I've been here for 11 years and have posted (without looking) may'be 2 questions. Why so few? I Hi, this is not intended as a "trick" question: have you asked questions on StackOverflow during this time period ?
I would interpret your few questions posted on QA here as being possibly correlated with:
1. you had a good technical education, or you self-educated with focus and direction. specifically: you learned the art/skill of debugging, and you learned some-kind-of-SOLID organizing principle for development.
2. you have natural analytic and logical problem-solving skills.
3. you have a work/study ethos that emphasizes self-reliance. you're willing to bear down, and work through frustration.
Hypothetically, I would say you are far from the average QA poster here.
cheers, Bill
«... thank the gods that they have made you superior to those events which they have not placed within your own control, rendered you accountable for that only which is within you own control For what, then, have they made you responsible? For that which is alone in your own power—a right use of things as they appear.» Discourses of Epictetus Book I:12
|
|
|
|
|
BillWoodruff wrote: have you asked questions on StackOverflow during this time period
I don't have a membership at 'that other website'. Google sends me there quite a bit, and I find answers there, but I've never felt like actually joining.
As for the rest, you pretty much nailed it, but all those traits seem to be required for those in our field, or those looking to join it...or at least it was when I started. (PG or Pre-Google)
BillWoodruff wrote: Hypothetically, I would say you are far from the average QA poster here.
Point taken. Thanks Bill!
"Go forth into the source" - Neal Morse
|
|
|
|
|
Eric Lynch wrote: "help urgently needed" - What makes it urgent? Should I rush to help? If it is urgent, call 911. Anything else can wait until tomorrow.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Fun stuff...I was checking out your articles. One of them is almost spot-on for a Q&A I recently tried to help with...getting a place name from a postal code. As with your article, I recommended GeoNames to the OP.
Would have saved me some time, if I simply posted a link to your article
|
|
|
|
|
That was my first article on CodeProject; there's a more modern version by another member - GeoNames .NET WCF Client[^]
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Eddy Vluggen wrote: "help urgently needed"
Google translates this to "I have to hand it in as my homework this morning".
Sent from my Amstrad PC 1640
Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Regarding your addendum I wouldn't take the responses to your post too much to heart. You may be a little innocent regarding the general tenor of the lounge - probably been too busy writing articles.
In response to your original post I might add that the answers given in Q&A are not always edifying.
Peter Wasser
"The whole problem with the world is that fools and fanatics are always so certain of themselves, and wiser people so full of doubts." - Bertrand Russell
modified 16-Jun-18 20:30pm.
|
|
|
|
|
Thank you, much appreciated. I promise I'm not taking it too much to heart
I was (slightly) surprised that I upset folks with my post, not my intent. What can I say...I've got a weird sense of humor. At this point, simply looking to move on.
You're completely correct, I enjoy writing articles much more. I think I'll stick to that in the future.
Ironically, I also enjoy helping in Q&A, when I've been able. I'll still do that as well.
|
|
|
|
|