|
Got a SQL table that has 7MB of data. Gotta suck it all down and some off shore users are complaining about the performance. So I thought to compress the big column. If I dump out the entire column (400 rows) to a text file and run it through 7zip, it goes down to 200k which I'm happy with. When I do row by row... I'm getting compression, but I end up with 900k of data output??? If I write all my rows out to separate .txt files, I get 4 or 5MB of txt files. If I add them all to a 7z archive, the archive ends up to be 200k as well. So I know my compression code is working. Issue is why when adding them all to an archive is it 25% of the size? Is it doing larger scale RLE or something when you have a bunch of files in the same archive?
|
|
|
|
|
Compression can only work when the data is statistically unbalanced; when all byte values and byte sequences would have the same probability, then there would be no way to get any compression.
Now adding larger chunks of data is likely to result in more compression, as there would be more of a statistical trend, hence more opportunity to compress.
Also, a lot of compression schemes have some overhead with a more or less fixed size (think of it like a dictionary describing the code words that will be used), so more data typically results in a relatively smaller overhead. Part of the rationale here is for smaller amounts of data, the compression ratio isn't all that relevant.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
Well, let me give you more info then:
* table has ~400 rows
* schema is int, int, varchar(80), varchar(256), varchar(max), varchar(80), TimeStamp
- varchar(256) column is mostly NULL
- last varchar(80) column is generally NULL, but occasionally may have a < 16 char
string in it...
- the varchar(max) column is what I am dealing with
* the varchar(max) column contains javascripts
* javascript size varies from about 700 bytes to 43k
* total size of all javascript written out to separate files is 4.8MB
* if I dump all those 400+ javascripts into a 7zip archive, the resultant archive is ~200k
* if I loop throw the rows and compress each column in memory and add up all the
compressed sizes, I end up with 800k in data
I was expecting to add up with something around 200k???
I realize there is probably some header info included, but over 400 files, that shouldn't account for 600k diff.
Thats why I was asking if you throw a bunch of files into the same archive it archives all the files as one chunk vs. individual files which results in it being able to compress "larger chunks" or whatever.
Trying to find a way to get rid of that extra 600k if possible.
|
|
|
|
|
SledgeHammer01 wrote: if you throw a bunch of files into the same archive it archives all the files as one chunk vs. individual files
I have never met a compressor that does that; in apps such as 7zip and WinZip, each file is compressed individually, and can be extracted on its own, even after deleting some or all of the others.
SledgeHammer01 wrote: if I loop throw the rows and compress each column in memory ...
How do you compress that? which classes and/or algorithms are involved?
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
7zip has a "solid" option that does that
|
|
|
|
|
Luc Pattyn wrote: I have never met a compressor that does that; in apps such as 7zip and WinZip,
each file is compressed individually, and can be extracted on its own, even
after deleting some or all of the others.
So I'm kind of assuming that if I can compress 400 .js files into a single .7z archive, that compressing 400 .js files into 400 .7z files should be roughly the same size?
Luc Pattyn wrote: How do you compress that? which classes and/or algorithms are involved?
I downloaded the SDK from here http://www.7-zip.org/sdk.html[^], the 9.22 version. I'm using the native C# version. This is the code I'm using to compress the text blocks.
foreach (SoftDataSet.tblScriptRow row in ds.tblScript.Rows)
{
int i = row.Text.Length;
a += i;
System.Diagnostics.Debug.WriteLine("BEFORE: " + row.Text.Length);
SevenZip.Compression.LZMA.Encoder encoder = new SevenZip.Compression.LZMA.Encoder();
SevenZip.CoderPropID[] propIDs =
{
SevenZip.CoderPropID.DictionarySize,
SevenZip.CoderPropID.PosStateBits,
SevenZip.CoderPropID.LitContextBits,
SevenZip.CoderPropID.LitPosBits,
SevenZip.CoderPropID.Algorithm,
SevenZip.CoderPropID.NumFastBytes,
SevenZip.CoderPropID.MatchFinder,
SevenZip.CoderPropID.EndMarker
};
Int32 dictionary = 0x00800000;
Int32 posStateBits = 2;
Int32 litContextBits = 3;
Int32 litPosBits = 0;
Int32 algorithm = 2;
Int32 numFastBytes = 128;
string mf = "bt4";
bool eos = false;
object[] properties =
{
(Int32)(dictionary),
(Int32)(posStateBits),
(Int32)(litContextBits),
(Int32)(litPosBits),
(Int32)(algorithm),
(Int32)(numFastBytes),
mf,
eos
};
encoder.SetCoderProperties(propIDs, properties);
MemoryStream ms = new MemoryStream(row.Text);
MemoryStream msOut = new MemoryStream();
encoder.WriteCoderProperties(msOut);
encoder.Code(ms, msOut, -1, -1, null);
System.Diagnostics.Debug.WriteLine("AFTER: " + msOut.Position + " " + (msOut.Position) * 100 / i + "%");
b += (int)msOut.Position;
}
System.Diagnostics.Debug.WriteLine("1: " + a + " 2: " + b);
}
If I use the above code to compress the 4.5MB single file, it does get down to 200k like the real 7zip app.
But with the above code doing each row by itself... 1: 4551167 2: 885162
So, somehow its getting 685k of extra crap or???
EDIT: FYI, writing out the properties to the output stream is only 5 bytes. So that only accounts for about 2k.
|
|
|
|
|
Thanks. I wasn't aware 7zip offered an API. I do see a dictionary and some "fast bytes" that tell me there is some overhead to be expected, a few KB wouldn't surprise me. I suggest you do perform the little experiment I described in another post in this thread.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
I don't think it can be a few KB. The 946 byte script comes out as 377 bytes. 11771 -> 2431 bytes, etc. Weird. Very Weird.
|
|
|
|
|
Luc Pattyn wrote: I have never met a compressor that does that; in apps such as 7zip and WinZip,
each file is compressed individually, and can be extracted on its own, even
after deleting some or all of the others.
Hmm... you are mistaken sir Unfortunately.
Had a brain storm and did a simple test:
400 .js -> test.7z = 196353
200 .js -> test1.7z = 138912
200 (the other 200) .js -> test2.7z = 128506
so 400 in one .7z is 196k while 2 200 file .7zs = 267418
So, I guess there is a global archive "shared dictionary" for all the files like I thought. I guess that'd be silly to have a dictionary for each file.
Just splitting it in 1/2 bloated it by 70k.
|
|
|
|
|
Interesting. I'll investigate further when I get some free time.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
suggestion:
take a realistic plain string, about 1000 char long, and now:
1. compress it
2. concatenate to self 100 times, then compress it
Now compare the sizes, that should tell you about the overhead.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
Hello everyone, I am using a ListView control in two threads. The second thread which did not create the control adds items to the ListView control. When I call the Add() function to add the item, it appears the function does not return, looking as if the function has entered into a loop. I changed the code to invoke a function through a delegate as below:
ListViewItem listItem = new ListViewItem("mathematics");
if (lvSubjects.InokeRequired)
{
AddSubjectDelegate addSubject = new AddSubject(AddListItem);
addSubject.AddListItem(addSubject, new ListViewItem[] {listItem});
}
'lvSubjects' is an object of ListView control. I have not typed all the code here but the object 'AddSubjectDelegate' is a delegate and 'AddListItem' is a function in the class. After implementing the code this way too, the same problem occurs. When the 'AddListItem' function is callled, a call to ListView.Items.Add() does not return.
Any solution around this please. Thanks in advance.
|
|
|
|
|
Seems like you are doing something wrong then.
Here[^] is an article that should help you out.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
Not sure what you're doing wrong - but plainly something!
This test code works and adds "mathematics" to the ListView twice, once from the UI thread and once from another thread by calling Invoke on the ListView :
using System.Threading;
using System.Windows.Forms;
public delegate void AddSubjectDelegate(ListViewItem item);
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
StartAddItem();
new ThreadStart(StartThread).BeginInvoke(null, null);
}
private void StartThread()
{
StartAddItem();
}
private void StartAddItem()
{
ListViewItem listItem = new ListViewItem("mathematics");
AddSubjectDelegate addSubject = new AddSubjectDelegate(AddListItem);
if (lvSubjects.InvokeRequired)
{
lvSubjects.Invoke(addSubject, listItem);
}
else
addSubject(listItem);
}
private void AddListItem(ListViewItem item)
{
lvSubjects.Items.Add(item);
}
}
|
|
|
|
|
Hi everyone, I'm working in VS2010 and trying to create an app that interacts with Excel. I've added The MS Excel 14.0 Object Library reference (COM Tab) and at the top of my code I have the following:
using Microsoft.Office.Interop;
using Microsoft.Office.Interop.Excel;
using Microsoft.Office.Interop.Word;
I've tried to do:
var x1 = new Excel.Application();
and I get the error. My project is .Net framework 4. I've tried to find an answer on Google and did what was suggested on a few other forums but still this is not working for me anyone any ideas why?
|
|
|
|
|
pmcm wrote: var x1 = new Excel.Application();
try: var x1 = new Microsoft.Office.Interop.Excel.Application();
Why is common sense not common?
Never argue with an idiot. They will drag you down to their level where they are an expert.
Sometimes it takes a lot of work to be lazy
Individuality is fine, as long as we do it together - F. Burns
|
|
|
|
|
your suggestion worked but instead of doing that every time throughout my code I've set this up:
using Excel = Microsoft.Office.Interop.Excel;
any idea why my project doesn't seem to be picking up the references that I added?
Thanks
|
|
|
|
|
This is the correct way to do it, as described here[^].
Unrequited desire is character building. OriginalGriff
I'm sitting here giving you a standing ovation - Len Goodman
|
|
|
|
|
That doesn't work in .NET 2.0+, since you can't prefix a class name with a "part" of the namespace like that.
Bastard Programmer from Hell
|
|
|
|
|
OP is using .NET 4.0.
Unrequited desire is character building. OriginalGriff
I'm sitting here giving you a standing ovation - Len Goodman
|
|
|
|
|
Shouldn't work in 4.0 either, but I'm too lazy to give it a try right now
Bastard Programmer from Hell
|
|
|
|
|
Eddy Vluggen wrote: Shouldn't work in 4.0 either
I've used it in 3.0 and it worked fine. Did you read the linked article I referred to?
Unrequited desire is character building. OriginalGriff
I'm sitting here giving you a standing ovation - Len Goodman
|
|
|
|
|
That tells us that you have probably used VB to implement the example, as C# doesn't allow a prefix of the classname with a partial namespace;
Namespace Mine.Test
Public Class SomeClass
Public Property P As Guid
End Class
End Namespace
--
Imports ScratchVb.Mine
Module Module1
Sub Main()
Dim X As Object = New Test.SomeClass()
End Sub
End Module
using System;
namespace Mine.Test
{
class SomeClass
{
public Guid P { get; set; }
}
}
--
using Mine;
namespace Scratch
{
class Program
{
static void Main(string[] args)
{
Object X = new Test.SomeClass();
Object X = new Mine.Test.SomeClass();
}
}
}
Yes, read the article some time ago. Did you try it?
Bastard Programmer from Hell
|
|
|
|
|
Eddy Vluggen wrote: That tells us that you have probably used VB to implement the example, as C# doesn't allow a prefix of the classname with a partial namespace;
Wrong on both counts.
Eddy Vluggen wrote: Yes, read the article some time ago. Did you try it?
Yes I tried it, using C# as i)I never use or have used VB/VB.NET and ii)the title of the article is How to automate Microsoft Excel from Microsoft Visual C#.NET!
Unrequited desire is character building. OriginalGriff
I'm sitting here giving you a standing ovation - Len Goodman
|
|
|
|
|
Richard MacCutchan wrote: Yes I tried it, using C#
Never mind
Bastard Programmer from Hell
|
|
|
|