The Weird and The Wonderful

wizardzz1-Mar-11 7:59

1-Mar-11 7:59

The open_quantity is never a search criteria. Its value is 0 99.99% of the time, and we never search by open_quantity or order_type, if we did, the order_id and date would be included as well. Have you ever inserted large sets of data into a table that had a clustered index, while the inserted data would have to be physically stored into the middle of the table because of the index? It's quite ugly.

I'm not suggesting eliminating the clustered index, but there's no reason the data need be physically stored in order according to open quantity and order type.

"Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!"
— Hunter S. Thompson

AspDotNetDev1-Mar-11 8:35

1-Mar-11 8:35

You are right, there's not really a good reason to store that type of data using a clustered index. Still, I don't see anything that you've come across which would fix the problem you described. Would inserting a value with a bad clustered index cause all the rows in the table to be moved so that the physical location on the hard drive is stricly enforced (i.e., so rows will be ordered on the HD by the clustered index)? Or is SQL smart enough to insert the data without rearranging it (i.e., by using a tree-like structure)?

wizardzz1-Mar-11 8:37

1-Mar-11 8:37

Clustered indexes physically store the rows in the order provided by the index, as far as I know, it can be quite taxing on resources so I'm not sure what logic they use for it.

AspDotNetDev3-Mar-11 11:23

3-Mar-11 11:23

See here. Basically, the row data is stored with the clustered index, which is stored as a B-tree, so insertion is not an expensive operation (i.e., this should not be the cause of your problem).

Andy Brummer3-Mar-11 10:37

3-Mar-11 10:37

A clustered index is by definition used to physically order the table data in the file. Non-clustered indexes are only ordered by the data in the index.

http://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/[^]

Curvature of the Mind now with 3D

AspDotNetDev3-Mar-11 11:22

3-Mar-11 11:22

Thanks, that is what I suspected. Looks like the row data is stored with the custered index. Since the indexes, including the clustered index, are stored as B-trees, there should indeed be no huge performance hit (i.e., all the data isn't going to have to be reorganized on disk if a row is inserted into the middle of the clustered index). The only time a performance hit would occur would be during a reindexing of the table (even then, the hit would turn an O(1) operation into an O(1 * x) operation, and since x is effectively constant, that would reduce to a O(1) operation). Since those shouldn't be occurring very often, that should not cause the problem described by the OP.

Andy Brummer3-Mar-11 11:40

3-Mar-11 11:40

What happens is the pages fill up and then get split as records are inserted in the middle. Splitting pages is significantly slower than just appending records on the end. It's not the O(whatever) performance, but the big ass constant performance of re-arranging data on a disk.

Curvature of the Mind now with 3D

AspDotNetDev3-Mar-11 12:02

3-Mar-11 12:02

A page is 8KB. That's not going to take long to move around. The time to split a page also averages in the long run to be O(1). Say a page holds 50 records. It may split on a single record insertion, but those other 49 record insertions would not cause a split. The average time per insertion would be O(1).

Now, if SQL sorts data within the page for each insert, that may slow things down a bit, as the entire page would essentially be rewritten each time randomly ordered data got inserted. Not sure exactly how it works at that low of a level. Even in this case, though, SQL should be able to optimize things. For example, if records are inserted in batches, adjacent records need not cause multiple sorts within a page... SQL can figure out the position within the page, and sort all the data at once before writing it to disk. And in the case that many small insertions are peformed, those individual inserts shouldn't take long enough to slow other stuff down. Of course, transaction locks and such could bork things up, but that's a different issue.

Andy Brummer3-Mar-11 12:22

3-Mar-11 12:22

It all depends. If the entire table can fit inside the memory of your sql server and there isn't a lot of contention then it doesn't really matter one way or the other. If you have 32Gigs of memory in your server and have 50 databases on the server with about 1 TB of data, then it makes the difference between inserting 5 million records in a few minutes or in a few days.

For another server it made the difference between inserting 35,000 records a second for 22 hours straight and starting to slow down after a few hours.

Curvature of the Mind now with 3D

AspDotNetDev3-Mar-11 12:51

3-Mar-11 12:51

Andy Brummer wrote:
inserting 35,000 records a second for 22 hours straight

Inserting nearly 3 billion records is going to take a while no matter what you do.

Andy Brummer wrote:
it makes the difference between inserting 5 million records in a few minutes or
in a few days.

I don't think it would cause a 1,000x slowdown. 10x, maybe... but 1,000x? That doesn't sound right.

wizardzz3-Mar-11 11:55

3-Mar-11 11:55

I am only speaking from experience here: Inserting large numbers of data at a high rate of speed into the middle of a table with seemingly random clustered index can really hurt performance. I've done it in the past, rate: ~10-30k rows per second on a table ~ 100m to 1b rows in size. Using a clustered index, and the pushing the data to the table in the order of the clustered index made queries nearly unusable or in the very least lengthy, however when doing a time-series based search, on a table with a time series clustered index was fine and allowed for a realtime monitoring solution.

AspDotNetDev3-Mar-11 12:09

3-Mar-11 12:09

Well, the clustered index you talked about is certainly not ideal and is less performant than it should be. I supposed the magnitude of the problem depends on the exact amount of data you are using and the frequency of inserts/updates/reads/deletes. I guess the "ultimate" fail here would be that the others you work with didn't find the problem sooner.

Andy Brummer3-Mar-11 12:32

3-Mar-11 12:32

A lot of the time has to do with speculative reads missing and the random read write access on the disks pushing them from < 10ms access times to over 100ms. SQL server optimizes it's accesses for overall throughput and that can drive latency to ridiculous levels. As much as it doesn't sound like it should be a big deal. I can tell you from real world experience it is.

Curvature of the Mind now with 3D

wizardzz22-Mar-11 4:35

Strings: the perfect way to compare numbers.

22-Mar-11 4:35

I don't know why this was down voted, hopefully I corrected it enough.

OriginalGriff27-Feb-11 23:30

OriginalGriff

27-Feb-11 23:30

I'm not saying exactly where this came from (to protect the guilty), but it was a question on this very site. (I have also changed the field and variable names)

A code fragment you may appreciate:

float result = 0F;
SqlCommand cmd = new SqlCommand("Select SUM(myField) From myTable Where myOtherField = 'Value'", con);
if (cmd.ExecuteScalar().ToString() != "0" && cmd.ExecuteScalar().ToString() != "") 
   result= float.Parse(cmd.ExecuteScalar().ToString().Trim());

Three lines of code; How many don't-do-its can you spot?

Two extra database accesses
Three unnecessary int-to-string conversions
One unnecessary Trim operation (with the output guaranteed to equal the input)
One unnecessary Parse operation
Six unnecessary string creations

Real men don't use instructions. They are only the manufacturers opinion on how to put the thing together.

Digital man: "You are, in short, an idiot with the IQ of an ant and the intellectual capacity of a hose pipe."

fjdiewornncalwe28-Feb-11 3:25

fjdiewornncalwe

28-Feb-11 3:25

I think I work with this guy.... Smile | :)

Just kidding, but I do know a developer who insisted on always doing this sort of this. He was like a little kitten. He was always compelled to use them.

I wasn't, now I am, then I won't be anymore.

OriginalGriff28-Feb-11 3:30

OriginalGriff

28-Feb-11 3:30

Marcus Kramer wrote:
I think I work with this guy

If you do, could you give him a slap round the back of the head, and tell him it's from me? Laugh | :laugh:

fjdiewornncalwe28-Feb-11 8:08

fjdiewornncalwe

28-Feb-11 8:08

You have no idea how much I would like to accommodate you on this one, but I'm not ready to retire quite yet and I don't think I could stop at just one, so I can't make it look like an accident.

I wasn't, now I am, then I won't be anymore.

Re: Strings: the perfect way to compare numbers. [modified]

OriginalGriff28-Feb-11 8:32

OriginalGriff

28-Feb-11 8:32

You know you want to... Laugh | :laugh:

musefan1-Mar-11 0:49

musefan

1-Mar-11 0:49

...so how would you write that bit of code?

OriginalGriff wrote:
Three unnecessary int-to-string conversions

There are 0 int-to-string conversions in the code you posted

OriginalGriff wrote:
One unnecessary Parse operation

Need to get it to a float somehow, and parse will only take a string. You could use ...

(float)Convert.ToDouble(cmd.ExecuteScalar());

...but not sure how good that is for performance, or perhaps take the chance of a straight cast (but could be risky)...

float result = (float)cmd.ExecuteScalar();

EDIT:

What deserves a vote of 1? the fact that somebody failed to note that ExecuteScalar return an object not an int? or the fact that people don't like other people not being in complete agreement?

Illogical thoughts make me ill

Re: Strings: the perfect way to compare numbers. [modified]

GibbleCH1-Mar-11 4:13

GibbleCH

1-Mar-11 4:13

More like this, I didn't do any tests, so I'm sure it can still be improved

var cmd = new SqlCommand("Select SUM(myField) From myTable Where myOtherField = 'Value'", con);
var resultString = cmd.ExecuteScalar().ToString();

var result = (!string.IsNullOrEmpty(result) &amp; !result.Equals("0"))
	? (float)System.Convert.ToDouble(resultString)
	: 0F;

musefan1-Mar-11 4:46

musefan

1-Mar-11 4:46

get rid of the var's and then we can talk Laugh | :laugh:

certainly an improvement thou you would be wasting your time checking for result.Equals("0") to end up setting it to 0 anyway, ~~also Convert.ToDouble() will return 0 for empty string anyway~~ (meant that for a null object). Problem is if the result is non-numeric. That's why I would lean more to a (try)parse

SqlCommand cmd = new SqlCommand("...", con);
float result = 0;
float.TryParse(cmd.ExecuteScalar().ToString(), out result);
return result;

I am not saying this is the best best way, but it covers null, empty and non-numeric values

EDIT: as I have correctly been corrected, I should test for null before using ToString() a shameful mistake...

SqlCommand cmd = new SqlCommand("...", con);
float result = 0;
float.TryParse(cmd.ExecuteScalar() AS string, out result);
return result;

GETTING THERE...

SqlCommand cmd = new SqlCommand("...", con);
float result = 0;
object cmdResult = cmd.ExecuteScalar();
if(cmdResult != null)
     float.TryParse(cmdResult.ToString(), out result);
return result;

...of course _Erik_ has the better (and smaller) amount of code

Don't vote my posts down just because you don't understand them - if you lack the superior intelligence that I possess then simply walk away

modified on Wednesday, March 2, 2011 7:10 AM

GibbleCH1-Mar-11 5:56

GibbleCH

1-Mar-11 5:56

I was copying the logic of the OP, like I said, I hadn't tested anything. But reducing the DB access from 3 to 1, will make a HUGE performance gain.

ps. 'var' is gold. I use it everywhere since my variable names are clear, and I refactor a lot. And not having to change types saves a lot of time.

musefan1-Mar-11 6:17

musefan

1-Mar-11 6:17

Yep I agree reducing the DB access is definitely the best performance optimisation.

GibbleCH wrote:
I use it everywhere since my variable names are clear

That's fair enough, I just prefer having the type declared at the start as it makes easier scanning IMO

Don't vote my posts down just because you don't understand them - if you lack the superior intelligence that I possess then simply walk away