|
JChrisCompton wrote: I know (or at least 'think') Specify Fill Factor for an Index | Microsoft Docs[^]
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Yes, it says "The fill-factor option is provided for... [when] an index is created or rebuilt..."
"I think" was intended to mean "I think fill-factor has no influence except during create/rebuild".
The issue that concerns me is page splits.
From memory of a MSFT SQL Server Performance Tuning class I took five years ago, I think tables with a clustered index are a B+ tree with doubly linked leaf nodes. (class and school were long ago :–)
When an insert happens, and the bottom node of the tree is full, then a split happens.
When SSvr splits a node, my understanding is that it just makes a new node - the fill-factor doesn't enter in to it. If I specify my fill factor to be "only fill 1/3" it doesn't matter - a new node is created (instead of 2 extra nodes being created then each adjusted to 1/3 full). This is done without rebalancing, because it is the fastest solution.
Is my mental model right, or are there things in newer versions that makes 'real life' different?
(maybe some auto tuning feature I've missed)
|
|
|
|
|
I never did trees in school.
JChrisCompton wrote: The issue that concerns me is page splits. From the page I linked; "When an index is created or rebuilt, the fill-factor value determines the percentage of space on each leaf-level page to be filled with data, reserving the remainder on each page as free space for future growth."
"How do you avoid them? There are strategies for doing this, one such strategy is to use a low fill factor when you create and index, but that will be at the expense of read operations and because you usually have more reads to writes then that’s bad."
What is a page split? What happens? Why does it happen? Why worry? - Tony Rogerson's ramblings on SQL Server[^]
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Thanks for the article reference, it is good reading.
It is great background and wonderfully answers "what is a page split."
I do wonder, however, how much of that has changed in SSrv 2008+. I thought, for example, that VarChar columns in SQL Server can be moved off to different storage location to prevent splits due a column changing its size. (row overflow? or is that just for blobs?)
I didn't find something to corroborate my thought when I did a quick search, so if someone could let me know whether I'm on/off base in the previous paragraph it would be great. Just specifically about VarChar (not text/blob; that's a different world I'm rarely in).
|
|
|
|
|
JChrisCompton wrote: It is great background and wonderfully answers "what is a page split." Better than I could in a single post, and it is nice to have an actual example that you can copy/paste to try for yourself and see
JChrisCompton wrote: I didn't find Let me share more bookmarks
How are varchar values stored in a SQL Server database? - Stack Overflow[^] - first answer. I had already forgotten about the option to compress those fields.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Thanks again. I believe the term I was looking for is "offrow pages".
Here's a recent (Feb 2018) link that talks about it briefly then takes the undocumented DBCC IND command really deep into the RowID and beyond SQL Server Row Data Linking to Off Row Data[^] in case you're interested.
I enjoyed it but was a little got lost
|
|
|
|
|
Cool, thanks for the link
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
JChrisCompton wrote: The index isn't just for lookup speed, it is a unique index (same functionality as a unique constraint unless I'm missing something... I'm more of an 'app dba' than a 'real DBA') to enforce referential integrity because... referential integrity.
You're not missing anything, but you need to differentiate the purpose.
A key, (Unique or primary) needs a unique index for its function, and while it's enough to create a unique index to enforce the uniqueness of a column(s) you should really create a unique (or primary) key to show the intent if nothing else.
From the help file Creating a unique index guarantees that any attempt to duplicate key values fails. There are no significant differences between creating a UNIQUE constraint and creating a unique index that is independent of a constraint. Data validation occurs in the same manner, and the query optimizer does not differentiate between a unique index created by a constraint or manually created. However, you should create a UNIQUE constraint on the column when data integrity is the objective. This makes the objective of the index clear. (emphasis mine.) There's also another slight difference, on a unique index you can actually add the option "Ignore duplicate values", this is not allowed for a constraint.
JChrisCompton wrote: So I came up with the idea of a Clustered Identity as the 'physical sorting field'
This is a good idea, but it would need to be the primary key in that case.
Which doesn't matter at all, from a relational point of view there's no difference between a primary key and a unique key. It's only about how the data is stored in the case of clustered tables and magnetic tapes.
JChrisCompton wrote: Someone mentioned the volume being low. Yes, last month was 3,000 but this will be expanded from 1 to 5 categories. This could cause 3,100 entries next month or 31,000 entries
To put it in relation, at my job we insert approximately 300000 rows every week on a clustered table with 300 columns and a semi-random natural key. It takes a couple of minutes.
|
|
|
|
|
Just so you know (in the event that I ask a question here again) those inserts would be against the same database server which supports thousands of real-time users. I was experimenting because it seemed needlessly inefficient to have a clustered char(36) as the key.
Turns out that all my questions may be moot (aside from my own learning)
For review exec sp_helpindex <table name> gives the following description
table #1
clustered, unique, primary key located on PRIMARY guid column
table #2
clustered located on PRIMARY the int column
nonclustered, unique, primary key located on PRIMARY guid column
There is no detectable difference on insert speed when table size is <100k.
There is a difference, but it seems trivial, when the table contains more than two million rows (14 second difference when inserting 300k rows in each table)
P.S.
Love the magnetic tape reference.
P.P.S.
> a clustered table with 300 columns and a semi-random natural key
Yikes!
|
|
|
|
|
The difference is bigger than you might think.
On table #1 you have one index, and on table #2 you have two indexes.
Insert performance is basically linear with the number of indexes[^].
So the fact that it doesn't take twice as long to do the inserts on table #2 means that your idea was quite correct after all. And this is on a narrow table, on a wide clustered table the difference would be bigger.
|
|
|
|
|
One has twice the indices... well duh.
I should have thought of that... but I didn't
Thanks for sticking with me until I got it!
|
|
|
|
|
Hi there,
Below is an Oracle statement 'EXCEPTIONS INTO' I'd like to replicate in SQL Server. Is there an equivalent type of statement in SQL Server?
ALTER TABLE tablename ADD (
PRIMARY KEY(a,b)
EXCEPTIONS INTO exceptionTable);
TIA
modified 20-May-21 21:01pm.
|
|
|
|
|
There isn't, AFAIK. If you want to insert, you wrap it in a transaction and insert. Rollback if there's any conversion-error. Fix the error, insert again. No exception-table
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Thanks Eddy.
modified 20-May-21 21:01pm.
|
|
|
|
|
hi,
I get an error :
The multi-part identifier "e.ent_name" could not be bound.
when trying to access a field from a temp table in a sub-query.
declare @entities table(ent_id int, ent_name varchar(50), startDT datetime)
insert into @entities
select ent.ent_id, ent_name, prod_job.act_start_time_local
from job as prod_job
inner join ent as ent on prod_job.run_ent_id = ent.parent_ent_id
where prod_job.wo_id = 'B.100077779'
select * from @entities
select * from @entities as e
inner join
(
select top(1) act_finish_time_local, oper_id from job as cleaningJob
where cleaningJob.oper_id like '%'+ e.ent_name +'%'
order by act_finish_time_local desc
)
as j on j.act_finish_time_local < e.startDT
how can i fix this?
Thanks
|
|
|
|
|
|
Reverse your operation
Populate the @Entities table as you are doing
change the select query
Select from Job
inner join on @Entities on - apply the filter here
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
|
Try using CROSS APPLY instead:
Making OUTER and CROSS APPLY work for you[^]
select * from @entities as e
CROSS APPLY
(
select top(1) act_finish_time_local, oper_id
from job as cleaningJob
where cleaningJob.oper_id like '%'+ e.ent_name +'%'
and cleaningJob.act_finish_time_local < e.startDT
order by act_finish_time_local desc
)
as j
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
hi all,
I am exporting record-set of select statement into a csv, but when I am doing it, if the fields already have values with , and then that comma is going into the csv as it is, and because of it the import is becoming tough, I want to have an escape for those comma which are coming from the field values.
Any help would be grateful, thanks in advance friends.
Thanks,
Abdul Aleem
"There is already enough hatred in the world lets spread love, compassion and affection."
|
|
|
|
|
You could use the BCP command with the -t option. See bcp Utility
And there is a lot of other ways. Just google for something like MSSQL import into .csv
|
|
|
|
|
Hi all,
I am trying to get all the columns and rows of a csv file using a select statement, when I am trying as in the below query, its retrieving all the rows and columns as one column and one row.
SELECT * FROM OPENROWSET(
BULK 'D:\Users\Abdul\Provider.csv',
SINGLE_CLOB) AS DATA;
I have tried in the following way
select *
into #T
from openrowset('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)};
DefaultDir=D:\Users\Abdul;Extensions=csv;',
'select * from \Provider.csv') Test;
select *
from #T;
Its giving me the following error:
OLE DB provider "MSDASQL" for linked server "(null)" returned message "[Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified".
Msg 7303, Level 16, State 1, Line 11
Cannot initialize the data source object of OLE DB provider "MSDASQL" for linked server "(null)".
Can any body please help me, am I missing anything in the above query, any help would be very much grateful, thanks in advance.
Thanks,
Abdul Aleem
"There is already enough hatred in the world lets spread love, compassion and affection."
-- modified 9-Apr-18 13:16pm.
|
|
|
|
|
|
hi,
I don't have permissions to run the open rowset and it needs the format file creation, I tried creating the format file, but some reason its not letting me do it.
I tried the same using the SSIS package, problem is its giving me error as below:
Error: 0xC0202055 at Data Flow Task 1, Source - ProviderDetails_csv [61]: The column delimiter for column "Column 18" was not found.
Error: 0xC0202092 at Data Flow Task 1, Source - ProviderDetails_csv [61]: An error occurred while processing file "C:\xxxxxxx\OPS\ProviderODS-ScriptsAndData\ProviderDetails.csv" on data row 542006.
Error: 0xC0047038 at Data Flow Task 1, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on Source - ProviderDetails_csv returned error code 0xC0202092. The component returned a failure code when the
But when I trying to import using
if not exists (select * from sysobjects where name='Import_21Columns' and xtype='U')
CREATE TABLE Import_21Columns(
[Column 0] <a href="max">varchar</a> NULL,
[Column 1] <a href="max">varchar</a> NULL,
[Column 2] <a href="max">varchar</a> NULL,
[Column 3] <a href="max">varchar</a> NULL,
[Column 4] <a href="max">varchar</a> NULL,
[Column 5] <a href="max">varchar</a> NULL,
[Column 6] <a href="max">varchar</a> NULL,
[Column 7] <a href="max">varchar</a> NULL,
[Column 8] <a href="max">varchar</a> NULL,
[Column 9] <a href="max">varchar</a> NULL,
[Column 10] <a href="max">varchar</a> NULL,
[Column 11] <a href="max">varchar</a> NULL,
[Column 12] <a href="max">varchar</a> NULL,
[Column 13] <a href="max">varchar</a> NULL,
[Column 14] <a href="max">varchar</a> NULL,
[Column 15] <a href="max">varchar</a> NULL,
[Column 16] <a href="max">varchar</a> NULL,
[Column 17] <a href="max">varchar</a> NULL,
[Column 18] <a href="max">varchar</a> NULL,
[Column 19] <a href="max">varchar</a> NULL,
[Column 20] <a href="max">varchar</a> NULL
)
if not exists (select * from sysobjects where name='Import_18Columns' and xtype='U')
CREATE TABLE [Import_18Columns](
[Column 0] <a href="max">varchar</a> NULL,
[Column 1] <a href="max">varchar</a> NULL,
[Column 2] <a href="max">varchar</a> NULL,
[Column 3] <a href="max">varchar</a> NULL,
[Column 4] <a href="max">varchar</a> NULL,
[Column 5] <a href="max">varchar</a> NULL,
[Column 6] <a href="max">varchar</a> NULL,
[Column 7] <a href="max">varchar</a> NULL,
[Column 8] <a href="max">varchar</a> NULL,
[Column 9] <a href="max">varchar</a> NULL,
[Column 10] <a href="max">varchar</a> NULL,
[Column 11] <a href="max">varchar</a> NULL,
[Column 12] <a href="max">varchar</a> NULL,
[Column 13] <a href="max">varchar</a> NULL,
[Column 14] <a href="max">varchar</a> NULL,
[Column 15] <a href="max">varchar</a> NULL,
[Column 16] <a href="max">varchar</a> NULL,
[Column 17] <a href="max">varchar</a> NULL,
[Column 18] <a href="max">varchar</a> NULL
)
if not exists (select * from sysobjects where name='Import_4Columns' and xtype='U')
create table Import_4Columns (
[Column 0] varchar(max) null
, [Column 1] varchar(max) null<br />
, [Column 2] varchar(max) null
, [Column 3] varchar(max) null
)
if not exists (select * from sysobjects where name='Import_2Columns' and xtype='U')
create table Import_2Columns (
[Column 0] varchar(max) null
, [Column 1] varchar(max) null<br />
)
truncate table Import_21Columns
SET QUOTED_IDENTIFIER OFF
BULK INSERT Import_21Columns FROM 'D:\Users\Abdul\Provider.csv' WITH (FIELDTERMINATOR = '","', ROWTERMINATOR = '\n')
with FIELDTERMINATOR = ',' I tried both the FIELDTERMINATOR values
Its putting couple of new rows into one single column which is not an expected behavior, the openrowset is giving me linked server error means the dbas are not allowing developers to run certain sql commands. any help please.
Thanks,
Abdul Aleem
"There is already enough hatred in the world lets spread love, compassion and affection."
|
|
|
|
|
Hello every one,
I need to improve my SQL skills as much as I can
I know how to do the basic stuff but when I try to do something a bit complex
I'm stuck
Any ideas?recommendations?
thanx a lot
Emily
|
|
|
|
|