|
Perhaps you missed my point.
You have a list of pointers. Nothing more. The pointer must be set to point to something. Where does that happen?
normga wrote: the code always query mysql successful
That doesn't mean anything. Code can run successfully, sometimes, even with uninitialized pointers. It depends on how the memory is laid down.
|
|
|
|
|
So I need to regularly update a table with data from another table.
The problem is that if I update the normal way I get a table lock on the target table for half an hour, which is frowned upon by the users. So I need to run the update in batches.
The other problem is that the ID sequence is having gaps in it. Larger gaps than the batch size.
At the moment I have this solution:
DECLARE
@LastID int = 0,
@NextID int,
@RC int = 1;
WHILE (@RC > 0)
BEGIN
SELECT TOP 5000
@NextID = s.id
FROM Source s
WHERE s.id> @LastID
ORDER BY s.id
;
UPDATE t
SET
FROM Source s
JOIN Target t ON t.id = s.id
WHERE s.id > @LastID
AND s.id <= @NextID
;
SET @RC = @@ROWCOUNT;
SET @LastID = @NextID ;
END
Which works just fine, but using two selects is getting under my skin.
Any better suggestions for how to do it?
|
|
|
|
|
How about something like:
DROP TABLE IF EXISTS #ProcessedIDs;
CREATE TABLE #ProcessedIDs (id int NOT NULL Primary Key);
DECLARE @RC int = 5000;
WHILE @RC = 5000
BEGIN
UPDATE TOP (5000)
T
SET
...
OUTPUT
inserted.id INTO #ProcessedIDs
FROM
Target As T
INNER JOIN Source As S
ON S.id = T.id
WHERE
Not Exists
(
SELECT 1
FROM #ProcessedIDs As P
WHERE P.id = T.id
)
;
SET @RC = @@ROWCOUNT;
END;
DROP TABLE IF EXISTS #ProcessedIDs;
NB: The DROP TABLE IF EXISTS syntax is new in SQL Server 2016. If you're using an earlier version, you'll need to use an alternative syntax[^].
The OUTPUT clause should work in SQL Sever 2005 or later.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Ah, yes!
I always tend to forget the output clause.
Thanks!
|
|
|
|
|
Using a WHERE NOT EXISTS turned out to be very slow because of the antijoin using index seeks for every row.
I changed it to WHERE ID > (SELECT ISNULL(max(ID),0) FROM @ProcessedIDs) which allows an index scan.
This is magnitudes faster than the original nonbatched update.
The question is how to use this with composite keys?
|
|
|
|
|
When you use the TOP clause with the UPDATE statement, there's no guarantee that the rows to update will be picked in any particular order. Using the MAX(id) option, you could end up missing rows.
I notice you've replaced the temporary table with a table variable. Was there a reason for that? IIRC, execution plans for table variables tend to assume they contain a very low number of rows, which might explain the poor performance.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Richard Deeming wrote: When you use the TOP clause with the UPDATE statement, there's no guarantee that the rows to update will be picked in any particular order. Using the MAX(id) option, you could end up missing rows.
I know, and you can't add an order by to an UPDATE or INSERT.
But you can put the SELECT with TOP and ORDER BY in a CTE.
Richard Deeming wrote: I notice you've replaced the temporary table with a table variable. Was there a reason for that?
No particular reason. I like to keep the scope as local as possible, so it's mostly a habit.
Richard Deeming wrote: IIRC, execution plans for table variables tend to assume they contain a very low number of rows, which might explain the poor performance.
Table variables don't have statistics, which obviously could affect the plan, but since all ID's are unique I don't think it would make a big difference in this case
But I will test it.
<edit>Oh, and table variables can't go parallell, which obviously can affect performance a lot in this case.</edit>
|
|
|
|
|
Done some testing now.
And as I suspected, there is no difference in either performance or plan as long as the temp table has one column with unique values.
Until the query goes parallel that is. Then the difference is quickly getting huge.
But as long as I'm batching the query it stays the same until the batch is big enough to go parallel (which happens between 10000 and 20000 rows in this case). But then I will also get a table lock.
And oddly enough, it also goes much slower when parallel until reaching 100000 rows per batch. I will do some more testing on this.
|
|
|
|
|
Jörgen Andersson wrote: The other problem is that the ID sequence is having gaps in it. That doesn't change the functionality, and since it should not be visible to the outside world it should not be a problem.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
It was important to mention so that I don't get suggestions like WHERE ID BETWEEN @LastID AND @LastID + 5000
|
|
|
|
|
Add WITH(NOLOCK) to your selects and joins:
DECLARE
@LastID int = 0,
@NextID int,
@RC int = 1;
WHILE (@RC > 0)
BEGIN
SELECT TOP 5000
@NextID = s.id
FROM Source s WITH(NOLOCK)
WHERE s.id> @LastID
ORDER BY s.id
;
UPDATE t
SET
FROM Source s
JOIN Target t WITH(NOLOCK) ON t.id = s.id
WHERE s.id > @LastID
AND s.id <= @NextID
;
SET @RC = @@ROWCOUNT;
SET @LastID = @NextID ;
END
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
|
I didn't put nolock on the update statement - I put it on the join.
You could just create a job that does the monster update at night.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
Quote:
UPDATE t
...
FROM Source s
JOIN Target t WITH(NOLOCK) ON t.id = s.id
...
That NOLOCK hint is on the target table. It's exactly the same as the first example from the article I linked to:
UPDATE t1
SET t1.x = something
FROM dbo.t1 WITH (NOLOCK)
INNER JOIN ...;
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
We use WITH(NOLOCK) prolifically. Of course, we have indexes on all of our tables, and don't generally do massive updates in the middle of the work day. We have no issues.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
NOLOCK can cause nonclustered index corruption, and it's also deprecated[^].
|
|
|
|
|
That's something for our DBAs to worry about.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
Have you tried using a MERGE statement?
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
Yes.
The update in my OP is greatly simplified.
The actual update is a merge with more than 4000 rows.
|
|
|
|
|
I'm not too familiar with MongoDB. In our MongoDb we're currently using the default Guid keys.
However, I don't see any reason not to use Init PKs.
Our base Entity has the following
public class _EntityBase : _BindableBase
{
[BsonId(IdGenerator = typeof(GuidGenerator))]
[BsonRepresentation(BsonType.String)]
[DataMember]
public Guid Id { get; set; }
}
Any reason not to change it to ints?
public class _EntityBase : _BindableBase
{
[BsonId(IdGenerator = typeof(MongoDBIntIdGenerator))]
[BsonRepresentation(BsonType.Int64)]
public int Id { get; set; }
}
If it's not broken, fix it until it is.
Everything makes sense in someone's mind.
Ya can't fix stupid.
modified 17-May-19 14:47pm.
|
|
|
|
|
Because the key is generated on the client not the server.
So if you only have one client, ever, then no not much of a problem. But if you want to add another server then it starts to be a problem.
MongoDB integer primary key (nodejs example) using server side functions[^]
In general the sort of things that a int id in a SQL database which might be solved are not the same as for NoSQL (mongodb). For example the overhead for storing each document (instance) is already so high that and saving with an int is almost insignificant. And if it is a matter of retrieval speed then look to some other solution that would provide real performance rather than the small (if any) gain that this might produce.
|
|
|
|
|
jschell wrote: Because the key is generated on the client not the server.
So what if I created a PK table on the sever:
public int GetNextPrimaryKey(string collectionName)
{
lock(_lockObj)
{
IMongoCollection<PrimaryKeyEntity> pkCol = GetCollection<PrimaryKeyEntity>("PrimaryKeys");
PrimaryKeyEntity entity = pkCol.Find(x => x.CollectionName == collectionName).FirstOrDefault();
if (entity == null)
{
entity = new PrimaryKeyEntity
{
PrimaryKey = 1,
CollectionName = collectionName
};
pkCol.InsertOne(entity);
}
else
{
entity.PrimaryKey = entity.PrimaryKey + 1;
var filter = Builders<PrimaryKeyEntity>.Filter.Eq(x => x.CollectionName, collectionName);
var update = Builders<PrimaryKeyEntity>.Update.Set(x => x.PrimaryKey, entity.PrimaryKey);
var result = pkCol.UpdateOneAsync(filter, update).Result;
}
return entity.PrimaryKey;
}
}
If it's not broken, fix it until it is.
Everything makes sense in someone's mind.
Ya can't fix stupid.
|
|
|
|
|
Kevin Marois wrote: So what if I created a PK table on the sever:
In this scenario...
(some caller) -> (app server) -> (database server)
In the above the 'app server' is the 'client' for the 'database server'
If you create the key on the 'app server' then you have the following choices (locking in your code has no impact on this.)
1. There can only be one server.
2. You will need to create/find a mechanism the correlates the id creation between multiple 'app server' instances.
|
|
|
|
|
How to add a single data for multiple times using MySQL
|
|
|
|
|
At a guess, by writing some code that runs in a loop. But if you want a proper answer you will need to provide more information about what you are trying to do.
|
|
|
|