Click here to Skip to main content
15,867,968 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have huge transactions table in azure database, where we import files with +1 million objects.

public class Transaction
   {
       [Key]
       public int Id { get; set; }
       public int TransactionId { get; set; }
       public DateTime Date { get; set; }
       public decimal Price { get; set; }
       public int UserId { get; set; }
       public string Product { get; set; }
       public int ClientId { get; set; }
       public int Uploaed { get; set; }
       public string UniqueId { get; set; }
       public string Custom1 { get; set; }
       public string Custom2 { get; set; }
       public string Custom3{ get; set; }

   }


after importing all new data I take all new transaction ids, and take all transaction ids for that client from database.

What I have tried:

// ids from import
        string transactionsString = string.Join(",", transactionIdsCsv);
        var result = await   _transactionsDataRepository.GetByTransactionIdsAndClientId(transactionIdsCsv.ToArray(), clientId);
        // ids from repository
        string transactionsDBString = string.Join(",", result.ToList());

        // remove rows in db where duplicate transactions ids and clientId=ClientId


but I am struggling to find the most effective way. I wanted to do something like delete from transactions `where transactionId IN (transactionsDBString) and clientId = ClientID ` but that would delete both values and I only want new value to be deleted (and old value to stay)

but would that be a good way? even fetching `var result = await _transactionsDataRepository... ` can take a lot of time since there are millions of rows.
Posted
Updated 3-Jun-20 2:37am

In SQL Server, you'd want something like:
SQL
WITH cteTransactions As
(
    SELECT
        Id,
        ROW_NUMBER() OVER (PARTITION BY ClientId, TransactionId ORDER BY Id) As RN
    FROM
        Transactions
)
DELETE
FROM
    T
FROM
    Transactions As T
    INNER JOIN cteTransactions As D
    ON D.Id = T.Id
    And D.RN > 1
;
ROW_NUMBER (Transact-SQL) - SQL Server | Microsoft Docs[^]

You can execute a raw SQL command in Entity Framework Core by using the context.Database.ExecuteSqlRaw extension method:
RelationalDatabaseFacadeExtensions.ExecuteSqlRaw Method (Microsoft.EntityFrameworkCore) | Microsoft Docs[^]
 
Share this answer
 
Comments
Maciej Los 3-Jun-20 6:46am    
Wow!
5ed!
Member 12885549 3-Jun-20 14:11pm    
Wow this is amazing solution! I was trying to write a procedure with merge but this beats everything!
Richard Deeming 3-Jun-20 14:13pm    
It leaves the value with the lowest ID.

If you want to leave the highest ID instead, change:
ORDER BY Id
to:
ORDER BY Id DESC
Member 12885549 3-Jun-20 14:27pm    
thank you! you are hero!
Well...
Quote:
after importing all new data I take all new transaction ids, and take all transaction ids for that client from database.
...
but I am struggling to find the most effective way. I wanted to do something like delete from transactions `where transactionId IN (transactionsDBString) and clientId = ClientID ` but that would delete both values and I only want new value to be deleted (and old value to stay)


I'd rather avoid of importing duplicate data. So, create a method/stored procedure which will insert only those data, which aren't duplicated. Something like:
SQL
INSERT INTO ExistingTable (...)
SELECT ...
FROM ImportedData IMDA
WHERE NOT EXISTS(
SELECT ...
FROM ExistingData EXDA
WHERE IMDA.ClientID = EXDA.ClientID AND IMDA.TransactionID = EXDA.TransactionID)


When you do that, you'll never be forced to delete duplicate data.

For further details, please see: EXISTS (Transact-SQL) - SQL Server | Microsoft Docs[^]

Good luck!
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900