Error/Fix: Cannot Insert Duplicate Key for UPSERT Code in SQLServer

Spread the love

I work with developers quite a lot. There are scenarios where data is received from various sources in an application, and asynchronously pushed to database in multiple sessions/connections.

There are situations where concurrency and transaction speed is high enough to cause below UPSERT code blocks to fail with error message like Cannot insert duplicate key row in object dbo.person with unique index 'pk_person'

Msg 2627, Level 14, State 1, Line 10
Violation of PRIMARY KEY constraint 'PK__person__3213E83F627AACAE'. Cannot insert duplicate key in object 'dbo.person'. The duplicate key value is (105).

1 2	Msg 2627, Level 14, State 1, Line 10 Violation of PRIMARY KEY constraint 'PK__person__3213E83F627AACAE'. Cannot insert duplicate key in object 'dbo.person'. The duplicate key value is (105).

Below is typical UPSERT code block I see in application code –

begin tran

update dbo.person
set address = 'Address for '+name
where id = 105;

if @@ROWCOUNT = 0
	INSERT INTO dbo.person (id, name, address)
	values (105, 'Person_105', 'Address of Person_105');

while @@TRANCOUNT > 0
	commit tran;
go

begin tran

update dbo.person

set address = 'Address for '+name

where id = 105;

if @@ROWCOUNT = 0

INSERT INTO dbo.person (id, name, address)

values (105, 'Person_105', 'Address of Person_105');

while @@TRANCOUNT > 0

commit tran;

IMPORTANT: Usually first solution that comes to mind is using MERGE statement. But I advice to not use MERGE due to lot many bugs attached with MERGE described in this blog.

The reason for failure arises from assumption that if @@rowcount is 0 means, if would be 100% safe to insert the record with same key. But we forget the fact that “what data is visible to a transaction” depends entirely on Isolation level of the transaction.

In above statement, assuming Read Committed Snapshot Isolation for transaction, in a high concurrent fast environment, it can happen that by the time we reach to INSERT statement of UPSERT, a record with same key already got inserted through another concurrent session.

For example, for test purpose, we can introduce a delay for a minute between UPDATE & INSERT of UPSERT code block, and perform the insert in another concurrent session in between of delay.

Session 01 –

-- create dummy table
-- create table dbo.person (id int primary key not null, name varchar(50) not null, address char(1000) not null);

begin tran

update dbo.person
set address = 'Address for '+name
where id = 105;

waitfor delay '00:01:00'; -- do something for 1 minute just for POC

if @@ROWCOUNT = 0
	INSERT INTO dbo.person (id, name, address)
	values (105, 'Person_105', 'Address of Person_105');

while @@TRANCOUNT > 0
	commit tran;
go

-- create dummy table

-- create table dbo.person (id int primary key not null, name varchar(50) not null, address char(1000) not null);

begin tran

update dbo.person

set address = 'Address for '+name

where id = 105;

waitfor delay '00:01:00'; -- do something for 1 minute just for POC

if @@ROWCOUNT = 0

INSERT INTO dbo.person (id, name, address)

values (105, 'Person_105', 'Address of Person_105');

while @@TRANCOUNT > 0

commit tran;

Session 02 –

INSERT INTO dbo.person (id, name, address)
values (105, 'Person_105', 'Address of Person_105');

1 2	INSERT INTO dbo.person (id, name, address) values (105, 'Person_105', 'Address of Person_105');

If we execute query of session 02 while the query of session 01 is still running, the session 02 query would execute successfully in case of Optimistic Isolation levels like Read Committed Snapshot.

So, what are the solutions to this problem?

The first solution could be to use more aggressive isolation level or table hints like Serializable. Serializable isolation level guarantees that with this isolation level, within a transaction, the data which is already read can not be modified by other sessions. Even if the exact keys are not present in table, the entire possible key range would be exclusively locked until the end of current transaction.

So a sample code with this solution would look like below –

set transaction isolation level serializable;

begin tran

update dbo.person
set address = 'Address for '+name
where id = 105;

waitfor delay '00:01:00'; -- do something for 1 minute just for POC

if @@ROWCOUNT = 0
	INSERT INTO dbo.person (id, name, address)
	values (105, 'Person_105', 'Address of Person_105');

while @@TRANCOUNT > 0
	commit tran;
go

set transaction isolation level read committed;

set transaction isolation level serializable;

begin tran

update dbo.person

set address = 'Address for '+name

where id = 105;

waitfor delay '00:01:00'; -- do something for 1 minute just for POC

if @@ROWCOUNT = 0

INSERT INTO dbo.person (id, name, address)

values (105, 'Person_105', 'Address of Person_105');

while @@TRANCOUNT > 0

commit tran;

set transaction isolation level read committed;

Key problem with above solution is, there could be heavy blocking on table if ratio of new records is higher. These new record insertions would block each other even when key column values are different. This happens because the entire Key Range is locked in this isolation level.

Another solution, which is not so know,n is using sp_getapplock and sp_releaseapplock. But this solution demands that insert/update to tables happen within defined set of procedures, or from known application sources only. This is for the fact that this is a code change that has to be applied on every INSERT OR UPDATE on table.

Below is a sample code to this second solution –

BEGIN TRAN;
	declare @id int = 105;
	declare @resource nvarchar(255);

	set @resource = 'person_id-'+convert(varchar(255),@id);
	
	exec sp_getapplock @Resource = @resource, @LockMode = 'Exclusive';

	update dbo.person
	set address = 'Address for '+name
	where id = 105;

	select @@TRANCOUNT;

	/*	--- Insert Same Row in another Session ---------
		begin tran

			declare @id int = 105;
			declare @resource nvarchar(255);

			set @resource = 'person_id-'+convert(varchar(255),@id);
	
			exec sp_getapplock @Resource = @resource, @LockMode = 'Exclusive';

		-- Will it block new row insert in another session?
			INSERT INTO dbo.person (id, name, address)
			values (105, 'Person_105', 'Address of Person_105');

		commit tran
	*/

	/*	--- Insert Different Row in another Session ---------
		begin tran

			declare @id int = 101;
			declare @resource nvarchar(255);

			set @resource = 'person_id-'+convert(varchar(255),@id);
	
			exec sp_getapplock @Resource = @resource, @LockMode = 'Exclusive';

		-- Will it block new row insert in another session?
			INSERT INTO dbo.person (id, name, address)
			values (101, 'Person_101', 'Address of Person_101');

		commit tran
	*/

	exec sp_releaseapplock @Resource = @resource;

--waitfor delay '00:01:00';

while @@trancount > 0
begin
    COMMIT TRAN
	-- ROLLBACK TRAN
end

/*
-- create dummy table
create table dbo.person (id int primary key not null, name varchar(50) not null, address char(1000) not null);

truncate table dbo.person;

-- Insert 100 dummy records into dbo.person
WITH Numbers AS (
    SELECT TOP 100 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n
    FROM sys.all_objects  -- any sufficiently large table
)
INSERT INTO dbo.person (id, name, address)
SELECT 
    n,
    CONCAT('Person_', n),  -- Name like Person_1, Person_2...
    LEFT(REPLICATE('Address_', 200) + CAST(n AS VARCHAR), 1000)  -- 1000-char address
FROM Numbers;

select * from dbo.person;
*/

/*
https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-transaction-locking-and-row-versioning-guide?view=sql-server-ver16#:~:text=Use%20the%20following%20table%20to%20determine%20the%20compatibility%20of%20all%20the%20lock%20modes%20available%20in%20the%20Database%20Engine.

-- Get me locks head
SELECT 
    tl.resource_type,
    tl.resource_subtype,
    tl.resource_description,
    tl.resource_associated_entity_id AS object_id,
    tl.request_mode,
    tl.request_status,
    DB_NAME(tl.resource_database_id) AS database_name,
    tl.request_session_id
FROM sys.dm_tran_locks AS tl
WHERE tl.request_session_id = 67
ORDER BY tl.resource_type, tl.request_mode;

*/

BEGIN TRAN;

declare @id int = 105;

declare @resource nvarchar(255);

set @resource = 'person_id-'+convert(varchar(255),@id);

exec sp_getapplock @Resource = @resource, @LockMode = 'Exclusive';

update dbo.person

set address = 'Address for '+name

where id = 105;

select @@TRANCOUNT;

/* --- Insert Same Row in another Session ---------

begin tran

declare @id int = 105;

declare @resource nvarchar(255);

set @resource = 'person_id-'+convert(varchar(255),@id);

exec sp_getapplock @Resource = @resource, @LockMode = 'Exclusive';

-- Will it block new row insert in another session?

INSERT INTO dbo.person (id, name, address)

values (105, 'Person_105', 'Address of Person_105');

commit tran

/* --- Insert Different Row in another Session ---------

begin tran

declare @id int = 101;

declare @resource nvarchar(255);

set @resource = 'person_id-'+convert(varchar(255),@id);

exec sp_getapplock @Resource = @resource, @LockMode = 'Exclusive';

-- Will it block new row insert in another session?

INSERT INTO dbo.person (id, name, address)

values (101, 'Person_101', 'Address of Person_101');

commit tran

exec sp_releaseapplock @Resource = @resource;

--waitfor delay '00:01:00';

while @@trancount > 0

begin

COMMIT TRAN

-- ROLLBACK TRAN

end

-- create dummy table

create table dbo.person (id int primary key not null, name varchar(50) not null, address char(1000) not null);

truncate table dbo.person;

-- Insert 100 dummy records into dbo.person

WITH Numbers AS (

SELECT TOP 100 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n

FROM sys.all_objects -- any sufficiently large table

)

INSERT INTO dbo.person (id, name, address)

SELECT

CONCAT('Person_', n), -- Name like Person_1, Person_2...

LEFT(REPLICATE('Address_', 200) + CAST(n AS VARCHAR), 1000) -- 1000-char address

FROM Numbers;

select * from dbo.person;

https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-transaction-locking-and-row-versioning-guide?view=sql-server-ver16#:~:text=Use%20the%20following%20table%20to%20determine%20the%20compatibility%20of%20all%20the%20lock%20modes%20available%20in%20the%20Database%20Engine.

-- Get me locks head

SELECT

tl.resource_type,

tl.resource_subtype,

tl.resource_description,

tl.resource_associated_entity_id AS object_id,

tl.request_mode,

tl.request_status,

DB_NAME(tl.resource_database_id) AS database_name,

tl.request_session_id

FROM sys.dm_tran_locks AS tl

WHERE tl.request_session_id = 67

ORDER BY tl.resource_type, tl.request_mode;

Key benefit of above second solution is that locking is on virtual resouce that can be customized by developer, and is very granular. So DML operation on same table on different key values does not block each other for UPSERT operation while still preventing DML on same key values.

I hope this will be useful to developers trying for a reliable solution of UPSERT in SQL Server.

Error/Fix: Cannot Insert Duplicate Key for UPSERT Code in SQLServer

Like this:

Related

1 Comment

Leave a ReplyCancel reply

Share this:

Like this:

Related

1 Comment

Leave a ReplyCancel reply