Relationship data in Easy Way

Spread the love

Recently I worked on one optimisation job. It was procedure that processed multi-relationship data provided by clients in excel format, and created relationship tables.

For example data like :-
1) Domain, SubDomain, Technology, Service, Activity, SubActivity
or
2) Country, State, province, county, city, town

The code handled the task using T-SQL cursor, and was taking more than an hour. So let’s see the sample data first, and then design our simple and elegent solution using Ranking Functions, which ultimately in my case took only 10 minutes.

--	Create table to import relationship data from client
CREATE TABLE Locations_Raw
(
    Country VARCHAR(50),	
    State	VARCHAR(50),
    Province VARCHAR(50),
    County VARCHAR(50),
    City VARCHAR(50),
    Town VARCHAR(50)
)
GO

--	Populate with relationship data
INSERT INTO Locations_Raw
SELECT  'C1', 'S1', 'P1', 'CNTY01', 'CTY01', 'T1'
UNION ALL
SELECT 'C1', 'S2', 'P2', 'CNTY01', 'CTY01', 'T2'
UNION ALL
SELECT 'C1', 'S2', 'P3', 'CNTY02', 'CTY02', 'T1'
UNION ALL
SELECT 'C1', 'S1', 'P2', 'CNTY02', 'CTY01', 'T2'
UNION ALL
SELECT 'C2', 'S1', 'P1', 'CNTY01', 'CTY01', 'T1'
UNION ALL
SELECT 'C2', 'S2', 'P2', 'CNTY01', 'CTY01', 'T2'
UNION ALL
SELECT 'C2', 'S2', 'P3', 'CNTY02', 'CTY02', 'T1'
UNION ALL
SELECT 'C2', 'S1', 'P2', 'CNTY02', 'CTY01', 'T2'
GO

--	Show me sample data
SELECT * FROM Locations_Raw
GO

Relationship-Data

Now, as per our requirement, we need above data to populate below tables:-

CREATE TABLE [dbo].[TBL_Location] 
(	LocationID INT NOT NULL PRIMARY KEY,
	Location_Type VARCHAR(20), 
	Name VARCHAR(50)
)
GO
CREATE TABLE [dbo].[TBL_Location_XREF] 
(	LocationID INT FOREIGN KEY REFERENCES [dbo].[TBL_Location] (LocationID),
	Parent_LocationID INT FOREIGN KEY REFERENCES [dbo].[TBL_Location] (LocationID)
)
GO

After populating above relationship tables, result set should look like as below:-

-- Find LocationID for country 'C1'
SELECT * FROM TBL_LOCATION WHERE NAME = 'C1' AND LOCATION_TYPE = 'COUNTRY'

--Find all states for above country
SELECT L.LocationID, L.NAME, L.Location_Type FROM TBL_Location_XREF X inner join TBL_LOCATION L ON L.LocationID = X.LocationID WHERE X.Parent_LocationID = 1
--User chooses 'S1' state with LocationID = 3

--Find all Province for above State
SELECT L.LocationID, L.NAME, L.Location_Type FROM TBL_Location_XREF X inner join TBL_LOCATION L ON L.LocationID = X.LocationID WHERE X.Parent_LocationID = 3
--User chooses 'P2' Province with LocationID = 8

--Find all Counties for above Province
SELECT L.LocationID, L.NAME, L.Location_Type FROM TBL_Location_XREF X inner join TBL_LOCATION L ON L.LocationID = X.LocationID WHERE X.Parent_LocationID = 8

So how do we achieve that?

With T-SQL cursor, it took 500+ lines of code with 4-5 levels of control-flow nesting (If Else) to implement the logic. Even after this much effort, it was executing in more than an hour’s time. So what option do we have?

Ranking Functions which were first introduced with SQL Server 2008 comes to our rescue. Let’s see the solution implementation using it.

-- Take out distinct raw data provided in separate temp table
--	=========================================================
IF OBJECT_ID('tempdb..#Locations_Raw') IS NOT NULL
	DROP TABLE #Locations_Raw;

SELECT	DISTINCT
		Country, State, Province, County, City, Town
INTO	  #Locations_Raw
FROM	  Locations_Raw

--	====================================================================================================================
-- Generate IDs for Country, State, Province, County, City, Town using ranking function DENSE_RANK(), and store result in another temp table
IF OBJECT_ID('tempdb..#Locations_Temp') IS NOT NULL
	DROP TABLE #Locations_Temp;

SELECT	*	    --  Country, State, Province, County, City, Town
		,TownID = DENSE_RANK()OVER(ORDER BY T6.CountryID, T6.StateID, T6.ProvinceID, T6.CountyID, T6.CityID, T6.Town)
INTO	  #Locations_Temp
FROM  (
		SELECT	T5.*
				,CityID = DENSE_RANK()OVER(ORDER BY T5.CountryID, T5.StateID, T5.ProvinceID, T5.CountyID, T5.City)
		FROM  (
				SELECT	T4.*, CountyID = DENSE_RANK()OVER(ORDER BY T4.CountryID, T4.StateID, T4.ProvinceID, T4.County)
				FROM  (
						SELECT	*
								,ProvinceID = DENSE_RANK()OVER(ORDER BY T3.CountryID, T3.StateID, T3.Province)
						FROM  (
								SELECT	T2.*
										,StateID = DENSE_RANK()OVER(ORDER BY T2.CountryID, T2.State)
								FROM (
										SELECT	T1.*
												,CountryID = DENSE_RANK()OVER(ORDER BY T1.Country)
										FROM	    #Locations_Raw AS T1	--Country
										) AS T2	-- State
								) AS T3	-- Province
						) AS T4	-- County
				) AS T5	-- City
		) AS T6	-- Town

Lines 1-9: Creates table with distinct relationship records provided by client. Yes, clients can make mistake by providing duplicate entries.
Lines 11-38: Creates dummy location id for each relationship element maintaining its relationship

Let’s see output of temp table #Locations_Temp.

As the result show, we have generated dummy location ids for our base logic based on below combination sets:-
(Country),
(Country, State),
(Country, State, Province),
(Country, State, Province, County),
(Country, State, Province, County, City),
(Country, State, Province, County, City, Town).

This serves as primary data for our subsequent tables.

Till now, we have generated dummy location ids for our relationship element, but the field LocationID of table dbo.TBL_LOCATION is primary key. So location id for country and State cannot be same as in above result. Hence, we need below step to modify the Location ids for child elements by adding up count of parent elements.

--  Increment ID values for location types based on their place value in relationship data
DECLARE	@CountryID_Max INT
		,@StateID_Max INT
		,@ProvinceID_Max INT
		,@CountyID_Max INT
		,@CityID_Max INT

SET	@CountryID_Max = (SELECT MAX(CountryID) FROM #Locations_Temp)
SET @StateID_Max = (SELECT MAX(StateID) FROM #Locations_Temp)
SET @ProvinceID_Max = (SELECT MAX(ProvinceID) FROM #Locations_Temp)
SET @CountyID_Max = (SELECT MAX(CountyID) FROM #Locations_Temp)
SET @CityID_Max = (SELECT MAX(CityID) FROM #Locations_Temp)

IF OBJECT_ID('tempdb..#Locations_Temp_Derived') IS NOT NULL
	DROP TABLE #Locations_Temp_Derived;

SELECT	TT.Country, TT.State, TT.Province, TT.County, TT.City, TT.Town
		,TT.CountryID
		,StateID = TT.StateID + @CountryID_Max
		,ProvinceID = TT.ProvinceID + @CountryID_Max + @StateID_Max
		,CountyID = TT.CountyID + @CountryID_Max + @StateID_Max + @ProvinceID_Max
		,CityID = TT.CityID + @CountryID_Max + @StateID_Max + @ProvinceID_Max + @CountyID_Max
		,TownID = TT.TownID + @CountryID_Max + @StateID_Max + @ProvinceID_Max + @CountyID_Max + @CityID_Max
INTO	#Locations_Temp_Derived
FROM	#Locations_Temp AS TT

Let’s see output of temp table [#Locations_Temp_Derived]

That’s it. Our salad is ready, we need to decorate it on plate. Let’s do that as well.

-- Now populate table [TBL_Location] with ID values of step 3 keeping INDENTITY_INSERT ON for the table.

INSERT INTO [TBL_Location] ( LocationID, Name, Location_Type )
SELECT DISTINCT CountryID, Country, 'Country' FROM #Locations_Temp_Derived

INSERT INTO [TBL_Location] ( LocationID, Name, Location_Type )
SELECT DISTINCT StateID, State, 'State' FROM #Locations_Temp_Derived

INSERT INTO [TBL_Location] ( LocationID, Name, Location_Type )
SELECT DISTINCT ProvinceID, Province, 'Province' FROM #Locations_Temp_Derived

INSERT INTO [TBL_Location] ( LocationID, Name, Location_Type )
SELECT DISTINCT CountyID, County, 'County' FROM #Locations_Temp_Derived

INSERT INTO [TBL_Location] ( LocationID, Name, Location_Type )
SELECT DISTINCT CityID, City, 'City' FROM #Locations_Temp_Derived

INSERT INTO [TBL_Location] ( LocationID, Name, Location_Type )
SELECT DISTINCT TownID, Town, 'Town' FROM #Locations_Temp_Derived

So our data for first and also the main relationship table [TBL_Location] is ready. Below is the output.

SELECT * FROM [TBL_Location]
LocationIDLocation_TypeName
1CountryC1
2CountryC2
3StateS1
4StateS2
5StateS1
6StateS2
7ProvinceP1
8ProvinceP2
9ProvinceP2
10ProvinceP3
11ProvinceP1
12ProvinceP2
13ProvinceP2
14ProvinceP3
15CountyCNTY01
16CountyCNTY02
17CountyCNTY01
18CountyCNTY02
19CountyCNTY01
20CountyCNTY02
21CountyCNTY01
22CountyCNTY02
23CityCTY01
24CityCTY01
25CityCTY01
26CityCTY02
27CityCTY01
28CityCTY01
29CityCTY01
30CityCTY02
31TownT1
32TownT2
33TownT2
34TownT1
35TownT1
36TownT2
37TownT2
38TownT1

Now, lets populate out derived reference table TBL_Location_XREF].

-- As last populate the relationship cross reference table [TBL_Location_XREF]
INSERT INTO [TBL_Location_XREF] (LocationID, Parent_LocationID )
SELECT	DISTINCT LocationID = RFC.StateID, Parent_LocationID =  RFC.CountryID
FROM	#Locations_Temp_Derived AS RFC
--
UNION ALL
--
SELECT	DISTINCT LocationID = RFC.ProvinceID, Parent_LocationID =  RFC.StateID
FROM	#Locations_Temp_Derived AS RFC
--
UNION ALL
--
SELECT	DISTINCT LocationID = RFC.CountyID, Parent_LocationID =  RFC.ProvinceID
FROM	#Locations_Temp_Derived AS RFC
--
UNION ALL
--
SELECT	DISTINCT LocationID = RFC.CityID, Parent_LocationID =  RFC.CountyID
FROM	#Locations_Temp_Derived AS RFC
--
UNION ALL
--
SELECT	DISTINCT LocationID = RFC.TownID, Parent_LocationID =  RFC.CityID
FROM	#Locations_Temp_Derived AS RFC

And we are done. Hurrah! Below is the output for [TBL_Location_XREF] table.

SELECT * FROM [TBL_Location_XREF]
LocationIDParent_LocationID
31
41
52
62
73
83
94
104
115
125
136
146
157
168
179
1810
1911
2012
2113
2214
2315
2416
2517
2618
2719
2820
2921
3022
3123
3224
3325
3426
3527
3628
3729
3830

CONCLUSION: It’s very easy to work with multi-relationship or hierarchical data using Ranking Functions. This way we can eliminate hours of execution time along with complex and lengthy code. Thanks for reading 🙂

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
5 Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.