Relationship data in Easy Way
|Recently I worked on one optimisation job. It was procedure that processed multi-relationship data provided by clients in excel format, and created relationship tables.
For example data like :-
1) Domain, SubDomain, Technology, Service, Activity, SubActivity
or
2) Country, State, province, county, city, town
The code handled the task using T-SQL cursor, and was taking more than an hour. So let’s see the sample data first, and then design our simple and elegent solution using Ranking Functions, which ultimately in my case took only 10 minutes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
-- Create table to import relationship data from client CREATE TABLE Locations_Raw ( Country VARCHAR(50), State VARCHAR(50), Province VARCHAR(50), County VARCHAR(50), City VARCHAR(50), Town VARCHAR(50) ) GO -- Populate with relationship data INSERT INTO Locations_Raw SELECT 'C1', 'S1', 'P1', 'CNTY01', 'CTY01', 'T1' UNION ALL SELECT 'C1', 'S2', 'P2', 'CNTY01', 'CTY01', 'T2' UNION ALL SELECT 'C1', 'S2', 'P3', 'CNTY02', 'CTY02', 'T1' UNION ALL SELECT 'C1', 'S1', 'P2', 'CNTY02', 'CTY01', 'T2' UNION ALL SELECT 'C2', 'S1', 'P1', 'CNTY01', 'CTY01', 'T1' UNION ALL SELECT 'C2', 'S2', 'P2', 'CNTY01', 'CTY01', 'T2' UNION ALL SELECT 'C2', 'S2', 'P3', 'CNTY02', 'CTY02', 'T1' UNION ALL SELECT 'C2', 'S1', 'P2', 'CNTY02', 'CTY01', 'T2' GO -- Show me sample data SELECT * FROM Locations_Raw GO |
Now, as per our requirement, we need above data to populate below tables:-
1 2 3 4 5 6 7 8 9 10 11 |
CREATE TABLE [dbo].[TBL_Location] ( LocationID INT NOT NULL PRIMARY KEY, Location_Type VARCHAR(20), Name VARCHAR(50) ) GO CREATE TABLE [dbo].[TBL_Location_XREF] ( LocationID INT FOREIGN KEY REFERENCES [dbo].[TBL_Location] (LocationID), Parent_LocationID INT FOREIGN KEY REFERENCES [dbo].[TBL_Location] (LocationID) ) GO |
After populating above relationship tables, result set should look like as below:-
1 2 3 4 5 6 7 8 9 10 11 12 13 |
-- Find LocationID for country 'C1' SELECT * FROM TBL_LOCATION WHERE NAME = 'C1' AND LOCATION_TYPE = 'COUNTRY' --Find all states for above country SELECT L.LocationID, L.NAME, L.Location_Type FROM TBL_Location_XREF X inner join TBL_LOCATION L ON L.LocationID = X.LocationID WHERE X.Parent_LocationID = 1 --User chooses 'S1' state with LocationID = 3 --Find all Province for above State SELECT L.LocationID, L.NAME, L.Location_Type FROM TBL_Location_XREF X inner join TBL_LOCATION L ON L.LocationID = X.LocationID WHERE X.Parent_LocationID = 3 --User chooses 'P2' Province with LocationID = 8 --Find all Counties for above Province SELECT L.LocationID, L.NAME, L.Location_Type FROM TBL_Location_XREF X inner join TBL_LOCATION L ON L.LocationID = X.LocationID WHERE X.Parent_LocationID = 8 |
So how do we achieve that?
With T-SQL cursor, it took 500+ lines of code with 4-5 levels of control-flow nesting (If Else) to implement the logic. Even after this much effort, it was executing in more than an hour’s time. So what option do we have?
Ranking Functions which were first introduced with SQL Server 2008 comes to our rescue. Let’s see the solution implementation using it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
-- Take out distinct raw data provided in separate temp table -- ========================================================= IF OBJECT_ID('tempdb..#Locations_Raw') IS NOT NULL DROP TABLE #Locations_Raw; SELECT DISTINCT Country, State, Province, County, City, Town INTO #Locations_Raw FROM Locations_Raw -- ==================================================================================================================== -- Generate IDs for Country, State, Province, County, City, Town using ranking function DENSE_RANK(), and store result in another temp table IF OBJECT_ID('tempdb..#Locations_Temp') IS NOT NULL DROP TABLE #Locations_Temp; SELECT * -- Country, State, Province, County, City, Town ,TownID = DENSE_RANK()OVER(ORDER BY T6.CountryID, T6.StateID, T6.ProvinceID, T6.CountyID, T6.CityID, T6.Town) INTO #Locations_Temp FROM ( SELECT T5.* ,CityID = DENSE_RANK()OVER(ORDER BY T5.CountryID, T5.StateID, T5.ProvinceID, T5.CountyID, T5.City) FROM ( SELECT T4.*, CountyID = DENSE_RANK()OVER(ORDER BY T4.CountryID, T4.StateID, T4.ProvinceID, T4.County) FROM ( SELECT * ,ProvinceID = DENSE_RANK()OVER(ORDER BY T3.CountryID, T3.StateID, T3.Province) FROM ( SELECT T2.* ,StateID = DENSE_RANK()OVER(ORDER BY T2.CountryID, T2.State) FROM ( SELECT T1.* ,CountryID = DENSE_RANK()OVER(ORDER BY T1.Country) FROM #Locations_Raw AS T1 --Country ) AS T2 -- State ) AS T3 -- Province ) AS T4 -- County ) AS T5 -- City ) AS T6 -- Town |
Lines 1-9: Creates table with distinct relationship records provided by client. Yes, clients can make mistake by providing duplicate entries.
Lines 11-38: Creates dummy location id for each relationship element maintaining its relationship
Let’s see output of temp table #Locations_Temp.
As the result show, we have generated dummy location ids for our base logic based on below combination sets:-
(Country),
(Country, State),
(Country, State, Province),
(Country, State, Province, County),
(Country, State, Province, County, City),
(Country, State, Province, County, City, Town).
This serves as primary data for our subsequent tables.
Till now, we have generated dummy location ids for our relationship element, but the field LocationID of table dbo.TBL_LOCATION is primary key. So location id for country and State cannot be same as in above result. Hence, we need below step to modify the Location ids for child elements by adding up count of parent elements.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
-- Increment ID values for location types based on their place value in relationship data DECLARE @CountryID_Max INT ,@StateID_Max INT ,@ProvinceID_Max INT ,@CountyID_Max INT ,@CityID_Max INT SET @CountryID_Max = (SELECT MAX(CountryID) FROM #Locations_Temp) SET @StateID_Max = (SELECT MAX(StateID) FROM #Locations_Temp) SET @ProvinceID_Max = (SELECT MAX(ProvinceID) FROM #Locations_Temp) SET @CountyID_Max = (SELECT MAX(CountyID) FROM #Locations_Temp) SET @CityID_Max = (SELECT MAX(CityID) FROM #Locations_Temp) IF OBJECT_ID('tempdb..#Locations_Temp_Derived') IS NOT NULL DROP TABLE #Locations_Temp_Derived; SELECT TT.Country, TT.State, TT.Province, TT.County, TT.City, TT.Town ,TT.CountryID ,StateID = TT.StateID + @CountryID_Max ,ProvinceID = TT.ProvinceID + @CountryID_Max + @StateID_Max ,CountyID = TT.CountyID + @CountryID_Max + @StateID_Max + @ProvinceID_Max ,CityID = TT.CityID + @CountryID_Max + @StateID_Max + @ProvinceID_Max + @CountyID_Max ,TownID = TT.TownID + @CountryID_Max + @StateID_Max + @ProvinceID_Max + @CountyID_Max + @CityID_Max INTO #Locations_Temp_Derived FROM #Locations_Temp AS TT |
Let’s see output of temp table [#Locations_Temp_Derived]
That’s it. Our salad is ready, we need to decorate it on plate. Let’s do that as well.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
-- Now populate table [TBL_Location] with ID values of step 3 keeping INDENTITY_INSERT ON for the table. INSERT INTO [TBL_Location] ( LocationID, Name, Location_Type ) SELECT DISTINCT CountryID, Country, 'Country' FROM #Locations_Temp_Derived INSERT INTO [TBL_Location] ( LocationID, Name, Location_Type ) SELECT DISTINCT StateID, State, 'State' FROM #Locations_Temp_Derived INSERT INTO [TBL_Location] ( LocationID, Name, Location_Type ) SELECT DISTINCT ProvinceID, Province, 'Province' FROM #Locations_Temp_Derived INSERT INTO [TBL_Location] ( LocationID, Name, Location_Type ) SELECT DISTINCT CountyID, County, 'County' FROM #Locations_Temp_Derived INSERT INTO [TBL_Location] ( LocationID, Name, Location_Type ) SELECT DISTINCT CityID, City, 'City' FROM #Locations_Temp_Derived INSERT INTO [TBL_Location] ( LocationID, Name, Location_Type ) SELECT DISTINCT TownID, Town, 'Town' FROM #Locations_Temp_Derived |
So our data for first and also the main relationship table [TBL_Location] is ready. Below is the output.
SELECT * FROM [TBL_Location]LocationID | Location_Type | Name |
---|---|---|
1 | Country | C1 |
2 | Country | C2 |
3 | State | S1 |
4 | State | S2 |
5 | State | S1 |
6 | State | S2 |
7 | Province | P1 |
8 | Province | P2 |
9 | Province | P2 |
10 | Province | P3 |
11 | Province | P1 |
12 | Province | P2 |
13 | Province | P2 |
14 | Province | P3 |
15 | County | CNTY01 |
16 | County | CNTY02 |
17 | County | CNTY01 |
18 | County | CNTY02 |
19 | County | CNTY01 |
20 | County | CNTY02 |
21 | County | CNTY01 |
22 | County | CNTY02 |
23 | City | CTY01 |
24 | City | CTY01 |
25 | City | CTY01 |
26 | City | CTY02 |
27 | City | CTY01 |
28 | City | CTY01 |
29 | City | CTY01 |
30 | City | CTY02 |
31 | Town | T1 |
32 | Town | T2 |
33 | Town | T2 |
34 | Town | T1 |
35 | Town | T1 |
36 | Town | T2 |
37 | Town | T2 |
38 | Town | T1 |
Now, lets populate out derived reference table TBL_Location_XREF].
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
-- As last populate the relationship cross reference table [TBL_Location_XREF] INSERT INTO [TBL_Location_XREF] (LocationID, Parent_LocationID ) SELECT DISTINCT LocationID = RFC.StateID, Parent_LocationID = RFC.CountryID FROM #Locations_Temp_Derived AS RFC -- UNION ALL -- SELECT DISTINCT LocationID = RFC.ProvinceID, Parent_LocationID = RFC.StateID FROM #Locations_Temp_Derived AS RFC -- UNION ALL -- SELECT DISTINCT LocationID = RFC.CountyID, Parent_LocationID = RFC.ProvinceID FROM #Locations_Temp_Derived AS RFC -- UNION ALL -- SELECT DISTINCT LocationID = RFC.CityID, Parent_LocationID = RFC.CountyID FROM #Locations_Temp_Derived AS RFC -- UNION ALL -- SELECT DISTINCT LocationID = RFC.TownID, Parent_LocationID = RFC.CityID FROM #Locations_Temp_Derived AS RFC |
And we are done. Hurrah! Below is the output for [TBL_Location_XREF] table.
SELECT * FROM [TBL_Location_XREF]LocationID | Parent_LocationID |
---|---|
3 | 1 |
4 | 1 |
5 | 2 |
6 | 2 |
7 | 3 |
8 | 3 |
9 | 4 |
10 | 4 |
11 | 5 |
12 | 5 |
13 | 6 |
14 | 6 |
15 | 7 |
16 | 8 |
17 | 9 |
18 | 10 |
19 | 11 |
20 | 12 |
21 | 13 |
22 | 14 |
23 | 15 |
24 | 16 |
25 | 17 |
26 | 18 |
27 | 19 |
28 | 20 |
29 | 21 |
30 | 22 |
31 | 23 |
32 | 24 |
33 | 25 |
34 | 26 |
35 | 27 |
36 | 28 |
37 | 29 |
38 | 30 |
CONCLUSION: It’s very easy to work with multi-relationship or hierarchical data using Ranking Functions. This way we can eliminate hours of execution time along with complex and lengthy code. Thanks for reading 🙂
In your example, same province “P1” is part of two countries “C1″ and C2”, but at the end it is generating two province ids for the same province name “P1” i.e. 7 and 11.
In this particular example, it may be fine as two country cannot have same province. But even if the name are same, two ids are generated.
We may have to give more thought to this so that if province name is same e.g. “P1” then only one province id is generated for country C1 and C2.
Hi Deepak,
Thanks for comment.
You are correct. But as in my case, Target table [TBL_Location_XREF] structure has only LocationID and Parent_LocationID as columns.
If our requirement says to generate same LocationID for province “P1” irrespective of country, then we need to modify the table [TBL_Location_XREF] structure to have columns something like (CountryID, StateID, ProvinceID, CountyID, CityID, TownID).
Hi Ajay, your logic is good.
I mentioned this point, because it may be used for Domain/SubDomain/Technology/Service/Activity/SubActivity, where for same Technology e.g. GSM, there will be two Technology IDs which won’t be correct logically.
For this particular example of Country,province,etc., as two country cannot have one province, it would be correct as even if province name is same, they are different.
I;m not an expert like you but I have one question that isn’t writing multiple sub query can’t we write in one single query with multiple rank statement
IF OBJECT_ID(‘tempdb..#Locations_Temp’) IS NOT NULL
DROP TABLE #Locations_Temp;
SELECT * — Country, State, Province, County, City, Town
,TownID = DENSE_RANK()OVER(ORDER BY T1.Country, T1.State, T1.Province, T1.County, T1.City, T6.Town)
,CityID = DENSE_RANK()OVER(ORDER BY T1.Country, T1.State, T1.Province, T1.County, T1.City)
, CountyID = DENSE_RANK()OVER(ORDER BY T1.Country, T1.State, T1.Province, T1.County)
,ProvinceID = DENSE_RANK()OVER(ORDER BY T1.Country, T1.State, T1.Province)
,StateID = DENSE_RANK()OVER(ORDER BY T1.Country, T1.State)
,CountryID = DENSE_RANK()OVER(ORDER BY T1.Country)
INTO #Locations_Temp
FROM #Locations_Raw AS T1
instead of
IF OBJECT_ID(‘tempdb..#Locations_Temp’) IS NOT NULL
DROP TABLE #Locations_Temp;
SELECT * — Country, State, Province, County, City, Town
,TownID = DENSE_RANK()OVER(ORDER BY T6.CountryID, T6.StateID, T6.ProvinceID, T6.CountyID, T6.CityID, T6.Town)
INTO #Locations_Temp
FROM (
SELECT T5.*
,CityID = DENSE_RANK()OVER(ORDER BY T5.CountryID, T5.StateID, T5.ProvinceID, T5.CountyID, T5.City)
FROM (
SELECT T4.*, CountyID = DENSE_RANK()OVER(ORDER BY T4.CountryID, T4.StateID, T4.ProvinceID, T4.County)
FROM (
SELECT *
,ProvinceID = DENSE_RANK()OVER(ORDER BY T3.CountryID, T3.StateID, T3.Province)
FROM (
SELECT T2.*
,StateID = DENSE_RANK()OVER(ORDER BY T2.CountryID, T2.State)
FROM (
SELECT T1.*
,CountryID = DENSE_RANK()OVER(ORDER BY T1.Country)
FROM #Locations_Raw AS T1 –Country
) AS T2 — State
) AS T3 — Province
) AS T4 — County
) AS T5 — City
) AS T6 — Town
Will the first query work and if it work will it take less time to execute?
Hi Nitesh,
Thanks for your comment. Yes, you are totally right. Your’s query is cleaner and faster than mine. Even though my query is using same worktable internally as yours, it will have extra CPU overhead due to nesting which will add approx 30-55 ms extra execution time. Other than this, there should not be any difference in Logical & Physical reads.
I made my choice just for sequential reason of relationship data. Thanks for pointing this out.
Regards,
Ajay Dwivedi