Relationship data in Easy Way

Previev-Image-Relationship-Data-In-Easy-Way
Spread the love

Recently I worked on one optimisation job. It was procedure that processed multi-relationship data provided by clients in excel format, and created relationship tables.

For example data like :-
1) Domain, SubDomain, Technology, Service, Activity, SubActivity
or
2) Country, State, province, county, city, town

The code handled the task using T-SQL cursor, and was taking more than an hour. So let’s see the sample data first, and then design our simple and elegent solution using Ranking Functions, which ultimately in my case took only 10 minutes.

Relationship-Data

Now, as per our requirement, we need above data to populate below tables:-

After populating above relationship tables, result set should look like as below:-

So how do we achieve that?

With T-SQL cursor, it took 500+ lines of code with 4-5 levels of control-flow nesting (If Else) to implement the logic. Even after this much effort, it was executing in more than an hour’s time. So what option do we have?

Ranking Functions which were first introduced with SQL Server 2008 comes to our rescue. Let’s see the solution implementation using it.

Lines 1-9: Creates table with distinct relationship records provided by client. Yes, clients can make mistake by providing duplicate entries.
Lines 11-38: Creates dummy location id for each relationship element maintaining its relationship

Let’s see output of temp table #Locations_Temp.

As the result show, we have generated dummy location ids for our base logic based on below combination sets:-
(Country),
(Country, State),
(Country, State, Province),
(Country, State, Province, County),
(Country, State, Province, County, City),
(Country, State, Province, County, City, Town).

This serves as primary data for our subsequent tables.

Till now, we have generated dummy location ids for our relationship element, but the field LocationID of table dbo.TBL_LOCATION is primary key. So location id for country and State cannot be same as in above result. Hence, we need below step to modify the Location ids for child elements by adding up count of parent elements.

Let’s see output of temp table [#Locations_Temp_Derived]

That’s it. Our salad is ready, we need to decorate it on plate. Let’s do that as well.

So our data for first and also the main relationship table [TBL_Location] is ready. Below is the output.

SELECT * FROM [TBL_Location]
LocationIDLocation_TypeName
1CountryC1
2CountryC2
3StateS1
4StateS2
5StateS1
6StateS2
7ProvinceP1
8ProvinceP2
9ProvinceP2
10ProvinceP3
11ProvinceP1
12ProvinceP2
13ProvinceP2
14ProvinceP3
15CountyCNTY01
16CountyCNTY02
17CountyCNTY01
18CountyCNTY02
19CountyCNTY01
20CountyCNTY02
21CountyCNTY01
22CountyCNTY02
23CityCTY01
24CityCTY01
25CityCTY01
26CityCTY02
27CityCTY01
28CityCTY01
29CityCTY01
30CityCTY02
31TownT1
32TownT2
33TownT2
34TownT1
35TownT1
36TownT2
37TownT2
38TownT1

Now, lets populate out derived reference table TBL_Location_XREF].

And we are done. Hurrah! Below is the output for [TBL_Location_XREF] table.

SELECT * FROM [TBL_Location_XREF]
LocationIDParent_LocationID
31
41
52
62
73
83
94
104
115
125
136
146
157
168
179
1810
1911
2012
2113
2214
2315
2416
2517
2618
2719
2820
2921
3022
3123
3224
3325
3426
3527
3628
3729
3830

CONCLUSION: It’s very easy to work with multi-relationship or hierarchical data using Ranking Functions. This way we can eliminate hours of execution time along with complex and lengthy code. Thanks for reading 🙂

5 Comments

  1. Deepak Khatri

    In your example, same province “P1” is part of two countries “C1″ and C2”, but at the end it is generating two province ids for the same province name “P1” i.e. 7 and 11.

    In this particular example, it may be fine as two country cannot have same province. But even if the name are same, two ids are generated.

    We may have to give more thought to this so that if province name is same e.g. “P1” then only one province id is generated for country C1 and C2.

    • Ajay Dwivedi

      Hi Deepak,
      Thanks for comment.
      You are correct. But as in my case, Target table [TBL_Location_XREF] structure has only LocationID and Parent_LocationID as columns.

      If our requirement says to generate same LocationID for province “P1” irrespective of country, then we need to modify the table [TBL_Location_XREF] structure to have columns something like (CountryID, StateID, ProvinceID, CountyID, CityID, TownID).

      • Deepak Khatri

        Hi Ajay, your logic is good.

        I mentioned this point, because it may be used for Domain/SubDomain/Technology/Service/Activity/SubActivity, where for same Technology e.g. GSM, there will be two Technology IDs which won’t be correct logically.
        For this particular example of Country,province,etc., as two country cannot have one province, it would be correct as even if province name is same, they are different.

  2. Nitesh Singla

    I;m not an expert like you but I have one question that isn’t writing multiple sub query can’t we write in one single query with multiple rank statement

    IF OBJECT_ID(‘tempdb..#Locations_Temp’) IS NOT NULL
    DROP TABLE #Locations_Temp;

    SELECT * — Country, State, Province, County, City, Town
    ,TownID = DENSE_RANK()OVER(ORDER BY T1.Country, T1.State, T1.Province, T1.County, T1.City, T6.Town)
    ,CityID = DENSE_RANK()OVER(ORDER BY T1.Country, T1.State, T1.Province, T1.County, T1.City)
    , CountyID = DENSE_RANK()OVER(ORDER BY T1.Country, T1.State, T1.Province, T1.County)
    ,ProvinceID = DENSE_RANK()OVER(ORDER BY T1.Country, T1.State, T1.Province)
    ,StateID = DENSE_RANK()OVER(ORDER BY T1.Country, T1.State)
    ,CountryID = DENSE_RANK()OVER(ORDER BY T1.Country)
    INTO #Locations_Temp
    FROM #Locations_Raw AS T1

    instead of
    IF OBJECT_ID(‘tempdb..#Locations_Temp’) IS NOT NULL
    DROP TABLE #Locations_Temp;

    SELECT * — Country, State, Province, County, City, Town
    ,TownID = DENSE_RANK()OVER(ORDER BY T6.CountryID, T6.StateID, T6.ProvinceID, T6.CountyID, T6.CityID, T6.Town)
    INTO #Locations_Temp
    FROM (
    SELECT T5.*
    ,CityID = DENSE_RANK()OVER(ORDER BY T5.CountryID, T5.StateID, T5.ProvinceID, T5.CountyID, T5.City)
    FROM (
    SELECT T4.*, CountyID = DENSE_RANK()OVER(ORDER BY T4.CountryID, T4.StateID, T4.ProvinceID, T4.County)
    FROM (
    SELECT *
    ,ProvinceID = DENSE_RANK()OVER(ORDER BY T3.CountryID, T3.StateID, T3.Province)
    FROM (
    SELECT T2.*
    ,StateID = DENSE_RANK()OVER(ORDER BY T2.CountryID, T2.State)
    FROM (
    SELECT T1.*
    ,CountryID = DENSE_RANK()OVER(ORDER BY T1.Country)
    FROM #Locations_Raw AS T1 –Country
    ) AS T2 — State
    ) AS T3 — Province
    ) AS T4 — County
    ) AS T5 — City
    ) AS T6 — Town

    Will the first query work and if it work will it take less time to execute?

    • Ajay Dwivedi

      Hi Nitesh,

      Thanks for your comment. Yes, you are totally right. Your’s query is cleaner and faster than mine. Even though my query is using same worktable internally as yours, it will have extra CPU overhead due to nesting which will add approx 30-55 ms extra execution time. Other than this, there should not be any difference in Logical & Physical reads.

      I made my choice just for sequential reason of relationship data. Thanks for pointing this out.

      Regards,
      Ajay Dwivedi

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.