Still Wet Behind the Ears – A Master Data Journey

It’s the year 2000, and I’ve just completed my N4 qualification in Industrial Engineering (whatever that means). Soon after this my Dad organized a job for me in a little computer shop on Kritzinger Road in Alberton, pretty much giving me the following solid advice before shipping me off to work: “Colin, keep your eyelids up and see what you can see” (Borrowed from Dr. Seuss’ “And to think that I saw it on Mulberry Street” which my 21 month old daughter loves reading with me before bedtime).

The working world wasn’t foreign to me. I had been a waiter on a temporary basis earlier in life, had helped my Dad in his many business ventures before that and also worked in a dodgy Car Audio shop called “A to Z Car Radio” in Alberton North during college holidays. This time round things were different. No more school, no more college, and the reality of having to move out of home within the next year setting in more and more every day. I was 17, and compared to the other people working in the shop I was very much “Wet Behind the Ears”. I had no idea what to do, how to act and how to address people when I assisted them.

Having been brought up Afrikaans (because as you know my name is Jan Poggenpoel…) I addressed everyone as “Oom” and “Tannie”, had a very heavy Afrikaans accent and was shyer and more bewildered than a wombat caught in a truck’s headlights. All of these “features” of mine, and many more, annoyed my boss very much and made me the object of many antics during my 3.5 year tenure at Matrix Warehouse. I had no idea where my “career” would lead me, but that was OK. The fact that I had a job and would have my own place soon was enough to keep me happy for a while.

As a side note, I had a quick look on Google Maps and the shop still exists, but is now an Indian Food joint. I wonder how I would have acted had I been brought up as a Moodley or Naidoo… I digress.

 

So why MDM?

I “stumbled” across Master Data Management roughly 2 years ago and fell in love with its concept, its methodologies and its clear messaging around the impact correctly managed data can have on a business. This intrinsic understanding of the value of data didn’t just appear over night, but was crafted in me by working with data on a daily basis, day in and day out, for 8 years. As a BI Professional my duties ranged from “ETL-ing” data, cleaning data, merging data, analysing data, modelling data, “beautifying” data (reports and dashboards), and ultimately selling tools to customers that helped them make decisions on their data. All this work taught me a very important lesson: Data are dirty, most people don’t know about the state their data are in, and those who know about it either don’t have the power to change it, don’t want it changed for job security purposes or they believe there are “bigger fish to fry” than poor quality data.

The sad reality is that as a BI Professional I know the amount of work involved in cleaning data in order to present it to business is often massive. I also know that this is a daily struggle faced by my brothers and sisters in data and that a movement needs to be started to take CONTROL of the data!! Wait, I am getting ahead of myself. Let me rephrase that last sentence. I also know that there are masses of developers who, on a daily basis, have to bear the brunt of the blame from business who lament the fact that their reports are never correct, but being blissfully unaware of the fact that the source of their issues starts with the person they see in the mirror every morning.

Looking for a solution to the problem I knew originated in the hands of business, I started scratching around and was handed a book by a little know professional called David Loshin. I was blown away! He wrote with such clarity and certainty that I was convinced I could conquer the world, freeing my brothers and sisters!! Mmm. There I go again. Carrying on swiftly… I was soon handed a book which spoke about “Process Driven Master Data”, which by the way seemed to be a buzz word going around at that time. Right! PROCESS!! THAT is the answer! Address the process and you address the data issues! No. Not quite. What about “People, Technology and Data” Mr. Smarty Pants? What about “Data Governance”, DAMA and DMBOK? Have you joined DAMA in South Africa? Do you know they are affective / ineffective? Is your customer mature enough to handle Master Data?? Haven’t you heard about what Nicola Askham and Robert Seiner have to say about Data Governance? What about Danette McGilvray’s book / view? Data Modelling? Have you read Steve Hoberman’s books? What about Zetta’s view on Master Data and Data Governance? What about this? What about that?

And there I was, 17 years old again and still “Wet Behind the Ears”, staring at the “Oom” in front of me who’s Pentium 3 PC had a corrupt operating system as a result of a virus he obtained from visiting a dirty website (which his wife doesn’t know about yet). What do I do? Do I fix his OS and send him home (do I clean your data Mr. Customer and then leave you to yourself again)? Do I sell him an anti-virus tool (do I sell you an MDM tool to fix and manage your data Mr. Customer)? Do I advise him to not visit such sites for fear of righteous retribution (do I try to fix your processes and advise you on your lack of Data Governance Mr. Customer)? Mmm…

I believe that solving my customers’ data issues is not a simple task, and is a problem which requires a combination of all of the solutions above along with a lot of help from the Data Community out there, and much wisdom and patience along the way.

Thanks for reading,
Colin

Ragged Hierarchies, HideMemberIf and MDX Compatibility

Struggling to get ‘Ole Excel to handle Ragged Hierarchies? Look no further than Chris Webb’s blog post.

Chris Webb's BI Blog

Here’s something that I suspect a few people out there will be surprised I didn’t know – but as far as I’m concerned, if I didn’t know it then it’s worth blogging about.

Anyway, it regards ragged hierarchies, the HideMemberIf property and the MDX Compatibility connection string property. Now you probably know that if you want to turn a user hierarchy into a ragged hierarchy (perhaps to avoid using a parent child hierarchy) you need to use the HideMemberIf property on the user hierarchy. For example, imagine you were using the following SQL query as the basis of your customer dimension:

SELECT        ‘UK’ AS Country, ‘Bucks’ AS State, ‘Amersham’ AS City, ‘Chris Webb’ AS Customer
UNION ALL
SELECT        ‘Italy’ AS Country, ‘BI’ AS State, ‘Biella’ AS City, ‘Alberto Ferrari’ AS customer
UNION ALL
SELECT        ‘Vatican’ AS Country, ‘Vatican’ AS State, ‘Vatican’ AS City, ‘The Pope’ AS customer

We could build…

View original post 473 more words

Reset Identity Seed

Sometimes the need arises to reseed the identity row within an MS-SQL table. One might want to do this because records have been deleted at the end of the table, or a delete * instead of a truncate has been performed on the table (which doesn’t reseed the identity row back to 0).

 

Within SQL Server, there is a built in function called DBCC CHECKIDENT, and it allows you to reseed the identity row to start at a specified number. Lets say that a table (lets call it TableA) has rows in it ranging from row 1 through 100, and for some reason rows 90 through 100 need to be deleted. Clearance is received from the auditors that these records can be deleted, and voila, a few minutes later 10 rows have been removed from the table. As time goes by an application inserts new rows into the table, however when the table is inspected you notice that the new rows have been inserted with identity rows starting at 101 upwards, and that the identity didn’t continue from 90 as expected. Because of the way MS-SQL handles Row IDs, it remembers the last Identity Row number per table, and just continues from that number forward when new data is inserted into that particular table, regardless if some or all of the data has been deleted from it.

 

To correct this, you need to determine the row id that you would like new rows to start at (in this case it’s 90), subtract one from it (making it 89), and then you need to run the following code:

 

DBCC CHECKIDENT (TableA, reseed, 89).

 

This will tell SQL that the last identity record for the specified table is 89, and so SQL will add one (1) to this value when inserting new data into the table. Of course the identity rows don’t need to start off exactly from the previous row. If you want the identity rows to start at 150, then the same code as above can be run, however the 89 would need to be replaced by 149.

 

There are a few exceptions to the use of the code, so for more information on these and other uses please visit the MSDN website. Thanks to Pinal Dave for the reference to this piece of code.

 

God bless.

Colin

Retrieve SQL Run Time Information

Within our company, one of the requirements we have is to log statistics about what happens within stored procedures / SSIS packages while they are executing for audit purposes. Examples of the required statistics are:

  • Process Name *1
  • Process GUID ID
  • Process Start Time
  • Process End Time
  • Process Duration
  • Source and Destination table names
  • Rows written successfully *2
  • Rows rejected *3
  • Row ID for record inserted into Audit table *4
Within SSIS retrieving the required audit values are easy enough as there are toolbox items such as the Row Count component one can use as well as other system variables that come stock with SSIS. Within a stored procedure a little more work is required. I particularly wanted to retrieve the items above marked with an asterisk from the executing stored procedure.

Process Name (*1)

To retrieve the name of the running stored procedure, one needs to use the built in function “OBJECT_NAME”. The exact syntax to use is:

 

SELECT OBJECT_NAME(@@PROCID).

The output of this query can be written as an output from the procedure, or written to a variable and used later on. Thanks to “Hunchback” for providing the answer on the MSDN forum.

Rows Written and Rejected (*2 & *3)

In order to retrieve the rows affected in particular table (whether by inserts, updates or deletes) one needs to use the T-SQL function @@ROWCOUNT directly after the statement executed that affected a table. So for instance, if I have three statements, one an insert, one an update, and one a delete (all on the same table), and I only want to log the rows that were inserted, then I would use the @@ROWCOUNT T-SQL function as follows:
DECLARE @RowCount bigint;
-- INSERT
INSERT INTO TableA (Column01, Column02, Column03)
VALUES ('10', 'Antibacterial Soap', '7.99');
SET @@RowCount = (SELECT @@ROWCOUNT);

-- UPDATE
UPDATE TableA
   SET TableA.Column02 = 'Sunlight Liquid'
WHERE TableA.Column02 = 'Sonlight Liquid'

DELETE FROM Table A
WHERE Column02 IS NULL
  AND Column03 < 0;
You will notice that I didn’t use the @@ROWCOUNT function after the Update or Delete statements, because every time a table is affected by a T-SQL operation the value stored by SQL in the @@ROWCOUNT variable is altered.

Row ID

We have created an Audit table that stores the run time information for our SSIS packages and SQL Stored Procedures. The way we use this table is that each process that runs inserts a record into the Audit table as its first step, runs the rest of the code, and then updates the Audit record with the correct run time information after the process has completed successfully. The same process can be run a few times (such as when it is run within a loop or a few times in sequence).
Because the same process can be run, it is important to update the correct values in the audit table. The way to do this is to retrieve the Row ID from the Audit Table (if it has one) after a record has been written to it by a particular process, and then to store this ID in a variable for later use. The safest way to retrieve this information is to use the T-SQL function Scope_Identity().
An example of it’s use can be seen below:
DECLARE @RowID BIGINT
CREATE TABLE Audit
(RowID    BIGINT IDENTITY(1,1)
,Column01 INT
,Column02 NVARCHAR(100)
,Column03 MONEY
)
INSERT INTO TableA (Column01, Column02, Column03)
VALUES ('10', 'Antibacterial Soap', '7.99');
SET @ETLPerformanceImportID = CAST(Scope_Identity() AS bigint)
NOTE: I cast the results from the function to a BIGINT because the RowID in the audit table was set as a BIGINT.

Conclusion

I hope this article assists you in getting a bit more out of your procedures than you used to.
God bless.
Colin

MS-SQL – Write Stored Procedure Results to Table

I browsed the web looking for a way to write the results from one of my stored procedures to a table. There were quite a few ways to do it, however this one worked best for me:

IF OBJECT_ID(N'tempdb..#Temp') IS NOT NULL
BEGIN
    DROP TABLE #Temp
END
CREATE TABLE #Temp
          (Date_Flag          [varchar](12)
          ,Account_Code       [varchar](20)
          ,Calendar_Month     [datetime]
          ,Last_Receipt_Month [date]
          ,Last_Receipt_Date  [date]
          ,Last_Receipt_AMT   [money])
INSERT INTO #Temp
EXEC        [Database].[dbo].[Procedure];
SELECT * FROM #Temp
GO

I hope this helps.

God bless,
Colin