Blog II - Agile Data Warehouse Design

Module 1 was jam packed with information and we covered topics regarding the data warehouse design cycle, the balanced scorecard, performance management, star schema design, data quality analysis, and dashboard design and analysis. This was the module I most looked forward to because of my past BI experience as well as my interest in data engineering as a potential career path. So with that I'm going to be focusing this blog entry on data modeling.

Our lecture material was focused on various sections from The Data Warehouse Toolkit by Ralph Kimball and Margy Ross. When designing a dimensional model it was broken down into the following four steps.

  1. Select the business process to model
  2. Declare the grain of the business process 
  3. Choose dimensions that apply to each fact row
  4. Identify the numeric facts
Now these four steps worked great for the assignments but I wanted to take this idea deeper. The last class I took in the MSMIS program, Information Systems Analysis and Design, focused heavily on agile software development life cycle (SDLC) and the different frameworks that can be applied for it. So how would agile dimensional modeling look? Coincidentally I got the chance to meet Shawna Baskin, the BI manager at Renew Financial, a few months ago. I asked her what resources she recommended for someone interested in learning data engineering. She pointed me to a book titled Agile Data Warehouse Design by Lawrence Corr with Jim Stagnitto. So I thought this would be an excellent chance to pick this book and cover some of the information for this blog post. 

What is agile development?

If you aren't familiar with agile development I highly recommend checkout out the Agile Manifesto and reading up on the principles that it's made up of.  The idea behind agile development is to break down the project into smaller chunks that can be completed in multiple iterations. Each iteration then goes through each phase of the SDLC. Benefits of this approach include end users can test and provide feedback earlier into the project, less time spent developing unimportant features, easily adapt to requirements changes, and favors experimentation and intuitive solutions. Some of the more common agile methodologies are scrum, feature driven development, lean software development, and the Kanban Method. 

BEAM*

BEAM* is an agile data modeling method for designing dimensional data warehouses and data marts. It stands for Business Event Analysis and Modeling while the asterisk represents a star schema, the primary deliverable. The cool thing about BEAM* is it's techniques allow us to discover details and then document them in a way that's easily understood by the stakeholders but also easily translated into logical/physical models by the developers. When doing analysis of a business event with the stakeholders we use what's called the 7Ws Framework. Focusing on the 7Ws (who, what, where, when, how many, why and how) forces everyone involved to think dimensionally. Here is an example of a BEAM* table that showcases this framework as well as the document notation. 

Image result for BEAM* table and notation data


Once all the details for the business process have been defined and event stories are created a similar process is done to model the dimensions for our data warehouse. Along side of the 7Ws framework BEAM* uses hierarchy charts and change stories to define the dimensions. This is important because the hierarchy definitions provide the relationship between attributes that is used for drilling down/across during analysis. Also, change stories define the business rules for dealing with slowly changing dimensions.  

Once these details are fully "modelstormed" with the stakeholders and the appropriate documents are created the event matrix is updated. One difference I noticed in this book's matrix compared to the one in the lecture is that there was a new section of columns added that identify the different business groups/stakeholders for each event. 

So how do these transform into a star schema? That was the goal of the four steps I laid out earlier were for anyways. Well again we can apply the 7Ws to help layout the schema. Below is a diagram from the book that shows how the 7Ws fit with a star schema. We can now take the fields from our data table and create diagram representing the star schema. 



Summary

Agile Data Warehouse Design is a 300 page book and a single blog post doesn't do it's content justice. For those of you reading this from the class you will find most of the technical information about data warehousing and the structures that make one up are very similar to our lectures. It does however dive into more detail and introduces different design patterns. Along side of good information about agile design and requirements gathering it does provide some templates and models to help keep your designs consistent with other agile developers. I hope you enjoyed the brief introduction to BEAM* and would love to hear your feedback! 


Comments

Popular Posts