Blog I - Introduction to BI and Big Data

September 03, 2017

Blog I - Introduction to BI and Big Data

Hello everyone and welcome to my blog! I will be using this as a tool for my Business Intelligence class to discuss the modules we cover.

The first module was an introduction to business intelligence and big data. Here we covered some quick topics such as internal vs external data, the paradigm shift caused by big data, traditional vs modern BI, and the development cycle for BI. Along with these topics we also had some supplemental readings on IT enabled business trends, recent statistics for BI talent demand, and how big data and personal data are converging.

What was the most shocking to me from this module is the scale of data available in the the world. According to the IDC article "The Digital Universe Decade - Are you ready?" by 2020 the size of the worlds data will exceed 35 zettabytes! But how big is a zettabyte? Here is an infographic from CISCO published in 2011 which helps bring this unit into scale. In the graphic is an example about an 11 oz coffee cup. If that cup would represent a gigabyte, a zettabyte would have the same volume as the great wall of china. Now that's incredible!

Of that space, 80% will be unstructured as well. So what does this mean for BI? Well this is where we see the modernization of it. Traditional business intelligence is mostly based on transactional databases, such as Oracle, MySql, etc., and were extracted, transformed, and loaded (ETL) into data warehouses. These data warehouses then were used to report and track established metrics. This style of BI is what I experienced at my internship as a BI Analyst during undergrad. While the technology is very interesting it is really limited to data generated by internal applications which is pretty much all historical. Modern business intelligence takes these traditional elements but also adds data from social media, physical assets with networked sensors, external APIs, and many other sources. In order to stream these new data sources the big data infrastructure has evolved into "ecosystems" of NoSQL systems, HDFS systems, as well as other systems to enabled predictive analytics.

I found the following graphic from the blog Martin's Insights. It showcases what a modern BI ecosystem might look like. The company I work for currently uses something similar. One interesting side effect of this evolution of infrastructure is how wide spread the BI jobs and titles are now. In my internship experience our team handled the analysis, administration, and modeling all together. Now more companies are utilizing multiple teams of data engineers, data architects, data analysts, systems analysts, and data scientists to work with and maintain these ecosystems. However this could also be a result of the hyper specialization trend that's been happening in the tech industry. Either way I think it's great and allows for young professionals such as myself to focus on the areas that are most interesting to us and develop the skill sets required to succeed.

Comments

AnonymousSeptember 6, 2017 at 2:31 PM
Hi Thomas,

Thanks for posting this blog. I agree that it is extremely hard to fathom that much data, and even less so, being able to glean any information from the mess. I do have one question concerning the hyper-specialization in the context of the ecosystem. When you mentioned that your team handles the entire life-cycle of data design, extraction, interpretation, etc., do you see a lot of the hyper-specialization? I only ask because I generally see the opposite in academia. It seems that the university, or companies want someone with quite a bit of experience that wears 10 different hats (often for minimal pay) much like the first module suggests for data scientists being versed in tech and business. I know that was the case with your internship and just wondered if it was the same with your current company.

Thank you.
ReplyDelete
Replies

Add comment

Search This Blog

Online Business Intelligence - Fall 2017

Blog I - Introduction to BI and Big Data

Comments

Post a Comment

Popular Posts

Blog IV - It's a RDBMS... It's a NoSQL... It's a Graph Database!

Blog III - ETL with Google Analytics data