Blog I - Introduction to BI and Big Data

Hello everyone and welcome to my blog! I will be using this as a tool for my Business Intelligence class to discuss the modules we cover.

The first module was an introduction to business intelligence and big data. Here we covered some quick topics such as internal vs external data, the paradigm shift caused by big data, traditional vs modern BI, and the development cycle for BI. Along with these topics we also had some supplemental readings on IT enabled business trends, recent statistics for BI talent demand, and how big data and personal data are converging. 

What was the most shocking to me from this module is the scale of data available in the the world. According to the IDC article "The Digital Universe Decade - Are you ready?" by 2020 the size of the worlds data will exceed 35 zettabytes! But how big is a zettabyte? Here is an infographic from CISCO published in 2011 which helps bring this unit into scale. In the graphic is an example about an 11 oz coffee cup. If that cup would represent a gigabyte, a zettabyte would have the same volume as the great wall of china. Now that's incredible! 

Of that space, 80% will be unstructured as well. So what does this mean for BI? Well this is where we see the modernization of it. Traditional business intelligence is mostly based on transactional databases, such as Oracle, MySql, etc., and were extracted, transformed, and loaded (ETL) into data warehouses. These data warehouses then were used to report and track established metrics. This style of BI is what I experienced at my internship as a BI Analyst during undergrad. While the technology is very interesting it is really limited to data generated by internal applications which is pretty much all historical. Modern business intelligence takes these traditional elements but also adds data from social media, physical assets with networked sensors, external APIs, and many other sources. In order to stream these new data sources the big data infrastructure has evolved into "ecosystems" of NoSQL systems, HDFS systems, as well as other systems to enabled predictive analytics. 

I found the following graphic from the blog Martin's Insights. It showcases what a modern BI ecosystem might look like. The company I work for currently uses something similar. One interesting side effect of this evolution of infrastructure is how wide spread the BI jobs and titles are now. In my internship experience our team handled the analysis, administration, and modeling all together. Now more companies are utilizing multiple teams of data engineers, data architects, data analysts, systems analysts, and data scientists to work with and maintain these ecosystems. However this could also be a result of the hyper specialization trend that's been happening in the tech industry. Either way I think it's great and allows for young professionals such as myself to focus on the areas that are most interesting to us and develop the skill sets required to succeed.


Big Data BI Ecosystem

Comments

  1. Hi Thomas,

    Thanks for posting this blog. I agree that it is extremely hard to fathom that much data, and even less so, being able to glean any information from the mess. I do have one question concerning the hyper-specialization in the context of the ecosystem. When you mentioned that your team handles the entire life-cycle of data design, extraction, interpretation, etc., do you see a lot of the hyper-specialization? I only ask because I generally see the opposite in academia. It seems that the university, or companies want someone with quite a bit of experience that wears 10 different hats (often for minimal pay) much like the first module suggests for data scientists being versed in tech and business. I know that was the case with your internship and just wondered if it was the same with your current company.

    Thank you.

    ReplyDelete
    Replies
    1. Thanks for the question Damien! Also, it's great to see a familiar and friendly face in this class. It's hard to say specifically for my current organization as I work in new member integrations and not the BI realm. But I do know that that hardware and cloud portion is handled by a team in mandan, the Hadoop and related programming groups are in Lake St. Loius, and then there is a separate group that handles the reporting but I'm not sure were they are out of. Along side of that, the support for our BI product also lies within it's own team. However, I'm sure there is a lot of cross talk between all groups and each have members that would have no problems being able to succeed if transferred to one of the other mentioned teams. I think the other thing that leads to the division of individual teams in the company culture/processes. A lot of times teams will have multiple projects they are in charge of. For example the guys in charge of the hardware are also doing the security patches, load monitoring, and the other fun stuff with sys admin tasks for not only the BI clusters but anything cloud related with our organization. So in that sense it makes sense to have a team dedicated to that rather than having a single team tackle everything that goes with modern BI. Hope that did a good enough job answering! If not just let me know and I'd be happy to clarify.

      Delete

Post a Comment

Popular Posts