MIS 4380/MBA 5380—Business Intelligence—Fall 2018
Write your answers in the separate answer document available at the exam link and submit the document there when completed. Your SafeAssign score should be visible shortly after submission.
(20 pts) Compare & contrast the definitions and meanings of decision support systems (DSS), business intelligence (BI), and business analytics (BA). Explain why it is so difficult to have one definition of each. Finally, draw (use any software or scan something hand-drawn) a conceptual model of how the ideas fit together in your mind. 19431002764790Decesions Support Systems (DSS)
0Decesions Support Systems (DSS)
19431002650490A Decision Support System (DSS) is a computerized application or system that gathers and analyzes data to facilitate businesses or organizations in quality decision-making. Business Intelligence (BI) is a term, which is used to describe programs or applications that help to organize, analyze and manage the data within an organization. Business Analytics (BA) refers to the practice of all methods and techniques used to be able to analyze data. Those methods or techniques could range from statistical and operational analysis, all the way to the formation of models. It is very difficult to have one concrete definition of each because they are all linked in one way or another. BI is considered a decision support system since it’s purpose is to support better business decision-making, yet, the same can be said for BA because it methods and techniques also make up a large portion of DSS. So, since both BI and BA largely describe the practice of using data to make better decisions, it comes down to the differences between the two. I believe the both go hand in hand since analysis without intelligence can’t be done. But one major distinction between the two is that BI looks at historical data, telling you what has already happened. Whereas BA looks to see what is going to happen, so you can anticipate what’s to come. My conceptual model shows how this fits together in my mind. BI and BA are both decision support systems, which both work side by side. One, which runs the business based on past results, and another, which looks to change the business as it, moves forward.
1931035267335Business Analytics (BA)
0Business Analytics (BA)
1702435153035-126365381635-2069465267335Business Intelligence (BI)
0Business Intelligence (BI)
(20 pts) Refer to Figure 1.11 on pg. 23 of your text. Explain the differences/similarities between descriptive, predictive, and prescriptive analytics. Next, answer the question: “Is it a good idea to follow a hierarchy of descriptive and predictive analytics before applying prescriptive analytics?” Why or why not?
The best way to explain the differences between descriptive, predictive, and prescriptive analytics is to split them up in to what they do. One tells us what has happened, one tells us what could happen and the last one tells us what should we do. Descriptive analytics tell us what has happened, giving us an insight into the past. It analyzes real-time and historical data, to help learn from the past and help to understand how it might influence the future. Descriptive analytics is important for businesses and organization because it helps them find the reasons behind any previous success they may have had, or even any failures. Predictive analytics helps in understanding the future and answers the question, what could happen? Just like descriptive analytics, predictive analytics too looks at past data, but rather than giving an insight into the past, it helps to understand the future by predicting what might happen. Even though the predictions provided will not be 100% accurate, this is very important for businesses because it helps them to be able to set realistic goals. Lastly, prescriptive analytics highlights problems and gives advise on possible outcomes. This is important for businesses because it helps them understand what possible actions might go on to maximize key business metrics for them. Prescriptive analytics is seen as the next step because it goes even further than descriptive and predictive analytics, since it gives advice on outcomes and results rather than giving an insight in to the past or help to understand the future. Because of this I believe that it is a good idea to follow a hierarchy of descriptive and predictive analytics before applying prescriptive. As a business, you want to know what you can do in the future to maximize your businesses potential, but before that can be done, you need to understand what has and hasn’t worked for you in the past, and descriptive analytics does that since the data it uses is 100% accurate. Without descriptive analytics you wont be able to know the reasons behind previous success. So it is very important to do these analytics in steps.
Answer the following questions:
(10 pts) Describe the data warehousing process/framework. What are the different parts/pieces & how do they fit together?
The data warehousing process/framework is an integrated system in which information from other systems is taken and stored. All of this data (current or historical), which is taken from other informational systems is put in to one, therefore becoming the basis for data analysis and reporting. The aim in doing this is so that within an organization, individuals can conveniently access the information they need in order to make decisions. Of course, for this information to be readily available, a number of things have to come together before the data warehouse can even be established. The first part of a data warehouse is deciding what type of information should be included in it. This is very important because without knowing the information requirements, the data warehouse will not be effective for the organization. Once the information requirements have been established, only then the process should be converted into a physical data warehouse. The second part is to actually populate the data once a physical design has been put in to place. The last part is to make the information available so that it can be used for analyzing and reporting.
(15 pts) What are the different architectures that organizations have to choose from when implementing a data warehouse? Give a basic description of each. Which one is considered the best & why?
There are a number of different architectures that organizations can choose from when implementing a data warehouse. Among the most common are n-tier architectures. From this, two-tier and three-tier are mainly used by organizations, but there can also be just one tier at times. N-tired architectures can be divided up in to three parts. The first being the data warehouse itself, which has all the software. The second part is data acquisition software. This is where data is taken from different systems and sources, sorted, and then loaded back in to the data warehouse. The third and last part is the client software. This lets users get the data from the warehouse and allows them to be able to analyze it. There are also a number of different alternative data warehousing architectures. Independent data marts, data mart bus, hub-and-spoke, centralized data warehouse and federated data warehouse. Independent data marts are the simplest and cheapest. The data marts are made to work independently of each other. Data mart bus architectures are individual marts linked to each other via some type of middleware. Hub-and-spoke are probably the most famous data warehousing architecture. The main focus is to build a scalable and maintainable infrastructure. This is so that user interfaces and reports can be easily customized. Centralized enter price data warehouse architecture is very much like the hub-and-spoke, with the only difference being no dependent data marts, but in its place an enormous enterprise data warehouse. Lastly, there is the federated data warehouse. This approach does everting possible to incorporate analytical resources from other sources so that business conditions and needs can be successfully met. Out of all of these architectures either hub-and-spoke or DM bus are considered. It depends on the organization, but I would say that hub-and-spoke is the best one to implement because of how effective it is for an organization even though it can lead to data redundancy and latency.
(20 pts) What is Business Performance Management (BPM)? How does it relate to BI (in general), business reporting, and visual analytics?
Business Performance Management (BPM) involves a number of different approaches used by businesses to monitor and manage the company’s overall performance. It helps to determine how a business can reach its goal’s better. BPM relates to BI because it is a form of BI, and what sets it apart from other BI tools is its strategy focus. One of BPM’S key components is the tools it provides for organizations to outline strategic goals and then be able to measure and manage performance against them. BPM also supports an organization to do a number of things like monitor the businesses processes, increase efficiency and analyze risks. The information it provides can be shared throughout the organization, thus relating to business reporting. Business reporting is an essential piece of any organization. Successful BPM will help the organization in reporting information to its employees, customers, stake/shareholders, and most importantly, it will help with internal decision making. BPM relates to visual analytics because it provides organizations with modeling and analysis. Since the data is provided, it can be displayed in a way that can help the organization to identify trends and patterns that might not have been able to be identified before.
(20 pts) What is the difference between data or information visualization and visual analytics? Why should storytelling be part of reporting and data visualization?
Data or information visualization is any type of visual representation of quantitative information. By turning datasets into visuals, individuals can identify patterns and trends that might have gone unnoticed if they were in reports, spreadsheets or any kind of text-based data. Data visualization simply just helps to understand data in a visual way, so you can see what is happening or what has happened. Visual analytics on the other hand goes a step further. It lets you dig deeper in to the visual analysis by telling you why something is happening in what you are seeing, taking more of an approach to computation and analytical reasoning. Another difference is that information visualization is linked with business intelligence, while visual analytics in liked with business analytics. Storytelling should be a part of reporting and data visualization because simply visualizing data effectively is not enough. At the end of the day, you want the data to be remembered, and for it to be remembered, it needs to tell a story and have a voice behind it. Storytelling goes further than just showing data charts because it unlocks important things in the data that might not have been possible by just looking at it. This can really help a business if they are trying to convince people in decision-making positions to do certain things, it adds weight to what is being presented and supports the point being made. Storytelling and business intelligence are very important since it gives context to the dataset.
(30 pts) Read the Harvard Business Review article, “The CEO of Williams-Sonoma on Blending Instinct with Analysis” (found with the exam file on Blackboard). First, comment on the ways Williams-Sonoma addresses the non-technical side of what we have discussed with regard to BI/analytics in class so far. Secondly, comment on how the company has refined its thinking with regard to analytics along the lines of: structure, people, tools, and culture.
Think about the data mining process using the CRISP-DM model.
(10 pts) Why do the earliest phases take so long to accomplish?
I believe the earliest phases take so long to accomplish because they involve learning and understanding. The first two steps are “Business Understanding” and “Data Understand.” These steps require a lot of learning and understand of the business and data hence they cannot be automated. If the additional time in doing this is not spent in the early phases, then mistakes are very likely and can affect the entire data-mining project and even go on to result in a failed project.
(5 pts) If an organization uses SEMMA instead of CRISP-DM, what do they supposedly already know?
If an organization uses SEMMA instead of CRISP-DM, then they supposedly already know and understand the project goals and objectives and know what data sources need to be used for the project.
(25 pts) Explain the key differences between prediction (classification/regression), cluster analysis, and association rule mining models?
Classification and regression are both probably of the most used data mining methods in the world. Both share the same concept in that they feed of patterns from past data to make predictions or take decisions. What separates these two is that in classification, you have dependent variables that are categorical and unordered, whereas in regression, you have dependent variables that are continuous values or ordered whole values. So when doing regression, you are predicting a value based on the past, which is not like classification since you are putting things into different categories. Clustering is also another popular data mining method. Clustering is very different from classification. Clustering doesn’t use any prior knowledge of classes like classification/regression. Clusters are established when the selected algorithm goes though the data, and where there is some relationship between objects. Clustering is all about finding natural groupings in un-labeled data rather than trying to predict something from pre-labeled data. One major difference when doing cluster analysis in comparison with classification/regression is that there is not any validation from the results, since its goal is to just create groups based on having as much similarity in them as possible. Association rule mining is also a popular technique. Its purpose is to find interesting relationships, patterns and correlations in databases. The aim is to find rules that oversee how or why certain items may have been bought together. It is widely used in the retail industry and often called market-based analysis. The main difference between clustering and association rules is that clustering groups a set of objects together based on similarity, but association rule mining finds associations among items. Classifications and regression can be described as predictive methods and clustering and association rules as descriptive methods.
(25 pts) Describe at least 3 problems you might find in data that must be corrected before the data can be mined. Make up your own example to illustrate each one.
Incomplete data is a problem you might find in data. This is something that should be corrected before the data can be mined since we want the best model possible. Trying to figure out what needs to go in the missing fields can be a real challenge. There are several reasons for the missing data, but finding a solution can be quite difficult. Even one missing value can have an effect on what we are trying to find out. Of course, many data mining algorithms have solutions inbuilt so that if there are missing values in the data, they can be handled, for example simply just being ignored. Another solution is data imputation. This is the process of replacing missing data with most frequent observed values, or even build learning models that can actually predict possible values for the fields we have missing.
The size of the data is another problem. I remember I took a data analysis class a few semesters ago and for one of the assignments, we were give a dataset that included flight departure and arrival times and the reasons for delays from airports all around the world. This first time I opened the file it took around 5 minutes to download. After I had it downloaded, I remember it was nearly impossible to do anything to the data on my laptop. The scale and complexity of the data was just simply too much for the device I was using. For that assignment I just had to pick a city and analyze the results of that city, so once I filtered the data to what I needed and just saved that, I was then able to work on it on my laptop. But the original file was just too large for me to efficiently be able to work on it on my laptop.
The last problem is timeliness. The bigger the data set gets, the longer it will take to analyze. Sticking with the same example as my last class, I remember when we did another assignment with the original dataset. This time I made sure I used an actual computer and not a laptop. But even then, for some of the exercise, I needed to analyze some information immediately, and it was impossible to scroll through the document just to find the information I needed. If this was real time data, then it would have just got bigger and bigger, thankfully this wasn’t the case for me. But if it was, then creating index structures in advance is something that can help speed things up, since it’s impractical to scan the entire data with the naked eye. Even though this is only a partial fix, since, index structures can only support some classes of criteria, it is still something that can be done before data mining to eliminate the problem of having to spend extra time looking for something in the data.