Introduction - Data Analysis Flashcards
(26 cards)
What is Business Intelligence (BI)?
Business Intelligence is the use of data and software tools to analyze historical data for actionable insights, mainly through descriptive analytics like dashboards and reports.
What are relational databases and why are they important?
Relational databases organize data into tables and allow querying large datasets. They became popular in the 1990s and enable integrated enterprise systems for strategic decisions.
How is Data Analytics different from Business Intelligence?
Data Analytics includes a broader range of techniques, from basic descriptive stats to predictive models, and focuses on answering questions and solving problems through data.
What is Data Science?
Data Science integrates data manipulation, visualization, statistics, and machine learning to build models that predict outcomes and automate decisions.
What is Data Mining?
Data Mining is a subset of data science that discovers patterns, associations, and anomalies in large datasets using statistical and logical methods.
What are the four analytics classifications?
Descriptive (what happened?), Diagnostic (why did it happen?), Predictive (what is likely to happen?), Prescriptive (what should we do about it?)
What is CRISP-DM?
A process for data mining that includes: Business understanding, Data understanding, Data preparation, Modeling, Evaluation & Communication, and Deployment.
What is an example of predictive analytics?
Predicting future sales based on historical customer data.
What is the goal of prescriptive analytics?
To recommend actions based on predictive insights, often identifying the best option under constraints.
Why has analytics become important in modern business?
Due to hyper-competition and rapid tech change, companies use analytics to adapt, compete, and make data-driven decisions.
What is Descriptive Analytics?
Summarizes historical data to identify patterns and trends. Answers: What happened?
Example: A company reviews monthly sales data to identify its top-performing products.
What is Diagnostic Analytics?
Explores data to find causes behind certain outcomes. Answers: Why did it happen?
Example: A business finds that sales dropped due to a competitor launching a major promotion.
What is Predictive Analytics?
Uses statistical models and historical data to forecast future trends. Answers: What is likely to happen?
Example: A retailer forecasts next month’s demand based on holiday trends and past sales.
What is Prescriptive Analytics?
Recommends actions based on predictions and constraints. Answers: What should we do about it?
Example: Walmart’s system recommends restocking high-demand products at specific stores to avoid stockouts.
What is the first step in the CRISP-DM process?
Business Understanding: Define the problem clearly and understand the business goals.
Ask: What decision or issue are we trying to improve with data?
What is the second step in the CRISP-DM process?
Data Understanding: Gather initial data and explore it.
Identify missing values, inconsistencies, and get familiar with its structure.
What is the third step in the CRISP-DM process?
Data Preparation: Clean the data (remove duplicates, handle missing values).
Transform it into a format suitable for analysis (e.g., filtering, merging).
What is the fourth step in the CRISP-DM process?
Modeling: Select appropriate models (e.g., regression, decision trees, clustering).
Apply tools like Excel, Tableau, or R depending on the problem and data type.
What is the fifth step in the CRISP-DM process?
Evaluation & Communication: Check how well your model performs.
Interpret and validate results in the context of the business question. Communicate findings using visualizations, reports, or dashboards.
What is the sixth step in the CRISP-DM process?
Deployment: Implement the model’s insights into real business processes.
Share results with stakeholders and use insights for decision-making.
A hypothetical scenario of all four analytics classifications: Weather Example
- Descriptive
Over the past 30 days, there has been an increase in rainfall amount over 25%
2. Diagnostic We must investigate why there has been a sudden increase in rainfall, and whether it can be related to the geographic area or seasonality storm patterns 3. Predictive Our model shows that there is a 65% chance of heavy rain over the next 8 days after taking into consideration the current atmospheric data 4. Prescriptive
It is essential that is information be reported to cities as they might need to close certain roads in case of floodings and also ensure that any important events be rescheduled given a high increase in rainfall
A hypothetical scenario of all four analytics classifications: Declining sales in coffee shop
- Descriptive
Analyzed sales data and noticed a 10% drop over the past 3 months
2. Diagnostic
We have determined the decline in coffee sales and it is due to several customers complaints due to the very slow service during busy hours
3. Predictive
We need to forecast whether this issue is likely to continue into the issue. If not, the sales will continue to fall another 4% until the next quarter
- Prescriptive
Recommend hiring more baristas to prepare the coffees during rush hour or using a delivery service app to forecast the coffee orders in advance for quick service. This will lead to more customer satisfaction in the long run
A hypothetical scenario of all four analytics classifications: Inventory restocking in advance
- Descriptive
Based on Walmart’s last month of sales, there has been a high demand for snacks, and bottled water during the week
2. Diagnostic
This increase in sales during the week is because school has started and there are more students in the area
3. Predictive
Our system predicts that there will be a 45% increase in snack sales over the coming weeks for increased student convenience
4. Prescriptive
It is recommended that Walmart restock bottled water and snacks early to ensure property inventory management, increase revenue, and avoid a shortage of snacks
What potential ethical issues do you anticipate using data analytics for hiring practices?
○ Significant bias in data
Given that data analytics uses algorithms, it is very likely that it favours certain individuals that come from a particular demographic. This leads to bias based on historical hiring rather than making impartial decisions
○ It is also difficult to understand why certain clients were rejected as the process is not transparent given it is fully automated
○ By relying heavily on automation, it can lead to unfair decisions as it does not always capture soft skills such as teamwork, responsibility or adaptability that a interviewer would easily see observe during the interview. This is because this qualitative information is not captured in the database that is being analyzed