Exam revision Flashcards

(71 cards)

1
Q

What is Data Analytics

A

The process of developing actionable decisions or recommendations for actions based on insights generated from historical data (involves tech, scienc,e and stats)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between descriptive, predictive, and prescriptive analytics, and what do they all comprise of?

A

Look at the slides for week 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define data

A

Data comprises of particular features and can be regarded as the description of a specific phenomenon.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the main 2 types of data

A

Quantitative and Qualitiative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 3 V’s of data

A

Volume, velocity and variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does BI fit with data?

A

BI turns data into “actionable” information.
Business analyse the information provided by these data and act on it.
Once the information is there, the BI people can use tools such as data discovery, data visualisation, OLAP analytics, dashboards, reports and more to discover the backbone of the information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 5 C’s of data

A

Clean, comprehensive, current, conformed and consistent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define Data integration

A

Combining data from different resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define data warehousing

A

A Data Warehousing (DW) is process for collecting and managing data from varied sources to provide meaningful business insights.

A Data warehouse is typically used to connect and analyze business data from heterogeneous sources.

The data warehouse is the core of the BI system which is built for data analysis and reporting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define a data-driven model and some benefits

A

. A model is a deliberate simplification of reality.
. A good model retains the most important features of reality and ignores less important details.
. The benefit of a model is that you simplify the complex real world.
. They can be used for predict future(forecasting), making decesions based on probability, quantifying uncertatinty, showing relatiuonships between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

State some differences between a database and a datawarehouse

A

. Database is a collection of related data that represents some elements of the real world, whereas Data warehouse is an information system that stores historical and commutative data from single or multiple sources.
. Database is designed to record data whereas the Data warehouse is designed to analyze data.
. Database is application-oriented-collection of data whereas Data Warehouse is the subject-oriented collection of data.
. Database uses Online Transactional Processing (OLTP) whereas Data warehouse uses Online Analytical Processing (OLAP).
. ER modeling techniques are used for designing Database whereas data modeling techniques are used for designing Data Warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

State some benefits of a datawarehouse

A

. Delivers Enhanced Business Intelligence
. Ensures Data Quality and Consistency
. Saves Time and Money
. Tracks Historically Intelligent Data
. Generates high ROI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

State some limitations of a datawarehouse

A

. Extra Report Work
. Inflexibility and homogenization of data
. Ownership Concerns
. Demands for large amounts of resources
. Hidden issues consume time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

State some characterisitics of a datawarehouse

A

. Subject oriented
. Integrated
. Time-variant (time series)
. Nonvolatile
. Summarized/Not normalized
. Metadata
. Web based, relational/multidimensional
. Client/server, real-time/right-time/active.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Defined subject-oriented and application-oriented

A

Subject-oriented: it offers information regarding a theme instead of companies’ ongoing operations. These subjects can be sales, marketing, distributions, etc.

Application-oriented: It offers information on ongoing operations wether it be a company or anybodyelse

A data warehouse never focuses on the ongoing operations. Instead, it put emphasis on modeling and analysis of data for decision making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does integration mean in terms of data warehousing .

A

The establishment of a common unit of measure for all similar data from the dissimilar database. The data also needs to be stored in the Data warehouse in common and universally acceptable manner.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Is ETL a part of integration yes or no

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does ETL stand for

A

Extract, Transform, Load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When thw word time-variant comes up what should come to mind?

A

It contains an element of time, explicitly or implicitly.
Another aspect of time variance is that once data is inserted in the warehouse, it can’t be updated or changed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does non-volatile mean in terms of data warehousing?

A

Means the previous data is not erased when new data is entered in it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What data operations are performed in Datawarehousing

A

Data loading
Data access

Activities like delete, update, and insert which are performed in an operational application environment are omitted in Data warehouse environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Name the 3 types of datawarehouses

A

. Enterprise Datawarehouse
. Operational Data Store
. Data Mart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Give me some info on a Enteprise Data Warehouse

A

. It’s a centralized warehouse.
. Provides decision support service across the enterprise.
. Offers a unified approach for organizing and representing data.
. It provides the ability to classify data according to the subject and give access according to those divisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Give me some info on Operationa Data Store

A

. A data store required when neither Data warehouse nor OLTP systems support organizations reporting needs.
. In ODS, Data warehouse is refreshed in real time.
. It is widely preferred for routine activities like storing records of the Employees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Give me some info on a Data Mart
. A data mart is a subset of the data warehouse(a datawarehouse split into multiple smaller datamarts see slides) . In an independent data mart, data can collect directly from sources. . The data in each data mart stays separate, in its own system . Data marts are fast and easy to use, as they make use of small amounts of data.
26
What are the 3 datawarehosue Architectures?
Single-tier architecture, Two-tier architecture, Three-tier architecture
27
Define a single-tier architecture
This architecture is not frequently used in practice. The objective of a single layer is to minimize the amount of data stored. This goal is to remove data redundancy.
28
Define a two-tier architecture
Two-layer architecture separates physically available sources and data warehouse. This architecture is not expandable and also not supporting a large number of end-users. It also has connectivity problems because of network limitations.
29
Define a three-tier architecture
This is the most widely used architecture.
30
List some factors that influence data warehouse architecture selection decisions
1. Information interdependence between organizational units 2. Upper management’s information needs 3. Urgency of need for a data warehouse 4. Nature of end-user tasks 5. Constraints on resources 6. Strategic view of the data warehouse prior to implementation 7. Compatibility with existing systems 8. Perceived ability of the in-house IT staff 9. Technical issues 10. Social/political factors
31
What is a dimensional data model?
A dimensional model is a data structure technique optimized for Data warehousing tools.
32
List some elements of a dimensional data model
Facts - measurment/metric from your business process Fact Table - A primary table in a dimesional model it contains facts and foreign keys which reference dimension tables Dimension: provides the context surrounding a business process event. In simple terms, they give who, what, where of a fact Attributes: the various characteristics of the dimension; i.e. County, Country
33
What do fact tables aggregate the measurements based on
The granularity of its dimensions. Granularity is a limit on the number of links with other objects an object can participate in. These numbers of links are called cardinality: they reflect the enterprise rules (constraints).
34
What is the difference between a surrogate and natural key
A surrogate key is a system generated value with no business meaning. A natural key is a column or set of columns that already exist in the table.
35
What is a BI strategy
A roadmap that enables businesses to measure their performance and seek out competitive advantages and truly "listen to their customers" using data mining and statistics. By 2020, more than 40% of all data analytics projects will relate to an aspect of customer experience.
36
Name some customer analytics
. People Make the . Difference . Deep Listening . Sharing Insights . Ensuring “It Just Works” . Massive . Customization . Changing Behaviors
37
What is the name of the famous persons DA strategy
Gartner's data analytics startegy
38
Describe some of the steps in data pre-processing
- Data consolidation: Filter, unify and integrate the data - Data cleaning: Handle missing values, reduce noise, minimise error - Data transformation: Normalisation, aggregation, creation of new attributes - Data reduction: Reduce attributes, number of records, balance skewed data
39
which is a top-down approach and which is a bottom-up approach
Inmon Model: EDW approach (top-down) Kimball Model: Data mart approach (bottom-up) Think I comes before K in the alphabet so for I its top-down
40
Describe Inmons model
1. Create a data model of all business data and use this to create an EDW (enterprise data warehouse) 2. Use the EDW to feed data marts, CRMs etc Uses traditional database methods and techniques to develop the EDW Fully normalised EDW is required In short EDW first and then data marts afterwards
41
Describe Kimball's approach
Identify information requirements and associated business processes of the enterprise and create the Data Warehouse Bus Matrix . Lists the key business processes with an indicator of analysis . Facilitates the selection and development of the first database (data mart) . The integration of further data marts leads to the EDW Uses dimensionality modeling to establish the data model In short data marts first then EDW
42
What does EDW stand for
Enterprise Data Warehouse
43
Compare inmon and kimball
Top-down (Inmon) vs Bottom-up (Kimball) Kimball: . Data marts . Logical data warehouse . Decentralised . Quick results, iterative approach Inmon: . Enterprise data model . Centralised . Later create data marts . More upfront work, but less repetitive work
44
Describe the steps of dimension modelling
1. Identify Business Process 2. Identify Grain (level of detail) 3. Identify Dimensions 4. Identify Facts 5. Build Star
45
Describe a star schema
Has one fact table and a number of associated dimension tables. The star schema is the simplest type of Data Warehouse schema. A snowflake can have multiple facts and tables connected to the dimensions
46
Name some OLAP operations
Slice - a subset of a multidimensional array Dice - a slice on more than two dimensions Drill Down/Up - navigating among levels of data ranging from the most summarized (up) to the most detailed (down) Roll Up - computing all of the data relationships for one or more dimensions Pivot - used to change the dimensional orientation of a report or an ad hoc query-page display
47
Name the 4 main techniques used for optimisation of a data warehouse design
. Normalising dimensional tables . Having multiple fact tables to represent different aggregation levels . De-normalising fact tables . Partitioning and replicating tables
48
How would you choose a datawarehouse
For cloud: Scalability, Low entry cost, Connectivity, Security and Reliability For on Premise: Speed, Security and Reliability
49
What does SQL stand for
Structured Query Language
50
What is a subquery
A query that is nested inside a SELECT, INSERT, UPDATE, or DELETE statement, or inside another subquery.
51
What is an aggregate function
An aggregate function performs a calculation on a set of values, and returns a single value. Except for COUNT(*), aggregate functions ignore null values. Aggregate functions are often used with the GROUP BY clause of the SELECT statement.
52
Name some aggregate functions
AVG, COUNT, MAX, MIN, SUM
53
When is the HAVING clause used and what is it similar to
Its usually used with group by and it works almost the same or similar to where. Essentially its WHERE but for when you use group by
54
What is data governance
Aims to enforce consistent definitions, rules, measurements, policies and procedures in data related areas
55
Give some pros and cons of data governance
Pros: . Helps protect against damaging and costly cyber attacks and data breaches . Reduces the cost of managing data and increases ROI of your data analytics . Reduces the load of data management from the IT team and spreads the burden throughout the organization Cons: . requires a company-wide mandate to implement correctly . taking time from your teams to manage data, so make sure it’s worth it in the end.
56
How is data governance monitored
. Make sure people use the data definitions . Communicate changes . Deal with changes
57
What will a good visualisation be able to do
A good visulaisation will be able to allow the person viewing it to get the highest lowest value instantly
58
Name some deliverables of data visulisation
- Layout - Type - Frequency - Quantity
59
Name some constraints of data visualisation
-Time - Knowledge - Anxiety - Technology
60
Name some classifications of visualisations:
Exploratory - Allows you to explore the vis e.g. a map Explanatory - Explains the info in the viz Exhibitory - Shows alot of info with keys and colours
61
What is in the good design of data visualisation
Trustworthy - is it reliable Accessible - Is it usable Elegant - Is it good to look at aesthetically
62
Name some visualisations
. Bar Chart - category comparison (with target line) . Line Chart - time series data . Flow Chart – process flow (also Swimlane diagram) . Bullet Graph – actual to target . Dot Plot or Strip Plot . Sparklines . Histogram Map . 100% Stacked Bar Chart (with caution) . Scatter Plot – relationship/correlation . Box Plot – grouping with summaries . Area Chart (with caution) . Control Charts (statistical process control)
63
What is a dashboard
A dashboard is a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance A dashboard is a visual display of data used to monitor conditions and/or facilitate understanding
64
Name some attributes of a dashboard
Information is presented using small, concise, direct, & clear display of media: . Clearly stated messages . Each point should be limited to the space needed Customized: . tailored to the needs of a specific group or individual Consistent layout: . Data changes over time . Interface is consistent
65
What is a business dashboard
They use visualisations to show indicators and implementation of key business operations at any time, helping managers make informed decisions. A business dashboard is a graphical display of various indicators, reflecting the operating status of the enterprise in real time and visualizing the collected data.
66
Name some features of a business dahboard
. Measurable values . Intuitive display system . Timely and authentic data . Represents the system architecture
67
Name the 3 types of business dashboards
Strategic: . Allows users to quickly understand the operations of the company, and to make decisions based on this understanding. . Operations here are mainly those that have happened in the past. Helps managers make decisions and locate and diagnose problems in bad operations Analytical : . Design of the dashboard needs to directly reflect the problems, and sort them in order, and correlate the actions to be taken . Can be either strategic or operational. The main difference lies in the level of information Operational : . Emphasizes continuous and real-time information reporting, so the timeliness of the data on it is relatively high. . used to monitor daily progress to ensure that the expected plan matches the actual performance achieved. . allows us to resolve problems in a timely manner before they turn into thorny big risks and helps to incrementally improve performance.
68
What is a time series
Refers to a series of data points over time where the observed variable (such as a share price) is the y-variable, and time is the x-variable.
69
What are some uses of tome series
Predicting the future performance of a variable, whether it is profit or the price of a stock, allows you to make informed decisions based on historical data. The more data are available on a variable, the better the prediction for that variable should be, as forecasting relies on observed trends and patterns. Any time series can be forecasted.
70
Name some components of a time series model
. Cycles - e.g. The same cycle of sales occurs every month. . Seasonality . Trends . Black swan - This is something unpredictiable or unexpected . Frequency e.g. daily,weekly, monthly or yearly
71
Revise sql logbook tasks 13-18