# Lecture 3 – Visualising statistics Flashcards

1
Q

What is the Difference Between

Data analysts
Data scientists
Data engineers

A

Data analysts are primarily people who develop insights with data ….

• Data scientists are primarily people who develop data models and products, that in turn produce insights …
• Data engineers are primarily people who manage data infrastructure, automate data processing and deploy models at scale …
2
Q

Explain the different analytic levels

A

Descriptive Analytics: gain insight from historical data
* plot sales results by region and product category
* correlate with advertising revenue per region

Predictive analytics: make prediction using statistical and
machine learning techniques
* predict next quarter’s sales results using economic projections and advertising targets

Prescriptive analytics: recommend decisions using optimisation, simulation, etc.
* recommend which regions to advertise in given a fixed budget

3
Q

Which of the following is a prescriptive analytics task (as opposed to a predictive analytics task)?

A. Suggesting a traffic route based on prior data for the time of data and incident reports.

B. Predicting travel time of multiple traffic routes

C. Estimating the student enrolment number of FIT5145 in 2023 Sem 1

D. Measuring the likelihood of a student getting HD in the final exam of FIT5145

A

A. Suggesting a traffic route based on prior data for the time of data and incident reports.

4
Q

What are influence diagrams?

A

method for modeling data and decision making

Influence Diagrams (a.k.a Decision Graphs) are:
* directed graphical model with 4 types of nodes:
- chance nodes, known variable nodes, action/decision nodes and objective/utility nodes

• model the “influences”, “causes”, random (“chance”) outcomes, “actions”, “goals”
involved in a decision problem
• provide a coarse abstraction, a conceptual model
5
Q

Explain the node types of an influence diagram

A
6
Q

An Influence Diagram:

A. is a model giving possible situations or outcomes.

B. consists of nodes and arcs.

C. is an alternative to decision tree.

D. consists of nodes and arcs and is an alternative to decision tree.

A

D. consists of nodes and arcs and is an alternative to decision tree.

7
Q

Name the four growth laws

A

Explanations about change in IT and society:

• Moore’s Law
• Koomey’s Law
• Bell’s Law
• Zimmerman’s Law
8
Q

What does Moore’s Law say?

A

==> capability and size of IT

Number of transistors per chip doubles every 2 years (starting from 1975)

Transistor count translates to:
* more memory
* bigger CPUs
* faster memory, CPUs (smaller==faster)

Pace currently slowing

9
Q

What does Koomey’s Law say?

A

==> capability and size of IT

• Corollary of Moores Law
• Amount of battery needed will fall by a factor of 100 every decade
10
Q

What does Bell’s Law say?

A

==> purpose of IT

• Corollary of Moore’s Law and Koomey’s Law
• “Roughly every decade a new, lower priced computer class forms based on a new programming platform, network, and interface resulting in new usage and the establishment of a new industry.”
e.g., PCs -> mobile computing -> cloud -> internet-of-things
11
Q

What does Zimmermann’s Law say?

A

==> relationship between privacy and IT

• Zimmerman is creator of Pretty Good Privacy (PGP), an early encription system
• “Surveillance is constantly increasing”
• Privacy constantly decreasing
12
Q

A

As information technology develops and with more data collected, businesses utilise it and incorporate it in their business models (–> innovation)

A business model describes the rationale of how an organization creates, delivers, and captures value, in economic, social, cultural or other contexts.

13
Q

What kinds of businesses do we have operating in the Data Science world?

A

Information brokering service: buys and sells data/ information for others

Information-based differentiation: satisfies customers by providing a differentiated service built on the data/information.

Information-based delivery network: deliver data/ information for others.

Information provider: business selling the data/ information it collects.

The Bloomberg Terminal:
* a computer system provided by Bloomberg L.P.
* enables professionals to monitor and analyse real-time financial market data
* is a proprietary secure network

Amazon.com
* An assembly line for the retail industry, with support for embedded online retailers.
* Huge stock of books, DVDs, CDs, etc. easily searchable.
* extensive cusomter reviews

–> Information-based differentiation: satisfies customers by providing a differentiated service (superior information (reviews), range)
–> Information-based deliverynetwork:
- they deliver information for others;
- retailers in the Amazon marketplace get customers directed to them and other retailer’s support

LexisNexis
- provides world’s largest electronic database for legal and public-records related information.

14
Q

What is statistics?

A

“The practice or science of collecting and analysing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative samples”.

Two main statistical analytical methods:
* descriptive statistics – explaining data
* inferential statistics – finding regularities in irregular data

15
Q

Mode, median, mean, variance, standard deviation

A

mode: which value is most common,
median: what is the value in the middle of the data
mean: the average value.
variance: average of how much values tend to differ from the mean.
Standard deviation: is the square root of the variance.

Example
Data: 2, 4,4,4,5,5,7,9
Mode: 4
Median: 4.5
Mean: 5
var = ((2-5)^2 + (4-5)^2 +(4-5)^2 + … + (9-5)^2)/8 = 4

sd = 4^0.5 = 2

16
Q

Name the different variable types

A

Categorical, qualitative
* Groups or categories
* Nominal – no natural ordering e.g. blood type, eye color
* Ordinal – ordered e.g. education level

Quantitative
* Numerical
* Discrete – specific values –> counts like number of customer complaints

• Continuous – infinite number of values between any two value e.g.
• Temporal: time and dates
• Space: locations
17
Q

What are outliers?

A

Outliers are values outside of the expected parameters for the data
- Errors
- Exceptional circumstances
- Chance

• Outliers need to be identified and decided on before the analysis is completed
• They will influence the calculation of the mean
• So wrangle them!
18
Q

What is a boxplot?

A

Combine quartiles, median and outliers

Quartile: divide the data into quarters
Interquartile range (IQR): The difference between the lower and upper quartiles

19
Q

What are the pros and cons of a motion charts?

A