Topic 2: Data & Graphical Summaries Flashcards

1
Q

When you first look at the data to get the general snapshot without actually answering the research question, what process is it?

A

Initial Data Analysis (IDA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

During the first snapshot of the data, what criteria are used to analyse it?

A
  1. Data background (quality + integrity)
  2. Data structure (characteristics of data including types, size, number)
  3. Data wrangling (making changes like scraping, cleaning, tidying, reshaping, splitting, combining)
  4. Data summaries (graphical & numerical)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is having the first look at the data critical?

A

IDA helps to capture the main qualities of the data as well as suggesting about the population.
Also, helps to analyse whether the data can answer the research question and pose follow-up ones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which two aspects need to be considered when analyzing the data?

A

Size: how many variables/subjects (p/n)
Type: quantitative or qualitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can size of data be described?

A

Multivariate (2+ variables)
+ Bivariate (2 variables)
+ Univariate (1 variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain different types of data

A
  1. Qualitative (category):
    + Ordinal (in order): binary/3+
    + Norminal: binary/3+
  2. Quantitative (measurement):
    + Discrete (separated)
    + Continous (continuum)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain data and variable

A

Data is information about sets of subjects being studied.
Variables are different measurement or categories describing attributes of the subjects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What type of graphical summary can be used for 1/2 qualitative variables?

A

1 qual variable: simple bar plot (table before plot)
2 qual variable: double bar plot (stacked or side-by-side)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What type of graphical summary can be used for 1/2 quantitative variables?

A

1 quan variable: histogram/boxplot
2 quan variable: scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What type of graphical summary can be used for 1 quan and 1 qual variables?

A

Comparative box plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Function of histogram

A

Present the overall distribution of the dataset
Hightlight the data percentage in 1 interval and compare to others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is block height in histogram calculated?
What is it called?

A

Density scale
Block height = % in the block/interval length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is end point convention? Give an example

A

To decide the inclusion of points that fall on the border
i.e: (20,45] 20 is not included, but 45 is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What information is conveyed from a box plot?

A

Box plot compares different data sets by presenting “anchor” points including median, middle 50%, outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In what case a comparative box plot can be used?

A

1 quan and 1 qual variable
The quan variable is split up by a qualitative variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe good, poor, and bad data viz

A

Good data viz: interesting story + visually appealing

Poor data viz: boring, distracting presentation (chartjunk)

Bad data viz: misleading story

17
Q

What is ggplot2 and why do we use it?

A

An intuitive way to produce products
Why: allow us to build different building blocks of the data, then combine to product a complex viz with layers

18
Q

What are the parameters of ggplot?

A

Aesthetic (aes): what we want to see in the viz (presented variable)

Geometric objects (geom_xxx): the type of graphical summary (actual marks)

Facet: subset of the data