Tables and graphics Flashcards

1
Q

Nate Silver

A

The signal and the noise - 90% of data collected in the last 2 years

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Charles Babbage

A

1791-1871
Errors using inadequate data are much
less than those using no data at all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

John Wilder Tukey

A
1915-2000
Far better an approximate answer to
the right question, which is often
vague, than an exact answer to the
wrong question, which can always be
made precise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Quantitative data reporting through figures

A

Aims to communicate information without distorting underlying results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Three main applications in science

A
  1. Experimental design
  2. Exploratory Data Analysis (EDA)
  3. Presentation/publication of results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Tables

A

Complex data and actual values.

Don’t tabulate what can easily be said

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Figures

A

Trends and patterns and highlighting differences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Constructing tables

A

Always tabulate vertically so you read down, not across.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Figure axes

A

X/horizontal axis = what we manipulate.

Y/Vertical axis = the response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Figure variability

A

Include measure of variation and sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Exploratory Data Analysis (EDA)

A

What variation occurs within and between my variables.? Plot data before stats.
Helps to identify underlying structure and decide most parsimonious model.
Identifies outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Anscombe Quartet

A

4 pairs of variables, 11 observations in each, same mean and fitted regression, same R^2 but very different situations.
Everything is not as it seems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Linear model assumptions

A
  1. Linearity between response and predictors
  2. Residuals are normally distributed
  3. Residuals have equal variances
  4. No overly influential points
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Classes of data

A

Categorical, ordinal (ranked, ordered), Measurement (ratio (with 0)/interval)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Categorical graph type

A

Bar graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Quantitative graph type

A

One variable - Box-plot
Two variables - maps
Many - Icon

17
Q

Mixed graph type

A

Two variables - Bar graph

Many - Bar graph

18
Q

Histograms

A

For frequency distributions of continuous variables.
Bars drawn together.
X = classes, y= frequency

19
Q

Frequency polygon

A

Frequency distribution - similar to histogram but with lines

20
Q

Dot plots

A

Density plot - not affected by subjective choice of number/width of bars. Full dataset.

21
Q

Box-plots

A

50% of data in the box, 90% of data in the whiskers

22
Q

Notched box-plots

A

Notch with median and 95% confidence interval

23
Q

Violin plots

A

Similar to box plots but reveal features - more accurate and reveal true distribution. Width proportional to frequency. Widest point is mode..

24
Q

Pie charts

A

Rarely used in science - consider stacked bar chart instead

25
Q

Bar graphs

A

NOT the same as histograms. Bars are separate and represent summary data (mean and variation).
Don’t reveal distribution.

26
Q

Scatter plots

A

Show degree of association between two continuous variables

27
Q

SPLOM

A

Scattered plot matrix
Useful for EDA.
Shows shape and distribution of all variables and bivariate relationships between them.

28
Q

Category plots

A

Scatter plots stratified by a third (categorical or ordinal) variable

29
Q

Line graphs

A

Time series or temporal data