OpenIntro 2 Flashcards

(28 cards)

1
Q

intensity map

A

colors are used to show higher and lower values of a variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

contingency table

A

A data matrix that displays the frequency of some combination of possible responses to multiple variables; cross tabulation results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

mosaic plot

A

uses the area of rectangles to display the relative frequency of occurrence of all combinations of two categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

column proportions

A

computed as the count divided by the corresponding column total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

row proportions

A

computed as the counts divided by their row totals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Dotplot

A

En-variabels scatterplot, bruges til at vise tendenser i enkelte variable fra datasæt. Prikkernes gennemsigtighed repræsenterer proportion (hvor mange lånrenter ud af alle er omkring 10%, eksempelvis) under dette er en trekant, der viser gennemsnittet, og under dette, værdier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

stacked dotplot

A

Som et prikplot, men i stedet for gennemsigtighed som mål for hyppighed stables prikkerne så man får hele tal. Alt andet er ens. Virker naturligvis kun for relativt små datasæt, da man let kan løbe tør for plads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mean vs median

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Median

A

Mellemste værdi i et SORTERET datasæt. Hvis n / 2 er et heltal (lige antal observationer), er den mellemste værdi gennemsnittet af de to mellemste værdier. Hvis ikke, er antallet ulige, og så vælger man bare den mellemste værdi. Repræsenterer centrum af et datasæt, og er mere robust end gns. (prøv at skrive et monsterstort eller småt tal i dine observationer, og se hvordan gennemsnittet ikke længere er repræsentativt). 50% af data over / større og 50% under / mindre end denne markør.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Q1

A

Første kvartil (nederste / venstre side af boxplots “krop”) - Repræsenterer at 25% af værdier er under denne.

Remember: 50th interquartile range is the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Q3

A

Tredje kvartil (øverste / højre side af boxplots “krop”) - 75% af værdier under denne, eller 25% større end denne markør.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mean vs median

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

modality and skewness

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Distribution

A

A function showing all possible values/intervals of the data & how often they occur.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data Density

A

The frequency of data at a certain value. Measured in a histogram.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Mode/Unimodal/Bimodal/Multimodal

17
Q

Deviation

A

The distance of an observation from its mean.

18
Q

Standard Deviation

A

Measures how far the data are spread out from their average value.

19
Q

Sample Variance

A

Measures how far the data are spread out from their average value.

How wide is the distribution?

20
Q

Mode

A

In statistics, the mode is the one that occurs most.

Unlike the median and the mean, the mode is not necessarily unique. There might be several different values that occur the same number of times.

21
Q

Bar plot

A

For showing categorial data

(remember that a barplot can show “counts” or “proportions” in y axis

22
Q

Comparring nummerical data

23
Q

Null hypothesis vs alternative hypothesis

A

“there is nothing going on”

24
Q

Hypothesis test

A

Example using 100 simulations of the test, to make sure that there is or not is discrimination. One or two tests would most likely be down to chance alone, therefore you have to make a lot of simulations of the test.

  1. “There is nothing going on.”
    Promotion and gender are independent, no gender
    discrimination, observed difference in proportions is simply due
    to chance. ! Null hypothesis
  2. “There is something going on.”
    Promotion and gender are dependent, there is gender
    discrimination, observed difference in proportions is not due to
    chance. ! Alternative hypothesis
25
cluster sampling
cluster sampling er den "fattige" stratified sample - vi vælger ikke selv grupperne, vi anter at fx lande eller klasser er gode nok til at skabe grupperne
26
stratified sample
Stratification is the process of dividing members of the population into homogeneous subgroups before sampling. (svært at gøre og derfor er cluster sample nemmere)
27
Ordinal vs not ordinal?
Ordinal data is a kind of categorical data with a scale to it. For example, ordinal data is said to have been collected when a responder inputs his/her financial happiness level on a scale of 1-10. In ordinal data, there is no standard scale on which the difference in each score is measured
28
Skew vs mean, median
Right-skewed: mean \> median I Left-skewed: mean \< median