OpenIntro 2 Flashcards by Sammy Christensen

intensity map

colors are used to show higher and lower values of a variable

How well did you know this?

Not at all

Perfectly

contingency table

A data matrix that displays the frequency of some combination of possible responses to multiple variables; cross tabulation results

How well did you know this?

Not at all

Perfectly

mosaic plot

uses the area of rectangles to display the relative frequency of occurrence of all combinations of two categorical variables

How well did you know this?

Not at all

Perfectly

column proportions

computed as the count divided by the corresponding column total

How well did you know this?

Not at all

Perfectly

row proportions

computed as the counts divided by their row totals

How well did you know this?

Not at all

Perfectly

Dotplot

En-variabels scatterplot, bruges til at vise tendenser i enkelte variable fra datasæt. Prikkernes gennemsigtighed repræsenterer proportion (hvor mange lånrenter ud af alle er omkring 10%, eksempelvis) under dette er en trekant, der viser gennemsnittet, og under dette, værdier.

How well did you know this?

Not at all

Perfectly

stacked dotplot

Som et prikplot, men i stedet for gennemsigtighed som mål for hyppighed stables prikkerne så man får hele tal. Alt andet er ens. Virker naturligvis kun for relativt små datasæt, da man let kan løbe tør for plads.

How well did you know this?

Not at all

Perfectly

Mean vs median

How well did you know this?

Not at all

Perfectly

Median

Mellemste værdi i et SORTERET datasæt. Hvis n / 2 er et heltal (lige antal observationer), er den mellemste værdi gennemsnittet af de to mellemste værdier. Hvis ikke, er antallet ulige, og så vælger man bare den mellemste værdi. Repræsenterer centrum af et datasæt, og er mere robust end gns. (prøv at skrive et monsterstort eller småt tal i dine observationer, og se hvordan gennemsnittet ikke længere er repræsentativt). 50% af data over / større og 50% under / mindre end denne markør.

How well did you know this?

Not at all

Perfectly

Første kvartil (nederste / venstre side af boxplots “krop”) - Repræsenterer at 25% af værdier er under denne.

Remember: 50th interquartile range is the median

How well did you know this?

Not at all

Perfectly

Tredje kvartil (øverste / højre side af boxplots “krop”) - 75% af værdier under denne, eller 25% større end denne markør.

How well did you know this?

Not at all

Perfectly

Mean vs median

How well did you know this?

Not at all

Perfectly

modality and skewness

How well did you know this?

Not at all

Perfectly

Distribution

A function showing all possible values/intervals of the data & how often they occur.

How well did you know this?

Not at all

Perfectly

Data Density

The frequency of data at a certain value. Measured in a histogram.

How well did you know this?

Not at all

Perfectly

Mode/Unimodal/Bimodal/Multimodal

Study These Flashcards

Deviation

Study These Flashcards

The distance of an observation from its mean.

Standard Deviation

Study These Flashcards

Measures how far the data are spread out from their average value.

Sample Variance

Study These Flashcards

Measures how far the data are spread out from their average value.

How wide is the distribution?

Mode

Study These Flashcards

In statistics, the mode is the one that occurs most.

Unlike the median and the mean, the mode is not necessarily unique. There might be several different values that occur the same number of times.

Bar plot

Study These Flashcards

For showing categorial data

(remember that a barplot can show “counts” or “proportions” in y axis

Comparring nummerical data

Study These Flashcards

Null hypothesis vs alternative hypothesis

Study These Flashcards

“there is nothing going on”

Hypothesis test

Study These Flashcards

Example using 100 simulations of the test, to make sure that there is or not is discrimination. One or two tests would most likely be down to chance alone, therefore you have to make a lot of simulations of the test.

“There is nothing going on.”
Promotion and gender are independent, no gender
discrimination, observed difference in proportions is simply due
to chance. ! Null hypothesis
“There is something going on.”
Promotion and gender are dependent, there is gender
discrimination, observed difference in proportions is not due to
chance. ! Alternative hypothesis

cluster sampling

cluster sampling er den "fattige" stratified sample - vi vælger ikke selv grupperne, vi anter at fx lande eller klasser er gode nok til at skabe grupperne

stratified sample

Stratification is the process of dividing members of the population into homogeneous subgroups before sampling. (svært at gøre og derfor er cluster sample nemmere)

Ordinal vs not ordinal?

Ordinal data is a kind of categorical data with a scale to it. For example, ordinal data is said to have been collected when a responder inputs his/her financial happiness level on a scale of 1-10. In ordinal data, there is no standard scale on which the difference in each score is measured

Skew vs mean, median

Right-skewed: mean \> median I Left-skewed: mean \< median

OpenIntro 2 Flashcards

(28 cards)