What are the two types of statistics?

Descriptive and Inferential

Define population

An aggregate of subjects we want to study

- things
- cases
- Bacterias
- Animals
- Humans

Define sample

a sample refers to a set of observations drawn from a population.

Define observation

Study unit / subject / individual

Define variable

Quality or quantity measured for each subject in the sample (age, sex, colour, weight)

Define dataset

A set of values on all variables of interest for all

observation in the study

Define parameters

Parameter are quantities used to describe characteristics of the population

Parameters are quantities such as:

Mean height of Swedish men

Prevalence of Hepatitis C in Swedish drug users

Proportion of breast cancer patients who develop another cancer

μ

Population mean

σ^{2}

population variance

*p*

population proportion

￼￼Define **target population**

The population to whom we wish to

generalize our findings

Define study population

The population from which we sample

￼￼￼￼

what are the measurements of central tendency?

Median

Mean

Mode

What measure of tendency is good to use when data contains outliers?

Median

Define* mode*

* Mode* is that most frequently occuring value in the data

S^{2}

Sample variance

S

Standard deviation of a sample

How is the standard deviation calculated?

By taking the is the square root of its variance

What does a low standard deviation indicate?

A low standard deviation indicates that the data points tend to be very close to the mean

What does a high standard deviation indicate?

a high standard deviation indicates that the data points are spread out over a large range of values

What does the standard deviation tell us?

it tells us how much variation or "dispersion" exists from the average (mean, or expected value)

What does the variance tell us?

The variance is describing how far the numbers lie from the mean (expected value)

What is the* constant* for 90 % confidence intervall?

C = 1.64

What is the * constant* for 95 % confidence intervall?

C = 1.96

What is the ** constant** for 99 % confidence intervall?

C = 2.58

what is a stochastic or random variable?

is a variable whose value is subject to variations due to chance

Sample mean

Population mean

Population variance

(Sigma square)

Sample variance

What is a nominal variable?

A variable that assume values that fall into unordered categories (e.g. maritial status, place of birth)

What is a binary or dichotomous variable?

A nominal variable with only two categories (e.g. gender, yes/no)

What is a** ordinal **variable?

A variable that assume values that fall into ordered categories

disease status: minor, moderate, and severe

Blood pressure: Low, normal, and high

What is the

*interquartile range?*

The interquartile range is equal to Q3 minus Q1

Quantitative variables can either be:

Discrete or continuous

Define **discrete variable **

Data that can be arranged into naturally occurring groups. For example number of children in a family or number of cigarettes smoked per day.

Define

*continuous variable*

A variable with a potentially infinite number of possible values along a continuum. For example height and weight

Explain

**range of distribution**

The *difference* between the largest and smallest values in a distribution.

The number of successes that result from the binomial experiment is denoted by the symbol

X

The number of trials in the binomial experiment is denoted by the symbol

*n*

The probability of success on an individual trial in a binominal experiment is denoted by the symbol..

*P*

The probability of failure on an individual trial in a binominal experiment is denoted by

1 -* P*

The mean of any distribution is also called...

Expectation

Both ** standard deviation** and

*are calculated from the...*

**standard error (SE)**

Variance

When calculating variance why do we square the deviations?

to eliminate negative values

How is the standard error calculated?

By dividing the standard deviation with the square root of *n*

What measure of distribution is good to use for the ** median**?

Percentiles or quartiles

What is a * type I error*?

Type I error occurs when the researcher rejects a null hypothesis when it is true.

What is a ** type II error**?

A Type II error occurs when the researcher accepts a null hypothesis that is false.

What is the ** confidence interval** used for?

the* confidence interval* is used to express the degree of uncertainty associated with a sample statistic.

What is a continuous varuable?

a variable that can take on any value between its minimum value and its maximum value.

Z-score is also called...

Standard score

What does a Z-score indicates?

it indicates how many standard deviations an element is from the mean.

How is the Z-score calculated?

How is the variance of a population calculated?

What does the horizonatal line in a box plot diagram represent?

It represents the median or the 50% percentile

What type of variables are histograms good for?

Continuous variables

What does the lower limit of the box in a box plot represent?

the 25th percentile

What does the upper limit of the box in a box plot represent?

The 75th percentile

what does the lower whisker of a box plot represent?

it is the smallest value within 1.5 times the interquartile range from lower limit of the box

what does the upper whisker of a box plot represent?

it is the largest value within 1.5 times the interquartile range from upper limit of the box

What does the outer dots in a box plot represent?

**Outliers **

values greater than upper whisker or smaller than lower whisker

How many percent of the observations do we find within 1 standard deviation of the mean?

68 %

How many percent of the observations do we find within 2 standard deviations of the mean?

95 %

The standard deviation has the same unit as the...?

Mean

Name four characteristics of the Normal distribution

• meant for continuous variables

• defined from minus infinity to plus infinity

• symmetrical and bell-shaped

• centered about its mean

A Normal distribution with mean

zero and variance one is called

standard Normal distribution.

Name five sampling schemes

** Simple random** sampling

* Systematic* sampling

** Stratified** sampling

** Cluster** sampling

** Non-probability** sampling

Simple random sample

Sampling units are equally likely to be part of the sample units

Systematic sampling

a statistical method involving the selection of elements from an ordered sampling frame.

Ex. One random number is generated then every 5th is choosen.

Stratified sampling

Divide the population into strata; draw random samples within each stratum;

sampling fractions may vary across strata

It ensures that all the strata are represented

Cluster sampling

Identify clusters or groups of units in the population (e.g. families); draw of

random sample of cluster rather than units (e.g. individuals)

Non-probability sampling

Convenience sampling schemes (e.g. volunteers)

Prone to bias

Probability can also be said to be the....?

Relative frequence in the long run

The probability is always a number between...?

0-1

In linear regressions the independent variable is denoted by what letter?

X

In linear regressions the dependent variable is denoted by what letter?

Y

Positive linear association means

Positive covariance

Negative linear association means

Negative covariance

What are the association?

Positive

What are the association?

Negative

What are the association?

Non!

Independent

The correlation coefficient can never be greater than...?

The correlation coefficient can never be smaller than?

-1

what does it mean if the correlation coefficient is equal to 0

There are no covariance between two variables

Explain residuals

it is the difference between the observed value of the dependent variable (y) and the predicted value (ŷ)

What is the coefficient of determination (r^{2}) if x does not affect y at all?

the coefficient of determination (r^{2}) is 0%

What does the intercept of an eqation mean?

The intercept is the value of the dependent variable when the value of the independent variable is = 0

what does β (*slope*) represent?

β is the value that determines how many units y increases when x increases one unit.

In linear regressions the independent variable is denoted by what letter?

X

What types of variables are used in binominal distributions?

Categorical binary variables

The null hypothesis is denoted by...?

H_{0}

The alternative hypothesis is denoted by...?

H_{1} or H_{A}

What are the most common α-levels?

0.01

0.05

0.10

if the confidence level is 95%, then alpha would equal

0.05

What do we do if the If the P-value is less than the significance level?

P

We reject the null-hypothesis

H_{0}

The criteria for rejecting the null hypothesis are:

p ≤α

reject the null hypothesis

The criteria for rejecting the null hypothesis are:

p > α

do not reject the null hypothesis

What values can a p-value take?

only values between 0 and 1

The 95% confidence interval for the mean represents

The interval that contains, with 95% probability, the true mean value in the population.

A binomial distribution must meet these four requirements

1. A fixed number of tests

2. Each test must be independent

3. There can be only two results (Success or Failure)

4. No test has any impact on any other test.

Define Z-score

A z-score is defined as the number of standard deviations a specific point is away from the mean.