Statistics Flashcards

1
Q

What is variance?

A

Disperzija/varijansa predstavlja matematicko ocekivanje odstupanja slucajne promenljive od njene srednje vrednosti.
Varijansa je mera disperzije, sto znaci da izrazava koliko je skup brojeva rasiren od njihove prosecne vrednosti.

Variance (sigma^2) is the measure of how far from the mean is each value in a dataset. The higher the variance, the more spread the dataset. This measures magnitude.

Calculation:
The average of the squared differences from the Mean for all data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is covariance?

A

Kovarijansa u teoriji verovatnoće i statistici, predstavlja meru jačine veze između promene dve promenljive.

Covariance is the measure of how two random variables in a dataset will change together. If the covariance of two variables is positive, they move in the same direction, else, they move in opposite directions. This measures direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is correlation?

A

Korelacija je medjusobni odnos dve ili vise slucajnih promenljivih (na osnovu vrednosti jedne slucajne promenljive, uz odredjenu verovatnocu, mozemo da pretpostavimo vrednost druge).
Kovarijansa je mera jacine povezanosti dve sl. promenljive.
Koeficijent korelacije je mera stepena poveznosti dve slucajne promenljive.

Correlation is the degree to which two random variables in a dataset will change together. This measures magnitude and direction. The covariance will tell you whether or not the two variables move, the correlation coefficient will tell you by what degree they’ll move.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What Is a Normal Distribution?

A

A normal distribution, also called Gaussian distribution, is one that is symmetric about the mean. This means that half the data is on one side of the mean and half the data on the other. Normal distributions are seen to occur in many natural situations, like in the height of a population.

In a graph, normal distribution will appear as a bell curve.

The mean, median, and mode are equal
All of them are located in the center of the distribution
68% of the data falls within one standard deviation of the mean
95% of the data lies between two standard deviations of the mean
99.7% of the data lies between three standard deviations of the mean

Ocekivanje je 0, varijansa (sigma^2) je 1.
srednja vrednost = medijana = modus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the different types of Hypothesis testing?

A

Hypothesis testing is the procedure used by statisticians and scientists to accept or reject statistical hypotheses.

Null hypothesis: It states that there is no relation between the predictor and outcome variables in the population. H0 denoted it.
Example: There is no association between a patient’s BMI and diabetes.

Alternative hypothesis: It states that there is some relation between the predictor and outcome variables in the population. It is denoted by H1.
Example: There could be an association between a patient’s BMI and diabetes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain the Type I and Type II errors in Statistics?

A

Greske u testiranju hipoteze.

Greška tipa I se dešava kada se odbaci istinita nulta hipoteza, odnosno ukoliko se prihvati neistinita alternativna hipoteza.
Greška tipa II nastaje kada prihvatamo netačnu nultu hipotezu, odnosno odbacimo istinitu alternativnu hipotezu.

In Hypothesis testing, a Type I error occurs when the null hypothesis is rejected even if it is true. It is also known as a false positive.

A Type II error occurs when the null hypothesis is not rejected, even if it is false. It is also known as a false negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Descriptive vs. Inferential Statistics

A

Deskriptivna statistika se bavi opisivanjem prikupljenih podataka dobijenih prilikom ispitivanja ili merenja, kao i njihovim sredjivanjem i sazimanjem (graficki prikazi, aritmeticka sredina, standardna devijacija).

Inferencijalna statistika sluzi analizi uzoraka i pronalazenju pravilnosti ili razlika unutar ili medju
uzorcima i omogucuje izvodjenje zakljucaka (obuhvata proveravanje postavljenih hipoteza upotrebom statistickih testova).

Descriptive Statistics describes the characteristics of a data set. It is a simple technique to describe, show, and summarize data in a meaningful way. Also, an experiment is conducted on the entire population

Inferential statistics involves drawing conclusions about populations by examining samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mean vs. Median vs. Mode

A

The mean of a dataset represents the average value of the dataset.

The median represents the middle value of a dataset.

The mode of a set of values is the most frequently repeated value in the set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When do we use mean?

A

Kada imamo simetricnu raspodelu bex outliera.

It’s best to use the mean to describe the center of a dataset when the distribution is mostly symmetrical and there are no outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When do we use the median?

A

It is best to use the median when the distribution is either skewed or there are outliers present.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Sample Size?

A

Sample size is the measure of the number of individual samples used in an experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Standard Deviation?

A

Govori nam koliko u proseku elementi skupa odstupaju od aritmeticke sredine skupa.

It’s a measure of how spread out the data is.

A square root of variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Quantitative vs. Qualitative Data

A

Qualitative = gender, color, car type… (pie charts)

Quantitative = numbers (bar graphs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a scatterplot?

A

It’s used to visualize the relationship between data that comes in pairs (two variables).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Population?

A

Populacija je skup svih clanova koji imaju odredjenu, zajednicku karakteristiku; skup svih ljudi ili stvari koji su od interesa u odredjenom istrazivanju.

Velicina populacije je odredjena brojem svojih clanova

The entire group of subjects about which we want information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Sample?

A

Uzorak je podskup osnovnog skupa, odnosno populacije.
Osnovna namena uzorka je da se na osnovu rezultata merenja ili ispitivanja na njemu zaključuje o populaciji.

Statistika se odnosi na karakteristiku uzorka, a parametar se odnosi na karakteristiku populacije

The part of the population from which we collect information.

17
Q

What is Statistical Inference?

A

The process through which inferences about population are made based on certain statistics calculated from a sample of data drawn from the population.

18
Q

What is the Probability of an event?

A

Verovatnoca da ce se dogadjaj desiti.

It’s the proportion of times the event occurs in many repetitions.

19
Q

Central Limit Theorem

A

The central limit theorem states that if you take sufficiently large samples from a population, the samples’ means will be normally distributed, even if the population isn’t normally distributed.

20
Q

The Chi-Square Test for Homogeneity

A

It tests the null hypothesis that the distribution of a categorical variable (e.g. color) is the same for several populations (e.g. milk, caramel, peanut m&m)

It assumes that the samples are drawn independently within and across the population.

21
Q

What is t-test and when do we use it?

A

A t-test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest.

22
Q

ANOVA

A

ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the difference between the means of more than two groups.

23
Q

What is Interquartile Range (IQR) and what can it be used for?

A

Svaki kvartil sadrzi 25% rezultata distribucije.

IQR = 75th percentile - 25th percentile
IQR = Q3 - Q1

Outlier removal:
- lower limit = Q1 - 1.5IQR
- upper limit = Q3 + 1.5
IQR

Then we remove all values above upper limit and under lower limit we calculated

24
Q

What is the Z-score?

A
  • Z Score tells us how many standard deviations away a data point is from the mean.
  • Z Score can be computed for every single data point.

(x - mean)/st.dev

25
Q

What is A/B testing?

A
  • A/B Testing is a simple and powerful technique that allows us to choose a better option out of the two (A and B).
  • Used for product recommendation on websites, deciding on the layout of tabs and buttons, social media add campaigns…
26
Q

Koje su karakteristike slucajnog uzorka?

A

Slučajni uzorak ima sledeće karakteristike:

  • svaki član populacije ima istu verovatnoću da bude izabran u uzorak;
  • odabir svakog člana uzorka je nezavisan, tj. ne utiče na odabir bilo kog drugog člana i
  • svaki uzorak određene veličine ima jednaku verovatnoću da bude izabran kao i bilo koji drugi uzorak iste veličine.