Kvantitativ statistik lecture 1 Flashcards

1
Q

What are the major types of data statisticians work with

A

interval, nominal, ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do we use sampling in statistics? What is the meaning of random sampling, can you
provide an example

A

To minimize errors for researches, if wanna say something about a big population there is a big risk for error s, but if they use sampling they can minimise they errors because when say something about a sampling is a little piece of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do we use sampling in statistics? What is the meaning of random sampling, can you
provide an example?

A

We use sampling in statistic to say something about the whole population.
- Random sampling example
- If we have 1.000 students and we wants to estimate the average gpa.
- we assign a unique number to all 1.000 students.
- We use a random number generator to select 100 students, so every students have a equal chance to being included.
- We collect data from the 100 students.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some of the challenges or issues in sampling of data? List out any three.

A
  • Selection bias: Selection bias occurs when the sampling method used doesn’t truly represent the population being studied.
  • Sampling frame issues: Choosing an appropriate sample size is critical. If the sample size is too small, it may not provide enough information to draw meaningful conclusions or detect significant patterns

Non-Response Bias: Non-response bias occurs when a significant portion of the selected sample does not participate or respond to the survey or study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is mean, median? Why should we care about these statistics?

A
  • Mean reference to the average in sample or population.
  • Median is the middle value in a population or sample.
  • To get a understanding of your data. easy to compare to other data. Can make you a decision on what the average salary is in a company or what is the median in the students gpa.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does standard deviation in a dataset tell us? What does it mean when standard
deviation is small or large?

A
  • Standard deviation tells us how much the dataset is spread out.
  • If there is a small deviation. The dataset is close around the mean.
  • If there is a large deviation. The dataset is spread out and not around the mean.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define numerical variable and categorical variable. Provide examples.

A

Numerical variable. Variables which you can performed mathematical on, for example, age, income, temperature.
- Categorial variable. Variables which you can not performed mathematical on, for example, Gender, Education level, Marital status.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Provide examples for discrete and continuous data.

A
  • Discrete data. Is data where is whole numbers you get. Roll a dice you get a whole number.
  • Continuous data. Is data where you can get decimal or fractional after the whole number. Heigh of induvial is a good example.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

List out one difference between discrete and continuous variables. Provide an example of
each.

A

One difference between discrete and continuous variables is that discrete is in whole numbers and continuous, there is decimal or fractional after the whole number.
- Discrete data. Roll a dice you get a whole number.
- - Continuous data. Heigh of induvial, because each induvial have a different height and often have decimal or fraction after the whole number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does it mean when one says - “90th percentile”?

A
  • It means that the value calculated below 90% of the data falls.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does it mean when one says - “10th percentile”?

A
  • This is the value below 10% of the data falls.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If one flips a fair coin many times, what is the probability of getting a heads?

A
  • 50%
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If one flips a fair coin many times, what is the probability of getting a tails?

A

50%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

. If one rolls a fair six-sided die {1, 2, 3, 4, 5, 6}, what is the probability of each face
occurring?

A

1/6 or 16,67%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain any one symmetric distribution - either discrete or continuous.

A

Normal distribution is a symmetric distribution, which describes discrete and continuous. Normal distribution have a peak which divides the data symmetric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Properties of a binomial distribution, mention any two. Provide an example

A

Fixed number of trials, which often is labeled success or failure.
- Independence of trials. Each trail is independent, which means one trail do not affect the outcome on any other trail.
- We can take a look on coinflip, where the fixed number is 3 in this example. And when I flip the coin 3 times, the first flip do not affect the 2 other flips.

17
Q

List out properties of the normal distribution. Mention any three.

A

symmetry, the normal distribution is symmetry which means is have a peak and the sides is evenly distributed.
- Unimodality, It means it have a single peak.
- Asymptotic Tails, It means the tails is extend indefinitely in both directions and it never reaches zero.

18
Q

Properties of continuous random variables. Mention any three.

A
  • Infinite possible values, it can take infinite number of continuous random variables.
  • Probability Density Function, continuous do not assign a probability to induvial values but provides a continuous curve that describes the relative likelihood of different values.
  • No Probability mass at individual values. Example if we have 1,2,3 values it does not have a specific value attached to it.
19
Q

What is the area under the curve for a continuous distribution?

A

The likelihood of the random variable taking on values within a specific interval.

20
Q

How does mean and standard deviation affect the shape of the normal distribution?

A

Mean, if the mean shift either to the right or left the peak follows the normal distribution.
- Standard deviation, affect how wide the normal distribution is or how small the normal distribution is.

21
Q

What is the relationship between mode, mean and median in a normal distribution?

A

The all share the same value and location, which is in the center of the normal distribution.

22
Q

What is an outlier and what are the ways of dealing with them?

A
  • An outlier is a observation or data point, which is significantly different from the rest of the dataset.
  • Identify and examine: make a visualizations to identified the outliers.
  • You can replace the outlier but I will affect your result.
23
Q

Under a normal distribution, what interval does 95% of the probability fall within? And
for 90%?

A

When the interval have a 95% the probability falls within 1.96 z score, it means the interval reaches from -1.96 to 1.96.
- When the interval have a 90% the probability falls within 1.645, it means the interval reaches from -1.645 to 1.645.

24
Q

What is the empirical rule, and when can it be helpful?

A

Empirical rule is known as 68, 95, 99,7. Which means that approximately 68% of the data falls within one standard deviation of the mean. Approximately 95% of the data falls within two standard deviation of the mean. Approximately 99,7% of the data falls within three standard deviation of the mean.
- When using the empirical rule is important to notice that it only holds on a perfect normal distribution. It can be helpful to set confidence interval.

25
Q

Define central limit theorem. Why is the central limit theorem important in statistics?

A

Regardless of the shape of the original population, the sampling distribution of the sample mean approaches a normal distribution as the sample size increases.
- Statistical inference. T-test and confidence intervals rely on CLT to make inference about population parameters, so we can work with real-world data that may not be normally distributed.

26
Q

What is the Law of large numbers?

A

The law of large numbers have two versions, Weak law of large numbers and strong law of large numbers.
- The weak law of large numbers states that as you take larger and larger samples from a population and calculate the mean, the probability that the sample mean is close to the population mean.
- The strong law of large numbers states that in almost every possible outcome, the sample mean will equal the population mean when the sample size is sufficiently large.

27
Q

Explain sampling distribution.

A

Sampling distribution provides valuable insights into behavior of sample statistics. It forms basis for making inferences about population and assessing the reliability of sample estimates.

28
Q

In experimental design - what is a treatment group, and what is a control group?

A

Treatment group consists of subjects or participants who are exposed to the experimental treatment.
-control group is the same as treatment group except for the experimental treatment.