Mid-Term Exam Flashcards

(81 cards)

1
Q

population

A

the group of all items (data) of interest.

- frequently very large; sometimes infinite.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

sample

A

a sample of items (data) drawn from the population of interest.

  • potentially large but much less than population.
  • the sample is a subset of the population.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

parameter

A

a descriptive measure of a population.

- Ex. population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

statistic

A

a descriptive measure of a sample.

- Ex. sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

statistical inference

A

sample statistics are used to make inferences about population parameters, meaning an estimate, prediction or decision can be produced about a population based on sample data. therefore what is known about a sample can be applied to the larger population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

numerical data

A
  • values are real numbers
  • all calculations are valid
  • data may be treated as ordinal or nominal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

nominal data

A
  • values are the arbitrary numbers that represent categories
  • only calculations, such as proportions based on the frequencies of occurrence are valid
  • data may be treated as ordinal or numerical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ordinal data

A
  • values must represent the ranked order of the data
  • calculations based on an ordering process are valid
  • data may be treated as nominal but not as numerical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

bar chart

A

a bar chart is mainly used for nominal data and graphically represents the frequency of each category as a bar rising vertically from the horizontal axis.
- bar height is proportional to frequency of the corresponding category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

pie chart

A

a circle that is subdivided into slices whose area are proportional to the frequencies, therefore displaying the proportion of occurrences of each category.
- popular tool to represent proportions of appearance for nominal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

steps to building a histogram (3)

A

1) collect the data
2) create a frequency distribution for the data
- determine number of classes
- determine class width
3) draw a histogram of rectangle bars using the class intervals and the corresponding frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

class width

A
generally best to use equal class widths. 
unequal class widths are used when the frequency associated with some classes is too low, then: 
- several classes are combined together to form a wider and more populated class
- it is possible to form an open-ended class at the higher or lower of the histogram
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

relative frequency

A

proportion of observations falling into each class, and should be used when comparing two or more histograms, each with different numbers/observations.
- often preferable than the frequency itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

class relative frequency (formula)

A

(class frequency) divided by (total number of observations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

equal class width (formula)

A

(largest value - smallest value) divided by (number of classes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

cumulative frequency of a class

A

the number of measurements less than the upper limit of that class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

to obtain the cumulative frequency of a class

A

add the frequency of that class with the frequencies of all previous classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

cumulative relative frequency of a particular class

A

the proportion of measurements that are less than the upper limit of that class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

arithmetic mean

A

most popular and useful measure of central location.

  • all values are used
  • it is unique
  • the sum of the deviations from the mean is 0
  • calculated by summing the values and dividing by the number of values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

median of a set of measurements

A

the value that falls in the middle when the measurements are arranged in order of magnitude.

  • unique median for each data set
  • commonly used measure of central location
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

mode of a set of observations

A

the value that occurs most frequently.

  • data set may have one, two or more modes (modal classes)
  • useful for all data, mainly used for nominal
  • for large data sets, modal class is more relevant than a single-value mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

which measure of central location?

A
  • mean is generally first selection unless outliers are present in the dataset, then the median should be used.
  • mode is seldom the best measure of central location.
  • median is not as sensitive to extreme as is the mean.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

variance

A

this measure of dispersion reflects the values of all the measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

standard deviation

A

the square root of the variance of the measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
empirical rules
- approximately 68% of all observations fall within 1 standard deviation of the mean - approximately 95% of all observations fall within 2 standard deviations of the mean - approximately 99.7% of all observations fall within 3 standard deviations of the mean
26
probability of an event
the probability P(A) of event A is the sum of the probabilities assigned to the simple events contained in A.
27
intersection of event A and B
the event that occurs when both A and B occur.
28
joint probability of A and B
the probability of intersection A and B.
29
conditional probability
conditional probability is used to determine how two events are related; that is, it can be determined the probability of one event given the occurrence of another related event.
30
discrete random variable
one that takes on a countable number of values (integers).
31
continuous random variable
one whose values are not discrete, not countable (real numbers).
32
discrete probability distribution
a table, formula or graph that lists all possible values a discrete random variable can assume, together with their associated probabilities.
33
expected value
the weighted average of the possible values it can assume, where the weights are the corresponding probabilities of each xi.
34
population variance
the weighted average of the squared deviations of the values of x from their mean, where the weights are the corresponding probabilities of each xi.
35
Statistical inference
The process of drawing conclusions about the properties of a population based on information obtained from a sample.
36
Sampling distribution
The tool that tells us how close the statistic is to the parameter.
37
Standard error
The standard deviation of the sampling distribution of the sample mean.
38
Central limit theorem
Random sample from normal population = sampling distribution of the sample mean is normally distributed Random sample from any population = sampling distribution of sample mean is approximately normal for a large sample size (n>=30)
39
What causes a more closer resemblance of the sampling distribution of the sample mean to a normal distribution?
A larger sample size (n)
40
What does capital N mean?
Population size
41
A population size large relative to the sample size, the correction factor is ...
Close to 1 and can be ignored
42
How large does a population sample have to be, to be considered “large”?
20 times larger than the sample size
43
Method for making statistical inferences:
- identify the parameter to be estimated - specify the parameters estimator and its sampling distribution - construct an interval estimator
44
Types of estimation (2)
- point estimator | - interval estimator
45
Point estimator
Estimates the value of an unknown parameter using a single value calculated from the sample data.
46
Interval estimator
Draws inferences about a population by estimating the value of an unknown population parameter by using an interval.
47
Estimator characteristics (3)
- unbiasedness - consistency - relative efficiency
48
Unbiasedness
An unbiased estimator is one whose expected value is equal to the parameter it estimates.
49
Consistency
An unbiased estimator is said to be consistent if the difference between the estimator and the population grows smaller as the sample size increases.
50
Relative efficiency
If there are two unbiased estimators available, the one with a smaller variance is said to be relatively efficient.
51
Examples of unbiased estimators
- sample mean - sample median - sample variance - sample proportion
52
Examples of consistent estimators
- sample mean | - sample median
53
Examples of efficient estimators
Both the sample mean and median are unbiased estimators of the population mean. However the median has a greater variance than the sample mean, so the sample mean is relatively efficient when compared to the sample median.
54
Which is the “best” estimator?
The sample mean as it is unbiased, consistent and relatively efficient.
55
The expected value (E(X)) of the sampling distribution of the sample mean equals the population mean...
...for all populations.
56
As the level of confidence increases...
...the width also increases.
57
If the standard deviation is doubled...
...2B is doubled and visa versa
58
when n increases...
...the width of the confidence interval increases.
59
The width of the confidence (2B) interval is affected by:
- level of confidence - population standard deviation - sample size
60
Wide confidence intervals provide:
Little information
61
t-distribution
Mound-shaped and symmetrical around zero.
62
Degrees of freedom (n-1)
A function of the sample size, which determines how spread the distribution is compared to the normal distribution.
63
Purpose of hypothesis testing
To determine whether there is enough statistical evidence in favour of a certain belief about a population parameter.
64
Rejection region
Consists of all values of the statistic for which Ho is rejected.
65
Acceptance region
Consists of all values of the rest statistic for which Ho is not rejected.
66
Critical value
Value that separates the acceptance and rejection region.
67
Decision rule
Defines the range of values of the test statistic for which Ho is rejected in favour of HA.
68
A 90% confidence interval estimate of the population mean can be interpreted to mean...
If we repeatedly draw samples of the same size from the same population, 90% of values of the samples means will result in a confidence interval that includes the population mean.
69
P-value
The minimum level of significance that is required to reject the null hypothesis.
70
If a hypothesis is not rejected at the 0.10 level of significance it will...
...not he rejected at the 0.05 level.
71
P-value method:
- Good measure of amount of statistical evidence supporting HA - Only employed statistical computer software - Yields same conclusions as rejection region method
72
The expected value of the difference of two sample means the difference of the corresponding means is...
...always correct.
73
Description of linear relationship between two variables:
- covariance | - correlation coefficient
74
If the problem objective is to analyse the relationship...
Use correlation and regression analysis
75
Regression analysis
Used to predict the value of one variable on the basis of other variables.
76
Deterministic model
An equation or set of equations that allow us to fully determine the value of the dependent variable from the values of the independent variables.
77
Probabilistic model
A model used to capture the randomness that is part of a real-life process.
78
To create a probabilistic model:
Start with deterministic model that approximates the relationship we want to model and add a random term that measures the error of the deterministic model.
79
Random term (error variable)
Difference between actual selling price and estimated price based on the size of the house.
80
Estimated least square regression line
This least square method, produces a straight line that minimises the sum of the squared differences between the points and line.
81
The smallest the sum of the square differences...
... the better the fit.