Final Of Everythin Flashcards
Data Matrix
A convenient way to store data (eg spread sheet, table). Each row is a unique case (observational unit). Each column corresponds to a variable.
The two types of variables
Numerical or Categorical
Numerical Variables
Can be discrete or continuous
Categorical Variables
Can be ordered or nominal
What type of variable is “Number of Siblings”?
Numerical (discrete)
What type of variable is “Student Height”?
Numerical (continuous)
What type of variable is “Previous Stats Courses Taken”?
Categorical (nominal)
Explanatory variables might affect
Response variable
Two types of data collection
Observational Studies and Experiments
Researchers collect data passively they merely observe
Observational studies
Researchers actively control the data collection trying to establish causation
Experiments
Sampling principles and strategies
1st step: Identify topics and questions to be investigated
2nd: clearly laid out research questions is important to identify important subjects/causes and what variables are important
3rd: Consider how data are collected
Example: suppose we want to estimate household size where a household is defined as people living together in the same dwelling and sharing living accommodation. If we selected students at random at an elementary school and asked them what their family size is, wilk this be a good measure of house hold size
- Average will be biased
- Only measuring households with children, not single people or people without children.
- Would likely estimate a higher number than the true number.
Relationship between Sample and Population
Sample is a subset of population:
Population- people
Sample- a group of selected people
Three sampling methods
1) simple random sample
2) stratified sample
3) cluster sample
Simple random sample
Randomly selected from population
What type of sample is cars passing through intersections in Kelowna
Simple random sample
Stratified sample
Cases grouped into strata, then simple random sampling
Cluster sample
Divide into clusters and sample all
Multistage sampling
Clusters are sampled randomly
Scatterplot
A way to provide case by case view of data. Can visualize relationship between two numerical variables.
Dot plot
Visualize one numerical variable
Sample mean (sample average formula)
x̄ = (x1 + x2 + x3 +… +xn)/n
What is the unit of sample mean
The same as the sample