Lecture 3 Flashcards
(11 cards)
1
Q
Observations/Cases
A
the objects we collect
data on
2
Q
Unit of observation
A
the level of our observations in our dataset
3
Q
Variables
A
- Variables are “the pieces of information we collect on our units of observation” (Haan
&Godley 2017, p.12). - Any property that varies can potentially be a variable (Takes on two or more values)
- Variables should be both exhaustive and mutually exclusive
Exhaustive: there should be enough categories composing the variables to classify every
observation; every observation or case has to have a place to go
Mutually exclusive: there is only one category suitable for each observation; one observation
cannot fit into multiple categories; every observation must fit into only one category - columns in data set
4
Q
2 Purposes/Types of Variables
A
- Identification variables: variables that uniquely identify each observational unit
- Characteristic or measurement variables: variables that describe properties of our observations
Best practice: identification variables should be the first column(s) in our dataset
5
Q
4 Levels of measurement
A
- Nominal
- Ordinal
- Interval
- Ratio
6
Q
Nominal variables
A
- Categorical
- There is no quantifiable difference between categories
- It is not possible to rank the categories
- Numeric values used to represent categories are not meaningful
(do not imply anything about the magnitude or differences
between categories) - Numbers or symbols are assigned to the values of the variable for
the purpose of classifying, naming, or labelling - Often referred to as “qualitative”
- Example: sex, religion, political party, ethnicity
7
Q
Ordinal variables
A
- You can rank the categories from low
to high, but not calculate the
difference between them - Many attitudes we measure are
ordinal-level variables (level of
agreement or satisfaction) - Example: social class, education,
but also often times we measure age
and income on ordinal scales
8
Q
Interval/ratio
A
- Compare values not only in terms of which is larger or smaller
but also in terms of how much larger or smaller one is compared
with another - Sometimes distinguish between variables with a natural zero
point (zero means the absence of the property) (ratio) and those
where zero is arbitrary, meaning there’s no true zero (interval)
9
Q
Dataset Codebook
A
- A codebook describes the contents, structure, and layout of a data
collection. - Should include a description of the study, variable names and
descriptions - may also include question wording (survey data), information about
weights, summary statistics
10
Q
Tidy Data
A
Tidy data refers to data that is stored in the following format:
* Each observation is a row
* Each variable is a column
* Each type of observational unit is a table
11
Q
Five common problems that make for untidy data
A
- Column headers are values, not variable names.
- Multiple variables are stored in one column.
- Variables are stored in both rows and columns.
- Multiple types of observational units are stored in the same table.
- Data values about a single type of observational unit are spread out over
multiple datasets.