Midterm Flashcards
(65 cards)
Categorical Variables
These variables represent categories or groups and cannot be used in mathematical operations like addition or subtraction to derive meaningful results. They are qualitative in nature.
Ordinal Variables
These have a natural order or ranking, but the differences between categories are not quantifiable or uniform. For example:
Shift Sizes: Small, Medium, Large. While there is an order, the difference between Small and Medium may not be the same as between Medium and Large.
Tax Brackets: Low, Medium, High. The brackets are ordered, but the difference between Low and Medium is not necessarily the same as between Medium and High.
Importance: Ordinal variables are useful in surveys and rankings where relative positioning matters, but exact differences do not.
Nominal Variables
These represent categories without any inherent order. They are essentially labels or names.
Yes or No: Binary responses like “Yes” or “No” are nominal.
Colors: Red, Blue, Green, etc. These are different but cannot be ordered.
Types of Animals: Dog, Cat, Bird. These are distinct categories without any natural order.
Importance: Nominal variables are crucial in classification tasks where the goal is to group data into distinct categories without implying any hierarchy.
Numerical Variables
These variables represent quantities and can be used in mathematical operations. They are quantitative in nature.
Discrete Variables
These take on specific, separate values, typically integers. They do not vary continuously.
Number of Times Done: For example, the number of times a person has visited a doctor. This is a count and can only be a whole number.
Importance: Discrete variables are essential in counting and frequency analysis, where the focus is on the number of occurrences.
Continuous Variables
These can take on any value within a continuous interval, including fractions and decimals.
Height: A person’s height can be measured to any degree of precision.
Concentration: The concentration of a chemical in a solution can be measured with high precision.
Importance: Continuous variables are vital in measurements and modeling where precision is required, such as in scientific experiments and engineering.
Independent Variable (Predictor)
This is the variable that you believe may influence or cause changes in the response variable. It is the “input” in an experiment or study.
Importance: Identifying the independent variable is crucial for designing experiments and understanding causal relationships.
“INPUT”
Dependent Variable (Response)
This is the variable that you are trying to explain or predict. It is the “output” in an experiment or study.
Importance: The dependent variable is the focus of analysis in many studies, as it represents the outcome of interest.
“OUTPUT”
Causal Relationship
This is a relationship where changing the independent variable directly affects the dependent variable.
Observational Studies
These cannot establish causal relationships because they do not involve controlled manipulation of variables.
Experimental Studies
These can establish causal relationships by manipulating the independent variable and observing the effect on the dependent variable.
Correlation vs. Causation
It is important to remember that correlation does not imply causation. Just because two variables are correlated does not mean that one causes the other.
Sampling
Is the process of selecting a subset of individuals from a larger population to estimate characteristics of the whole population.
Population
The entire set of elements you are interested in studying.
Sample
A subset of the population that you collect data from. The goal is for the sample to be representative of the population.
Parameters
Numerical values that describe some characteristic of a population.
Statistics
Numerical values that describe some characteristic of a sample.
**The goal is to use statistics to estimate parameters.
Importance: Sampling is essential because it is often impractical or impossible to study an entire population. A well-chosen sample can provide accurate estimates of population parameters.
Random Sampling
Each member of the population has an equal chance of being selected. This helps avoid bias and ensures that the sample is representative.
Importance: Random sampling is crucial for generalizing results from the sample to the population.
Avoiding Selection Bias
Ensuring that the sample is not systematically overrepresented or underrepresented. This includes eliminating convenience bias, where samples are chosen based on ease of access.
Importance: Selection bias can lead to incorrect conclusions, so it is vital to ensure that the sample is representative.
Sample Size
The sample should be large enough to minimize sampling error but not so large that the marginal gain in accuracy is negligible.
Importance: An appropriately sized sample ensures that the results are reliable and valid.
Coverage of Key Demographics or Features
Ensuring that all relevant subgroups are included in the sample proportionally.
Importance: This ensures that the sample accurately reflects the diversity of the population.
Randomized Allocation
Randomly assigning subjects to different groups to ensure that each group is comparable. This is crucial for establishing causal relationships.
Importance: Randomized allocation helps eliminate confounding variables and ensures that any observed effects are due to the independent variable.
Confronting Variance/Counfounding Variable
** Influences both the Predictor and Response Variables
Variance can arise from differences in subjects’ backgrounds, environments, etc. Methods to control variance include:
Matching: Pairing subjects based on similar characteristics to control for differences.
Replication: Using a larger sample size to increase the accuracy of the results.
Blocking: Grouping subjects into blocks based on certain characteristics and then randomizing within each block.
Importance: Controlling variance is essential for ensuring that the results of an experiment are valid and reliable.
Hypothesis Testing
A statistical method used to determine whether there is enough evidence to reject a null hypothesis.