Chpt 2 - Organizing and Graphing Data Flashcards
Why do we need to learn how to organize data?
Without organizing the raw data, we cannot obtain useful information
What is data that is non-numerical, such as gender called?
Categorical
Also called qualitative
What is data that is numerical, such as the ages of the people involved in the study?
Numerical
Also called quantitative
What is a variable?
A variable is a characteristic that varies from one person or thing to another.
Examples include gender, age, marriage status, number of email accounts, monthly salary, height, weight, etc.
Basically all of the headings of the columns is considered a variable
What is the formal definition of data?
The values of a variable in a sample.
How can we categorize variables? How can these be further broken down?
Categorical (qualitative) or Numerical (quantitative)
Numerical variables can be further broken down to:
Discrete or continuous
Categorical variables can be further broken down to:
Nominal or ordinal
What are the types of numerical variables?
Discrete or continuous
What is a discrete variable? Give some examples
A numerical variable whose values can be listed
An example might be how many email addresses do you have? If you say 4, the person would be able to list out their 4 email addresses.
What is a continuous variable? Give some examples
A numerical variable whose possible values from some interval of numbers
Height is a continuous variable because its possible value is any number from 40cm to 300cm. While the height may not change once we are adults, we often don’t measure it in just feet….we use feet and inches, making this continuous as a value
Other examples are weight, temperature, and length.
Is age a discrete or continuous variable?
Age is also considered discrete because while you are constantly aging, we only list out age as whole numbers.
Anytime we are counting such as 1, 2, 3, it is considered discrete
That being said, with babies, we do measure the partial intervals, such as 1.5 years, or kids are only 5 days away from their next birthday…this context with looking forward and using increments, its considered a continuous variable
What is a non-numerical variable that has no specified order called?
Give an example
Nominal
Marital status…there is no actual order in life that is set, you may get married, be common law, get separated, be single etc. without a defined order that everyone follows
What is a non-numerical variable that has an order called?
Give an example
Ordinal
In many studies, there may be very like, like, neutral, dislike, or very dislike
If in a study, they coded females as 1 and males as 2, does this make it a numerical variable?
How can you test if the value is numerical or non-numerical?
No, it’s still a categorial variable because the numbers are meaningless and have no mathematical meaning
To check, do some basic math operations, if it makes sense it’s a numerical variable, if not, it is a non-numerical value. For example, if females are 1 and males are 2, you cannot add two females together to make 1 male. haha
What is the purpose of organizing data?
To analyze the distribution of a variable
What methods can we use to organize categorical data?
Numerical methods (frequency and relative frequency tables)
Graphical methods (pie charts and bar charts that help visualize the frequency/relative frequency tables)
What is the distribution of a variable?
A table or formula (he also said graph) that tells us:
1.What values this variable takes
- How often it takes these values in a population
If we are interested in the distribution of gender, what would we need to know?
- What values this variable takes (so male/female)
- How often it takes these values in a population (so number of males and number of females)
We might also include transmen, transwomen, and nonbinary as other values, this would mean that we would need to also know the number of transmen, transwomen, and nonbinary people exist in the population
What is frequency?
What is the frequency of genders if the sample of 10 students has 6 girls and 4 boys in it?
A numerical method or organizing categorical data
It’s counting :) It’s the number of times a particular distinct value of a variable occurs in a sample
Frequency of males is 4
Frequency of females is 6
What is relative frequency?
What is the relative frequency of genders if the sample of 10 students has 6 girls and 4 boys in it?
A numerical method of organizing categorical data
It’s the ratio of the frequency to the sample size. The relative frequency of a particular distinct value of a variable shows the percentage of this value occurs in a sample
Relative frequency = frequency/sample size
Females: 6/10 or 60%
Males: 4/10 or 40%
Once we have determined the frequencies or relative frequencies, we can make a frequency/relative frequency table to show the distribution of the data. What are the steps to this?
Give an example using the possible genders of a group of 10 students with 6 women and 4 men
- List out all the possible distinct values (male, female)
- Find the frequencies (f) and relative frequencies (f/n) of the distinct values
Females - f=6, f/n=6/10
Males - f=4, f/n=4/10
(this would be listed in a simple table…but I can’t put one in here)
What are the graphical methods used to organize categorical data?
Pie charts
Bar charts
What are the numerical methods to organize categorical data
Frequency/relative frequency tables
What is pie chart?
A disk divided into wedge-shaped pieces proportional to the relative frequencies of the categorical data
Once slice is allotted for each category and the angle of the slice = relative frequency x 360 degrees
A study of 10 students showed that 6 liked rom-com movies. How would this be expressed on a pie chart?
f = 6
f/n = 6/10 or 60%
Equation for pie chart:
angle = f/n x 360
angle = 0.60 x 360
angle = 216 degrees
What is the notation for frequency?
f
What is the notation for relative frequency?
f/n
What is a bar chart?
A display of the distinct values of categorical data on a horizontal axis and the relative frequencies or frequencies (depending on what we want) of those values on a vertical axis
One bar is for one category
The height of the bar = relative frequency, or frequency, of a value (so we might have a frequency of 6, or a relative frequency of 0.6, depending on what we want to present)
Bars should be positioned so that they DO NOT TOUCH each other
What is the difference between frequency/relative frequency tables and contingency tables?
Frequency and relative frequency tables summarize information of a SINGLE (univariate) categorical data
Contingency tables summarize information of TWO SETS (bivariate) of categorical data
What method will help us answer the question of are there more divorced female residents or more divorced male residents of a sample?
What would this look like?
Contingency table
There are TWO SETS of data to be examined, gender and marital status
Gender and Marital status would both be titles; under gender we would split males to females, and then under marital status, we would use each type of status as a subheading (married, divorced etc.). We would count each group of participants to summarize the totals. So how many females are married, how many males etc. to fill in the tables. The totals would be counted both horizontally and vertically
What are contingency tables?
Tables that allow us to summarize and analyze the relationship between TWO variables
What is the relationship between 2 variables?
Association
What is an assocaiton?
The relationship between 2 variables
How do we know 2 variables are associated?
If knowing the values of one of the variables tells us something about the values of the other variables
What are possible values of seasons? What about weather?
Are these 2 variables associated? explain.
Season - spring, summer, fall, winter
Weather - sunny, cloudy, snowy, windy, thunder, raining etc.
Yes they are associated. Given the information about the season, we know more about the weather. Like knowing it is cold and snowy in the winter
What are possible values of clothing size? What about GPA?
Are these 2 variables associated? explain
Clothing size - S, M, L, XL
GPA - 0.00-4.00
No they are not associated because knowing your clothing size tells us nothing about GPA
What are are some of the ways we can further examine the relationship between 2 variables if it is difficult to observe in a contingency table?
Apply other statistical methods such as side-by-side bar or pie charts, stacked bar charts, etc.