Chapter 2 - Frequency Distributions Flashcards

1
Q

What are raw scores?

A

The data that is gathered from participants. All the numbers that have not been organized or graphed or cleaned up.

WHY not use raw data?
* Finding a pattern in raw data is difficult
* We want to visualize and summarize the data
* Need to also inspect for outliers and for data entry errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the steps to create a frequency distribution table and a grouped frequency table?

A
  1. Frequency Distribution Table = a visual depiction of data that shows how often each value occurred (how many scores were at a certain value – how many students got exactly 7 hrs sleep? 5 hours of sleep?) SEE PIC BELOW for how it’s done.
  2. Grouped Frequency Table: (Groups the data) 2 reasons -
    1. when data has a large range of potential values (like IQ going from 70 - 149 ) see table on next card
    2. When the data has decimal points (is continuous)

Principles to keep in mind for a Grouped Table:
a) you need to determine the full range of data and include the points that have zero frequency (Top Value - Bottom Value: 8 - 3.5 (then + 1) = 5.5)

b) aim for between approx. 5-10 intervals (no less than 5, no more than 15)

c) for continuous data, use lower and upper limits (the lowest and highest possible values)

Frequency Distribution Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

GROUPED FREQUENCY DATA

A

GROUPED FREQUENCY TABLE (the data initially)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

GROUPED FREQUENCY TABLE - for continuous data

A

HISTOGRAM - for continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is PIE CHART?

A

When you want to show proportions of the whole picture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a BAR GRAPH?

2nd way

A

Visual depictions of data when the independent variable is nominal and the dependent variable is interval (specifically, scale) :

TWO WAYS:

  1. Present frequency or proportion Data. EX: graph showing the % of girls and boys getting over 9 hours of sleep per night.
  2. Present mean or average values EX: the previous graph shows the mean score of the two variables, neutral and emotional. The black stick bars on top are ‘standard error bars’.

EX: develop a chart demonstrating the cost of tuition (dep. variable) for 3 types of schools - public, semi-public, & private (indep. variable)

1st way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a SCATTERPLOT?

A

Used to depict the relationship between 2 scale variables

ex: amount of abdominal fat & dementia symptoms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a HISTOGRAM?

Histogam bar graph

A

A histogram is a bar graph of data that shows the frequency of each value of a variable. Same info as a frequency table, but visualised differently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Biased Scale Lie?

What is the Sneaky Sample Lie?

What is an Interpolation Lie?

What is an Extrapolation Lie?

What is an Inaccurate Value Lie?

A
  1. When the choices are biased towards an outcome, such as when a scale has ‘Not Satisfactory, Good, Excellent, Truly Superior’…… and there’s no negative ratings on there! Another example is ‘Rate Toronto as 1st, 2nd, 3rd. or 4th’ and then the person reports ‘Toronto is in the top 4 cities in Canada!’. It is set up to have a biased outcome.
  2. sometimes there is a dichotomy amoung the data because either people had very good experiences or very bad experiences (Travel Advisor, Rate my Professor, Yelp). People self-select to participate and it’s not randomized sampling!:)
  3. When a line is drawn between data points that have been selectively placed on the graph
  4. When a line is drawn outside of the data points and the graph assumes the model line will go down, up or across.
  5. Uses scaling to distort the graph data. Looking at the pic below, the Tim Hortons and the Starbux uses different scales so the whole thing is hard to read at a glance! (Should start at 0 and label the scales)

All of these need to have representative sampling.

#5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a normal distribution?

A

is a graph showing the typical bell curve in the middle – meaning most of the participants scores were in the middle of the graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do positively skewed distributions and negatively distributions deviate from a normal distribution?

A

Instead of being a ‘normal’ graph with the bell graph in the middle, there is a tail to one side. It is non-normal and non-symmetrical.

POSITIVE — generally has ‘floor’ effects
NEGATIVE — generally has ‘ceiling’ effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the benefit of creating a visual distribution of data rather than simply looking at a list of the data?

A

to look at the shape of the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a floor effect and how does it affect a distribution?

A

A situation in which a constraint prevents a variable from taking values below a certain point. Pushes the distribution to the LEFT side of the graph (positive skew)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

CALCULATING STATS:

What is 63 out of 1264 in %
What is 2 out of 88 in %
What is 7 out of 39 in %
What is 122 out of 300 in %

What type of variable (nominal, ordinal, scale) are these data as counts?

What kind of variable are they as percentages?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Report these to only 2 decimal places?
1888.999
2.6454
0.0833

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

On a test of marital satisfaction, scores could range from 0 to 27:
1. What is the full range of data, according to the calculation procedure described in this chapter?
2. What would the interval sie be if we wanted six intervals?
3. List the 6 intervals

A
17
Q

If you have data that range from 2 - 68 and you want seven intervals in a grouped frequency table, what would the intervals be?

A
18
Q

A grouped frequency table has the following intervals:
30-44
45-59
60-74
If converted into a histogram, what would the midpoints be?

A
19
Q

Referring to the grouped frequency table (2.6), how many countries had at least 30 volcanoes?

A
20
Q

Referring to the histogram (2.1), how many countries had one or two volcanoes?

A
21
Q

If the average person convicted of murder killed only 1 person, serial killers would create what kind of skew?

Would the data for number of murders by those convicted of the crime be an example of a floor effect or a ceiling effect?

A
22
Q

A researcher collects data on the ages of university students. As you have probably observed, the distributions of age clusters around 19 - 22 yrs, but there are extremees on both the low end (high school prodigies) and the high end (non-traditional students returning to school):

  1. What type of skew might you expect for such data?
  2. Do the skewed data represent a floor effect or a ceiling effect?
A
23
Q

If you have an instagram account, you are allowed to follow up to 7500 other accounts. At that point, Instagram cuts you off, and you have to unfollow ppl to add more. Imagine you collected data from Instagram users at your university about the number of accounts each one follows:

  1. What type of skew might you expect for such data?
  2. Do the skewed data represent a floor effect or ceiling effect?
A
24
Q

APPLYING THE CONCEPTS:

Frequency tables, histograms, and the National Survey of Student Engagement: The National Survey of Student Engagement (NSSE) surveys U.S. first-year university students and seniors about their level of engagement in campus and classroom activities that enhance learning. Hundreds of thousands of students at almost 1000 schools have completed surveys since 1999, when the NSSE was first administered. Among the many questions, students are asked how often they have been assigned a paper of 20 pages or more during the academic year. For a sample of 19 institutions classified as national universities that made their data publicly available through the U.S. News & World Report Web site, here are the percentages of students who said they were assigned between 5 and 10 twenty-page papers:

0 5 3 3 1 10 2
2 3 1 2 4 2 1
1 1 4 3 5

a. Create a frequency table for these data. Include a third column for percentages.
b. For what percentage of these schools did exactly 4% of the students report that they wrote between 5 and 10 twenty-page papers that year?
c. Is this a random sample? Explain your answer.
d. Create a histogram of grouped data, using six intervals.
e. In how many schools did 6% or more of the students report that they wrote between 5 and 10 twenty-page papers that year?
f. How are the data distributed?

A
25
Q

APPLYING THE CONCEPTS:

The United Nations Development Programme(2015) published life expectancy rates—the number of years an adult can expect to live—for 195 countries around the world. Following is a randomly selected sample of 30 of them (see graph below).

a. Create a grouped frequency table for these data.
b. The data have quite a range, with the lowest life expectancy of 50.72 years in Côte d’Ivoire and the highest life expectancy of 83.58 years in Japan. What research hypotheses come to mind when you examine these data? State at least one research question that these data suggest to you.
c. Create a grouped histogram for these data. As always, be careful when determining the midpoints of the intervals.
d. Examine the histogram and give a brief description of the distribution. Are there unusual scores? Are the data symmetric, or are they skewed? If they are skewed, in which direction?

A
26
Q

Types of distributions: Consider these three variables: finishing times in a marathon, number of university dining hall meals eaten in a semester on a three-meal-a-day plan, and scores on a scale of extroversion.
a. Which of these variables is most likely to have a normal distribution? Explain your answer.
b. Which of these variables is most likely to have a positively skewed distribution? Explain your
answer, stating the possible contribution of a floor effect.
c. Which of these variables is most likely to have a negatively skewed distribution? Explain your answer, stating the possible contribution of a ceiling effect.

A
27
Q

Number of televisions and a grouped frequency distribution : The Canadian Radio-Television and Communications Commission (crtc.gc.ca/eng/publications) gathered data on the numbers of television sets in Canadian homes. Two percent of homes had no television; 28% had one television; 32% had two televisions; 20% had three televisions; and 18% had four or more televisions. Create a histogram for these percentages. (Treat “four or more televisions” as four for the purposes of this exercise.)

A
28
Q

Skew and movie ratings: IMDb (Internet Movie Database) publishes average ratings of movies worldwide. Anyone can log on and rate a film. What’s the worst-rated film of the more than 235,000 that are listed on IMDb? The Bollywood action-romance Gunday, which earned a rating of 1.4 on a scale of 1–10 (Goldenburg, 2014). Hardly any other movies even came close to that low rating. In fact, the average film is rated 6.3, and most of the movies garnered ratings between 5.5 and 7.2. Even though Gunday got pretty good critical reviews, it tanked on the crowd-sourced IMDb. Why? Activists in Bangladesh harnessed social media to give it a bad rating. One posted, “If you’re a Bangladeshi and care enough to not let some Indian crappy movie distort our history of independence, let’s unite and boycott this movie!!!”

a. Based on what you know about the typical ratings on IMDb, is a histogram based on these data likely to be normally distributed, negatively skewed, or positively skewed? Explain your answer.
b. Is there more likely to be a floor effect or a ceiling effect for these data? Explain your answer.
c. Based on this story, are audience-generated IMDb ratings a good way to operationalize the quality of a movie? What might be a better way?

A
29
Q

Frequencies, distributions, and obesity around the world: The World Happiness Report publishes a number of indicators related to physical and psychological well-being (Helliwell et al., 2018). For example, it publishes adult obesity rates for more than 30 countries, with those percentages ranging from 3.7% in Japan to 38.2% in the United States. Percentages for 20 of these countries are presented below.

30.7 32.4 12.3 12.8 10.3
22.6 9.8 24.8 16.6 38.2
17.8 3.7 27.9 15.3 21.3
19.0 23.6 14.7 25.8 19.2

a. Create a grouped frequency table for these data. Include a third column for percentages.
b. Create a histogram of these data.
c. Write a summary describing the distribution of these data with respect to shape and direction of any skew.
d. If you wanted the data to be positively skewed, how would the data have to shift to fit that goal? How could you use knowledge about the current distribution to target certain countries?
e. Are these data from correlational or experimental research? Explain your answer.

A
30
Q

Frequencies, distributions, and graduate advising: In a study of mentoring in chemistry fields, a team of chemists and social scientists identified the most successful U.S. mentors—professors whose students were hired by the top 50 chemistry departments in the United States (Kuck et al., 2007). Fifty-four professors had at least 3 students go on to such jobs. Here are the data for the 54 professors. Each number indicates the number of students successfully mentored by each different professor. (see table below)

a. Construct a frequency table for these data. Include a third column for percentages.
b. Construct a histogram for these data.
c. Describe the shape of this distribution.
d. How did the researchers operationalize the variable of mentoring success? Suggest at least two other ways in which they might have operationalized mentoring success.
e. Imagine that researchers hypothesized that an independent variable—good mentoring—predicts the dependent variable of
mentoring job success. One professor, Dr. Yuan T. Lee from the University of California at Berkeley, trained 13 future top faculty members. Dr. Lee won a Nobel Prize. Explain how such a prestigious and public accomplishment might present a confounding variable to the hypothesis described here.
f. Dr. Lee had many students who went on to top professorships before he won his Nobel Prize. Several other chemistry Nobel Prize winners in the United States serve as graduate advisors but have not had Dr. Lee’s level of success as mentors. What are other possible variables that might predict the dependent variable of attaining a top professor position?

A