AP Stat Ch 1 Flashcards

Question

Ways to display caterogircal data

Answer 1

Bar graphs and relative frequency bar graphs Pie charts and segments bad charts Two way table

Answer 2

Label variables and scales The bars should be the same width and not touching each other The order of the categories doesn't matter Relative frequency bar charts make it easier to compare multiple distributions, especially when the sample sizes are different

Answer 3

Label variables and categories Pie charts are easier to construct with a computer spreadsheet program or stat software Pie charts help us visually see what part of the whole each group forms Segmented bar charts are basically rectangular pie charts, each bar is a whole, divide each bar proportionally into segments corresponding to the percentage in each group Segmented bar charts make it easier to compare distributions

Answer 4

BE SURE TO LABEL GRAPHS!!!

Answer 5

Segmented bar chart | Three bars, one with tenth, one with eleventh, one with twelfth

Answer 6

A table with two categorical variables

Answer 7

Distributions of categorical data that appear at the right and bottom margins of a two way table. They help us to look at the distribution of each variable separately

Answer 8

Caterogiral distrivutions inside a two way table that deals w a specific number inside the table

Answer 9

Rows + columns

Answer 10

An association between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of the third variable are combined. This reversal is called Simpson's paradox. Therefore You must be careful when data from several groups are combined to form a single group! Data that suggests one conclusion when aggregated and a different conclusion when presented in subcategories

Answer 11

With Simpson's paradox Sometimes the relationship between two variables is influenced by other variables that we did not measure or even think about! Because the variables are lurking in the background, we call them lurking variables. They are not among the explanatory or response variables in a study, but they may influence the interpretation of the relationship among these variables.

Answer 12

It is caused by a combination of a lurking variable and data from unequal sized groups being combined into a single data set. The unequal group sizes, in the prescense of a lurking variable, can weight the results incorrectly. This can lead to seriously flawed conclusions. The obvious way to prevent it is to not combine data sets of different sizes from diverse sources! A great deal of care has to be taken when combining small data sets into a larger one. Sometimes Conclusions from large data sets are the opposite of conclusions from smaller ones. Conclusions from large set are usually wrong!

Answer 13

A simple way to display quantitative data when the set is reasonably small

Answer 14

Label your axis (horizontal line) with the variable and title your graph Scale the axis based on the values of the variable Mark a dot above the number on the horizontal axis corresponding to each data value. Stack multiple dots vertically

Answer 15

Another way to display a relatively small numerical data set. Often the values of the variable are too spread out to make a dotplot, so this is a better option. Stem is the first part of the number and leaf is last digit

Answer 16

Separate each observation into a set consisting of all but the rightmost digit and a leaf, the final digit. Write the stems vertically in increasing order from top to bottom, and draw a vertical line to the right of the stems. Write each leaf to the right of its stem. Numbers to the left of the line are the stems and to the right are the leaves. MUST INCLUDE A KEY W UNITS LEAVES MUST BE IN SINGLE DIGITS, NO COMMAS it is best if leaves are in numerical order

Answer 17

Useful for comparing distributions Example is comparing female and male weights Have stem in the middle and leaves on both sides with male above one side and female above the other

Answer 18

When a data set is very compact, it is often useful to split stems to stretch the display to investigate the shape. Whenever you split stems, be sure that each stem is assigned an equal number of possible leaf digits. When given data all between 96 and 99, make stems 96,96,97,97,98,98,99,99 and have the top be 0-4 for leaves and bottom be 5-9

Answer 19

Truncate or round the data to shrink the display | Change 10.53 to 11

Answer 20

SHAPE, CENTER, AND SPREAD

Answer 21

Symmetric, skewed right, or skewed left Unimodal if one peak, bimodal if two peaks Uniform if a plateau, get same values

Answer 22

Data values that fall outside the overall pattern of the rest of the distribution. Q3 + 1.5IQR

Answer 23

Isolated groups of points of points

Answer 24

Large spaces between points

Answer 25

If the right and left sides of the historgram are approximately mirror images of each other

Answer 26

The thinner ends of a distribution are called the tails. If one tail stretches out further than the other, the historgram is said to be skewed to the side of the longer tail

Answer 27

Used to display larger data sets for quantitative data

Answer 28

In discrete historgrams, make the bars over the center of the number on the X-axis. In continuous histograms, make classes where the bars fall between. For example, make groups of 5 and have on the left edge 40, right edge 45 and then 50 and then 55.

Answer 29

Label axis and scales Bars should touch Y axis is frequency or relative frequency X axis is variable

Answer 30

Same as regular histogram, but have relative frequency (percent of total) rather than frequency (number of observations) on the vertical axis. Relative frequency histograms are more useful because you can compare two distributions easier

Answer 31

Histograms uses QUANTITATIVE variables while bar graphs use CATEGORICAL data. Histograms don't have spaces between bars, bar graphs have spaces

Answer 32

``` Make classes of the same length that never overlap Divide the range of the data into classes of equal width. Count the number of observations in each class Five classes is a good minimum. Too few will give a skyscraper graph and too many will give a pancake graph. Label and scale your axes If an observation falls on a boundary, put the value into the upper class. ```

Answer 33

Culumative relative frequency graph | Relative culm frequency is percentile

Answer 34

Look at the mean and the median

Answer 35

Greek letter mu (u with long stem) | The arithmetic average of all values in the entire population

Answer 36

X with a bar above it. Since we rarely study the entire population, estimate population mean with the sample mean = sum of all values / number of values

Answer 37

The middle score | To find which value is the middle score, put all the data in order

Answer 38

Most frequency observation. Not a useful measure of center.

Answer 39

Measure not affected by outliers

Answer 40

Median is resistant--not affected by outliers so it is better for a skewed data set Mean is not resistant--affected by outliers, as outliers affect arithmetic average.

Answer 41

Use median with all data | Mean with symmetrical data since mean is not resistant and median is

Answer 42

Skewed left is when the tail is to the left. Median> mean Lower values that push the graph to the left. Bell curve on right. Tail on left. Skewed right is when the tail is to the right Mean>median Bell curve on the left.

Answer 43

Skewed left: Median> mean Skewed right: Mean> median Symmetric: Mean roughly equal to median

Answer 44

Full spread of data by simply finding the difference between the largest and the smallest observation. ONE NUMBER MAX-MIN BUT it is not resistant. Outliers heavily influence the range

Answer 45

Range for roughly symmetric data without outliers. | IQR when skewed or have outliers

Answer 46

A resistant measure of spread. It is the distance between the first and third quartiles. The range of the middle half of the data. IQR=Q3-Q1

Answer 47

Q1 is first quartile--the point that divides the lowest 25% of the data from the upper 75% Q2 is the median Q3 is the third quartile--the point that divides the lowest 75% of the data from the upper 25%

Answer 48

Get data in order. Find median. Median is Q2 Half the data above the median is Q3 and half the data below the median is Q1 In that data above the median, take that median. That value is Q3. Do the same for the data below the median and get Q1

Answer 49

Shape: Skewed (direction) or symmetric Unimodal or bimodal Center: Mean or median Spread: IQR, range, standard deviation

Answer 50

(MIN,Q1,MED,Q3,MAX)

Answer 51

A graph of the five number summary. Easy to make and clearly shows center and spread of the distribution. Skewed toward the side with the longer box. Useful to compare multiple distributions -- side by side boxplots and are usually drawn vertically

Answer 52

Central box spans the quartiles Q1 and Q3 A line in the box marks the Median, M Lines extend from the box out to the smallest and largest observations. Width of the box = IQR label axes and scale

Answer 53

Specifically identifies outliers, in addition to median and quartiles. Regular boxplot connects outliers.

Answer 54

Averaged squared deviation of the observations from the mean S squared

Answer 55

The deviation of an observation is its distance from the mean (x-x bar). The mean is the point that makes the sum of the deviations=0. We square the deviations to make negatives positives.

Answer 56

Greek letter looking like the letter "o' Standard deviation of all the values in the entire population. Typical deviation from the mean or the average distance form the average = SQRT (sum of (x-mu)^2 / n)

Answer 57

Square of the population standard deviation | Greek letter looking like "o" squared

Answer 58

Represented by s Since we rarely study entire populations, use this. Your distance from the center or your average distance from the average. Approximates the average, or typical deviation = SQRT ( sum of (x-x bar)^2 / (n-1))

Answer 59

Square of the sample standard deviation. | S squared

Answer 60

Some error between x bar and mu, so this helps to accounts for this.

Answer 61

Use sample unless told otherwise

Answer 62

When talking about mean, as this measures spread about the mean.

Answer 63

When there is no spread. All observations have same value. Otherwise, s>0 As observations are more spread out about their mean, s is larger.

Answer 64

No because like the mean | Strong skewness or outliers can make S very large

Answer 65

Usually better than the mean and standard deviation for describing a skewed distribution or a distribution with strong outliers Use the mean and s when with reasonably symmetric disturbition free of outliers.

Answer 66

Shape never changes. Center always changes -- when multiplying each observation by b, multiply both mean and median by b. Adding same number a adds a to mean and median Spread stays the same when adding same amount to each but increases if multiply each data point by something. When milt by b, spread is multiplied by b.

AP Stat Ch 1 Flashcards

(90 cards)