Notes Flashcards

Question

Multimodal

Answer 1

Distributions with multiple peaks.

Answer 2

The mean of a specific subset of data. | We apply a condition and calculate the mean for values that meet that condition.

Answer 3

=AVERAGE(range, criteria, [average_range]) * range contains the one or more cells to which we want to apply the criteria or condition. * criteria is the condition that is to be applied to the range. * [average_range] is the range of cells containing the data we wish to average.

Answer 4

The value beneath which a certain percentage of the data lie.

Answer 5

The smallest value that is greater than or equal to 25% of the data points.

Answer 6

The answer cannot be determined without further information. The mean's location depends upon the distribution of the data set. Recall how the location of the mean differs for a symmetrical distribution and a skewed distribution. Therefore, there is no way to determine the percentile of the mean without more information about the data set.

Answer 7

50% Half of a distribution's data points are less than or equal to the median. Therefore, the median is equal to the 50th percentile, because 50% of the data points are equal to or below this value.

Answer 8

The answer cannot be determined without further information. The mode's location depends upon the distribution of the data set. Therefore, there is no way to determine the percentile of the mode without more information about the data set.

Answer 9

=PERCENTILE.INC(array, k) * array is the range of data for which we want to calculate a given percentile. * k is the percentile value. For example, if we want to know the 95th percentile, k would be 0.95.

Answer 10

The range. Range = Maximum value - Minimum value =MAX(number 1, [number 2], ...)-MIN(number 1, [number 2], ...) =MAX(A2:A11)-MIN(A2:A11)

Answer 11

=VAR.S(number 1, [number 2]...) * number 1 is the first number, cell reference, or range of cells for which to calculate the specified value. * [number 2],... represents additional numbers, cell references, or range of cells. The square brackets indicate that the argument is optional.

Answer 12

=STDEV.S(number 1, [number 2]...) The "S" in VAR.S and STDEV.S indicates sample. =SQRT(number) Number is the variance.

Answer 13

Step 1. From the Data menu, select Data Analysis, then select Descriptive Statistics. Step 2. Enter the appropriate Input Range: * The Input Range in column A with its label, A1:A11. * Make sure to include A1, the cell containing the label, when inputting your range, and check the Label in first row box, as this ensures that output table will be appropriately labeled. Step 3. Enter the appropriate Output Range. Step 4. Select Summary Statistics.

Answer 14

Estimates how close the mean of the sample is to the mean of the overall population. Calculated by dividing the standard deviation of the sample by the square root of the total number of data points. =STDEV.S(number 1, [number 2], ...)/SQRT(COUNT(number 1, [number 2], ...)) =STDEV.S(A2:A11)/SQRT(COUNT(A2:A11))

Answer 15

Flatness or sharpness of a distribution. | A flat distribution has low kurtosis; a very sharp distribution has high kurtosis.

Answer 16

The ratio of the standard deviation to the mean. Coefficient of Variation = Standard Deviation / Mean To compare variation in two data sets.

Answer 17

``` To visualize the relationship between two variables. One variable ("independent variable") is plotted on the horizontal axis (x-axis), and the other ("dependent variable") is plotted on the vertical axis (y-axis). ```

Answer 18

Step 1. From the Insert menu, select Scatter, then select Scatter With Only Markers. Step 2. Enter the appropriate Input Y Range and Input X Range: * The Input Y Range is ... data in column C with its label, C1:C11. * The Input X Range is ... data in column B with its label, B1:B11. * Make sure to include the cells containing labels when inputting ranges and check the Labels in first row box, as this ensures that scatter plot will be appropriately labeled.

Answer 19

* The strength of the linear relationship between two variables. * The extent to which the data points on the scatter plot create a line, on a scale from -1 to +1.

Answer 20

A relationship between two variables might exist--just not a linear one. The relationship may appear more like a curve.

Answer 21

1) Range: Correlation coefficients include all values, and only values, from -1 to 1. 2) Magnitude: Correlations are stronger for coefficients that are closer to -1 or 1; correlations are stronger as the coefficient value moves farther from 0. 3) Directionality: A positive correlation coefficient indicates a positive relationship, meaning that as one variable increases, the other variable increases. A negative correlation coefficient indicates a negative relationship, meaning that as one variable increases, the other variable decreases. 4) Non-linearity: Correlation coefficients measure only linear relationships; they may not provide insight into other types of relationships. Two variables with a correlation close to 0 or equal to 0 have little or no linear relationship; they may have no relationship at all or may have another type of relationship. A non-linear relationship may be visible in a scatterplot.

Answer 22

=CORREL(array 1, array 2) * array 1 is a set of numerical variables or cell references containing data for one variable of interest. * array 2 is a set of numerical variables or cell references containing data for the other variable of interest. * Note that the number of observations in array 1 must be equal to the number in array 2.

Answer 23

A variable that is correlated with each of two variables that are not fundamentally related to each other. That is, there is no reason to think that a change in one variable will lead to a change in the other; in fact, the correlation between the two variables may seem surprising until the hidden variable is considered. Although there is no direct relationship between these two variables, they are mathematically correlated because each is correlated individually with a third "hidden" variable. Therefore, for a variable to act as a hidden variable, there must be three variables, all of which are mathematically correlated (either directly or indirectly). See a correlation between weight gain and grades, driven by the hidden variable, worry. Students couldn't just eat more food and expect their grade to improve, nor could they make a point of doing poorly in their courses just to lose weight. These two variables are not fundamentally related.

Answer 24

Variables which are affected by one variable, and then affect another variable in turn. For example, being worried about grades 1. may cause a student to study harder, and thus get better grades, but we wouldn't consider studying to be a hidden variable linking worry and getting better grades. Those two variables ARE fundamentally related, in that the worry is leading to the better grades. If students are more worried, they may study harder and get even better grades. 2. may cause a student to stress eat and gain weight, but we wouldn't consider eating to be a hidden variable linking worry and weight gain. Those two variables ARE fundamentally related, in that the worry is leading to the weight gain. If students are more worried, they may gain even more weight.

Answer 25

Not an example of a hidden variable GDP is likely correlated with oil consumption. To determine whether there is a hidden variable, first identify two variables that are not fundamentally related to each other, and then identify a third "hidden" variable that is correlated with each. In this example, what would the two variables be? One would be oil, but there is no second variable proposed that is fundamentally unrelated to oil.

Answer 26

Example of a hidden variable In this case, the two variables are number of traffic lights and number of crimes. The third variable, population, is related to both. Population is related to traffic lights; higher populations lead to more traffic, which in turn leads to the need for more lights. Population is also related to number of crimes. Even if we hold the crime rate constant, as the population increases, the number of criminals, and thus number of crimes, increase. Traffic lights, however, do not lead to crime or vice-versa.

Answer 27

Not an example of a hidden variable Here there are not two variables that are correlated; there is only one: hot dog sales. Although dietary habits may be hidden from the researchers in a conversational sense, it is not a hidden variable in the statistical meaning of the term.

Answer 28

Not an example of a hidden variable Although the weather is probably correlated with the increase in same-day delivery, it is not related to the discount, and so does not function as a hidden variable between weather and the discount.

Answer 29

Example of a hidden variable In this case, first two variables are acne and music volume. The third variable, age, is related to both. Age is related to acne; with acne decreasing once a person passes adolescence. In addition, age is related to music volume, with younger people tending to listen to louder music. Loud music does not lead to acne or vice-versa.

Answer 30

A data set in which one of the variables is time. Time series data contain data about a given subject in temporal order, measured at regular time intervals (e.g. minutes, months, or years). Managers collect and analyze time series to identify trends and predict future outcomes.

Answer 31

Cross-sectional data contain data that measure an attribute across multiple different subjects (e.g. people, organizations, countries) at a given moment in time or during a given time period. Managers use cross-sectional data to compare metrics across multiple groups.

Answer 32

We want to compare the daily sales of stores in a mall during a day-long mall-wide event. (Cross-Sectional) Since we are interested in the sales of different stores on a single day (a single point in time), we should analyze a cross-section of the stores in the mall. We want to see if the Red Sox performance changes over the course of the baseball season. (Time Series) Since we are interested in comparing the Red Sox performance at different points in time during the baseball season, we should analyze time series data. We want to know the current average height and weight of citizens in each country that belongs to the European Union. (Cross-Sectional) Since we are interested in the average height and weight of citizens living in different countries in the European Union at a specific point in time ("currently"), we should analyze a cross-section of citizens. We want to know if a company's profits have increased after it started advertising more. (Time series) To determine whether profits have increased during a period of time, we must compare profits over time. Therefore, we should analyze time series data. We want to compare the final exam scores of students this semester. (Cross-Sectional) Since we are interested in final exam scores for a single point in time (this semester), we should analyze cross-sectional data of this year's results. We want to know if rates of dementia in the U.S. have decreased. (Time series) To determine whether rates of dementia have decreased, we must compare dementia rates over time. Therefore, we should analyze time series data.

Notes Flashcards

Remember and understand concepts, and know how to solve problems (56 cards)