Chapter 1 Flashcards
(52 cards)
Cases
The objects described by a set of data.
Ex. Customers, companies, subjects in a study, stock
Label
Is a SPECIAL VARIABLE used in some data sets to distinguish the different cases
Variable
Is a characteristic of the case–> different cases can have different values for variables
Observation
Describes the data for a particular case
Categorical Variable
Places a case into one of several groups or categories
Ex. Bar Graphs, Pie Charts, and Pareto Charts
Quantitative Variable
Takes numerical values arithmetic operations, such as adding and averaging, makes sense
Statistical Software
In some statistical software spaces are not allowed in variable names–> instead use an underscore
Ordered Categorical Variable
Possible values for a grade…A, B, C, D..etc because A is better than B which is better then C and so on
Nominal Variable
A categorical variable that is not ordered
Instruments
Different areas of application (marketing) can also have their own special variables–> these variable are measured with instruments
Rate
Computing a rate is one of several ways of adjusting one variable to create another–> sometime more meaningful than count
Distribution
Describes how to values of a variable vary from case to case
Pareto Chart
Categories are ordered from MOST frequent–>least frequent–>most important categories for a categorical variable
Ex. frequently used in quality control settings
Histogram
The most common graph of the distribution of a quantitative variable wear we group near values into classes–> for small data sets a stemplot can be used
How can you describe the overall pattern of a histogram
You can describe the overall pattern of a histogram by its SHAPE, CENTER, and SPREAD
Outlier
The most important type of deviation–> an individual value that falls outside the overall pattern
When is a distribution symmetric?
If the right and left sides of the histogram are mirror images of each other
Skewed to the right
If the right side of the histogram extends much farther out than the left side..and vice versa
Positively skewed
Data that skews to the right–> positive skewness is the MOST common type of skewness that we see in real data
Time plot
Plots each observation against the time it was measured–> time on a horizontal and the variable you are measuring on a vertical scale
Mean
The most common measure of center is the ordinary arithmetic average–> NOT a resistant measure of center as it can be influenced by outliers
Median
The median is the midpoint of a distribution, the number such that half the observations are smaller and half are larger
Median Odd
(N+1)/2 observations up from the bottom of the list
Median Even
It is the mean of the two numbers in the middle