QMB 3200 Flashcards

(110 cards)

1
Q

Data

A

the facts & figures collected, analyzed, and summarized for presentation and interpretation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dataset

A

all the data collected for a particular analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Element

A

the entity on which data is collected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Variable

A

a characteristic of interest of an element.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Observation

A

the variables associated with an individual element.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Categorical

A

use numeric or ordinal values of measurement of categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Quantitative

A

use numeric (quantitative) measures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does statistical analysis depend on

A

The type of statistical analysis depends on whether the variable is categorical or quantitative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cross Sectional

A

data collected at a similar point in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Time Series

A

data collected over several time periods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Panel

A

combination of cross-sectional and time series data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Descriptive Statistics

A

describe data or variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Population

A

is the set of all data/variables of a statistical analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sample

A

is a subset of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Statistical Analysis

A

uses data from a sample to make estimates and test hypothesis about the characteristics of a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Row

A

contains variables names

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Column

A

contains the element

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Average Formula

A

=average(A:A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Median Formula

A

=median(A:A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Analytics

A

is the scientific process of transforming data for decision making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Descriptive Analytics

A

which describe what has happened in the past.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Predictive Analytics

A

uses statistical models frompast data to predict the future [forecasting] or access the impact of one variable on another [inference].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Prescriptive Analytics

A

uses models seeking to find a best (optimal) solution. Often these are some type of optimization model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Differences between Data and Big Data

A

Volume – the number of observations.
Velocity – the speed at which data is collected.
Variety – the forms of data are of different types.
Veracity – the reliability of the data generated.
*Focus on extracting predictive information from big data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Frequency Distribution

a tabular summary of data showing the number (i.e. frequency) of observations in each of several non over-lapping categories.
26
Relative Frequency
= frequency of a class/ n of a class
27
Percent Frequency
= relative frequency * 100
28
Bar Chart
a visual display of frequency; relative frequency & percent frequency distributions. (Used to compare 2 variables)
29
Pie Chart
a visual display of frequency; relative frequency & percent frequency distributions.
30
A frequency distribution with quantitative data must define the classes for a frequency distribution by:
determine the number of non over-lapping classes; b. determine the width of each class; c. determine the class limits.
31
Number of Classes
Typically, between 5 and 20.Small datasets have less; larger datasets have more.
32
Width of the Class
Generally, it should be the same for each class. Approximate class width = (largest data value – smallest data value)/number of classes.
33
Class Limits
each data observation must only belong to one class.
34
Relative Frequency Distributions
= frequency of the class/n.
35
Histogram
A visual display of a frequency, relative frequency or percent frequency distribution, where the variable of interest is on the horizontal axis and the frequency, relative frequency or percent frequency is on the vertical axis. * Shows the shape of the distribution of the variable of interest.
36
Cumulative Distribution
Presents the number of data items with values less than or equal to the upper class limit for each class.
37
Cumulative Relative Frequency Distribution
Shows the proportion of data items with values less than or equal to the upper limit of each class.
38
Cumulative Percent Frequency Distribution
Shows the percentage of data items with values less than or equal to the upper limit of each class.
39
Crosstabulation
a tabular summary of data for two variables (either categorical or quantitative)
40
Scatter Diagram & Trendline
A scatter diagram is a graphical display of the relationship between two quantitative variables and a trendline provides an approximation (i.e. an estimate) of the relationship; which can be positive, negative or none.
41
Side-by-Side Bar Chart

Depicts multiple bar charts on the same display.
42
Stacked Bar Chart 

Has one bar broken into segments of a different color showing the relative frequency of each class.
43
Mean(average)

is the average value for a variable and the sample mean is denoted as 𝑥, and the population mean is denoted as 𝜇.
44
Sample Mean
∑ 𝑥𝑖/𝑛 ; where Σ is the symbol meaning to sum or add up all the x i ’s. For a variable, the first observation is x 1 , the second is x 2 and the i this x i , and n is the number of observations.
45
Median
is the value in the middle, when the data are arranged in ascending order. When the number of observations are odd the median is the middle value; when the number of observations are even the median is the average of the two middle values. The median avoid problems where there are extreme high or low values for x.
46
Mode
is the value that occurs with the greatest frequency. If there are two values that are most frequent the variable is bi-modal; if there are more then it’s multi-modal.
47
Mode Formula
=mode.sngl
48
Weighted Mean
used when observations have different weights (relative importance).
49
Percentile
provides information about how the data is spread over the interval from the smallest to the largest value. The pth percentile divides the data into two parts – approximately p% of the observations are less than the pth percentile and approx. (100-p)% are greater.
50
Location of the pth percentile

𝐿𝑝 = 𝑝 /100 (𝑛 + 1)
51
Quartiles
represent how the data is spread over four parts, each containing approximately 25% of the observations. Q 1 is the first quartile (25th percentile); Q2 = second quartile (50th percentile or median); Q3 = third quartile (75th percentile). Same calculation as a percentile, but only use 25th , 50 th and 75 th .
52
Percentile Formula
=percentile.exc( array, k)
53
Quartile Formula
=quartile.exc( array, quart )
54
Measures of variability or dispersion of the data
Range: largest value – smallest value.Interquartile Range: Q3 – Q1 , is the range of the middle 50% of the data. Variance: measures variability using all the data, since it is based on the difference between the value of xi and the mean. This difference is called a deviation about the mean. For a sample, a deviation is 𝑥𝑖−̅𝑥 and for a population, a deviation is 𝑥𝑖− 𝜇. Then the deviation is squared.
55
Benefit of Variance
that it is useful in comparing the variability of two or more variables; but one difficulty is the units, since the variance is in units squared.
56
Standard Deviation
The population standard deviation: 𝜎 = sqr.rt𝜎 2 The sample standard deviation: 𝑠 = sqr.rt𝑠^2 The advantage is now the units are no longer squared, making it easier to compare the results to the mean as well as other statistics.
57
Variance Formula
=var.s (A:A)
58
Standard Deviation Formula
=stdev.s (A:A)
59
Coefficient of Variation

This is a measure of how large the standard deviation is relative to the mean. Coefficient of Variation = (𝑠/𝑥 ∗ 100)%
60
Distribution Shape

is measured by skewness. If the shape of the data is skewed to the left, the skewness is negative; if to the right then skewness is positive; and if the data is symmetric, then skewness is zero.
61
Symmetric Distribution
the mean and median are equal.
62
Positive Skew Distribution
the mean is usually greater than the median
63
Negative Skew Distribution
the mean is usually less than the median.
64
Skewness Formula

=SKEW CA,A... A)
65
Measure of Relative Location
Measures the relative location of values in the dataset. This helps determine how far a particular value is from the mean.
66
Z-Score
The z-Score yields a standardized value and is the number of standard deviations from the mean. The z-Score for any observation is a measure of the relative location of the observation in the dataset.
67
Chebyshev's Theorem

Allows us to make statements about the population of the data values that must be within a specified number of standard deviations from the mean. At least (1 − ⁄1 𝑧^ 2) of the data values must be within z standard deviations of the mean (z > 1).
68
Chebyshev's Theorem Advantage

It applies to any dataset if the data is bell shaped around the mean
69
Detecting Outliers 

Outliers are extreme values relative to the rest of the data. z-Score can help identify outliers.Typically, any z-Score greater than 3 is an outlier. Alternatively, we can use the interquartile range, where the: lower limit: 𝑄1 − 1.5(𝐼𝑄R) upper limit: 𝑄3 + 1.5(𝐼𝑄R)
70
Covariance
is a descriptive measure of the linear association between two variables.
71
Covariance Formula
= covariance.s CA:A, B:B) or =covariance.s (array 1, array2)
72
Descriptive Measures

Two descriptive measures of the relationship between two variables are covariance and correlation.
73
Interpreting Covariance

If sxy > 0, then there is a positive linear association between x and y. If sxy < 0, then there is a negative linear association between x and y. *Depends on unit of measurement
74
Correlation Coefficient Formula
=correl
75
Probability
A numerical measure of the likelihood of an event occurring. A probability ranges from 0 to 1, such as the probability it will rain tomorrow.
76
Permutations
Is a counting rule computing the number of experimental outcomes when n objects are to be selected from a set of N objects where the order of selections is important.
77
Basic Requirements Assigning Probabilities
The probability assigned to each experimental outcome must be between 0 and 1, inclusively. The sum of the probabilities for all experimental outcomes must be equal to 1.
78
3 Methods Assigning Probabilities
Classical Method – such as a coin toss or a roll of a 6-sided die. Relative Frequency Method - is used when data are available to estimate the proportion of time the experimental outcome will occur if the experiment is repeated a large number of times. Subjective Method - is used when outcomes are not equally likely and data is unavailable.
79
Events
a collection of sample points
80
Probability of an Event
is equal to the sum of the probabilities of the sample points in the event.
81
Complement of Event A 

(A^C) are all the sample points not in the event.
82
Union of 2 Events
is the event containing all sample points belonging to Event A, Event B or both. The union of Event A and Event B is denoted by: 𝐴 ∪ 𝐵.
83
Intersection of 2 Events
is the event containing the sample points belonging to both A and B. Intersection is denoted by: 𝐴 ∩ 𝐵
84
Addition Law
is useful when we want to know the probability that at least one of two events occurs.The addition law: 𝑃 (𝐴 ∪ 𝐵)= 𝑃(𝐴) + 𝑃 (𝐵) −𝑃(𝐴∩𝐵).
85
Mutually Exclusive Events
occur when two events have no sample points in common. Addition Law for mutually exclusive events: 𝑃 (𝐴 ∪ 𝐵) = P (A) + P(B)
86
Marginal Probabilities
is the sum of the joint probabilities (by row and column).
87
Conditional Probability
𝑃 (𝐴|𝐵) = 𝑃(A ∩ 𝐵)/𝑃(𝐵) or𝑃 (𝐵 |𝐴) = 𝑃(𝐴 ∩ 𝐵)/𝑃(𝐴)
88
Independent Events
Event A and Event B are independent if:P(A|B) = P(A) or P(B|A) =P(B)
89
Multiplication Law

is used to compute the probability of the intersection of two events:𝑃 (𝐴 ∩ 𝐵) = 𝑃 (𝐵) ∗ 𝑃(𝐴|𝐵) or𝑃 (𝐴 ∩ 𝐵) = 𝑃 (𝐴) ∗ 𝑃(𝐵|𝐴)For independent events: 𝑃 (𝐴 ∩ 𝐵) = 𝑃 (𝐴) ∗ 𝑃(𝐵)
90
Discrete Random Variables
either are a finite number of values or an infinite number of values such as 0, 1, 2 ...
91
Expected Value (or mean)

a random variable is a measure of the central location for a random variable.
92
Variance
measures the variability or dispersion of the random variable
93
Standard Deviation

is the positive square root of the variance.
94
Binomial Probability Distribution
The experiment consists of a sequence of identical trials. 2. Two outcomes are possible on each trial; successor failure. 3. The probability of success (p) and the probability of failure (1-p) does not change from trial to trial. 4. The trials are independent.
95
Binomial Distribution Formula 

=Binom.Dist
96
Poisson Probability Distribution

This distribution relates to the case for estimating the number of occurrences over a specified interval of space/time.
97
Properties of a Poisson Experiment
The probability of an occurrence is the same forany two intervals of equal length. 2. Occurrence or non-occurrence in any interval is independent of the occurrence or non-occurrence in any other interval. In the Poisson distribution, the mean and variance are equal
98
Poisson Probability Formula
= Poisson. Dist (x, u, true / False)
99
Hypergeometric Probability Distribution
is similar to the binomial distribution, except the trials are not independent and the probability of success changes from trial to trial. NOTE: r is the number of successes in population N, and N-r is the number of failures.
100
Hypergeometric Probability Distribution Formula
= Hypergeom. Dist (x, u, r, N, true/ False)
101
Difference Between Discrete and Continuous Random Variables

Discrete random variables are computed where the random variable takes on a specific value; continuous random variables are computed where the random variable is within an interval.
102
Exponential Probability Distribution

This distribution is useful for random variable measuring the time between an interval. The exponential probability density function
103
Exponential Probabilities Formula
= Expon. Dist (x, T/M, true / false)
104
Characteristics of the Normal Distribution
Only two parameters: μ and σ. 2. The highest point is the mean, which is also the median and the mode. 3. The mean can take on any numerical value. 4. The normal distribution is symmetric; skewness = 0 The standard deviation determines how flat or wide the normal curve is. Larger standard deviations result in wider or flatter curves. 6. Probabilities for a normal random variable are given by the area under the normal curve. Total area under the curve equals 1.
105
Normal Distribution Formula
=Norm. Dist (x,u, o, true, false)
106
Inverse Norm Formula

=Norm.Inv
107
Standard Normal Probability Distribution
Standard Normal Probability Distribution is where the μ is 0 and the standard deviation is 1.
108
Standard Normal Distribution Formula
= Norm. s. Dist (z, true)
109
Outliers Formula
= quartile. exe (array, quart)
110
Poisson Probabilities Formula
= Poisson. Dist (x, u, true / False)