Mid-term Flashcards

(43 cards)

1
Q

what is Data

A

the facts and figures collected, analyzed, and summarized for presentation and
interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Quantitative data

A

Data are considered quantitative data if numeric and arithmetic operations, such as addition,
subtraction, multiplication, and division, can be performed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is categorical Data

A

If arithmetic operations cannot be performed, they are categorical data.

When data is seperated into groups/classes like in a Venn Diagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Variable

A

A variable is a characteristic or a quantity of interest that can take on different values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is Variation

A

Variation is the difference in a variable measured over observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is cross sectional data

A

Cross-sectional data are data collected from several entities at the same, or approximately the same, point in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is time series data

A

Time series data are data collected over
several time periods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Random Sampling

give an example

A

Random sampling is a sampling method that allows gathering a representative sample
from the POPULATION DATA

Example: Age, education, location income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sample data

A

A sample is a subset of the population, which consists of all the elements of interest

leads to uncertainty

Example: What is your sample? (816 voters)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Experimental study

A

a variable of interest is first identified.

Then, one or more other variables are identified and controlled or manipulated to obtain data about how they influence the variable of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Statistical data

A

Data necessary to analyze a business problem can often be obtained with a statistical study

Statistical studies can be classified as experimental or observational

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Observational or Nonexperimental Study

A

does not attempt to control the variables of
interest.

A survey is perhaps the most common type of observational study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Frequency distribution

A

frequency distribution is a summary of data showing the number (frequency) of observations
in several nonoverlapping classes, typically referred to as bins.

example: 50 soft drinks are distributed over 5 types of soft drinks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Relative Frequency

State equation

A

For a data set with n observations, the relative frequency of each bin can be determined as follows

Frequency of Bin / number of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Relative frequency distribution

A

A relative frequency distribution is a tabular (any summary that uses a table) summary of data showing the relative frequency for each bin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

percent frequency distribution

A

A percent frequency distribution summarizes the percent frequency of the data for each bin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Bin Width formula

A

Largest data- smallest data / number of bins = BW (always round up)

Bin widths calculations:

(MIN,MIN+BW), (UB1,UB1+BW), (UB2,UB2+BW)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what kind of data does Histogram use

A

a common graphical presentation of quantitative data

19
Q

What is a histogram

A

histogram is a column chart with no spaces between the columns whose heights represent the frequencies of the corresponding bins

20
Q

Frequency polygon

A

A frequency polygon is useful for comparing quantitative distributions.

-A frequency polygon uses lines to connect the frequency counts of observations from
different bins.

21
Q

Cumulative Frequency distributions

A

A cumulative frequency distribution is a variation of the frequency distribution that provides another tabular summary of quantitative data.

  • It uses the number of classes, class widths, and
    class limits developed for the frequency
    distribution.
  • Shows the number of data items with values less
    than or equal to the upper-class limit of each class.
22
Q

Skewness

A

Skewness, or lack of symmetry, is an important characteristic of the shape of a distribution

Skewed Left= data is higher at right and lower at left
Skewed right= data is higher at left and lower at right
symmetrical= data looks like a pyramid

Skewness can be highly or moderately depending on how skewed

23
Q

Mean

equation

A

The most common measure of central location is the mean, the average of all the data values.
The population mean is denoted by the Greek letter, 𝜇

Sum of All Data / # of data sets = mean (average)

24
Q

Mode

A

The mode of a data set is the value that occurs with the greatest frequency (number that is seen the most)

The greatest frequency may occur at two or more different values. In these instances, more
than one mode exists.

25
Bimodal
Data set has exactly 2 modes
26
Multimodal
Data has more than 2 modes
27
Median
The median is the value in the middle of a data set when data are arranged in ascending order. a) if n is odd, the median is the middle value b) if n is even, the median is the average of the two middle values
28
What would a perfectly Symmetrical distribution look like?
Histogram would look like a pyramid, highest values in the middle the MEAN would be equal to the MEDIAN
29
Geometric Mean | equation
The geometric mean is often used in analyzing growth rates in financial data (where using the arithmetic mean will provide misleading results) it should be applied any time you want to determine the mean rate of change over several successive periods (be it years, quarters, weeks, etc. equation= Nroot of all data multiplied together Nroot= number of data set root
30
Range | equation
Simplest form of variability Range = Largest value- smallest value
31
Variance and Computation of variance | equation
-The variance is a measure of variability that utilizes all the data. -In most statistical applications, when we compute a sample variance, we are often interested in using it to estimate the unknown population variance word equation= sum of every data set- the mean, squared, then divided by number of samples -1 =(data 1- mean)^2 + (data2 - mean)^2.... =add all together = divded by number of samples (2) -1
32
Standard Deviation
Square root of the variance= standard deviation a measure of how dispersed the data is in relation to the mean. Low, or small, standard deviation indicates data are clustered tightly around the mean, and high, or large, standard deviation indicates data are more spread out.
33
Coefficient of Variation | equation
The coefficient of variation, usually expressed as a percentage, measures how large the standard deviation is relative to the mean. = standard deviation /mean
34
Percentiles
Finding a specific percentage of a data set. exm: if you are looking for the 95% of data, that means that only 5% of data is higher than you To calculate the 𝑝th percentile of a data set, we must first sort the data in ascending order.
35
Percentiles/Quartiles equation
lets find the 85th percentile and sample size is 12 L85= P/100 (n+1) L85=85/100 (12+1) L85=.85 x 13 L85= 11.05 this means that the 85th percentile will be 5% of the way between values 11 and 12 if value 11 = 298,000 and value 12= 456,250 85th percentile = 298,000 + 5% (456,250-298,000) = 305,912.50
36
Quartiles
Quartiles are specific percentiles that divide the data set into four parts, with each part containing approximately 25% of the observations. Uses the same formula as Percentiles
37
Interquartile range | equation
The difference between the third and first quartiles is often referred to as the interquartile range, or IQR Q3-Q1 = Interquartile range
38
Boxplots
A boxplot, also known as box-and-whisker plot, is a graphical summary of the distribution of data developed from the quartiles for a data set.
39
Outliers why should be taken into consideration
An outlier is an unusually small or unusually large value in a data set. Care should be taken when handling outliers, as they might be: - an incorrectly recorded data value
40
Covariance
the directional relationship between the returns on 2 assets, similar to correlation but only gives direction
41
Scatter chart
A scatter chart is a graph for analyzing the relationship between two variables. uses Linear lines to determine its correlation
42
Correlation (coefficient)
One limitation of the covariance to describe the relationship between two variables is that its magnitude depends on the variables’ units of measurement. A standardized measure of linear association between two variables that takes on values between −1 and +1. Values near +1 indicate a strong negative, linear relationship, values near −1 indicate a strong positive linear relationship, and values, near zero indicate the lack of a linear relationship. gives strength, which will be a percentage which explains the direction
43
Bins
The nonoverlapping groupings of data used to create a frequency distribution. Bins for categorical data are also known as classes