Probability and Statistics Flashcards

(66 cards)

1
Q

It is the study of the collection, analysis, interpretation, presentation, and organization of data.

A

Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 2 Types of Data?

A

Qualitative and Quantitative Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

It deals with Categories or attributes.

Examples: Color of eyes, ethnicity, and brand of ice cream

A

Qualitative Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

These are numerical data.

A

Quantitative Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 2 Data under Quantitative Data?

A

Discrete data and Continuous Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

These are data obtained through counting. It can be expressed as whole numbers.

Examples: Number of Countries in Southeast Asia, Number of courses in a school term

A

Discrete data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

These are data obtained by measuring. It can be expressed as fractions and decimals.

Examples: Weight, age

A

Continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 4 Classification of Data / Level of Measurement?

A
  • Nominal Level
  • Ordinal Level
  • Interval Level
  • Ratio Level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

It is used for categorical data where the values represent different categories without any inherent order or ranking.

Examples:

  • Gender: Male, Female, Non-Binary
  • Color: Red, Blue, Green
  • Type of Animal: Dog, Cat, Bird
A

Nominal Level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Involves categories that can be ordered or ranked based on some criteria. However, the differences between ranks are not necessarily equal or measurable.

Examples:

  • Education Level: High School, Bachelor’s Degree, Master’s Degree, Ph.D.
  • Customer Satisfaction: Poor, Fair, Good, Excellent.
  • Socioeconomic Status: Low, Middle, High.
A

Ordinal Level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Involves numerical data with meaningful distances between values, but there is no true zero point.

Examples:

  • Temperature
  • IQ Scores
  • Calendar Dates
A

Interval level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Includes all the properties, but with a true zero point.

Examples:

  • Height
  • Weight
  • Income
A

Ratio Level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

This are statistical metrics used to describe the center or typical value of a data set. They provide a summary of the data by identifying a central point around which the data points tend to cluster.

A

Measures of central tendency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the Three primary measures of central tendency?

A
  • Mean
  • Median
  • Mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

It is commonly known as the average. It is the sum of all values in a data set divided by the number of values. It provides a measure of the central location of the data.

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

It is the middle value in a data set when it is ordered from smallest to largest.

A

Median

Note:

  • For an odd number of observations, the median is the middle value.
  • For an even number of observations, the median is the average of the two middle values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

It is the value/s that occur most frequently (repeated values) in a data set.

A

MODE

Note:

  • A data set may have no mode, one mode, bimodal, multimodal.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

It is also known as measures of variability or spread, describe the extent to which data values in a data set differ from the central value (such as the mean or median). They provide insights into the variability or consistency of the data.

A

Measures of Dispersion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

It is the difference between the maximum and minimum values in a data set. It gives a basic indication of the spread of the data.

A

Range

Formula: Max.value - Min.value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

It measures the range within which the central 50% of the data values lie. It is the difference between the first quartile (Q1) and the third quartile (Q3) and provides a robust measure of Dispersion that is less sensitive to outliers.

A

Interquartile Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

It measures the average squared deviation of each data point from the mean. It quantifies how much the data values spread out from the mean. It is useful for understanding the dispersion of the data but is in squared unit of the original data.

A

Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

It is the square root of the variance and provides measure of Dispersion in the same units as the original data. It indicates the average distance of each data point from the mean.

A

Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Where do we use standard deviation in real life and what does the value represent?

A

One useful application of this is when we compute for general average of the students. In case there are students who have an exact average, and we would like to rank them. We compute for the standard deviation of both students and whoever gets the LOWER standard deviation (SD) should be the 1st in rank and the student with HIGHER SD should be the 2nd in rank.

The lower the value of the SD means the more CONSISTENT the data are.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A property of distribution that has the mean as the center, acting as the mirror image of the two sides of the distribution. Most of the data values are found near the mean, tapering off on both sides of the mean.

mean = median

A

SYMMETRIC DISTRIBUTION

mean = median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
The mean and the median are not equal (mean ≠ median).
Asymmetric Distribution
26
* Most of the data values can be found on the left side of the equation. * The mean is greater than the median. **[mean > median]
RIGHT-SKEWED DISTRIBUTION
27
* Most of the data values can be found on the right side of the equation. * The mean is less than the median. **[mean < median]**
LEFT-SKEWED DISTRIBUTION
28
If PC = 0, then the data is ____.
Symmetric or normally distributed.
29
If PC ≥1, then the data is ________.
Right-skewed/ positively skewed.
30
If PC ≤ −1, then the data is _______.
Left skewed / negatively skewed.
31
Data in its ___ form can be arranged and organized into tables and graphs.
raw
32
It is arrangement of raw data into class intervals and frequency.
Frequency distribution table
33
Procedure in constructing frequency distribution table.
* Step 1: Calculate the range. * Step 2: Calculate the class width. Divide the range by the number of class intervals. ROUND-UP the obtained value. * Step 3: Start the frequency distribution with the lowest value and add the class width repeatedly to obtain the lower limits of the class intervals. * Step 4: Since class intervals cannot overlap, obtain the upper limits of each class intervals. * Step 5: Count how many of the values fall within each of the class interval.
34
It is a **graph that consists of vertical, rectangular bars** which represent the frequency of ranges of values. The rectangular bars have no graphs between them.
HISTOGRAM
35
Data is divided into two parts: “stem” and “leaf”. This is used when the distribution is symmetric.
Stem-and-Leaf Plot
36
It contains the minimum, maximum, lower quartile, and upper quartile. These values are known as the **five-number summary**. Best used when the data has extreme values.
BOX-AND-WHISKER-PLOT
37
It is a continuous probability distribution. This means that it generally uses either interval or ratio data.
Normal distribution
38
It is a great approximation of a normal distribution.
Histogram
39
Drawing a bell-shaped curve on the histogram determines if the data follows a normal distribution. TRUE OR FALSE
TRUE
40
A normal distribution has the following properties:
* It is a bell-shaped curve. * The total area under a normal curve is 1. * The tails of the normal curve are asymptotic to the horizontal axis. * The curve is symmetrical to the mean. * It is determined by the population mean and the population standard deviation. The mean controls the center and the standard deviation controls the spread of the distribution. * The mean, median, and mode have the same value.
41
The standard normal distribution has the same properties as that of normal distribution except that the mean is 1 and the standard deviation is 0. TRUE OR FALSE
**FALSE** mean is 0 and the standard deviation is 1
42
It is the **variable you manipulate, control**, or vary in an experimental study to explore its effects.
Independent variable
43
It is the variable that depends on other factors.
Dependent variable
44
It is the **study of relationship between independent and dependent** variables.
Correlation analysis
45
It is used to determine if there is a **linear relationship between two variables**. It has a value from -1 to 1.
Correlation coefficient (r)
46
If the correlation coefficient r = -1. What does this indicate?
Perfect negative relationship
47
If the correlation coefficient r = 1. What does this indicate?
Perfect positive relationship
48
If the correlation coefficient r = 0. What does this indicate?
No linear relationship
49
The closer the value of r to either -1 or 1 means that there is either a _______ negative or positive linear relationship. A. weak B. strong
B. strong
50
It is a **visual representation of the linear relationship** between the two variables.
Scatter plot
51
It is data on each of two variables, where each value of one of the **variables is paired with a value of the other variable**.
Bivariate data
52
It is obtained by simply taking the **square of the correlation coefficient (r)**. It tells how much of the variance in the values of one variable can be explained by the values on another variable.
Coefficient of Determination (r^2)
53
This is analysis that is slightly different from linear correlation analysis. The aim of this is to **DEVELOP AN EQUATION** to describe the relationship between variables and to predict the future values of the dependent variable from values of the independent variable.
Simple Linear Regression
54
It is also known as the **prediction line** is drawn on the scatter plot.
Regression Line
55
It is a **sum of money received** or paid for the use of someone else's money.
Interest (I)
56
It is the **original amount borrowed, deposited or invested**.
Principal (P)
57
It is the **percent of the principal paid per time period**.
Rate of interest (r)
58
It is the **number of years, months or days**.
Time (t)
59
It is the **interest earned at the end of the allotted time** between the lender and the borrower.
Simple interest
60
It is the **total amount** when the **principal is added to the interest**.
Maturity Value (M)
61
It is the **interest earned on previously earned interest** added to the principal.
Compound Interest (I)
62
It is **used instead of principal.**
Present value (P)
63
It is the number of times the interest will be added to the present value.
**Rate of conversion (m)** Annually (1) Semi annually (2) Quarterly (4) Bi-monthly (6) Monthly (12)
64
It is the annual interest rate.
Nominal rate (j)
65
It is the annual interest rate per frequency of conversion.
Periodic rate (i)
66
The product of frequency of conversions and time.
No. of conversions (n)