Probability and Statistics Flashcards
(66 cards)
It is the study of the collection, analysis, interpretation, presentation, and organization of data.
Statistics
What are the 2 Types of Data?
Qualitative and Quantitative Data
It deals with Categories or attributes.
Examples: Color of eyes, ethnicity, and brand of ice cream
Qualitative Data
These are numerical data.
Quantitative Data
What are the 2 Data under Quantitative Data?
Discrete data and Continuous Data
These are data obtained through counting. It can be expressed as whole numbers.
Examples: Number of Countries in Southeast Asia, Number of courses in a school term
Discrete data
These are data obtained by measuring. It can be expressed as fractions and decimals.
Examples: Weight, age
Continuous data
What are the 4 Classification of Data / Level of Measurement?
- Nominal Level
- Ordinal Level
- Interval Level
- Ratio Level
It is used for categorical data where the values represent different categories without any inherent order or ranking.
Examples:
- Gender: Male, Female, Non-Binary
- Color: Red, Blue, Green
- Type of Animal: Dog, Cat, Bird
Nominal Level
Involves categories that can be ordered or ranked based on some criteria. However, the differences between ranks are not necessarily equal or measurable.
Examples:
- Education Level: High School, Bachelor’s Degree, Master’s Degree, Ph.D.
- Customer Satisfaction: Poor, Fair, Good, Excellent.
- Socioeconomic Status: Low, Middle, High.
Ordinal Level
Involves numerical data with meaningful distances between values, but there is no true zero point.
Examples:
- Temperature
- IQ Scores
- Calendar Dates
Interval level
Includes all the properties, but with a true zero point.
Examples:
- Height
- Weight
- Income
Ratio Level
This are statistical metrics used to describe the center or typical value of a data set. They provide a summary of the data by identifying a central point around which the data points tend to cluster.
Measures of central tendency
What are the Three primary measures of central tendency?
- Mean
- Median
- Mode
It is commonly known as the average. It is the sum of all values in a data set divided by the number of values. It provides a measure of the central location of the data.
Mean
It is the middle value in a data set when it is ordered from smallest to largest.
Median
Note:
- For an odd number of observations, the median is the middle value.
- For an even number of observations, the median is the average of the two middle values.
It is the value/s that occur most frequently (repeated values) in a data set.
MODE
Note:
- A data set may have no mode, one mode, bimodal, multimodal.
It is also known as measures of variability or spread, describe the extent to which data values in a data set differ from the central value (such as the mean or median). They provide insights into the variability or consistency of the data.
Measures of Dispersion
It is the difference between the maximum and minimum values in a data set. It gives a basic indication of the spread of the data.
Range
Formula: Max.value - Min.value
It measures the range within which the central 50% of the data values lie. It is the difference between the first quartile (Q1) and the third quartile (Q3) and provides a robust measure of Dispersion that is less sensitive to outliers.
Interquartile Range
It measures the average squared deviation of each data point from the mean. It quantifies how much the data values spread out from the mean. It is useful for understanding the dispersion of the data but is in squared unit of the original data.
Variance
It is the square root of the variance and provides measure of Dispersion in the same units as the original data. It indicates the average distance of each data point from the mean.
Standard Deviation
Where do we use standard deviation in real life and what does the value represent?
One useful application of this is when we compute for general average of the students. In case there are students who have an exact average, and we would like to rank them. We compute for the standard deviation of both students and whoever gets the LOWER standard deviation (SD) should be the 1st in rank and the student with HIGHER SD should be the 2nd in rank.
The lower the value of the SD means the more CONSISTENT the data are.
A property of distribution that has the mean as the center, acting as the mirror image of the two sides of the distribution. Most of the data values are found near the mean, tapering off on both sides of the mean.
mean = median
SYMMETRIC DISTRIBUTION
mean = median