💝✨Statistics✨💝 Flashcards

(175 cards)

1
Q

What is statistics?

🍬Hint: ITSOHTCOAAID

A

✨Its the study of how to collect, organise, analyse and interpret data.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Individuals = ?

A

✨People or objects included in a study.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variable = ?

A

✨Characteristics of the individual to be measured or observed.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Population = ?

A

✨A group of individuals with a common theme.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample = ?

🍬Hint: ASPOTPICBROB

A

✨A small portion of the population. It can be representative or biased✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Census

🍬Hint: ARACIAAWP

A

✨Acquiring, recording, and, calculating information (ARC) about a whole population.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

N = ?

A

✨Total population.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

n = ?

A

✨Sample✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Parameters = ?

A

✨A measure that describes the entire population.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Statistic = ?

🍬Hint: AMTDAS

A

✨A measure that describes a sample.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

🌸Descriptive statistics = ?🌸

🍬Hint: MOOPASIFSAP

A

✨Methods of organising, picturing, and summarising (OPS )information from samples and populations.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

🌸Inferential statistics = ?🌸

🍬Hint: MOUIFASTDCRAP

A

✨Methods of using information from a sample to draw conclusions regarding a population.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

⚠ Its important to properly idenitfy meansures as [ 👛 ] or [ 🐽 ]in statistics…..?

A

✨ 👛Population Parameters 👛 or 🐽Sample Statistics.🐽✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

⚠️ Different types of data are……?

🍬Hint: UFPAS

A

✨Used for parameters and statistics.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Quantitative = ?

🍬Hint: ANMOS

A

✨A numerical measurement of something.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

🌸Qualitative = ?🌸

🍬Hint: AQOCCOS

A

✨A quality or categorical characteristic of something.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

🌸Interval = ?🌸

A

✨no true 0✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Ratio = ?

A

✨True 0✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Nominal = ?

🍬Hint: CLONTDHAHO

A

✨Categories, labels or names, that don’t have a hierarchal order✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Ordinal = ?

🍬Hint: CLN

A

✨Categories,labels and names that do have a hierarchy✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

QUANTITATIVE ——– QUALITATIVE
🎀👇🏻🎀 💅🏻👇🏻💅🏻

A

✨🎀Interval and ratio🎀 💅🏻Nominal and ordinal💅🏻✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

🌸All data can be qualified as……?🌸

🍬Hint: QOQATF

A

✨Quantitative or Qualitative and then further.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

🌸Sampling frame = ?🌸

🍬Hint: LOIFWASIAS

A

✨List of individuals from which a sample is actually selected.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

🌸Undercoverage = ?🌸

🍬Hint: OPMFTSF

A

✨Omitting population members from the sampling frame.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
🌸Sampling error = ?🌸 | 🍬Hint: IETCWSBATEOTDWAIANWWTFP
✨Inevitable errors that come with sampling because at the end of the day we are inferring, and not working with the full population.✨
26
🌸Non-sampling error = ?🌸 | 🍬Hint: AVOAHE
✨Any variety of avoidable human errors.✨
27
Simulations = ?
✨Numerical facsimiles that mimic real-word processes using random sampling to estimate probabilities or outcomes.✨
28
⭐️A common rule of thumb is...?⭐️
✨The more complex the model and assumptions, the higher the likelihood of error. Generally, aim for at least a 95% confidence level in results, but always validate with real data when possible✨
29
🎀Simple random sampling (SRS)💅🏻 = ?
✨A method where each member of a population has an equal chance of being selected for the sample✨
30
⚠️ A high quality sampling frame is crucial in....?
✨🎀SRS💅🏻✨
31
🎀💅🏻 = ?
🎀✨Known terms✨💅🏻
32
🎀Stratified sampling💅🏻
✨Dividing your list into subgroups (strata) based on specific variables.✨
33
⚠️ If you have a strata that is somewhere in between specific variables then you must....?
✨choose a variable in an objective manner to put the strata into✨
34
The strata are based on.....?
✨a specific characteristic✨
35
⭐️Draw an SRS from each stratum because....?⭐️
✨To ensure each subgroup (stratum) is properly represented, reducing bias and increasing accuracy✨
36
⭐️ You can further [ ] you strata for better accuracy.⭐️
✨stratify✨
37
⚠️ Oversampling = ?
✨Intentionally collecting more data from an underrepresented group.✨ ⚠️ It distorts the natural proportions of the data leading to bias if not handled correctly
38
kᵗʰ = ?
✨Every "k" number in a sequence✨ Example: If k = 5, you pick every 5th item (5th, 10th, 15th, etc.). 📊
39
🎀Systematic sampling💅🏻 = ?
✨Systematic sampling is when you pick every kᵗʰ person from a list after randomly starting somewhere ✨
40
Steps of 🎀Simple random sampling💅🏻 = ?
1. 💖Define the population: Identify everyone or everything in your group.💖 2. 💖List all individuals: Write down each item or person in the population (could be in a list or database). 💖 3. 💖Random selection: Use a random method (like drawing names from a hat or using a computer generator) to pick individuals.💖 4. 💖Select your sample: The number of individuals you pick depends on your desired sample size, but every person has an equal chance of being chosen.💖
41
Steps of 🎀Systematic sampling💅🏻 = ?
💖 Define the population: Identify the full group of items or individuals (like all your Barbie accessories). 💖 💖 Choose your sample size: Decide how many you want to pick (e.g., 10 Barbie accessories). 💖 💖 Calculate the interval (k): Divide the total number of items by the sample size. For example, if you have 100 accessories and want 10, your interval (k) is 10. 💖 💖 Pick a random starting point: Choose a random number between 1 and k. Let’s say you randomly choose 3. 💖 💖 Select every kᵗʰ item: Starting from your random point (e.g., 3), pick every 10th item (3, 13, 23, etc.). 💖
42
🎀Stratified sampling💅🏻 steps = ?
1. 💖 Define the population: Think of this as the whole brain—every neuron and pathway you need to consider in your study. 💖 2. 💖 Divide into strata: Just like how a neurosurgeon divides the brain into different regions (frontal, temporal, occipital, etc.), you categorize your population into specific groups based on a characteristic—such as age or gender. 💖 3. 💖 Randomly sample within each stratum: After identifying the brain regions, you carefully extract samples from each region (just like taking samples from specific parts of the brain during surgery) to make sure each region is well-represented in your data. 💖 4. 💖 Combine the samples: Once you’ve sampled from each region, you reconnect the brain regions into one holistic view, just like stitching up the brain after surgery, combining all your samples into one complete dataset. 💖
43
⚠️ You can't do [ ] with 🎀systematic sampling💅🏻
✨Patterns✨
44
🎀Cluster Sampling💅🏻 = ?
✨A method where the population is divided into groups (clusters), and entire clusters are randomly selected for study.✨
45
🎀Cluster Sampling💅🏻 steps = ?
✨💖 1. Define the population: Identify the entire group you're studying (e.g., all schools, hospitals, or neighborhoods). 💖 2. Divide the population into clusters: Break the population into natural groups or clusters (e.g., different cities, schools, or regions). 💖 3. Randomly select clusters: Choose entire clusters at random from the list of clusters you created. 💖 4. Study all individuals in selected clusters: Collect data from every individual within the randomly chosen clusters. 💖 5. Analyze the data: Combine the data from all the individuals within the selected clusters to draw conclusions about the population. 💖✨
46
⚠️ 🎀Cluster Sampling💅🏻 is better used when.....?
✨💖 When the population is large or geographically spread out: It’s more practical to divide the population into clusters (e.g., cities, schools, or neighborhoods) and randomly select entire clusters to make the sampling process more manageable. 💖 💖 When you can’t list every individual: If it’s hard or time-consuming to list every individual in the population, using clusters allows you to focus on groups, making it more feasible. 💖 💖 When cost and time are factors: Studying entire groups within clusters can save you time and money compared to sampling individuals from every possible group. 💖 💖 When the groups are similar: Cluster sampling works well when the individuals within each cluster are similar to each other, so studying one cluster can give you a good understanding of the whole population. 💖 In short, use it when it’s more efficient and practical than other sampling methods, especially for large or hard-to-reach population. If time and money aren't a factor it's usually better to do more individualised studies💖✨
47
🎀Convenience Sampling💅🏻 = ?
✨A non-probability sampling method where participants are selected based on their ease of access (convenience) rather than randomly.✨
48
Steps of 💅🏻Convenience Sampling🎀 = ?
✨💖 1. Define your population: Identify the group you want to study (e.g., customers, students, etc.). 💖 2. Select participants based on availability: Choose participants who are easiest to reach or access, such as people nearby, volunteers, or those readily available. 💖 3. Collect data: Gather information from the selected individuals. 💖 4. Analyze the data: Use the collected data to draw conclusions about the population (though remember, the results may not be fully representative due to potential bias). 💖 It's all about ease and accessibility, but it can introduce bias because it doesn’t provide a random selection. 💖✨
49
🎀Multistage Sampling💅🏻 = ?
✨A method where sampling happens in steps, selecting large groups first and then narrowing down using different sampling methods at each stage.✨
50
🎀Multistage Sampling💅🏻 steps = ?
💖 1. Define the population: Identify the entire group you want to study. 💖 2. Divide the population into large clusters: Break it into big, manageable groups (e.g., cities, schools, hospitals). 💖 3. Select clusters using a sampling method: Use random sampling (or another method) to pick some clusters. 💖 4. Further divide the selected clusters: Break them down into smaller subgroups (e.g., schools → classrooms → students). 💖 5. Use a different sampling method at each stage: For example, SRS for cities, cluster sampling for schools, and systematic sampling for students. 💖 6. Collect data from the final sample: Gather data from the individuals selected in the last stage.
51
⚠️Multistage sampling can be less accurate due to.....?
✨to multiple sampling steps, more complex to design and execute, and has a higher risk of bias if the sampling methods at each stage aren’t properly chosen.✨
52
✨Every member of the population has a known and non-zero chance of being selected (e.g., SRS, stratified sampling).✨
53
🌸Eight rules of thumb for conducting a Statistical Study = ?🌸
✨1.💖 State a hypothesis.💖 2. 💖 Identify individuals of interest.💖 3. 💖 Specify the variables to measure.💖 4. 💖 Determine if you will use an entire population or a sample. (If you choose a sample, choose a sampling method)💖 5. 💖 Address ethical concerns before data collection💖 6. 💖Collect data💖 7. 💖 Use descriptive or inferential statistics to answer your hypothesis💖 8. 💖Note any concerns about your data collection or analyses and make recommendations for future studies💖✨
54
⚠️ 👛Which review board decides if your study is considered "Human Research".👛 ⚠️ 🐽What are the implications of your study being considered "Human Research".🐽
✨👛Institutional Review Board (IRB)👛✨ ✨🐽Consent: If the study is being conducted children you'll need it BOTH from their parents and the children🐽✨
55
💫 What are two options for Data Collection?💫
✨1. 💖Collect from Existing Data sets💖 2. 💖Collect Manually💖✨
56
⭐️ If you need population measure you can collect from [ ] data sets.
✨Government✨
57
💫👛In a [ ] measurements or observations from the entire population are used.👛💫 💫🐽In a [ ] measurements or observations from part of the population are used.🐽💫
✨👛Census👛 ✨ ✨ 🐽Sample🐽✨
58
💫 What are the two main types of studies?💫
✨ Experimental and Observational✨
59
🌸Experimental study = ?🌸
✨A treatment or intervention is deliberately assigned to the individuals. i.e. Controlling conditions, applying treatments. ✨
60
🌸Observational study = ?🌸
✨No treatment or intervention is deliberately assigned to individuals. i.e Watching and recording, no interference. ✨
61
💫Purpose of Experimental study?💫
✨To study the possible effect of the treatment or intervention on the variables measured✨
62
💫Studies must be done rigorously enough to be [ ]💫
✨Replicated✨
63
💫Purpose of Observational study?💫
✨To analyse relationships between variables without applying a treatment or intervention✨
64
💫What are the 7 types of bias?💫 🍬 Hint: Sassy Researchers Really Can Mess Numbers, Right Harper?
✨1. 💖S – Selection Bias (Sample isn’t random or representative) 💖 2. 💖R – Response Bias 🎤 (People alter answers due to wording or pressure)💖 3. 💖R – Recall Bias 🧠 (People misremember past events)💖 4. 💖C – Confirmation Bias 🔍 (Looking for data that supports beliefs)💖 5. 💖M – Measurement Bias 📏 (Flawed tools or inconsistent data collection)💖 6. 💖N – Nonresponse Bias 📭 (People who don’t respond may be different)💖 7.💖 R – Survivorship Bias 💃 (Only looking at successful cases)💖 8. 💖Hidden Bias = Bias that influences results unnoticed, skewing the study without anyone realising it💖✨
65
🌸Lurking Variable = ?🌸
✨A hidden factor that influences both the independent and dependent variables, causing a false impression of a relationship between them. ✨
66
🌸Blocked Randomisation = ?🌸
✨Grouping participants by a characteristic and then randomly assigning them to treatment groups within each group.✨
67
Steps of Blocked 🎀Randomisation💅🏻 = ?
✨1. 💖Identify blocks: Group participants by a characteristic (e.g., age, gender).💖 2. 💖Randomly assign: Within each block, randomly assign participants to different treatment groups.💖 3. 💖Repeat: Continue for all blocks to ensure balanced groups.💖✨
68
🌸Blinding = ?🌸
✨where a person (participant, research staff) is deliberately not told of a treatment assignment in a study so s/he is not biased in reporting study information.✨
69
🌸Double Blinding = ?🌸
✨where both study staff and participant does now know treatment assignment✨
70
🌸Unblinding procedures = ?🌸
✨1.💖Emergency Unblinding – Done when a participant's safety is at risk.💖 2. 💖Planned Unblinding – Occurs at pre-specified points in the study.💖 3.💖 Accidental Unblinding – Happens unintentionally due to errors or clues.💖 4. 💖Partial Unblinding – Only specific study personnel are unblinded.💖 5. 💖Complete Unblinding – Everyone is unblinded, usually at study completion.💖✨
71
🌸Frequency Histogram = ?🌸
✨a graph that shows how data is distributed across classes (Tupperwares), with bars representing the frequency of data in each class .✨
72
💫Steps to a 🎀Frequency Histogram💅🏻?💫
✨1. 💖Collect all your data points💖 2. 💖Choose you classes💖 3. 💖Sort you data points into the classes💖 4. 💖Note your classes on the x axis💖 5. 💖 Note the frequency on the y axis 💖 6. 💖For each class, draw a bar that reaches up to the corresponding frequency.💖 💖P.S. Maybe you make a Frequency Table for clarity💖✨
73
🌸Relative Frequency Histogram = ?🌸
✨A Relative Frequency Histogram shows the percentage of data in each class instead of the count.✨
74
Steps to 🎀Relative Frequency Histogram💅🏻?
✨1. 💖 Collect all your data points.💖 2.💖 Choose your classes.💖 3.💖 Sort your data points into the classes.💖 4.💖 Calculate the relative frequency for each class (class frequency divided by total number of data points).💖 5.💖 Note your classes on the x-axis.💖 6.💖 Note the relative frequency on the y-axis.💖 7.💖 For each class, draw a bar that reaches up to the corresponding relative frequency.💖 💖P.S. It will be very helpful to do Frequency Table here 💖 ✨
75
🌸Distribution = ?🌸
✨The description of how the data points of a variable are spread or arranged. It shows the frequency or probability of different outcomes in a dataset.✨
76
5 Main types of 🎀Distribution💅🏻?
✨1. 💖Normal Distribution: Most numbers are close to the middle. It’s balanced. 💖 2. 💖Uniform Distribution: Every number has the same chance of happening. No number is more likely than the other. 💖 3. 💖Skewed Right: Most numbers are small, but a few big numbers stretch the group to the right. 💖 4. 💖Skewed Left: Most numbers are big, but a few small numbers stretch the group to the left. 💖 5. 💖Bimodal Distribution: There are two separate groups of numbers that appear the most. 💖✨
77
🌸Outliers = ?🌸
✨Data points that are significantly different from the rest of the data.✨
78
2 Main types of 🎀Outliers💅🏻?
✨1. 💖Global Outliers: These are numbers that are really far away from the rest of the numbers.💖 2.💖These numbers are strange in one group, but okay in another group.💖✨
79
🌸Cumulative Distribution = ?🌸
✨It’s the total number of times something has happened up to a certain point (Your maximum). ✨
80
💫The point of a 🎀Histogram💅🏻is the reveal the [ ]💫
✨Distribution✨
81
🌸Time Series Graph = ?🌸
✨A time series graph shows how data changes over time, with time on the x-axis and values on the y-axis. It helps identify trends, patterns, and anomalies.✨
82
💫🎀Time Series Graph💅🏻 steps?💫
✨1. 💖Collect your data over time💖 2. 💖Label the x-axis with time intervals💖 3. 💖Label the y-axis with the measured values💖 4. 💖Plot each data point at the correct time💖 5. 💖Connect the points with a line to show trends💖 6. 💖Analyse for patterns, trends, and outliers💖✨
83
🌸Bar Graph = ?🌸
✨A chart that uses rectangular bars to show the size of different categories✨
84
⚠️Changing the scale of the y axis can be [🐽].⚠️ ⚠️When comparing to another Data point use the [👛] on the y axis.⚠️
✨🐽Misleading🐽✨ ✨👛same scale👛✨
85
🌸Clustering = ?🌸
✨Splitting bars in bar charts into subcategories for further precision✨
86
🌸Pie Chart = ?🌸
✨A circular chart that is divided into slices to show how different categories compare as parts of a whole✨
87
💫Pie charts work best with [ ] categories, e.g., small/medium/large, yes/no, white/black.💫
✨Mutually Exclusive✨
88
🎀Pie Chart💅🏻 steps?
✨1.💖Add up all the numbers to get the total. (Example: 5 + 3 + 8 = 16)💖 2.💖Find the percentage for each part by dividing each part by the total and multiplying by 100. (Example: (5 ÷ 16) × 100 = 31.25%, (3 ÷ 16) × 100 = 18.75%, (8 ÷ 16) × 100 = 50%)💖 3.💖Convert percentages to decimals by dividing each by 100. (Example: 31.25 ÷ 100 = 0.3125, 18.75 ÷ 100 = 0.1875, 50 ÷ 100 = 0.5)💖 4.💖Multiply the decimal by 360 to get the degrees for each section. (Example: 0.3125 × 360 = 112.5°, 0.1875 × 360 = 67.5°, 0.5 × 360 = 180°)💖 5.💖Draw a circle and mark the center.💖 6.💖Use a protractor to measure the angles for each section based on the degrees you calculated. (Example: 112.5° for the first part, 67.5° for the second, and 180° for the third.)💖 7.💖Label each section with the corresponding category and percentage.💖✨
89
💫Rules of thumb for ALL Graphs?💫
1. 💖 Provide a title Always include a clear and descriptive title that explains what the graph shows.💖 2. 💖 Label axes Clearly label both the x-axis and y-axis with their respective variables. This helps explain what each axis represents.💖 3. 💖 Identify units of measure Always include the units of measurement (e.g., dollars, hours, percentage) so people understand the scale of the data.💖 4. 💖 Make the graph clear Font size: Use legible fonts and avoid making text too small. Graph complexity: Avoid overcrowding the graph with too many data points or categories. Keep it simple and focused. Colors: Use contrasting colors for clarity and to differentiate between categories, but don’t overdo it. Legends: Use legends or labels to explain what different colors or lines represent if needed.💖 5. 💖 Scale appropriately Choose an appropriate scale for the data. Ensure the graph is not misleading by adjusting the scale to fit the data points properly.💖 6. 💖 Show trends clearly If possible, emphasize the patterns or trends in the data (e.g., using lines or markers on a line graph) to make the message easier to understand.💖 7. 💖 Consistent formatting Make sure your graphs are consistent, especially if you’re comparing multiple graphs. Use the same colors, fonts, and layout style.💖
90
💫Cases where each graph is most useful?💫 1.🌈Frequency Histogram?🌈 2.🌺Relative Frequency Histogram ?🌺 3.🏝️Stem-And-Leaf Display?🏝️ 4.🦩Time Series Graph?🦩 5.🐠Bar Graph?🐠 6.🍍Pareto Chart?🍍 7.🦜Pie Graph?🦜
1.🌈For quantitative data, when you want to see the distribution.🌈 2. 🌺For quantitative data, when you want to see the distribution. Also, good for comparing to other data.🌺 3.🏝️For quantitative data, when you want to see the distribution. Easier to make by hand than histogram.🏝️ 4.🦩For graphing a variable that changes over time and is measured at regular intervals.🦩 5.🐠For qualitative or quantitative data, and for displaying frequency or percentage.🐠 6.🍍For frequencies of rare events in descending order.🍍 7.🦜For mutually-exclusive categories (quantitative or qualitative).🦜
91
🌸Frequency table = ?🌸
✨organises data by showing how often each value appears. It’s useful for spotting patterns, summarising large datasets, and preparing for further analysis.✨
92
🌸Class = ?🌸
✨An interval grouping data values. Example: Between 30 and 40 miles.✨
93
🌸Class Limit = ?🌸
✨The smallest and largest values that fit in a class. Example: 30 is the lower class limit, and 40 is the upper class limit.✨
94
🌸Class Width = ?🌸
✨The size of a class. Example: Upper class limit (40) minus lower class limit (30) = 10, then add 1 → 11. Example: 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 → 11 values.✨
95
🌸Frequency = ?🌸
✨ The number of data points that fall within a class. Example: The number of patients transported 30 to 40 miles.✨
96
💫How to decide on Classes?💫
Find the range → Biggest number − Smallest number Example: 98 − 12 = 86 Pick the number of classes (usually 5 to 10) Let’s say 6 Find class width → Range ÷ Number of classes 86 ÷ 6 = 14.3 (Always round up, so we use 15) Make class groups (start from the smallest number and add the width) 10 - 24 25 - 39 40 - 54 55 - 69 70 - 84 85 - 99 Count how many numbers fit in each group → That’s the frequency!
97
🌸Relative Frequency Table = ?🌸
✨A table that shows the proportion of data that falls into each class relative to the total sample size.✨
98
🌸Relative = ?🌸
✨In relationship to the rest of the data.✨
99
🌸f = ?🌸
✨Frequency✨
100
🌸n = ?🌸
✨Total Sample Size✨
101
🌸f/n = ?🌸
✨Relative Frequency✨
102
🌸Relative Frequency = ?🌸
✨Is the proportion of the values that are in that class.✨
103
💫How to calculate Relative Frequency?💫
✨1.💖Find the frequency (f) of the class you're interested in.💖 2.💖Find the total sample size (n) by adding up the frequencies of all classes. 💖 3.💖Calculate relative frequency by dividing frequency (f) by total sample size (n):💖 4.💖Convert to percentage if needed by multiplying by 100.💖✨
104
💫How do you do a 🎀 Stem-And-Leaf Display💅🏻?💫
✨1. 💖Sort the data: Put the data in order from smallest to largest.💖 2.💖 Split the numbers into stem and leaf: The stem is everything except the last digit (the tens, hundreds, etc.). The leaf is just the last digit (ones, tenths, etc.). For example, 52 → stem = 5, leaf =2.💖 3. 💖List the stems: Write down all the stems (without repeating). If you have 52, 54, and 58, you will just write "5" for the stem.💖 4.💖 Add the leaves: For each stem, list all the corresponding leaves next to it. For example, if you have 52, 54, and 58, the stem "5" will have leaves "2", "4", and "8".💖 5. 💖Organise the leaves: Arrange the leaves in numerical order for each stem. For example, "5 | 2, 4, 8" becomes "5 | 2, 4, 8" after ordering.💖 6. 💖Repeat for all stems: Do the same for every stem in your data.💖 7. 💖Final display: Now, your stem-and-leaf display is ready! Each stem shows a group of numbers, and the leaves show the details of each number. Example: Data: 52, 54, 58, 61, 63, 67, 72, 74, 75💖 Steps: 💖Sort the data: 52, 54, 58, 61, 63, 67, 72, 74, 75💖 💖Split into stems and leaves: 52 → Stem = 5, Leaf = 2 54 → Stem = 5, Leaf = 4 58 → Stem = 5, Leaf = 8 61 → Stem = 6, Leaf = 1 63 → Stem = 6, Leaf = 3 67 → Stem = 6, Leaf = 7 72 → Stem = 7, Leaf = 2 74 → Stem = 7, Leaf = 4 75 → Stem = 7, Leaf = 5 List the stems: 5, 6, 7💖 💖Add the leaves: 5 | 2, 4, 8 6 | 1, 3, 7 7 | 2, 4, 5💖 💖Organise the leaves: Already in order.💖 💖Final stem-and-leaf display: 5 | 2, 4, 8 6 | 1, 3, 7 7 | 2, 4, 5 And that's it! You've got your stem-and-leaf display.💖✨
105
🌸Central Tendency = ?🌸
✨tells us where the centre or middle of the dataset is.✨
106
💫Three main types of 🎀Central Tendency💅🏻?💫
✨Mean (Average) Add up all the numbers. Divide by how many numbers there are. Example: (5 + 10 + 15) ÷ 3 = 10 Mean = Σx / n Median (Middle Value) Put the numbers in order from smallest to largest. If there’s an odd number of values, pick the middle one. If there’s an even number, take the average of the two middle numbers. Example: Ordered list: 3, 5, 8, 12, 15 → Median = 8 Example (even count): Ordered list: 2, 6, 9, 11 → (6 + 9) ÷ 2 = 7.5 Mode (Most Frequent Number) Find the number that appears the most. There can be one mode, multiple modes, or no mode if all numbers appear equally. Example: 4, 7, 7, 9, 10 → Mode = 7✨
107
💫When to use each main type of 🎀Central Tendency💅🏻?💫
✨Mean: Best when data has no extreme values (outliers). Median: Best when data has outliers or is skewed. Mode: Best for categorical data (e.g., favourite colour, most common shoe size).✨
108
🌸Greek letter capital sigma Σ = ?🌸
✨shorthand for "sum of" Example: if x = {3,4,7} then Σx = 14 Also, ∑xy is shorthand for "multiply x and y for each pair, then add them up." Example: If x = {2, 5, 7} and y = {3, 4, 1}, First, multiply: (2×3), (5×4), (7×1) → {6, 20, 7} Now, add: 6 + 20 + 7 = 33✨
109
🌸x̄ (x-bar) = ?🌸
✨is the average of a sample (a small group from a big group).✨
110
🌸μ (mu) = ?🌸
✨is the average of the whole group.✨
111
⚠️P.S. Means are very sensitive to [👛], Medians aren't. ⚠️
✨ 👛outliers👛 Example {1, 2, 3, 4, 100} Median: The middle number is 3. It stays the same even though 100 is way higher than the other numbers. Mean: If you add all the numbers up (1 + 2 + 3 + 4 + 100 = 110) and divide by 5, the mean is 22. The mean is much higher because 100 is an outlier, and it affects the mean a lot. So, in cases with outliers, the median is more reliable for representing the “typical” value. The mean is less reliable because it can be easily skewed by extreme numbers.✨
112
🌸Trimmed Mean = ?🌸
✨A method of ameliorating the influence of the outliers How to Calculate a 5% Trimmed Mean: 💖Find the total number of data points (n): Example: 100 data points.💖 💖Calculate 5% of the data points: 5% of 100 = 5.💖 💖Order the data: Sort from lowest to highest (or vice versa).💖 💖Remove the 5% from both ends: 💖 💖Remove the 5 smallest and 5 largest values.💖 💖Calculate the new mean: Find the mean of the remaining data, which is now less affected by outliers.💖✨
113
⭐️To find the percentage of anything quickly⭐️
✨the whole times 0.0what-ever -percentage Example: To find 5% of 1200 1200×0.05=60✨
114
🌸Weighted Average = ?🌸
✨a method of computing an average where some data points contribute more than others✨
115
💫How to calculate a 🎀Weighted Average💅🏻? 💫
✨1.💖Identify Given Information💖 2.💖Multiply Each Value by Its Weight💖 3.💖Sum Up the Weighted Values💖 4.💖Divide by the Sum of Weights💖✨
116
💫How to decide the importance of 🎀Weighted Averages💅🏻?💫
✨The weights depend on importance and are usually assigned in one of three ways: Arbitrarily – If no real justification exists, weights might just be assigned based on intuition or preference. Example: A teacher decides homework is worth 30% and exams 70% just because they think exams matter more. Empirically – Based on data or past trends. Example: A company weighs customer feedback scores differently based on how predictive they are of future sales. By Policy/Rules – Set by an institution, standard, or contract. Example: University grading systems (e.g., final exams = 50%, quizzes = 30%, participation = 20%). It all depends on what matters most in the context.✨
117
🌸Normal Distribution = ?🌸
✨when the mean, median, and mode are all the same value Mean: The average of all the data points. Median: The middle value when all data points are arranged in order. Mode: The value that appears most frequently. In a perfect normal distribution (like a bell curve), these three measures of central tendency (mean, median, and mode) will be the same and perfectly aligned with the center of the curve. Why? The data is symmetrically distributed, so the average, middle, and most frequent values all occur at the same point.✨
118
🌸Skewed Distributions = ?🌸
✨the positions of the mean, median, and mode change depending on the direction of the skew: Right-skewed distribution (positively skewed): The tail of the distribution is stretched to the right, meaning there are more lower values and a few higher values. In this case, the mean is greater than the median, and the median is greater than the mode. The order of the measures from left to right is: Mode < Median < Mean. Left-skewed distribution (negatively skewed): The tail of the distribution is stretched to the left, meaning there are more higher values and a few lower values. In this case, the mean is less than the median, and the median is less than the mode. The order of the measures from left to right is: Mean < Median < Mode. Example: If you have a distribution of values: Right-skewed: {1, 2, 3, 4, 5, 5, 5, 6, 7, 8, 9} Mode: 5 (most frequent) Median: 5 (middle value) Mean: 5.36 (average) Left-skewed: {1, 2, 3, 4, 5, 5, 6, 7, 8, 9} Mode: 5 (most frequent) Median: 5 (middle value) Mean: 4.6 (average) In skewed distributions, the skew affects where the mean gets pulled relative to the median and mode.✨
119
🌸Variation = ?🌸
✨How spread out the data is. If numbers are close together, variation is low (consistent). If numbers are far apart, variation is high (all over the place). Example: Two classes have the same average grade of 75, but the grades could look very different: Class A: Everyone got 75, 75, 75, 75, 75 → No variation, everyone got the same grade. Class B: Some got 50, 60, 75, 90, 100 → Big variation, some did much better or worse than others. Even though both have the same mean (75), Class B has more spread-out grades. That’s why we need variation (like range, variance, or standard deviation) to see how much the data is spread out or clustered together. ✨
120
💫Measures of Variation = ?💫
✨1.💖Range → Difference between the largest and smallest values.💖 2.💖Variance → The average of the squared differences from the mean.💖 3.💖Standard Deviation → The square root of the variance; shows how much data deviates from the mean.💖 4.💖Coefficient of Variation → Standard deviation divided by the mean; useful for comparing variability between different datasets.💖 💖💖✨
121
🌸Range = ?🌸
✨The Largest Value - The Smallest Value 🧮 For example: Data: 42, 33, 21, 78, 62 Maximum = 78 Minimum = 21 Range = 78 − 21 = 57 ✅ The data spreads 57 units from the smallest to the largest value.✨
122
💫⚠️Why isn't the 🎀Range💅🏻 all that useful?⚠️💫
✨Range is too sensitive to outliers, so we use better tools like: Standard deviation Variance Interquartile range Range only looks at two numbers: → The highest and the lowest. It ignores everything else in the middle. So if you change just one of those two numbers (highest or lowest), the range changes a lot—even if the rest of the data stays the same. Example: Original: [21, 33, 42, 62, 78] → Range = 78 − 21 = 57 Now change one number: [23, 33, 42, 62, 90] → Range = 90 − 23 = 67 The range changed a lot (from 57 to 67), but most of the numbers didn’t.✨
123
🌸Variance = ?🌸
✨Variance – average of the squared differences from the mean It tells you how spread out the numbers are from the mean. If the variance is small, your numbers are huddled close together. If it’s big, your numbers are scattered like confetti 🎉. Think of it like: “How well does the average (mean) describe everyone?” ✨
124
🌸Standard Deviation = ?🌸
✨Standard deviation – the square root of the variance Variance gives you the squared spread — a bit awkward. Standard deviation is just the square root of that, so it’s easier to interpret and in the same units as your data. So if you’re measuring grades, standard deviation is also in grade units (not grade² like variance is).✨
125
💫How to calculate Variance and Standard deviation? 💫
✨Find the mean  (8+6+2)/3 = 5.333 Subtract the mean from each value (get deviations)  8−5.333 = 2.667  6−5.333 = 0.667  2−5.333 = −3.333 Square each deviation  2.667² = 7.113  0.667² = 0.445  (−3.333)² = 11.109 Add them up  7.113 + 0.445 + 11.109 = 18.667 Divide by n−1 (for sample variance)  18.667 ÷ (3−1) = 9.334 → This is the variance Square root the variance  √9.334 = 3.055 → This is the standard deviation ✅✨
126
💫What does 🎀Variance💅🏻 and 🎀Standard Deviation💅🏻?💫
✨🔢 Variance Tells you how spread out your data is. ➡️ Higher variance = more spread ➡️ Lower variance = more consistent Example: Set A: [5,5,5] → Variance = 0 (super consistent) Set B: [2,5,8] → Variance > 0 (more spread) 📏 Standard Deviation Tells you on average how far values are from the mean. ➡️ It’s the "typical" distance from the center. Example: Mean = 5 Standard Deviation = 2 → Most values are about 2 units away from 5 ✅✨
127
🔔Reminder!🔔
✨🎯 Focus on Sample Formulas We usually don’t know the whole population — just a sample. So we use sample variance and sample standard deviation formulas. s² = (sum of (x - mean)²) ÷ (n - 1) s = square root of s² Where: x = each value in the dataset mean = average of all values n = number of values s² = variance s = standard deviation ✨
128
💫How many ways are there to calculate variance and standard deviation?💫
✨Two — the defining formula and the computational formula.✨
129
🌸Defining formula = ?🌸
✨“Square all the x-values first, then add the result.” The step-by-step method: mean → deviations → squares → average → square root.✨
130
🌸Computational Formula = ?🌸
✨"Add all the x-values first, then square the result." A shortcut using Σx² and (Σx)² that gives the same result but skips the logic.✨
131
🌸Which formula is easier to understand?🌸
✨ The defining formula — it shows what’s really happening in the data.✨
132
💫Which formula is faster for large datasets but harder to understand?💫
✨The computational formula.✨
133
🔔Reminder🔔
✨Sample Defining Formulas These are the formulas used to calculate sample variance and standard deviation. The main part of the formula, Σ(x − x̄)², is called the Sum of Squares. This captures how much each value differs from the mean, squared and then added together. Sample Variance (s²):  s² = Σ(x − x̄)² / (n − 1) Sample Standard Deviation (s):  s = √[Σ(x − x̄)² / (n − 1)] We divide by (n − 1) when using a sample. First calculate the Sum of Squares, then plug it into the formula. Standard deviation is just the square root of the variance.✨
134
💫How do we calculate sum of squares ? 💫
✨Steps to Calculate Sum of Squares (SS): Find the mean of the data set. Subtract the mean from each value (get the deviations). Square each deviation. Add all the squared deviations. That final total = Sum of Squares SS = Σ(x - x̄)² Where: SS = Sum of Squares Σ = Sum of... x = each value in the dataset x̄ = the sample mean (x - x̄)² = square of the difference between each value and the mean ✅✨
135
🌸Sample variance = ?🌸
✨Sample variance tells you how spread out the data is from the mean in a sample. Formula: s² = [Σ(x − x̄)²] / (n − 1) This means: Subtract the mean from each value. Square each result. Add them (this is the "sum of squares"). Divide by (n − 1).✨
136
💫How do I get standard deviation from sum of squares?💫
✨Divide sum of squares by (n − 1) → that's the variance. Take the square root of that → that's the standard deviation. Example: Sum of squares = 70, n = 6 Variance = 70 ÷ 5 = 14 Standard deviation = √14 ≈ 3.74 ✅✨
137
🔔Reminder Flashcard🔔
✨Descriptive Statistics Quick Reference Descriptive Statistics Concepts Mean (Average): The mean is the sum of all numbers divided by how many numbers there are. It shows the center (typical value) of the data. Example: For numbers 2 and 5, add them (2 + 5 = 7) and divide by 2. The mean = 3.5. Deviation from the Mean: A deviation is a number minus the mean. It shows how far each number is from the average. Example: For 2 and 4, the mean = 3, so deviations are 2 – 3 = –1 and 4 – 3 = 1. Squaring a Number: Squaring a number means multiplying it by itself. It makes negative numbers positive and shows the number’s size. Example: 3 × 3 = 9 (so 3² = 9), and (–2) × (–2) = 4. Square Root: The square root of a number is a value that, when multiplied by itself, gives the original number. It is the reverse of squaring. Example: √9 = 3, because 3 × 3 = 9. Sum of Squares (SS): Sum of squares is adding up each deviation from the mean squared. It tells the total squared distance of values from the mean. Example: For 2 and 4 (mean = 3), deviations are –1 and 1; their squares are 1 and 1, so SS = 1 + 1 = 2. Sample Variance (s²): Sample variance is the average of squared deviations (using n – 1 in the bottom). It shows how spread out the data are (in squared units). Example: For 2 and 4, SS = 2 (as above), n = 2, so variance s² = 2 ÷ (2 – 1) = 2. Standard Deviation (s): Standard deviation is the square root of the variance. It tells the typical distance of data points from the mean (in original units). Example: If variance = 2 (as above), then s = √2 ≈ 1.4. Range: The range is the largest number minus the smallest number. It shows the total spread of the data from low to high. Example: For 2, 7, and 10, range = 10 – 2 = 8. Conceptual Ideas Sample vs Population: The population is the whole group of interest; a sample is a smaller part of it. We often use a sample when we cannot measure the whole population. Example: Surveying 100 people (sample) to learn about all 1000 people (population). n − 1 (Bessel’s correction): When calculating the variance of a sample, we divide by (n – 1) instead of n. This makes the sample variance a fair (unbiased) estimate of the true population variance. Example: For 3 data points, divide by 2 (3 – 1). Consistency vs Inconsistency: Consistency means data points are similar (low spread); inconsistency means they vary a lot (high spread). This is seen in measures like range or SD. Example: [4, 5, 6] is consistent (values close together), while [1, 5, 10] is inconsistent (values far apart). Defining vs Computational Formula: The defining formula for variance uses (value – mean)² for each data point. The computational formula rearranges sums of values and squares to calculate the same result more quickly. Both give the same answer; usually the computational form is used for easier calculation. Why Variance & SD Are Better Than Range: Variance and SD use all data points, so they show how spread out the data are on average. The range only uses the smallest and largest values, so it can be misleading if one value is very big or small. Example: [5, 6, 7] has range = 2; if one value changes to 20 (making [5, 6, 20]), range = 18 (huge jump) even though two numbers stayed close. Variance/SD reflect how all values spread out.✨
138
🌸Variance formulas = ? 🌸
✨Sample Variance Formula = s2= ∑(x− xˉ) 2 / n-1 Population Variance formula = σ2 = ∑(x−μ)2/N ✨
139
🌸Coefficient of Variation?🌸
✨A % that shows how much the data varies relative to the mean. Formula: CV = (Standard Deviation ÷ Mean) × 100 → Higher % = more spread out (less consistent) → Lower % = tighter around the mean (more consistent)✨
140
💫Coefficient Variation formulas = ?💫
✨Sample CV = (s / x̄) × 100 This is the formula for a sample: s is the standard deviation (how much data jumps around), x̄ (x-bar) is the mean (average), Multiply the fraction by 100 to turn it into a percentage. ✨
141
💫Why is CV important?💫
✨It shows how reliable or consistent the data is — and lets you compare variability across different units or scales. 📊 Example: Group A: Mean = 6, SD = 3.74 → CV = 62% Group B: Mean = 8, SD = 4 → CV = 50% → Group B is more consistent, even with a bigger SD!✨
142
🌸Chebyshev's Theorem = ? 🌸
✨A rule that tells you the minimum % of values that fall within a certain number of standard deviations (k) from the mean — no matter the shape of the data. ✅ Great for non-normal distributions ✅ Helps you spot outliers and understand variability Here is an analogy the worked for me: Imagine you’re watching campers from a drone above. The campfire is your mean — the centre of the gathering. When you zoom in tight, you only see the campers right next to the fire. As you zoom out, your view widens and you start to see more campers who are a bit further away. Chebyshev’s Theorem says: No matter how the campers are scattered, if you zoom out to a certain radius (say “k” steps away from the fire), you’ll always capture at least a certain percentage of the campers inside that circle. So as you zoom out: The percentage of campers inside your circle goes up — because you include more who were further away. The theorem tells you the minimum guaranteed percentage inside any zoom level, even if some campers are far off wandering. It’s like a safety net for how spread out the group can be, no matter how weird or uneven their distribution.✨
143
💫Chebyshev's Theorem Formula = ? 💫
✨At least 1− (1/k2) 1− (1/k 2) × 100% of data falls within k standard deviations of the mean ✅ k must be ≥ 1 Example: If k = 2 → at least 1 - (1/4) = 75% of data within 2 SDs✨
144
🌸Percentiles = ?🌸
✨A percentile shows the % of scores below a certain value in a dataset. Percentiles tell you how you rank compared to everyone else in a set of quantitative (number-based) data. 🧪 Example (Standardized Test Vibes): If you're in the 77th percentile, it doesn't mean you got 77% of the answers right. It does mean you did better than 77% of the people who took the test. If 100 people took it, that puts you ahead of 77 people. 💡 Why It's Useful: Percentiles are position indicators. They help compare individual performance or values to a larger group. Percentile = the score below which a given % of data falls. Example: 25th percentile = 25% of scores are less than this number. ✨
145
💫How do you find out what percentile a value is in?💫
✨Count how many values are less than your score. Divide that count by the total number of values. Multiply by 100. ✨
146
💫How do you find out what value a percentile is in?💫
✨🎯 To find the value at a given percentile (like “what score is the 30th percentile?”), use: Rank = (P / 100) × (N + 1) Rank equals P divided by 100, multiplied by N plus 1 Where: P = the percentile you're interested in (like 30 for the 30th percentile), N = total number of values in your data set. 🔍 Steps: Sort your data from lowest to highest. Plug into the formula: e.g. 30th percentile in a list of 10 values → Rank = (30 / 100) × (10 + 1) = 3.3 If the rank is a whole number (say 4), then the 4th value is your percentile. If it's a decimal (like 3.3), interpolate: → Take the 3rd value + 0.3 × (4th − 3rd value) 🧠 Example: Data: 2, 3, 4, 5, 6, 7, 8, 9 Percentile: 25th N = 8 → Rank = (25 / 100) × (8 + 1) = 2.25 → Look between 2nd and 3rd value: 3 + 0.25 × (4 − 3) = 3.25 So, the 25th percentile = 3.25✨
147
🌸 Quartiles = ?🌸
✨Quartiles are special percentiles that split your data into four equal parts. Think of your data like a pizza 🍕—quartiles cut it into 4 slices based on value, not number of items. 📊 The 3 Main Quartiles: 1st Quartile (Q1) = 25th percentile → 25% of the data is below this value. 2nd Quartile (Q2) = 50th percentile → This is just the median. Half the data is below, half is above. 3rd Quartile (Q3) = 75th percentile → 75% of the data is below this value. So if you line up all your data from smallest to largest, the quartiles are the values that mark the cutoff points between the bottom 25%, middle 50%, and top 25%.✨
148
💫How do you calculate Q1, Q2, and Q3?💫
✨1. Order the data from smallest to largest. Q2 (median): Find the middle number. Q1: Find the median of the lower half (left of Q2). Q3: Find the median of the upper half (right of Q2). 👉 If there's an odd number of data points, don’t include the median when finding Q1 and Q3. So yes — odd = simple pick, even = quick calc.✨
149
🌸Interquartile Range = ? 🌸
✨QR = Q3 - Q1 That’s it. It’s the range between the third quartile (75th percentile) and the first quartile (25th percentile). It shows where the middle 50% of your data lives — the core chunk, the comfy middle. ✨
150
💫How to calculate interquartile range?💫
✨🧮 How to calculate it: Order your data from smallest to largest. Find the median (Q2). This splits your data in half. Find Q1: the median of the lower half (don’t include Q2 if odd!). Find Q3: the median of the upper half (same rule applies). Subtract Q1 from Q3: IQR = Q3 − Q1 🧠 Why’s it useful? It ignores extreme values (outliers). Helps you see how spread out the middle of your data is. Used in boxplots and spotting outliers. 🎲 Example: Data: 2, 4, 6, 8, 10, 12, 14 Median (Q2) = 8 Lower half: 2, 4, 6 → Q1 = 4 Upper half: 10, 12, 14 → Q3 = 12 IQR = 12 − 4 = 8 ✨
151
🌈A little joke to remember what statistics is mainly about.....🌈
✨Statistics: the desperate search party for your runaway data.” And yeah — a shocking amount of stats is just figuring out where most of the data went, how spread out it is, and whether it's acting dodgy. It's like a game of hide-and-seek with numbers: Mean: "Come back here, average!" Median: "Let’s find the middle child, they’re usually sensible." Mode: "Who’s showing up the most, the attention seeker?" Standard deviation: "How rebellious is everyone feeling?" IQR: "Ignore the weirdos on the edges — what are the normal kids doing?" So yes, stats is a bit like emotionally profiling a classroom of numbers. 😅. “Statistics: beating predictions out of your data like a desperate oracle with performance anxiety.” And yeah — once you've finally wrangled your chaotic little data gremlins into some kind of order, it’s time for act two: forcing them to cough up answers about the future. It’s like: Linear regression: “If I draw a line through this mess, maybe I can guess what happens next. Please cooperate.” Probability: “On a scale of 0 to 1, how likely are you to betray me?” Confidence intervals: “I’m 95% sure this won’t completely humiliate me. The other 5% is quietly sweating.” Hypothesis testing: “I’m not saying you're wrong... but I am going to run 10,000 simulations to prove it.” Bayes’ Theorem: “Let’s update our beliefs — like a polite but suspicious detective.” It’s not so much prediction as it is data interrogation: "Dear data: I don’t care what you’ve done — tell me what you’re going to do. Or else I’m switching to tarot.”✨
152
🌸Box and whisper plot = ?🌸
✨A box plot is a visual summary of a dataset using five-number summary stats: Minimum Q1 (25th percentile) Median (Q2 / 50th percentile) Q3 (75th percentile) Maximum It shows the spread, center, and skew of the data with a box (the interquartile range) and two whiskers extending to the minimum and maximum values.✨
153
💫How do you make a Box-and-whisker plot?💫
✨Order the data from smallest to largest. Find the median (Q2). Split the data into lower and upper halves. If odd number of data points, exclude the median. If even, include all. Find Q1 = median of lower half (25th percentile) Find Q3 = median of upper half (75th percentile) Minimum = smallest value Maximum = largest value Plot it: Draw a box from Q1 to Q3 Draw a line inside the box for Q2 Draw whiskers from min to Q1 and from Q3 to max✨
154
🌸Scatter Diagram = ?🌸
✨A scattergram (or scatter plot) is a graph plotting pairs of values (x,y) to show the relationship between two variables. The x variable is the explanatory (independent) variable, graphed on the x-axis, and the y variable is the response (dependent) variable, graphed on the y-axis. It helps visualize how changes in x relate to changes in y.✨
155
💫How do you create a Scatter Diagram?💫
✨Identify the explanatory (independent) variable (x) and response (dependent) variable (y). Draw the x-axis (horizontal) and y-axis (vertical). For each pair of data points (x,y), plot a dot where x is on the x-axis and y is on the y-axis. Look for patterns or trends among the dots to understand the relationship between variables. If the dots roughly form a line going upwards from left to right, that’s a positive correlation — as x increases, y tends to increase. Like more study hours, better scores. If the dots form a line going downwards from left to right, that’s a negative correlation — as x increases, y tends to decrease. Like more hours watching TV, lower test scores (maybe). If the dots are all over the place with no clear pattern, that means no correlation — x and y aren’t related in any obvious way. This is the first, simplest way to visually check if two variables might be connected before you get fancy with numbers. It’s like spotting a trend at a glance!✨
156
🌸Linear Correlation = ? 🌸
✨Linear correlation is when two variables, x and y, show a straight-line relationship on a scatterplot. As x increases, y increases (positive correlation) or decreases (negative correlation) in a consistent pattern. If the points form a straight line, it’s a perfect linear correlation.✨
157
💫What are the 2 main traits of Linear Correlation? 💫
✨Strength and Direction Direction 📈 Positive: As x increases, y increases. 📉 Negative: As x increases, y decreases. ❌ No correlation: No clear pattern — just chaos. Strength 💪 Strong: Dots hug the line tightly. 😐 Moderate: Dots kind of follow the line but wander a bit. 😵 Weak: Dots are all over the place — barely forming a line.✨
158
🌸Correlation coefficient, r = ?🌸
✨A number between -1 and +1 that measures the strength and direction of a linear relationship between two variables. Positive r means both variables increase together. Negative r means one variable increases as the other decreases. r close to 0 means little or no linear relationship. ✨
159
💫 How do you interpret the value of r?💫
✨r = +1: Perfect positive correlation (dots lie exactly on an upward line) r = -1: Perfect negative correlation (dots lie exactly on a downward line) r near 0: No linear correlation |r| close to 1: Strong correlation |r| moderate (around 0.5): Moderate correlation |r| close to 0: Weak correlation✨
160
💫How do you calculate coefficient r?💫
✨Raw formula ∑xy Plain English Multiply each x and y that go together (like partners), then add all those products up. Raw formula ∑x, ∑y Plain English Add up all the x values into one total. Do the same for all the y values. You’ll now have two totals: one for x and one for y. Raw formula (∑x)(∑y) Plain English Multiply your two totals from above (x total × y total). Raw formula n∑xy − (∑x)(∑y) Plain English Take the number of data pairs (n). Multiply it by the total you got from multiplying each x and y. Now subtract the number you got when you multiplied the x and y totals. This is your numerator (top of the fraction). Raw formula ∑x² Plain English Square each x (like 80², 90², etc.) and add them all together. This gives the "sum of x squared". Raw formula ∑y² Plain English Do the same thing for y values: square each one and add them. This gives the "sum of y squared". Raw formula n∑x² − (∑x)² Plain English Multiply n (the number of pairs) by the sum of x squared. Then subtract the square of the sum of x (not the same as the sum of x squared!). This gives the x-part of the denominator. Raw formula n∑y² − (∑y)² Plain English Same as above, but for y. This gives the y-part of the denominator. Raw formula √[(x-part)(y-part)] Plain English Multiply the x-part and y-part together, then take the square root of the result. This is the full denominator. Raw formula [numerator] ÷ [denominator] Plain English Divide the numerator (from earlier) by the denominator (the square root you just found). This gives you the correlation coefficient, r. ✨
161
💫What kind of data distribution does the correlation coefficient r assume?💫
✨ A bivariate normal distribution. (We don’t check for this in class, but it’s an important assumption.)✨
162
💫Does the correlation coefficient r have units?💫
✨No, r is unitless.✨
163
💫What values of r indicate perfect linear correlation?💫
✨r = 1.0 (perfect positive) or r = −1.0 (perfect negative).✨
164
💫What value of r means no linear correlation?💫
✨r = 0✨
165
💫What does a positive r mean?💫
✨ As x goes up, y goes up. as x goes down, y goes down.✨
166
💫What does a negative r mean?💫
✨As x goes up, y goes down. As x goes down, y goes up.✨
167
💫What happens to r if you switch the x and y axes?💫
✨Nothing — the value of r stays the same✨
168
💫What happens to r if you convert x and y to different units?💫
✨Nothing — r stays the same even if the units change (like cm to inches)✨
169
🌸Linear Regression = ?🌸
✨A method for drawing the straighter line possible through a scatter plot, so you can predict one thing (y) from another thing (x). It shows the overall trend or patterns from 2 variables✨
170
🌸The Coefficient of Determination = ?🌸
✨A number that tells you how well you line fits the data. R² = how much of y's changes can be explained by x. The closer R² is to 1, the stronger the connection; closer to 0 means your line is mostly guessing.✨
171
💫How do you calculate Linear Regression?💫
✨FORMULA (Raw):   ŷ = a + bx Where: ŷ = predicted y a = intercept = ȳ − b𝑥̄ b = slope = Σ[(x−𝑥̄)(y−ȳ)] ÷ Σ(x−𝑥̄)² PLAIN ENGLISH TRANSLATION: To get the best-fit line, we need two things: 💖 The slope (b) = how steep the line is 💖 💖 The intercept (a) = where it crosses the y-axis 💖 STEP-BY-STEP (With Example) Let’s say we have 3 students: x = hours studied y = exam score Student 1: x = 1 y = 50 Student 2: x = 2 y = 60 Student 3: x = 3 y = 65 Step 1: Find the means 𝑥̄ = (1+2+3)/3 = 2 ȳ = (50+60+65)/3 = 58.33 Step 2: Calculate the slope (b) Use: b = Σ[(x−𝑥̄)(y−ȳ)] ÷ Σ(x−𝑥̄)² Now build your table vertically for each student: Student 1 x = 1 y = 50 x−𝑥̄ = 1−2 = -1 y−ȳ = 50−58.33 = -8.33 (x−𝑥̄)(y−ȳ) = (-1)(-8.33) = 8.33 (x−𝑥̄)² = (-1)² = 1 Student 2 x = 2 y = 60 x−𝑥̄ = 0 y−ȳ = 1.67 (x−𝑥̄)(y−ȳ) = 0 (x−𝑥̄)² = 0 Student 3 x = 3 y = 65 x−𝑥̄ = 1 y−ȳ = 6.67 (x−𝑥̄)(y−ȳ) = (1)(6.67) = 6.67 (x−𝑥̄)² = (1)² = 1 Summing up: Σ[(x−𝑥̄)(y−ȳ)] = 8.33 + 0 + 6.67 = 15 Σ(x−𝑥̄)² = 1 + 0 + 1 = 2 So: b = 15 / 2 = 7.5 Step 3: Find the intercept (a) a = ȳ − b𝑥̄ a = 58.33 − (7.5)(2) = 58.33 − 15 = 43.33 Final regression equation:   ŷ = 43.33 + 7.5x✨
172
🌸b (slope) = ?🌸
✨How much y changes for each 1-unit increase in x✨
173
💫How to calculate b slope?💫
✨Slope of the Line (b) FORMULA (Raw): b = Σ[(x−𝑥̄)(y−ȳ)] ÷ Σ(x−𝑥̄)² Where: x̄ = mean of x    ȳ = mean of y (x−x̄) = how far each x is from the mean (y−ȳ) = how far each y is from the mean Σ[(x−x̄)(y−ȳ)] = total of x-y pairwise movement Σ(x−x̄)² = total spread of x around its mean PLAIN ENGLISH TRANSLATION: To find the slope of the best-fit line: 💖 Multiply how far each x is from the average x by how far each y is from the average y💖 💖 Add all those up💖 💖 Then divide by how spread out the x-values are from the average x (squared)💖 This tells us how much y changes for every 1 unit increase in x STEP-BY-STEP (With Example): We have 3 students: x-values: 1 2 3 y-values: 50 60 65 Step 1: Find the means 𝑥̄ = (1 + 2 + 3) ÷ 3 = 2 ȳ = (50 + 60 + 65) ÷ 3 = 58.33 Step 2: Build the calculation table Row 1 x = 1 y = 50 x−𝑥̄ = 1 − 2 = −1 y−ȳ = 50 − 58.33 = −8.33 (x−𝑥̄)(y−ȳ) = (−1)(−8.33) = 8.33 (x−𝑥̄)² = (−1)² = 1 Row 2 x = 2 y = 60 x−𝑥̄ = 2 − 2 = 0 y−ȳ = 60 − 58.33 = 1.67 (x−𝑥̄)(y−ȳ) = (0)(1.67) = 0 (x−𝑥̄)² = (0)² = 0 Row 3 x = 3 y = 65 x−𝑥̄ = 3 − 2 = 1 y−ȳ = 65 − 58.33 = 6.67 (x−𝑥̄)(y−ȳ) = (1)(6.67) = 6.67 (x−𝑥̄)² = (1)² = 1 Step 3: Add it all up Σ[(x−𝑥̄)(y−ȳ)] = 8.33 + 0 + 6.67 = 15 Σ(x−𝑥̄)² = 1 + 0 + 1 = 2 Step 4: Final calculation b = 15 ÷ 2 = 7.5 So, the slope = 7.5 That means: for every extra hour studied, the exam score goes up by 7.5 points 🎯📈✨
174
🌸Intercept a = ?🌸
✨Intercept a is the starting value of y when x = 0✨
175
💫How to calculate 💫