💝✨Statistics✨💝 Flashcards

Question

🌸Sampling error = ?🌸 | 🍬Hint: IETCWSBATEOTDWAIANWWTFP

Answer 1

✨Inevitable errors that come with sampling because at the end of the day we are inferring, and not working with the full population.✨

Answer 2

✨Any variety of avoidable human errors.✨

Answer 3

✨Numerical facsimiles that mimic real-word processes using random sampling to estimate probabilities or outcomes.✨

Answer 4

✨The more complex the model and assumptions, the higher the likelihood of error. Generally, aim for at least a 95% confidence level in results, but always validate with real data when possible✨

Answer 5

✨A method where each member of a population has an equal chance of being selected for the sample✨

Answer 6

✨🎀SRS💅🏻✨

Answer 7

🎀✨Known terms✨💅🏻

Answer 8

✨Dividing your list into subgroups (strata) based on specific variables.✨

Answer 9

✨choose a variable in an objective manner to put the strata into✨

Answer 10

✨a specific characteristic✨

Answer 11

✨To ensure each subgroup (stratum) is properly represented, reducing bias and increasing accuracy✨

Answer 12

✨stratify✨

Answer 13

✨Intentionally collecting more data from an underrepresented group.✨ ⚠️ It distorts the natural proportions of the data leading to bias if not handled correctly

Answer 14

✨Every "k" number in a sequence✨ Example: If k = 5, you pick every 5th item (5th, 10th, 15th, etc.). 📊

Answer 15

✨Systematic sampling is when you pick every kᵗʰ person from a list after randomly starting somewhere ✨

Answer 16

1. 💖Define the population: Identify everyone or everything in your group.💖 2. 💖List all individuals: Write down each item or person in the population (could be in a list or database). 💖 3. 💖Random selection: Use a random method (like drawing names from a hat or using a computer generator) to pick individuals.💖 4. 💖Select your sample: The number of individuals you pick depends on your desired sample size, but every person has an equal chance of being chosen.💖

Answer 17

💖 Define the population: Identify the full group of items or individuals (like all your Barbie accessories). 💖 💖 Choose your sample size: Decide how many you want to pick (e.g., 10 Barbie accessories). 💖 💖 Calculate the interval (k): Divide the total number of items by the sample size. For example, if you have 100 accessories and want 10, your interval (k) is 10. 💖 💖 Pick a random starting point: Choose a random number between 1 and k. Let’s say you randomly choose 3. 💖 💖 Select every kᵗʰ item: Starting from your random point (e.g., 3), pick every 10th item (3, 13, 23, etc.). 💖

Answer 18

1. 💖 Define the population: Think of this as the whole brain—every neuron and pathway you need to consider in your study. 💖 2. 💖 Divide into strata: Just like how a neurosurgeon divides the brain into different regions (frontal, temporal, occipital, etc.), you categorize your population into specific groups based on a characteristic—such as age or gender. 💖 3. 💖 Randomly sample within each stratum: After identifying the brain regions, you carefully extract samples from each region (just like taking samples from specific parts of the brain during surgery) to make sure each region is well-represented in your data. 💖 4. 💖 Combine the samples: Once you’ve sampled from each region, you reconnect the brain regions into one holistic view, just like stitching up the brain after surgery, combining all your samples into one complete dataset. 💖

Answer 19

✨Patterns✨

Answer 20

✨A method where the population is divided into groups (clusters), and entire clusters are randomly selected for study.✨

Answer 21

✨💖 1. Define the population: Identify the entire group you're studying (e.g., all schools, hospitals, or neighborhoods). 💖 2. Divide the population into clusters: Break the population into natural groups or clusters (e.g., different cities, schools, or regions). 💖 3. Randomly select clusters: Choose entire clusters at random from the list of clusters you created. 💖 4. Study all individuals in selected clusters: Collect data from every individual within the randomly chosen clusters. 💖 5. Analyze the data: Combine the data from all the individuals within the selected clusters to draw conclusions about the population. 💖✨

Answer 22

✨💖 When the population is large or geographically spread out: It’s more practical to divide the population into clusters (e.g., cities, schools, or neighborhoods) and randomly select entire clusters to make the sampling process more manageable. 💖 💖 When you can’t list every individual: If it’s hard or time-consuming to list every individual in the population, using clusters allows you to focus on groups, making it more feasible. 💖 💖 When cost and time are factors: Studying entire groups within clusters can save you time and money compared to sampling individuals from every possible group. 💖 💖 When the groups are similar: Cluster sampling works well when the individuals within each cluster are similar to each other, so studying one cluster can give you a good understanding of the whole population. 💖 In short, use it when it’s more efficient and practical than other sampling methods, especially for large or hard-to-reach population. If time and money aren't a factor it's usually better to do more individualised studies💖✨

Answer 23

✨A non-probability sampling method where participants are selected based on their ease of access (convenience) rather than randomly.✨

Answer 24

✨💖 1. Define your population: Identify the group you want to study (e.g., customers, students, etc.). 💖 2. Select participants based on availability: Choose participants who are easiest to reach or access, such as people nearby, volunteers, or those readily available. 💖 3. Collect data: Gather information from the selected individuals. 💖 4. Analyze the data: Use the collected data to draw conclusions about the population (though remember, the results may not be fully representative due to potential bias). 💖 It's all about ease and accessibility, but it can introduce bias because it doesn’t provide a random selection. 💖✨

Answer 25

✨A method where sampling happens in steps, selecting large groups first and then narrowing down using different sampling methods at each stage.✨

Answer 26

💖 1. Define the population: Identify the entire group you want to study. 💖 2. Divide the population into large clusters: Break it into big, manageable groups (e.g., cities, schools, hospitals). 💖 3. Select clusters using a sampling method: Use random sampling (or another method) to pick some clusters. 💖 4. Further divide the selected clusters: Break them down into smaller subgroups (e.g., schools → classrooms → students). 💖 5. Use a different sampling method at each stage: For example, SRS for cities, cluster sampling for schools, and systematic sampling for students. 💖 6. Collect data from the final sample: Gather data from the individuals selected in the last stage.

Answer 27

✨to multiple sampling steps, more complex to design and execute, and has a higher risk of bias if the sampling methods at each stage aren’t properly chosen.✨

Answer 28

✨Every member of the population has a known and non-zero chance of being selected (e.g., SRS, stratified sampling).✨

Answer 29

✨1.💖 State a hypothesis.💖 2. 💖 Identify individuals of interest.💖 3. 💖 Specify the variables to measure.💖 4. 💖 Determine if you will use an entire population or a sample. (If you choose a sample, choose a sampling method)💖 5. 💖 Address ethical concerns before data collection💖 6. 💖Collect data💖 7. 💖 Use descriptive or inferential statistics to answer your hypothesis💖 8. 💖Note any concerns about your data collection or analyses and make recommendations for future studies💖✨

Answer 30

✨👛Institutional Review Board (IRB)👛✨ ✨🐽Consent: If the study is being conducted children you'll need it BOTH from their parents and the children🐽✨

Answer 31

✨1. 💖Collect from Existing Data sets💖 2. 💖Collect Manually💖✨

Answer 32

✨Government✨

Answer 33

✨👛Census👛 ✨ ✨ 🐽Sample🐽✨

Answer 34

✨ Experimental and Observational✨

Answer 35

✨A treatment or intervention is deliberately assigned to the individuals. i.e. Controlling conditions, applying treatments. ✨

Answer 36

✨No treatment or intervention is deliberately assigned to individuals. i.e Watching and recording, no interference. ✨

Answer 37

✨To study the possible effect of the treatment or intervention on the variables measured✨

Answer 38

✨Replicated✨

Answer 39

✨To analyse relationships between variables without applying a treatment or intervention✨

Answer 40

✨1. 💖S – Selection Bias (Sample isn’t random or representative) 💖 2. 💖R – Response Bias 🎤 (People alter answers due to wording or pressure)💖 3. 💖R – Recall Bias 🧠 (People misremember past events)💖 4. 💖C – Confirmation Bias 🔍 (Looking for data that supports beliefs)💖 5. 💖M – Measurement Bias 📏 (Flawed tools or inconsistent data collection)💖 6. 💖N – Nonresponse Bias 📭 (People who don’t respond may be different)💖 7.💖 R – Survivorship Bias 💃 (Only looking at successful cases)💖 8. 💖Hidden Bias = Bias that influences results unnoticed, skewing the study without anyone realising it💖✨

Answer 41

✨A hidden factor that influences both the independent and dependent variables, causing a false impression of a relationship between them. ✨

Answer 42

✨Grouping participants by a characteristic and then randomly assigning them to treatment groups within each group.✨

Answer 43

✨1. 💖Identify blocks: Group participants by a characteristic (e.g., age, gender).💖 2. 💖Randomly assign: Within each block, randomly assign participants to different treatment groups.💖 3. 💖Repeat: Continue for all blocks to ensure balanced groups.💖✨

Answer 44

✨where a person (participant, research staff) is deliberately not told of a treatment assignment in a study so s/he is not biased in reporting study information.✨

Answer 45

✨where both study staff and participant does now know treatment assignment✨

Answer 46

✨1.💖Emergency Unblinding – Done when a participant's safety is at risk.💖 2. 💖Planned Unblinding – Occurs at pre-specified points in the study.💖 3.💖 Accidental Unblinding – Happens unintentionally due to errors or clues.💖 4. 💖Partial Unblinding – Only specific study personnel are unblinded.💖 5. 💖Complete Unblinding – Everyone is unblinded, usually at study completion.💖✨

Answer 47

✨a graph that shows how data is distributed across classes (Tupperwares), with bars representing the frequency of data in each class .✨

Answer 48

✨1. 💖Collect all your data points💖 2. 💖Choose you classes💖 3. 💖Sort you data points into the classes💖 4. 💖Note your classes on the x axis💖 5. 💖 Note the frequency on the y axis 💖 6. 💖For each class, draw a bar that reaches up to the corresponding frequency.💖 💖P.S. Maybe you make a Frequency Table for clarity💖✨

Answer 49

✨A Relative Frequency Histogram shows the percentage of data in each class instead of the count.✨

Answer 50

✨1. 💖 Collect all your data points.💖 2.💖 Choose your classes.💖 3.💖 Sort your data points into the classes.💖 4.💖 Calculate the relative frequency for each class (class frequency divided by total number of data points).💖 5.💖 Note your classes on the x-axis.💖 6.💖 Note the relative frequency on the y-axis.💖 7.💖 For each class, draw a bar that reaches up to the corresponding relative frequency.💖 💖P.S. It will be very helpful to do Frequency Table here 💖 ✨

Answer 51

✨The description of how the data points of a variable are spread or arranged. It shows the frequency or probability of different outcomes in a dataset.✨

Answer 52

✨1. 💖Normal Distribution: Most numbers are close to the middle. It’s balanced. 💖 2. 💖Uniform Distribution: Every number has the same chance of happening. No number is more likely than the other. 💖 3. 💖Skewed Right: Most numbers are small, but a few big numbers stretch the group to the right. 💖 4. 💖Skewed Left: Most numbers are big, but a few small numbers stretch the group to the left. 💖 5. 💖Bimodal Distribution: There are two separate groups of numbers that appear the most. 💖✨

Answer 53

✨Data points that are significantly different from the rest of the data.✨

Answer 54

✨1. 💖Global Outliers: These are numbers that are really far away from the rest of the numbers.💖 2.💖These numbers are strange in one group, but okay in another group.💖✨

Answer 55

✨It’s the total number of times something has happened up to a certain point (Your maximum). ✨

Answer 56

✨Distribution✨

Answer 57

✨A time series graph shows how data changes over time, with time on the x-axis and values on the y-axis. It helps identify trends, patterns, and anomalies.✨

Answer 58

✨1. 💖Collect your data over time💖 2. 💖Label the x-axis with time intervals💖 3. 💖Label the y-axis with the measured values💖 4. 💖Plot each data point at the correct time💖 5. 💖Connect the points with a line to show trends💖 6. 💖Analyse for patterns, trends, and outliers💖✨

Answer 59

✨A chart that uses rectangular bars to show the size of different categories✨

Answer 60

✨🐽Misleading🐽✨ ✨👛same scale👛✨

Answer 61

✨Splitting bars in bar charts into subcategories for further precision✨

Answer 62

✨A circular chart that is divided into slices to show how different categories compare as parts of a whole✨

Answer 63

✨Mutually Exclusive✨

Answer 64

✨1.💖Add up all the numbers to get the total. (Example: 5 + 3 + 8 = 16)💖 2.💖Find the percentage for each part by dividing each part by the total and multiplying by 100. (Example: (5 ÷ 16) × 100 = 31.25%, (3 ÷ 16) × 100 = 18.75%, (8 ÷ 16) × 100 = 50%)💖 3.💖Convert percentages to decimals by dividing each by 100. (Example: 31.25 ÷ 100 = 0.3125, 18.75 ÷ 100 = 0.1875, 50 ÷ 100 = 0.5)💖 4.💖Multiply the decimal by 360 to get the degrees for each section. (Example: 0.3125 × 360 = 112.5°, 0.1875 × 360 = 67.5°, 0.5 × 360 = 180°)💖 5.💖Draw a circle and mark the center.💖 6.💖Use a protractor to measure the angles for each section based on the degrees you calculated. (Example: 112.5° for the first part, 67.5° for the second, and 180° for the third.)💖 7.💖Label each section with the corresponding category and percentage.💖✨

Answer 65

1. 💖 Provide a title Always include a clear and descriptive title that explains what the graph shows.💖 2. 💖 Label axes Clearly label both the x-axis and y-axis with their respective variables. This helps explain what each axis represents.💖 3. 💖 Identify units of measure Always include the units of measurement (e.g., dollars, hours, percentage) so people understand the scale of the data.💖 4. 💖 Make the graph clear Font size: Use legible fonts and avoid making text too small. Graph complexity: Avoid overcrowding the graph with too many data points or categories. Keep it simple and focused. Colors: Use contrasting colors for clarity and to differentiate between categories, but don’t overdo it. Legends: Use legends or labels to explain what different colors or lines represent if needed.💖 5. 💖 Scale appropriately Choose an appropriate scale for the data. Ensure the graph is not misleading by adjusting the scale to fit the data points properly.💖 6. 💖 Show trends clearly If possible, emphasize the patterns or trends in the data (e.g., using lines or markers on a line graph) to make the message easier to understand.💖 7. 💖 Consistent formatting Make sure your graphs are consistent, especially if you’re comparing multiple graphs. Use the same colors, fonts, and layout style.💖

Answer 66

1.🌈For quantitative data, when you want to see the distribution.🌈 2. 🌺For quantitative data, when you want to see the distribution. Also, good for comparing to other data.🌺 3.🏝️For quantitative data, when you want to see the distribution. Easier to make by hand than histogram.🏝️ 4.🦩For graphing a variable that changes over time and is measured at regular intervals.🦩 5.🐠For qualitative or quantitative data, and for displaying frequency or percentage.🐠 6.🍍For frequencies of rare events in descending order.🍍 7.🦜For mutually-exclusive categories (quantitative or qualitative).🦜

Answer 67

✨organises data by showing how often each value appears. It’s useful for spotting patterns, summarising large datasets, and preparing for further analysis.✨

Answer 68

✨An interval grouping data values. Example: Between 30 and 40 miles.✨

Answer 69

✨The smallest and largest values that fit in a class. Example: 30 is the lower class limit, and 40 is the upper class limit.✨

Answer 70

✨The size of a class. Example: Upper class limit (40) minus lower class limit (30) = 10, then add 1 → 11. Example: 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 → 11 values.✨

Answer 71

✨ The number of data points that fall within a class. Example: The number of patients transported 30 to 40 miles.✨

Answer 72

Find the range → Biggest number − Smallest number Example: 98 − 12 = 86 Pick the number of classes (usually 5 to 10) Let’s say 6 Find class width → Range ÷ Number of classes 86 ÷ 6 = 14.3 (Always round up, so we use 15) Make class groups (start from the smallest number and add the width) 10 - 24 25 - 39 40 - 54 55 - 69 70 - 84 85 - 99 Count how many numbers fit in each group → That’s the frequency!

Answer 73

✨A table that shows the proportion of data that falls into each class relative to the total sample size.✨

Answer 74

✨In relationship to the rest of the data.✨

Answer 75

✨Frequency✨

Answer 76

✨Total Sample Size✨

Answer 77

✨Relative Frequency✨

Answer 78

✨Is the proportion of the values that are in that class.✨

Answer 79

✨1.💖Find the frequency (f) of the class you're interested in.💖 2.💖Find the total sample size (n) by adding up the frequencies of all classes. 💖 3.💖Calculate relative frequency by dividing frequency (f) by total sample size (n):💖 4.💖Convert to percentage if needed by multiplying by 100.💖✨

Answer 80

✨1. 💖Sort the data: Put the data in order from smallest to largest.💖 2.💖 Split the numbers into stem and leaf: The stem is everything except the last digit (the tens, hundreds, etc.). The leaf is just the last digit (ones, tenths, etc.). For example, 52 → stem = 5, leaf =2.💖 3. 💖List the stems: Write down all the stems (without repeating). If you have 52, 54, and 58, you will just write "5" for the stem.💖 4.💖 Add the leaves: For each stem, list all the corresponding leaves next to it. For example, if you have 52, 54, and 58, the stem "5" will have leaves "2", "4", and "8".💖 5. 💖Organise the leaves: Arrange the leaves in numerical order for each stem. For example, "5 | 2, 4, 8" becomes "5 | 2, 4, 8" after ordering.💖 6. 💖Repeat for all stems: Do the same for every stem in your data.💖 7. 💖Final display: Now, your stem-and-leaf display is ready! Each stem shows a group of numbers, and the leaves show the details of each number. Example: Data: 52, 54, 58, 61, 63, 67, 72, 74, 75💖 Steps: 💖Sort the data: 52, 54, 58, 61, 63, 67, 72, 74, 75💖 💖Split into stems and leaves: 52 → Stem = 5, Leaf = 2 54 → Stem = 5, Leaf = 4 58 → Stem = 5, Leaf = 8 61 → Stem = 6, Leaf = 1 63 → Stem = 6, Leaf = 3 67 → Stem = 6, Leaf = 7 72 → Stem = 7, Leaf = 2 74 → Stem = 7, Leaf = 4 75 → Stem = 7, Leaf = 5 List the stems: 5, 6, 7💖 💖Add the leaves: 5 | 2, 4, 8 6 | 1, 3, 7 7 | 2, 4, 5💖 💖Organise the leaves: Already in order.💖 💖Final stem-and-leaf display: 5 | 2, 4, 8 6 | 1, 3, 7 7 | 2, 4, 5 And that's it! You've got your stem-and-leaf display.💖✨

Answer 81

✨tells us where the centre or middle of the dataset is.✨

Answer 82

✨Mean (Average) Add up all the numbers. Divide by how many numbers there are. Example: (5 + 10 + 15) ÷ 3 = 10 Mean = Σx / n Median (Middle Value) Put the numbers in order from smallest to largest. If there’s an odd number of values, pick the middle one. If there’s an even number, take the average of the two middle numbers. Example: Ordered list: 3, 5, 8, 12, 15 → Median = 8 Example (even count): Ordered list: 2, 6, 9, 11 → (6 + 9) ÷ 2 = 7.5 Mode (Most Frequent Number) Find the number that appears the most. There can be one mode, multiple modes, or no mode if all numbers appear equally. Example: 4, 7, 7, 9, 10 → Mode = 7✨

Answer 83

✨Mean: Best when data has no extreme values (outliers). Median: Best when data has outliers or is skewed. Mode: Best for categorical data (e.g., favourite colour, most common shoe size).✨

Answer 84

✨shorthand for "sum of" Example: if x = {3,4,7} then Σx = 14 Also, ∑xy is shorthand for "multiply x and y for each pair, then add them up." Example: If x = {2, 5, 7} and y = {3, 4, 1}, First, multiply: (2×3), (5×4), (7×1) → {6, 20, 7} Now, add: 6 + 20 + 7 = 33✨

Answer 85

✨is the average of a sample (a small group from a big group).✨

Answer 86

✨is the average of the whole group.✨

Answer 87

✨ 👛outliers👛 Example {1, 2, 3, 4, 100} Median: The middle number is 3. It stays the same even though 100 is way higher than the other numbers. Mean: If you add all the numbers up (1 + 2 + 3 + 4 + 100 = 110) and divide by 5, the mean is 22. The mean is much higher because 100 is an outlier, and it affects the mean a lot. So, in cases with outliers, the median is more reliable for representing the “typical” value. The mean is less reliable because it can be easily skewed by extreme numbers.✨

Answer 88

✨A method of ameliorating the influence of the outliers How to Calculate a 5% Trimmed Mean: 💖Find the total number of data points (n): Example: 100 data points.💖 💖Calculate 5% of the data points: 5% of 100 = 5.💖 💖Order the data: Sort from lowest to highest (or vice versa).💖 💖Remove the 5% from both ends: 💖 💖Remove the 5 smallest and 5 largest values.💖 💖Calculate the new mean: Find the mean of the remaining data, which is now less affected by outliers.💖✨

Answer 89

✨the whole times 0.0what-ever -percentage Example: To find 5% of 1200 1200×0.05=60✨

Answer 90

✨a method of computing an average where some data points contribute more than others✨

Answer 91

✨1.💖Identify Given Information💖 2.💖Multiply Each Value by Its Weight💖 3.💖Sum Up the Weighted Values💖 4.💖Divide by the Sum of Weights💖✨

Answer 92

✨The weights depend on importance and are usually assigned in one of three ways: Arbitrarily – If no real justification exists, weights might just be assigned based on intuition or preference. Example: A teacher decides homework is worth 30% and exams 70% just because they think exams matter more. Empirically – Based on data or past trends. Example: A company weighs customer feedback scores differently based on how predictive they are of future sales. By Policy/Rules – Set by an institution, standard, or contract. Example: University grading systems (e.g., final exams = 50%, quizzes = 30%, participation = 20%). It all depends on what matters most in the context.✨

Answer 93

✨when the mean, median, and mode are all the same value Mean: The average of all the data points. Median: The middle value when all data points are arranged in order. Mode: The value that appears most frequently. In a perfect normal distribution (like a bell curve), these three measures of central tendency (mean, median, and mode) will be the same and perfectly aligned with the center of the curve. Why? The data is symmetrically distributed, so the average, middle, and most frequent values all occur at the same point.✨

Answer 94

✨the positions of the mean, median, and mode change depending on the direction of the skew: Right-skewed distribution (positively skewed): The tail of the distribution is stretched to the right, meaning there are more lower values and a few higher values. In this case, the mean is greater than the median, and the median is greater than the mode. The order of the measures from left to right is: Mode < Median < Mean. Left-skewed distribution (negatively skewed): The tail of the distribution is stretched to the left, meaning there are more higher values and a few lower values. In this case, the mean is less than the median, and the median is less than the mode. The order of the measures from left to right is: Mean < Median < Mode. Example: If you have a distribution of values: Right-skewed: {1, 2, 3, 4, 5, 5, 5, 6, 7, 8, 9} Mode: 5 (most frequent) Median: 5 (middle value) Mean: 5.36 (average) Left-skewed: {1, 2, 3, 4, 5, 5, 6, 7, 8, 9} Mode: 5 (most frequent) Median: 5 (middle value) Mean: 4.6 (average) In skewed distributions, the skew affects where the mean gets pulled relative to the median and mode.✨

Answer 95

✨How spread out the data is. If numbers are close together, variation is low (consistent). If numbers are far apart, variation is high (all over the place). Example: Two classes have the same average grade of 75, but the grades could look very different: Class A: Everyone got 75, 75, 75, 75, 75 → No variation, everyone got the same grade. Class B: Some got 50, 60, 75, 90, 100 → Big variation, some did much better or worse than others. Even though both have the same mean (75), Class B has more spread-out grades. That’s why we need variation (like range, variance, or standard deviation) to see how much the data is spread out or clustered together. ✨

Answer 96

✨1.💖Range → Difference between the largest and smallest values.💖 2.💖Variance → The average of the squared differences from the mean.💖 3.💖Standard Deviation → The square root of the variance; shows how much data deviates from the mean.💖 4.💖Coefficient of Variation → Standard deviation divided by the mean; useful for comparing variability between different datasets.💖 💖💖✨

Answer 97

✨The Largest Value - The Smallest Value 🧮 For example: Data: 42, 33, 21, 78, 62 Maximum = 78 Minimum = 21 Range = 78 − 21 = 57 ✅ The data spreads 57 units from the smallest to the largest value.✨

Answer 98

✨Range is too sensitive to outliers, so we use better tools like: Standard deviation Variance Interquartile range Range only looks at two numbers: → The highest and the lowest. It ignores everything else in the middle. So if you change just one of those two numbers (highest or lowest), the range changes a lot—even if the rest of the data stays the same. Example: Original: [21, 33, 42, 62, 78] → Range = 78 − 21 = 57 Now change one number: [23, 33, 42, 62, 90] → Range = 90 − 23 = 67 The range changed a lot (from 57 to 67), but most of the numbers didn’t.✨

Answer 99

✨Variance – average of the squared differences from the mean It tells you how spread out the numbers are from the mean. If the variance is small, your numbers are huddled close together. If it’s big, your numbers are scattered like confetti 🎉. Think of it like: “How well does the average (mean) describe everyone?” ✨

Answer 100

✨Standard deviation – the square root of the variance Variance gives you the squared spread — a bit awkward. Standard deviation is just the square root of that, so it’s easier to interpret and in the same units as your data. So if you’re measuring grades, standard deviation is also in grade units (not grade² like variance is).✨

Answer 101

✨Find the mean (8+6+2)/3 = 5.333 Subtract the mean from each value (get deviations) 8−5.333 = 2.667 6−5.333 = 0.667 2−5.333 = −3.333 Square each deviation 2.667² = 7.113 0.667² = 0.445 (−3.333)² = 11.109 Add them up 7.113 + 0.445 + 11.109 = 18.667 Divide by n−1 (for sample variance) 18.667 ÷ (3−1) = 9.334 → This is the variance Square root the variance √9.334 = 3.055 → This is the standard deviation ✅✨

Answer 102

✨🔢 Variance Tells you how spread out your data is. ➡️ Higher variance = more spread ➡️ Lower variance = more consistent Example: Set A: [5,5,5] → Variance = 0 (super consistent) Set B: [2,5,8] → Variance > 0 (more spread) 📏 Standard Deviation Tells you on average how far values are from the mean. ➡️ It’s the "typical" distance from the center. Example: Mean = 5 Standard Deviation = 2 → Most values are about 2 units away from 5 ✅✨

Answer 103

✨🎯 Focus on Sample Formulas We usually don’t know the whole population — just a sample. So we use sample variance and sample standard deviation formulas. s² = (sum of (x - mean)²) ÷ (n - 1) s = square root of s² Where: x = each value in the dataset mean = average of all values n = number of values s² = variance s = standard deviation ✨

Answer 104

✨Two — the defining formula and the computational formula.✨

Answer 105

✨“Square all the x-values first, then add the result.” The step-by-step method: mean → deviations → squares → average → square root.✨

Answer 106

✨"Add all the x-values first, then square the result." A shortcut using Σx² and (Σx)² that gives the same result but skips the logic.✨

Answer 107

✨ The defining formula — it shows what’s really happening in the data.✨

Answer 108

✨The computational formula.✨

Answer 109

✨Sample Defining Formulas These are the formulas used to calculate sample variance and standard deviation. The main part of the formula, Σ(x − x̄)², is called the Sum of Squares. This captures how much each value differs from the mean, squared and then added together. Sample Variance (s²): s² = Σ(x − x̄)² / (n − 1) Sample Standard Deviation (s): s = √[Σ(x − x̄)² / (n − 1)] We divide by (n − 1) when using a sample. First calculate the Sum of Squares, then plug it into the formula. Standard deviation is just the square root of the variance.✨

Answer 110

✨Steps to Calculate Sum of Squares (SS): Find the mean of the data set. Subtract the mean from each value (get the deviations). Square each deviation. Add all the squared deviations. That final total = Sum of Squares SS = Σ(x - x̄)² Where: SS = Sum of Squares Σ = Sum of... x = each value in the dataset x̄ = the sample mean (x - x̄)² = square of the difference between each value and the mean ✅✨

Answer 111

✨Sample variance tells you how spread out the data is from the mean in a sample. Formula: s² = [Σ(x − x̄)²] / (n − 1) This means: Subtract the mean from each value. Square each result. Add them (this is the "sum of squares"). Divide by (n − 1).✨

Answer 112

✨Divide sum of squares by (n − 1) → that's the variance. Take the square root of that → that's the standard deviation. Example: Sum of squares = 70, n = 6 Variance = 70 ÷ 5 = 14 Standard deviation = √14 ≈ 3.74 ✅✨

Answer 113

✨Descriptive Statistics Quick Reference Descriptive Statistics Concepts Mean (Average): The mean is the sum of all numbers divided by how many numbers there are. It shows the center (typical value) of the data. Example: For numbers 2 and 5, add them (2 + 5 = 7) and divide by 2. The mean = 3.5. Deviation from the Mean: A deviation is a number minus the mean. It shows how far each number is from the average. Example: For 2 and 4, the mean = 3, so deviations are 2 – 3 = –1 and 4 – 3 = 1. Squaring a Number: Squaring a number means multiplying it by itself. It makes negative numbers positive and shows the number’s size. Example: 3 × 3 = 9 (so 3² = 9), and (–2) × (–2) = 4. Square Root: The square root of a number is a value that, when multiplied by itself, gives the original number. It is the reverse of squaring. Example: √9 = 3, because 3 × 3 = 9. Sum of Squares (SS): Sum of squares is adding up each deviation from the mean squared. It tells the total squared distance of values from the mean. Example: For 2 and 4 (mean = 3), deviations are –1 and 1; their squares are 1 and 1, so SS = 1 + 1 = 2. Sample Variance (s²): Sample variance is the average of squared deviations (using n – 1 in the bottom). It shows how spread out the data are (in squared units). Example: For 2 and 4, SS = 2 (as above), n = 2, so variance s² = 2 ÷ (2 – 1) = 2. Standard Deviation (s): Standard deviation is the square root of the variance. It tells the typical distance of data points from the mean (in original units). Example: If variance = 2 (as above), then s = √2 ≈ 1.4. Range: The range is the largest number minus the smallest number. It shows the total spread of the data from low to high. Example: For 2, 7, and 10, range = 10 – 2 = 8. Conceptual Ideas Sample vs Population: The population is the whole group of interest; a sample is a smaller part of it. We often use a sample when we cannot measure the whole population. Example: Surveying 100 people (sample) to learn about all 1000 people (population). n − 1 (Bessel’s correction): When calculating the variance of a sample, we divide by (n – 1) instead of n. This makes the sample variance a fair (unbiased) estimate of the true population variance. Example: For 3 data points, divide by 2 (3 – 1). Consistency vs Inconsistency: Consistency means data points are similar (low spread); inconsistency means they vary a lot (high spread). This is seen in measures like range or SD. Example: [4, 5, 6] is consistent (values close together), while [1, 5, 10] is inconsistent (values far apart). Defining vs Computational Formula: The defining formula for variance uses (value – mean)² for each data point. The computational formula rearranges sums of values and squares to calculate the same result more quickly. Both give the same answer; usually the computational form is used for easier calculation. Why Variance & SD Are Better Than Range: Variance and SD use all data points, so they show how spread out the data are on average. The range only uses the smallest and largest values, so it can be misleading if one value is very big or small. Example: [5, 6, 7] has range = 2; if one value changes to 20 (making [5, 6, 20]), range = 18 (huge jump) even though two numbers stayed close. Variance/SD reflect how all values spread out.✨

Answer 114

✨Sample Variance Formula = s2= ∑(x− xˉ) 2 / n-1 Population Variance formula = σ2 = ∑(x−μ)2/N ✨

Answer 115

✨A % that shows how much the data varies relative to the mean. Formula: CV = (Standard Deviation ÷ Mean) × 100 → Higher % = more spread out (less consistent) → Lower % = tighter around the mean (more consistent)✨

Answer 116

✨Sample CV = (s / x̄) × 100 This is the formula for a sample: s is the standard deviation (how much data jumps around), x̄ (x-bar) is the mean (average), Multiply the fraction by 100 to turn it into a percentage. ✨

Answer 117

✨It shows how reliable or consistent the data is — and lets you compare variability across different units or scales. 📊 Example: Group A: Mean = 6, SD = 3.74 → CV = 62% Group B: Mean = 8, SD = 4 → CV = 50% → Group B is more consistent, even with a bigger SD!✨

Answer 118

✨A rule that tells you the minimum % of values that fall within a certain number of standard deviations (k) from the mean — no matter the shape of the data. ✅ Great for non-normal distributions ✅ Helps you spot outliers and understand variability Here is an analogy the worked for me: Imagine you’re watching campers from a drone above. The campfire is your mean — the centre of the gathering. When you zoom in tight, you only see the campers right next to the fire. As you zoom out, your view widens and you start to see more campers who are a bit further away. Chebyshev’s Theorem says: No matter how the campers are scattered, if you zoom out to a certain radius (say “k” steps away from the fire), you’ll always capture at least a certain percentage of the campers inside that circle. So as you zoom out: The percentage of campers inside your circle goes up — because you include more who were further away. The theorem tells you the minimum guaranteed percentage inside any zoom level, even if some campers are far off wandering. It’s like a safety net for how spread out the group can be, no matter how weird or uneven their distribution.✨

Answer 119

✨At least 1− (1/k2) 1− (1/k 2) × 100% of data falls within k standard deviations of the mean ✅ k must be ≥ 1 Example: If k = 2 → at least 1 - (1/4) = 75% of data within 2 SDs✨

Answer 120

✨A percentile shows the % of scores below a certain value in a dataset. Percentiles tell you how you rank compared to everyone else in a set of quantitative (number-based) data. 🧪 Example (Standardized Test Vibes): If you're in the 77th percentile, it doesn't mean you got 77% of the answers right. It does mean you did better than 77% of the people who took the test. If 100 people took it, that puts you ahead of 77 people. 💡 Why It's Useful: Percentiles are position indicators. They help compare individual performance or values to a larger group. Percentile = the score below which a given % of data falls. Example: 25th percentile = 25% of scores are less than this number. ✨

Answer 121

✨Count how many values are less than your score. Divide that count by the total number of values. Multiply by 100. ✨

Answer 122

✨🎯 To find the value at a given percentile (like “what score is the 30th percentile?”), use: Rank = (P / 100) × (N + 1) Rank equals P divided by 100, multiplied by N plus 1 Where: P = the percentile you're interested in (like 30 for the 30th percentile), N = total number of values in your data set. 🔍 Steps: Sort your data from lowest to highest. Plug into the formula: e.g. 30th percentile in a list of 10 values → Rank = (30 / 100) × (10 + 1) = 3.3 If the rank is a whole number (say 4), then the 4th value is your percentile. If it's a decimal (like 3.3), interpolate: → Take the 3rd value + 0.3 × (4th − 3rd value) 🧠 Example: Data: 2, 3, 4, 5, 6, 7, 8, 9 Percentile: 25th N = 8 → Rank = (25 / 100) × (8 + 1) = 2.25 → Look between 2nd and 3rd value: 3 + 0.25 × (4 − 3) = 3.25 So, the 25th percentile = 3.25✨

Answer 123

✨Quartiles are special percentiles that split your data into four equal parts. Think of your data like a pizza 🍕—quartiles cut it into 4 slices based on value, not number of items. 📊 The 3 Main Quartiles: 1st Quartile (Q1) = 25th percentile → 25% of the data is below this value. 2nd Quartile (Q2) = 50th percentile → This is just the median. Half the data is below, half is above. 3rd Quartile (Q3) = 75th percentile → 75% of the data is below this value. So if you line up all your data from smallest to largest, the quartiles are the values that mark the cutoff points between the bottom 25%, middle 50%, and top 25%.✨

Answer 124

✨1. Order the data from smallest to largest. Q2 (median): Find the middle number. Q1: Find the median of the lower half (left of Q2). Q3: Find the median of the upper half (right of Q2). 👉 If there's an odd number of data points, don’t include the median when finding Q1 and Q3. So yes — odd = simple pick, even = quick calc.✨

Answer 125

✨QR = Q3 - Q1 That’s it. It’s the range between the third quartile (75th percentile) and the first quartile (25th percentile). It shows where the middle 50% of your data lives — the core chunk, the comfy middle. ✨

Answer 126

✨🧮 How to calculate it: Order your data from smallest to largest. Find the median (Q2). This splits your data in half. Find Q1: the median of the lower half (don’t include Q2 if odd!). Find Q3: the median of the upper half (same rule applies). Subtract Q1 from Q3: IQR = Q3 − Q1 🧠 Why’s it useful? It ignores extreme values (outliers). Helps you see how spread out the middle of your data is. Used in boxplots and spotting outliers. 🎲 Example: Data: 2, 4, 6, 8, 10, 12, 14 Median (Q2) = 8 Lower half: 2, 4, 6 → Q1 = 4 Upper half: 10, 12, 14 → Q3 = 12 IQR = 12 − 4 = 8 ✨

Answer 127

✨Statistics: the desperate search party for your runaway data.” And yeah — a shocking amount of stats is just figuring out where most of the data went, how spread out it is, and whether it's acting dodgy. It's like a game of hide-and-seek with numbers: Mean: "Come back here, average!" Median: "Let’s find the middle child, they’re usually sensible." Mode: "Who’s showing up the most, the attention seeker?" Standard deviation: "How rebellious is everyone feeling?" IQR: "Ignore the weirdos on the edges — what are the normal kids doing?" So yes, stats is a bit like emotionally profiling a classroom of numbers. 😅. “Statistics: beating predictions out of your data like a desperate oracle with performance anxiety.” And yeah — once you've finally wrangled your chaotic little data gremlins into some kind of order, it’s time for act two: forcing them to cough up answers about the future. It’s like: Linear regression: “If I draw a line through this mess, maybe I can guess what happens next. Please cooperate.” Probability: “On a scale of 0 to 1, how likely are you to betray me?” Confidence intervals: “I’m 95% sure this won’t completely humiliate me. The other 5% is quietly sweating.” Hypothesis testing: “I’m not saying you're wrong... but I am going to run 10,000 simulations to prove it.” Bayes’ Theorem: “Let’s update our beliefs — like a polite but suspicious detective.” It’s not so much prediction as it is data interrogation: "Dear data: I don’t care what you’ve done — tell me what you’re going to do. Or else I’m switching to tarot.”✨

Answer 128

✨A box plot is a visual summary of a dataset using five-number summary stats: Minimum Q1 (25th percentile) Median (Q2 / 50th percentile) Q3 (75th percentile) Maximum It shows the spread, center, and skew of the data with a box (the interquartile range) and two whiskers extending to the minimum and maximum values.✨

Answer 129

✨Order the data from smallest to largest. Find the median (Q2). Split the data into lower and upper halves. If odd number of data points, exclude the median. If even, include all. Find Q1 = median of lower half (25th percentile) Find Q3 = median of upper half (75th percentile) Minimum = smallest value Maximum = largest value Plot it: Draw a box from Q1 to Q3 Draw a line inside the box for Q2 Draw whiskers from min to Q1 and from Q3 to max✨

Answer 130

✨A scattergram (or scatter plot) is a graph plotting pairs of values (x,y) to show the relationship between two variables. The x variable is the explanatory (independent) variable, graphed on the x-axis, and the y variable is the response (dependent) variable, graphed on the y-axis. It helps visualize how changes in x relate to changes in y.✨

Answer 131

✨Identify the explanatory (independent) variable (x) and response (dependent) variable (y). Draw the x-axis (horizontal) and y-axis (vertical). For each pair of data points (x,y), plot a dot where x is on the x-axis and y is on the y-axis. Look for patterns or trends among the dots to understand the relationship between variables. If the dots roughly form a line going upwards from left to right, that’s a positive correlation — as x increases, y tends to increase. Like more study hours, better scores. If the dots form a line going downwards from left to right, that’s a negative correlation — as x increases, y tends to decrease. Like more hours watching TV, lower test scores (maybe). If the dots are all over the place with no clear pattern, that means no correlation — x and y aren’t related in any obvious way. This is the first, simplest way to visually check if two variables might be connected before you get fancy with numbers. It’s like spotting a trend at a glance!✨

Answer 132

✨Linear correlation is when two variables, x and y, show a straight-line relationship on a scatterplot. As x increases, y increases (positive correlation) or decreases (negative correlation) in a consistent pattern. If the points form a straight line, it’s a perfect linear correlation.✨

Answer 133

✨Strength and Direction Direction 📈 Positive: As x increases, y increases. 📉 Negative: As x increases, y decreases. ❌ No correlation: No clear pattern — just chaos. Strength 💪 Strong: Dots hug the line tightly. 😐 Moderate: Dots kind of follow the line but wander a bit. 😵 Weak: Dots are all over the place — barely forming a line.✨

Answer 134

✨A number between -1 and +1 that measures the strength and direction of a linear relationship between two variables. Positive r means both variables increase together. Negative r means one variable increases as the other decreases. r close to 0 means little or no linear relationship. ✨

Answer 135

✨r = +1: Perfect positive correlation (dots lie exactly on an upward line) r = -1: Perfect negative correlation (dots lie exactly on a downward line) r near 0: No linear correlation |r| close to 1: Strong correlation |r| moderate (around 0.5): Moderate correlation |r| close to 0: Weak correlation✨

Answer 136

✨Raw formula ∑xy Plain English Multiply each x and y that go together (like partners), then add all those products up. Raw formula ∑x, ∑y Plain English Add up all the x values into one total. Do the same for all the y values. You’ll now have two totals: one for x and one for y. Raw formula (∑x)(∑y) Plain English Multiply your two totals from above (x total × y total). Raw formula n∑xy − (∑x)(∑y) Plain English Take the number of data pairs (n). Multiply it by the total you got from multiplying each x and y. Now subtract the number you got when you multiplied the x and y totals. This is your numerator (top of the fraction). Raw formula ∑x² Plain English Square each x (like 80², 90², etc.) and add them all together. This gives the "sum of x squared". Raw formula ∑y² Plain English Do the same thing for y values: square each one and add them. This gives the "sum of y squared". Raw formula n∑x² − (∑x)² Plain English Multiply n (the number of pairs) by the sum of x squared. Then subtract the square of the sum of x (not the same as the sum of x squared!). This gives the x-part of the denominator. Raw formula n∑y² − (∑y)² Plain English Same as above, but for y. This gives the y-part of the denominator. Raw formula √[(x-part)(y-part)] Plain English Multiply the x-part and y-part together, then take the square root of the result. This is the full denominator. Raw formula [numerator] ÷ [denominator] Plain English Divide the numerator (from earlier) by the denominator (the square root you just found). This gives you the correlation coefficient, r. ✨

Answer 137

✨ A bivariate normal distribution. (We don’t check for this in class, but it’s an important assumption.)✨

Answer 138

✨No, r is unitless.✨

Answer 139

✨r = 1.0 (perfect positive) or r = −1.0 (perfect negative).✨

Answer 140

✨r = 0✨

Answer 141

✨ As x goes up, y goes up. as x goes down, y goes down.✨

Answer 142

✨As x goes up, y goes down. As x goes down, y goes up.✨

Answer 143

✨Nothing — the value of r stays the same✨

Answer 144

✨Nothing — r stays the same even if the units change (like cm to inches)✨

Answer 145

✨A method for drawing the straighter line possible through a scatter plot, so you can predict one thing (y) from another thing (x). It shows the overall trend or patterns from 2 variables✨

Answer 146

✨A number that tells you how well you line fits the data. R² = how much of y's changes can be explained by x. The closer R² is to 1, the stronger the connection; closer to 0 means your line is mostly guessing.✨

Answer 147

✨FORMULA (Raw): ŷ = a + bx Where: ŷ = predicted y a = intercept = ȳ − b𝑥̄ b = slope = Σ[(x−𝑥̄)(y−ȳ)] ÷ Σ(x−𝑥̄)² PLAIN ENGLISH TRANSLATION: To get the best-fit line, we need two things: 💖 The slope (b) = how steep the line is 💖 💖 The intercept (a) = where it crosses the y-axis 💖 STEP-BY-STEP (With Example) Let’s say we have 3 students: x = hours studied y = exam score Student 1: x = 1 y = 50 Student 2: x = 2 y = 60 Student 3: x = 3 y = 65 Step 1: Find the means 𝑥̄ = (1+2+3)/3 = 2 ȳ = (50+60+65)/3 = 58.33 Step 2: Calculate the slope (b) Use: b = Σ[(x−𝑥̄)(y−ȳ)] ÷ Σ(x−𝑥̄)² Now build your table vertically for each student: Student 1 x = 1 y = 50 x−𝑥̄ = 1−2 = -1 y−ȳ = 50−58.33 = -8.33 (x−𝑥̄)(y−ȳ) = (-1)(-8.33) = 8.33 (x−𝑥̄)² = (-1)² = 1 Student 2 x = 2 y = 60 x−𝑥̄ = 0 y−ȳ = 1.67 (x−𝑥̄)(y−ȳ) = 0 (x−𝑥̄)² = 0 Student 3 x = 3 y = 65 x−𝑥̄ = 1 y−ȳ = 6.67 (x−𝑥̄)(y−ȳ) = (1)(6.67) = 6.67 (x−𝑥̄)² = (1)² = 1 Summing up: Σ[(x−𝑥̄)(y−ȳ)] = 8.33 + 0 + 6.67 = 15 Σ(x−𝑥̄)² = 1 + 0 + 1 = 2 So: b = 15 / 2 = 7.5 Step 3: Find the intercept (a) a = ȳ − b𝑥̄ a = 58.33 − (7.5)(2) = 58.33 − 15 = 43.33 Final regression equation: ŷ = 43.33 + 7.5x✨

Answer 148

✨How much y changes for each 1-unit increase in x✨

Answer 149

✨Slope of the Line (b) FORMULA (Raw): b = Σ[(x−𝑥̄)(y−ȳ)] ÷ Σ(x−𝑥̄)² Where: x̄ = mean of x ȳ = mean of y (x−x̄) = how far each x is from the mean (y−ȳ) = how far each y is from the mean Σ[(x−x̄)(y−ȳ)] = total of x-y pairwise movement Σ(x−x̄)² = total spread of x around its mean PLAIN ENGLISH TRANSLATION: To find the slope of the best-fit line: 💖 Multiply how far each x is from the average x by how far each y is from the average y💖 💖 Add all those up💖 💖 Then divide by how spread out the x-values are from the average x (squared)💖 This tells us how much y changes for every 1 unit increase in x STEP-BY-STEP (With Example): We have 3 students: x-values: 1 2 3 y-values: 50 60 65 Step 1: Find the means 𝑥̄ = (1 + 2 + 3) ÷ 3 = 2 ȳ = (50 + 60 + 65) ÷ 3 = 58.33 Step 2: Build the calculation table Row 1 x = 1 y = 50 x−𝑥̄ = 1 − 2 = −1 y−ȳ = 50 − 58.33 = −8.33 (x−𝑥̄)(y−ȳ) = (−1)(−8.33) = 8.33 (x−𝑥̄)² = (−1)² = 1 Row 2 x = 2 y = 60 x−𝑥̄ = 2 − 2 = 0 y−ȳ = 60 − 58.33 = 1.67 (x−𝑥̄)(y−ȳ) = (0)(1.67) = 0 (x−𝑥̄)² = (0)² = 0 Row 3 x = 3 y = 65 x−𝑥̄ = 3 − 2 = 1 y−ȳ = 65 − 58.33 = 6.67 (x−𝑥̄)(y−ȳ) = (1)(6.67) = 6.67 (x−𝑥̄)² = (1)² = 1 Step 3: Add it all up Σ[(x−𝑥̄)(y−ȳ)] = 8.33 + 0 + 6.67 = 15 Σ(x−𝑥̄)² = 1 + 0 + 1 = 2 Step 4: Final calculation b = 15 ÷ 2 = 7.5 So, the slope = 7.5 That means: for every extra hour studied, the exam score goes up by 7.5 points 🎯📈✨

Answer 150

✨Intercept a is the starting value of y when x = 0✨

💝✨Statistics✨💝 Flashcards

(175 cards)