Data Insights Flashcards
DI notes
> goal of Data Insights section is to test your ability to MAKE SENSE OF COMPLICATED DATA efficiently —-> which data you need to answer the question
READ WORDING CAREFULLY
> even though you will have an on-screen calculator, ESTIMATION is an important skills to master for DI questions that use the words “approximately”, “nearest to” or “closest to” —-> before resorting to the calculator, see if you can solve the problem in a smarter way
> in fact, it IS a SOUND STRATEGY to attempt to estimate your way through correct answers whenever possible in data insights
> not all DI questions require calculations (even in quant ones) –> some answers will be answerable through simply interpreting the information provided to us
> be comfortable with interacting with DI questions displayed on the computer screen
e.g., multiple tabs for Multi-Source reasoning; sort function for Table analysis
> Develop a smart timing strategy (e.g., spending 1 min to Graphics Qs and saving more time for complex questions)
> still better to FINISH all questions with some guesses than leave answers blank
> keep in mind may of the DI questions involve multiple parts –> ALL parts need to be answered correctly (no partial credit)
> watch out for EXTRANEOUS INFO in DI questions that won’t be needed to answer the questions —-> USE THE ANSWER CHOICES as a guide
Graphics Interpretation
What: select correct answer from a DROP-DOWN LIST based on information presented in a GRAPH or CHART
> typically TWO questions / statements
> “From each drop-down menu, select the option that creates the most accurate statement based on the information provided”
Types of charts:
> bar charts
> scatterplots **
> column charts / pareto charts
> cluster charts
> stacked column charts ***
> bubble diagrams
> Venn diagram
> Other DIAGRAMS with symbols
> Flow charts
> Frequency table and Histograms
> line graphs **
> Concept maps
> Pie charts
> hybrid charts with double axis
etc.
Most often, graphics interpretation questions are QUANTITATIVE
> ask about probability, slope of a line, direct and indirect variation, averages, ratios, standard deviation, mean, median, mode, range, percent of and percent change
Strategy for Solving:
(1) Summarize the high level message of the graph + textual information (“this graph shows the number of mangos sold during each ten-year period from 1980 to 2020.”) – “simple story”
> no need to get caught up in the details yet
(2) Go to the sentences — BEFORE DETAILS (might not need all the info)
(3) Pay attention to DETAILS of the graphic and its components (title, units of measurement, axes, axis titles, axis GRIDLINES, legend, colours, SYMBOLS patterns)
> READ the labels
> Symbols might be used to represent numbers
> be careful of y axis scales that DON’T start at 0 and can be misleading in terms of relative size (avoid using visual comparisons to make conclusions about values); use actual values instead
> Some visual estimation requires precision of reading values from the chart –> add imaginary gridlines if you must! And compare relative to other values in the chart (e.g., max value for Range calcs)
Notes:
> when you see estimation markers like “approximately”, “nearest to”, “closest to” —> solve question using savvy and disciplined estimation
> another trigger for estimation is when the ANSWER OPTIONS are SPREAD RELATIVELY FAR apart
> when a chart has NO numerical scales on its y axis –> can only perform RELATIVE comparisons across categories (e.g., one value is greater than or less than another value) ——> cannot say anything related to absolute value differences or ratios of values
ESTIMATION STRATEGIES:
> Division —> round to easy DECIMALS (NOT always nearest integer), then set fraction = x so you can cross multiply
e.g., 6.25/0.62 = x —-> ~6.2/0.62 = x —–> 6.2 = 0.62x —-> x = 10
> can also write down necessary info from charts in a TABLE (helpful to calculate percent change)
Graphics interpretation: Column and pareto charts
Category on x axis, and frequency or relative frequency on y axis
Questions focus on comparing RELATIVE heights of vertical bars
See word doc
Watch out for:
> y axis that don’t start at 0 –> cannot rely on visual aid
> y axis with no numerical values at all –> cannot determine actual values or ratios of values; just know if something is bigger/biggest vs smaller/smallest
e.g., 1, 2, 3 increments OR 101, 102, 103 increments
> HOWEVER, if we are given: axis starts at 0 and the increments are CONSTANT (difference) —> then we can determine RATIO VALUES on the chart
Try: One line’s value / lower line’s value —> see if this ratio is constant when you vary values
Graphics interpretation: Stacked column chart **
Allows us to compare relative frequencies of a single column (% split) and nicely illustrates the SUM of the series of numbers
Be careful when calculating the PARTS
e.g., if A + B = total
Then A = Total - B
Also don’t get overwhelmed by too many information in the CHART AND text —> keep track of what you need to solve
Other tips:
> can shorten the list of potential answers asking about proportion of a series using VISUAL APPROACH –> does the series represent over or less 50% of the bar?
Graphics Interpretation: Histograms and frequency tables
Represent FREQUENCY (count) of certain INDIVIDUAL VALUES or RANGES OF VALUES
Cumulative Frequency => helpful for “at least” or “at most” questions (sum of multiple categories)
e.g., How many attorneys at at least 4 pieces of fruit per day? = number of attorneys who ate 4 pieces + number who ate 5 pieces + number who ate 6 pieces
e.g., How many attorneys at at most 3 pieces of fruit per day? = number of attorneys who at 0 pieces + 1 pieces + 2 pieces + 3 pieces
Or total number of attorneys - number of attorneys who at at least 4 pieces per day
Frequency of certain RANGES of values e.g., 3 ppl aged 40-49
Histograms are similar to column chart except the x axis has NUMBER RANGES
Graphical Interpretation: Hybrid and double axis
Usually a bar and line chart with 2Y axis
> PAY ATTENTION to which axis to use (especially pernicious when the units are the SAME on both sides, like $)
Graphical Interpretation: Scatter plots
Allows us to analyze any RELATIONSHIPS between TWO VARIABLES (represented by the x and y axis)
> positive relationship
> negative relationship
> no relationship (almost horizontal line)
Trend lines make it very easy to identify relationship between two variables
(you can add a trend line to see relationships more clearly)
Scatterplots can aid us in making predictions
> e.g., temperature (x axis) vs number of customers (y axis). When temperature is 70 F, we find the region in the scatterplot WHERE MOST OF THE DOTS ARE. Then find the respective y axis (can also determine the max and min prediction based on actual data points near it)
HOWEVER: when working with scatterplots, do NOT make predictions about data OUTSIDE of the data that was measured, UNLESS language in the question states that we can extend the relationship (“extrapolation)
> extrapolation can be dangerous unless the QUESTION makes an explicit assumption that the trend will continue (who knows whether the opposite trend or unexpected trend could happen!)
Graphics Interpretation: Correlation
Line charts and scatter plots can depict relationships:
> positive relationship (as x increases, y increases; as x decreases, y decreases)
> negative relationship (as x increases, y decreases; as x decreases, y increases)
> no relationship (almost horizontal trend line)
Remember: Correlation =/ causation
Graphics Interpretation Question also will often present us with a LINE GRAPH(s) and ask us what type of correlation exists between the data sets (groups) represented by the line(s)
Detecting correlation among two or more lines?
> see if the lines MOVE TOGETHER from left to right (don’t need to move perfectly by the same amount every time, just IN THE SAME DIRECTION)
> Always TRACK INTERVALS along the X axis for EACH LINE (which denote a CHANGE IN DIRECTION for that line) –> observe whether those intervals line up and what happens to each line
KEEP TRACK OF TWO CORRELATIONS:
> 1) X variable and Y variable (generally, applies to both groups of data)
> 2) Group A vs Group B (moving together or moving in opposite directions over same interval)
—-> test by: as x increases, A (increases/decreases/stable) and B (increases/decreases/stable)
What happens if there are a few cases where data does NOT follow a trend?
> COULD still be a correlation between the two variables –> look at the GENERAL TREND
> sometimes though there could just be NO GENERAL
Bivariate data
Data point representing TWO VALUES (x, y)
> scatter plots
line charts
Graphics Interpretation: Scatter plots with double axis and PAIR of symbols
Keep careful track of which symbols tie to which axis AND GRIDLINES
Pair of points aligned vertically, and SHARE the same x axis
> Careful to ALIGN CORRECTLY
Graphics Interpretation: Bubble chart
Center of the circle = data point
Represents 3 variables –> x, y and size of the bubble
Graphics Interpretation: Pie charts
Be careful of complex pie chart questions involving:
> Pie chart
> Plus column or bar chart
> Plus tables
You need to expertly decide which data to use (you may not need to use them all!)
Supplemental data could be a double click into one slice of the pie chart or be something completed unrelated
Tips:
> might not need to calculate actual Total count to know actual count of a slice —> can creatively use other slices and their % (proportions of actual counts)
e.g., A = 12% * T
Looking for B = 36%*T
B/A = 3
So if A = 240, B = 3*240 = 720
Graphics Interpretation: Venn Diagrams
Venn Diagram Qs are similar to what we learned in PS
> often accompanied by probability questions —> be careful which region you are taking the values from
Be careful:
> AND vs OR
e.g., ketchup AND mayonnaise =/ ketchup only + mayonnaise only + overlap
Tip:
> use the sub-part view (4 or 8) to understand which sub-parts must be included or excluded
> for counting symbols:
» Go top to bottom (3 sets)
» Go left to right (2 sets)
Fractions (for estimating probability)
7/8
87.5%
Graphics Interpretation: Flowcharts
Describes a process using shapes, arrows, and text
e.g., decision process
Tip:
> read the accompanying description of how the flow chart works and LOOK AT THE CHART each time the text mentions a part of the chart
> Before reading the questions, develop a GENERAL UNDERSTANDING of how the chart works, but don’t seek to understand every detail of the chart
Table Insights
What: Data is provided in a table with COLUMNS that CAN BE SORTED (least to greatest)
> generally quantitative info stored in tables accompanied by an explanatory text (but can also be text in cells e.g., “High / Low”, “Yes / No”, “Name 1”)
> followed by 3 True or False statements
> “For each of the following statements, select True if the statement can be verified to be true based on the information provided. Otherwise select False” —–> does the data SUPPORT / validate the statement?
> True = data supports the statement
> False = data does not support the statement (incorrect or not enough data)
—> ADD a QUESTION MARK to evaluate the statement
Specially worded Qs: still two options per statement, but just need to read carefully
> “Less than the median” vs “Greater than or equal to the median”
Content:
> change
> percent difference
> average, median, range, standard deviation
> ratios (of values or counts) and proportions (part to whole involving values or counts)
> probability (often will be = criteria within subgroup / subgroup count)
> Correlation
> comparative ranking (highest, lowest, higher, lower)
> DS variations (does the table have enough info to support a conclusion?)
> nonstandard table analysis Qs (incl. verbal data)
How to solve:
(1) Read explanatory text to understand the table and come up with a single sentence that captures your understanding of the info in the tab
(2) Focus on high level understanding and look for most obvious patterns, relationships and trends
> don’t get into the details yet
(3) Go to questions and determine EXACTLY what data you need (without getting distracted by all the other info in the table)
Tips:
>Skillful sorting: not all Table Analysis qs will require using info from multiple sorting screens
> Estimation is key for qs that don’t need high level of accuracy for SAVING TIME (e.g., if N is the same in average calculations, just compare numerators, eyeballing data that is always greater to determine highest average, using fractions to compare sizes of ratios)
> Double check that you are using the right column (labels)
> sometimes the table with provide TOTALS at the BOTTOM of the columns –> can be useful when calculating average
> be careful NOT to count the total or mean rows as part of datasets
> COUNT CAREFULLY
> be careful when doing “mental filter” –> sorting using function, THEN mentally or manually sorting again (don’t expect sub-group of values to be sorted)
e.g., first sort for all chocolate chip cookies, then need to arrange manually prices from least to greatest
Table Analysis: Values that fit a specified criterion in a table
Often to answer a Table Analysis question, we must determine something about a SUBSET of data that fits a specified criterion
> e.g., sort by Country first, then find the median value of a specific country
(kind of like creating a “mental filter” using Sort –> helps GROUP relevant values together, then you have to manually sort again to find median)
Sorting can help GROUP relevant values together and focus our attention to the right subgroups
Table Analysis: Calculating mean
Tips
> sometimes the table will come with TOTAL ROW –> useful for calculating average (sum)
> you can sometimes also skip the actual calculation and EYEBALL the data (e.g., if you notice ALL the data in one average calculation is smaller than the data in another average calculation –> can conclude average is smaller)
** eyeball complicated additions first (lots of decimal sums) to see if you can find a short cut
Table Analysis: Calculating median
To calculate mean of criterion (‘mentally filtered’ via sorting –> we can use the sort functionality to shortlist our values
BUT THEN we have to MANUALLY sort the next level of values to calculate median (because we cannot apply more than 1 sort)
Table Analysis: Standard Deviation
You won’t have to calculate standard deviation, but instead be asked to make judgements about Standard Deviation
Recall: Standard deviation = HOW FAR values in a set are FROM THE MEAN
Can compare standard deviations of sets by:
> considering the SPACING of values in sets (range)
> considering how FAR THE VALUES in the sets are FROM the set’s MEAN
Dealing with EQUAL NUMBER OF TERMS in sets
> look at the MAX and MIN values and how far they are from the mean —> Range is a good proxy here (smaller range, smaller SD)
e.g., mean is 205 and max value is 220 while min value is 190. Each is within 15 units from the mean
> sum of ABSOLUTE DIFFERENCES | value - mean | is a good proxy (if it’s really hard to make a judgement about dispersion of values)
Also may be asked about CHANGES TO STANDARD DEVIATION IF values are changed
> depends on how far the value being removed or added is FROM the MEAN
> the MORE values CLOSE to or AT the mean in a data set ADDED to the set, the SMALLER the SD of the set
> the more values close to or at the mean REMOVED from a set, the LARGER the SD of the set
e.g., let’s say mean goals scored is 61 in a set containing {14, 54, 55, 61, 61, 62, 76, 79, 87}. If we remove 61, 61 and 62 from the set (values equal to or very close to the mean), the standard deviation would INCREASE (mean might stay around the same)
Note:
> EVEN BETTER if the table gives us the MEAN so we don’t even have to calculate it!
Table Analysis: Percents
1) Asked to calculate percent change or percent of based on VALUES in the table
> Time saving tactic estimation of % change –> if a value doubles, we know it is >20% –> don’t need to calculate % change
e.g., 300 to 1240
2) Asked to work with percentage information presented in the table
> be ware of statements that make conclusions about an ABSOLUTE NUMBER rather than a percentage (might not be supported)
e.g., just because one percentage value is greater than another percentage value, this does NOT necessarily mean the absolute count is greater
Table Analysis: Correlation
Very easy to check for correlation between variables by SORTING ONE column and seeing if any trends emerge on the OTHER COLUMN
> move in same direction, opposite direction or no correlation
Note:
> it’s okay if a FEW VALUES are not correlating in perfect unison with another variable
e.g., despite the # of beachgoers decreasing when the Average Temperature increases from 56 to 62, 45 to 52, and 40 to 44, there is STILL a strong positive correlation between Average Temperature and # of Beachgoers
Two Part Analysis
Asks you two select TWO CORRECT ANSWERS about scenarios or mathematical expression, with answers presented in a table
> Either (1) Quant-based (often algebra word problems) or (2) Verbal-based (resemble CR)
> mark 1 answer per column (possible for answers to be the same)
> “select values… that are jointly consistent with the given information. Make only two selections, one in each column”
> Quant is mostly word problems that have two questions using the SAME SCENARIO
Possible Quant topics:
> Word problem - one equation with two variables: Use TRIAL AND ERROR with answer choices to determine the correct values for the variables
> Word problem: Rate-time-distance
> Unit conversions with rates
> Word problems: work (shortest time / longest time)
> Word problems: ratios (remember ratios indicate MULTIPLES of something, can help you shortlist answers; three part ratios, hypothetical situations where you add or subtract to achieve desired ratio)
> Word problems: Percents
> Word problems: Statistics (average, median, range)
> General word problems (interest, how much do I pay, growth)
> Combinatorics
> Sequences
Possible verbal topics:
> CR-based
> RC-based
> Both CR and RC based
> Criteria-based logic qs **
> Order-based logic qs
Tips:
> use calculator smartly (e.g., ugly exponential growth problems where you have to figure out t in expnent, unit conversions requiring high precision, percents, compound interest)
> READ the COLUMN HEADERS CAREFULLY (so you don’t reverse the order of your answer!)—-> FOLLOW ORDER OF THE QUESTION STEM
Two part analysis: One equation with two variables
e.g., x + 8y = 42. If x and y are both positive integers, what are possible values for x and y?
x y
1
2
3
4
5
Word problem - one equation with two variables: Use TRIAL AND ERROR with answer choices to determine the correct values for the variables
> we know there must be ONE correct pair
easiest if you make ONE SIDE A SINGLE VARIABLE and other side an operation containing another variable –> plug in easily
Trial and error: x = 42 - 8y
> plug in values for y and see if matching x value appears in the table
ans: x = 2 and y = 5
Other tips
> to reduce number of possible answer choices quickly, use Even and Odd properties