Chapter 3: performing the Test Plan and Analyzing the Results Flashcards
(37 cards)
What are the 4 main types of data analytics?
Descriptive
Diagnostic
Predictive
Prescriptive
What is Descriptive Analytics?
procedures that summarize existing data to determine WHAT HAS HAPPEND IN THE PAST
What are common tools in Descriptive Analytics?
Summary statistics (mean, median, etc.)
Data reduction/filtering (fuzzy matching)
summary statistics
describes set of data in terms of location, range, shape
helps to quicly see how data is distributed
(desriptive ) data reduction or filtering
reduce observations, focus on relevant items –> reducing large set in samller set
helps to isolate high risk items, so can focus on what matters
ex: filtering large vendor list to only include those with transactions over 10k
What is Diagnostic Analytics?
exploring current data to determine WHY SOMETHING HAS HAPPENED, typically comparing data to benchmark
What are the 4 key methods in Diagnostic Analytics?
Profiling
Clustering
Similarity Matching
Co-occurrence Grouping
(diagnostic) profiling
identifies typical behaviour of individual/group by compiling summary statistics
–> COMPARING INDIVIDUALS TO POPULATION
shows typical behaviour of a group –> helps to detect abnormal patterns (for ex: fraud)
(diagnostic) clustering
identifies groups that have common underlying characteristics –> reveal hidden relationships
Grouping data into clusters based on natural patterns, without pre-defined labels.
finds groups without prior labels
(diagnostic) similarity matching
measures how alike two items are and used to group data in clusters
find things that look alike
for ex: detect suspicious entries or fraud
(diagnostic) co-occurence grouping
find items that often appear/happen together
“people who buy x, also buy y”
helps reveal recurring transactional patterns
What is Predictive Analytics?
It uses historical data and models to forecast future outcomes.
procedures used to generate a model that can determine WHAT IS LIKELY TO HAPPEN IN FUTURE
What are 3 common predictive methods?
Regression
Classification
Link Prediction
(predictive methods) regression
A: A statistical model used to predict a number (e.g., sales, income) based on other variables.
predicts a number (ex:sales) show how variables are related
good when, high R2 (close to 1)
statistically significant coeffcients (p<005)
(predictive methods) classification
predicts class/category for new observation based on manual identification of classes from previous observations
(predictive methods) link prediction
PREDICTS which new connections or relationships are likely to form in network based on existing data
predicts connections between items –> like suggesting friend a social media
Q: What is Prescriptive Analytics?
procesdures to identify the best possible options for WHAT SHOULD BE DONE IN THE FUTURE
What tools are used in Prescriptive Analytics?
Decision support systems
= helps users make decisions by combining data and analysis to recommend best action
machine learning and AI
= they recommend a course of action + model adapts to new external data
Decision support systems
rule based systems, that gather data and recommend actions based on input
for ex: cashflow forecasting and management toolsma
(prescriptive) machine learning and AI
learns from data to improve suggestions
helps continuously scanning transactions for example flagging suspicous vendors
Q: (Similarity Matching) What is fuzzy matching?
finds text entries that are similar but not exactly the same
technique for detecting suspicious records in imperfect data
Q: (Similarity Matching) When do we use fuzzy matching?
uses probability to identify likely simir data
when data has inconsistencies, with imperfect data
like “123 Main St.” vs. “123 Main Street.”
Q: (Similarity Matching) What’s the risk of using a high fuzzy matching threshold?
also for low
A: Fewer false positives but more false negatives (you miss real matches).
A: More false positives (many things look matched, but aren’t real).
Q: (Classification) What is pruning in decision trees?
remove branches from decision tree to avoid overfitting
model works to good in training data, will not work great in test data