CS165 Flashcards
(79 cards)
What is the PPDAC Model
Structural approach to carry out investigative research. Explorative and inquisitive.
What is the 1st stage of the PPDAC Model
Problem - requires underlying preliminary understanding of the area, identify the kind of question to ssolve.
What is the 2nd stage of the PPDAC Model
Plan - What you need to do to address that problem
What is the 3rd stage of the PPDAC Model
Data - Collection, processing, management
Tabular
2D table, each row an observation and each column a measurement
Structured
Each observation represented by a dictionary of keys and values
Semi-structured
Not all records are represented by the same keys
What is the 4th stage of the PPDAC Model
Analysis - Visualising the data, develop initial questions , communicate findings
Bar charts
Use bars to represent counts of categorical features.
Histogram
Shows distribution (frequency of occurrence)
across a range, with values binned into brackets.
Scatter Plot
Shows relationships between two variables within multivariate data.
Top-down
Applies previous knowledge to data,
commonly via rules or choices.
Bottom-up
Builds knowledge from the data, allowing
a system to learn its own behaviour based on what it
observes.
What is the 5th stage of the PPDAC Model
Conclusion - Summarise and communicate , Reflect and look forward.
What is the PPDAC model used for in data science?
To structure investigative research: Problem, Plan, Data, Analysis, Conclusion.
How does the CRISP-DM model differ from PPDAC?
CRISP-DM is more business-centric, includes deployment and business understanding.
What is data science?
An interdisciplinary field combining statistics, computing, and domain expertise to extract insights from data.
What are the two main categories of data?
Quantitative (numerical) and Qualitative (categorical).
What is the difference between nominal and ordinal data?
Nominal has no order (e.g. color), ordinal has order (e.g. ratings).
Define a scalar, vector, matrix, and tensor in terms of data representation.
Scalar: single value, Vector: 1D array, Matrix: 2D array, Tensor: multi-dimensional array.
What are the three measures of central tendency?
Mean, Median, and Mode.
What is standard deviation and why is it useful?
Measures spread of data around the mean; useful for understanding variability.
How do you calculate the IQR?
IQR = Q3 - Q1, where Q1 and Q3 are the first and third quartiles.
When is a boxplot useful?
For visualising the distribution, spread, and outliers of a dataset.