Common Questions Flashcards by Beccy Smedley

What is data analysis?

The practice of gathering, cleaning, transforming, and interpreting data to extract meaningful insights and use it to make decisions.

How well did you know this?

Not at all

Perfectly

Explain the main aspects of data analysis.

Data collection - collecting raw data from numerous sources e.g. external datasets, surveys, internal databases etc.
Data transformation - cleaning, standardisation (data type alignment), data enrichment (adding additional data to existing records) e.g. historical/ geographical, data aggregation (from different sources), data mapping (matching fields/ cols from different sources), data partitioning (one dataset into smaller ones).
EDA - Exploratory Data Analysis, aimed at studying and summarising the characteristics of data. The main methods to do this are statistics and data visualisations: Statistics provide brief informational coefficients that summarise data. e.g. mean, median, standard deviation, and correlation coefficients. Data visualisation is the graphical representation of data; some graphs will be more useful than others e.g. a boxplot is a great graph to visualize the distribution of data and split extreme values. Applying mathematical/ statistical techniques to data to draw conclusions.
Showcasing results e.g. using Tableau, Power BI, Python packages such as Matplotlob, Seaborn etc., R packages such as ggplot2, Lattice etc.

How well did you know this?

Not at all

Perfectly

How do data analysts differ from data scientists?

Data analyst - responsible for collecting, cleaning, and analysing data to help make better decisions. Visualisation tools are used to identify trends, and reports/ dashboard may be made to communicate findings.

Data scientists - responsible for creating/ implementing machine learning and statistical models on data which are used to make predictions and enhance business processes.

How well did you know this?

Not at all

Perfectly

Give examples of different tools used for DA.

Spreadsheet software e.g. Excel, Google Sheets.
Database Management Systems to store, manage, and organise large datasets e.g. MySQL, SWL Server, PostgreSQL.
Programming languages e.g. Python, R.

How well did you know this?

Not at all

Perfectly

What is data wrangling?

AKA data munging.
Involves cleaning, transforming, and organising raw, unstructured data in a usable format, to improve the dataset structure and quality.
Can involve data cleaning, data transformation, data integration, data restructuring, data enrichment, and quality assurance.

How well did you know this?

Not at all

Perfectly

What is data cleaning?

Identifying and removing errors, inconsistencies, and missing values from datasets.

How well did you know this?

Not at all

Perfectly

What is data transformation?

Transform the structure, format, or values of data as per analysis requirements which may include normalisation, scaling, and encoding categorical values.

How well did you know this?

Not at all

Perfectly

What is data restructuring?

Reorganising data to make it more suitable for analysis, e.g. reshaping into different formats or creating new variables by aggregating features at different levels.

How well did you know this?

Not at all

Perfectly

What is data enrichment?

Data is enriched by adding additional relevant info e.g. combined aggregation of numerous features, external data etc.

How well did you know this?

Not at all

Perfectly

What is quality assurance?

Ensuring data meets quality standards and is fit for analysis.

How well did you know this?

Not at all

Perfectly

What is descriptive analysis?

Used to describe questions e.g. what’s happened previously, what are key characteristics of data?
Identifies patterns, trends, and relationships within data.
Uses statistical measures, visualisations, and exploratory DA techniques to gain insight.
Concerned with historical perspective, summary statistics (mode, mean, median), visualisations, trends, and exploration.

How well did you know this?

Not at all

Perfectly

What is predictive analysis?

Uses past data and applies statistical and ML models to identify patterns and make future predictions.
Concerned with future projection, model building (using historical data), validation and testing (using unseen data to asses accuracy), feature selection (identifying variables that influence the predicted outcome), and decision making (by providing insights).

How well did you know this?

Not at all

Perfectly

What is univariate, bicarbonate, and multivariate analysis?

Univariate - analyses one variable at a time to understand measures of central tendency (mean, median, mode etc.) , measures of dispersion (range, variance), and graphical methods e.g. histograms.
Bivariate - analyses relationship of 2 variables and understand how one car is related to another, how strong the correlation is etc. using scatter plots, contingency tables etc.
Multivariate - analyses relationship between 3 or more vars to identify patterns/ clusters/ dependencies, using cluster analysis, regression analysis etc.

How well did you know this?

Not at all

Perfectly

Give examples of some of the most popular DA tools.

Tableau - visualisation and dashboard creation from numerous data sources.
Power BI - data visualisation.
Qlik Sense - data visualisation.
SAS - advanced analytics, multivariate analysis, and business intelligence.

How well did you know this?

Not at all

Perfectly

What are the steps taken to analyse a dataset (generic)?

Ensure problem/ objective is clear.
Collect data from various sources e.g. surveys, tests, databases, web scraping, and ensure it’s representative/ accurate.
Data preprocessing/ data cleaning - fixing missing values, removing blanks/ duplicates/ extreme outliers, formatting, redefine columns etc.
EDA (Exploratory Data Analysis) - apply different graphical/ statistical approaches to analyse data and discover trends/ patterns, identify outliers, and gain initial insights.
Data visualisations - provide visual representation of complex info and patterns, which enhances understanding and allows communication to stakeholders.

How well did you know this?

Not at all

Perfectly

Why is exploratory data analysis important?

Study These Flashcards

EDA - investigating and understand data through graphical and statistical techniques.
Helps identify trends and understand relationships between variables.
Non-parametric (doesn’t make assumptions about the dataset).
Can get deep understanding of variable relationships, patterns, and nature of data.
Can analyse quality of the dataset through univariate analysis e.g. mean, mode, median, quartile range and identify patterns of single rows of the dataset.
Can find the most I bc lie tail feature of the dataset via correlations, covariance, and bivariate/ multivariate plotting.
Can identify outliers using box plots.

Study These Flashcards

What are key considerations when undertaking data transformation?

Study These Flashcards

Data Profiling: Understanding the characteristics of the data before transformation.
Mapping: Defining how to map data from different sources.
Transformation Rules: Implementing rules to transform data into the desired format.
Testing and Validation: Ensuring the accuracy of the transformed data.
Iteration and Refinement: Continuously improving the transformation process based on feedback.

Study These Flashcards

What are the four types of DA?

Study These Flashcards

Descriptive analysis - what happened? Summarises historical data to understand what has happened previously.

Diagnostic analysis - why did it happen? Comparing different data sets to understand an outcome.

Predictive analysis - what will happen? Can use statistical models and forecasting techniques to understand the future, and involves using data from the past to predict what might happen in the future.

Prescriptive analysis - how can we make it happen? Helps predict future outcomes and suggest actions to take to benefit e.g. using machine learning

What is exploratory analysis?

Study These Flashcards

Used to understand main characteristics of data set.

Often used at beginning of DA process to summarise main aspects of the data/ check for missing data/ test assumptions.

Can involve visual methods such as scatter plots, histograms, and box plots.

What is regression analysis?

Study These Flashcards

Statistical method used to understand the relationship between a dependent variable and one or more independent variables.

Commonly used for forecasting, time series modeling, and finding the causal effect relationships between variables.

E.g. linear regression.

What is factor analysis?

Study These Flashcards

Technique used to reduce a large number of variables into fewer factors.

The factors are constructed in such a way that they capture the maximum possible info from the original variables.

Often used in market research, customer segmentation, and image recognition.

What is cluster analysis?

Study These Flashcards

Technique used to group a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups.

Often used in market segmentation, image segmentation, and recommendation systems.

E.g. hierarchical clustering and k-means clustering.

What is cohort analysis?

Subset of behavioural analytics that takes data from a given dataset and groups it into related groups for analysis. These related groups, or cohorts, usually share common characteristics within a defined time span. Often used in marketing, user engagement, and customer lifecycle analysis.

What is time series analysis?

Statistical technique that deals with time series data, or trend analysis. Used to analyse the sequence of data points to extract meaningful statistics and other characteristics of the data. Often used in sales forecasting, economic forecasting, and weather forecasting.

What is sentiment analysis?

AKA opinion mining. Uses natural language processing, text analysis, and computational linguistics to identify and extract subjective information from source materials. Often used in social media monitoring, brand monitoring, and understanding customer feedback.

What’s the purpose of normalisation in a database?

Helps organise data efficiently by ensuring consistency/ reducing redundancy. E.g. instead of storing a customer’s name multiple times, it could be stored once in a table and linked using primary and foreign keys, which makes querying more straightforward.

Common Questions Flashcards

(28 cards)