DATA ANALYTICS Flashcards

(39 cards)

1
Q

What is data analytics?

A

The process of examining raw data to extract meaningful insights and draw conclusions

Involves collecting, cleaning, transforming, and analyzing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is descriptive analytics?

A

Descriptive analytics answers ‘what happened?’ and is used to describe outcomes to stakeholders

Focuses on summarizing past data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does diagnostic analytics investigate?

A

Diagnostic analytics answers ‘why things happened?’ and digs deeper to find causes behind trends and outcomes

Involves identifying anomalies and collecting related data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the focus of predictive analytics?

A

Predictive analytics answers ‘what will happen in the future?’ using historical data to identify trends

Involves statistical and machine learning techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does prescriptive analytics suggest?

A

Prescriptive analytics answers ‘what should be done?’ and recommends data-driven decisions

Relies on machine learning strategies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is data?

A

Data refers to raw, unprocessed facts, figures, or values that have no inherent meaning until analyzed

Exists in various forms such as numbers, text, images, or symbols.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define a dataset.

A

A structured collection of related data points, typically stored in databases, spreadsheets, or tables

Arranged in rows and columns for easier analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is information?

A

Information is data that has been processed, analyzed, or organized in a meaningful way for decision-making

Provides insights and context.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Name a tool for data visualization.

A

Microsoft Power BI

A business intelligence tool for data visualization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the Excel function SUM() do?

A

Adds a range of numbers

Example: =SUM(A1:A5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a variable in data analysis?

A

A specific characteristic or attribute of an observation, represented as a column in a dataset

Can be numerical or categorical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an observation?

A

An observation represents a single unit of analysis, such as a person, transaction, or event

Corresponds to a row in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define a categorical variable.

A

A categorical variable represents qualitative data consisting of distinct categories or groups without numerical meaning

Examples include ‘Gender’ or ‘Payment Method’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a nominal variable?

A

A nominal variable has categories with no meaningful order or ranking

Examples: ‘Eye Color’ or ‘Country of Residence’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an ordinal variable?

A

An ordinal variable has categories with a meaningful order or ranking, but differences are not uniform

Example: ‘Customer Satisfaction’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What defines numerical variables?

A

Numerical variables represent measurable quantities and can be expressed in numbers

Allow mathematical operations.

17
Q

Define a continuous variable.

A

A continuous variable can take an infinite number of values within a range

Often represents measurements with decimal components.

18
Q

What is a discrete variable?

A

A discrete variable represents countable, distinct values without fractional components

Examples: number of students in a classroom.

19
Q

What is data collection?

A

The process of gathering and evaluating information from multiple sources to address research problems

Essential for research and decision-making.

20
Q

What is primary data collection?

A

Primary data collection involves gathering original data directly from the source

Methods include surveys, interviews, observations, experiments, and focus groups.

21
Q

What are published sources in secondary data collection?

A

Data that has already been collected and made publicly available by authors or organizations

Includes books, articles, and industry reports.

22
Q

What is random sampling?

A

A technique where every individual in the population has an equal chance of being selected

Aims to reduce selection bias.

23
Q

What is stratified sampling?

A

Stratified sampling divides the population into distinct groups and selects a random sample from each group

Ensures proportional representation.

24
Q

What is structured data?

A

Structured data refers to organized information stored in a predefined format, typically in rows and columns

Commonly found in relational databases.

25
Define unstructured data.
Unstructured data lacks a predefined format and includes text, images, and videos ## Footnote Requires advanced techniques for analysis.
26
What is ETL?
ETL stands for Extract, Transform, Load, a process for collecting data from various sources, transforming it, and loading it into a database ## Footnote Used when data must be cleaned before storage.
27
What is ELT?
ELT stands for Extract, Load, Transform, storing raw data first and transforming it later on demand ## Footnote More flexible for data processing.
28
What does ETL stand for?
Extract, Transform, Load ## Footnote ETL is a process for collecting data from various sources, transforming it, and loading it into a database.
29
What is the primary use of ETL?
When data must be cleaned and standardized before storage ## Footnote ETL is suited for preparing data for analysis and storage in databases.
30
What does ELT stand for?
Extract, Load, Transform ## Footnote ELT is a process where raw data is stored first and transformed later on demand.
31
When is ELT preferred over ETL?
When working with modern cloud systems and wanting to store raw data first ## Footnote ELT is more flexible and efficient for handling big, complex, or unstructured data.
32
What is the first step in the ETL process?
Extract ## Footnote In this step, data is located and gathered from various sources.
33
What types of sources can data be extracted from?
* Relational databases * NoSQL databases * Flat files * XML files * Other formats ## Footnote Data can come from a wide variety of sources for analysis.
34
What is the purpose of the Transform step in ETL?
To convert source data to the type needed for the target database ## Footnote This includes cleaning, joining, aggregating, sorting, and applying validation rules.
35
What tasks are involved in the Transform step?
* Joining data from several sources * Aggregating data * Sorting data * Calculating new values * Applying validation rules ## Footnote These tasks ensure that the data is properly formatted and ready for analysis.
36
What is data cleaning in the Transform step?
Removing blank records and standardizing formats ## Footnote Data cleaning ensures the consistency and accuracy of the data.
37
What is the final step in the ETL process?
Load ## Footnote This step involves loading the transformed data into the database for querying.
38
What factors can affect the Load process?
* Types of source data * Type of target database * Type of querying ## Footnote The loading process can vary widely based on these factors.
39
What rules are applied during the Load step?
* Uniqueness of data * Consistency of data * Mandatory fields not being empty ## Footnote These rules help ensure successful data loading and querying.