Unit 3 Flashcards

WOOHOO!!! (16 cards)

1
Q

What is Data Science?

A

multi-disciplinary field that uses scientific methods,
processes, algorithms, and systems to extract knowledge and
insights from structured, semi-structured and unstructured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Scientist:

A

A professional that collects large amounts of data using
analytical, statistical, and programmable skills.

Responsible for using data to develop solutions tailored to meet
the organization’s unique needs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Vrai ou faux: Data scientists may write programs and develop new algorithms.

A

Vrai

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Scientist skills

A

Programming,

Communication,

Organizational,

Mathematical,

Data analysis,

Problem solving,

Analytical skills.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Quantitative and Qualitative data

A

Quantity ~ numerical
Descriptive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Base of the DIKW pyramid

A

Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What id data quality

A

check condition of data

Measured in: completeness,
uniqueness, consistency, timeliness, validity and accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Knowledge and wisdom

A

Knowledge is understanding facts(patterns), while wisdom is the ability to apply them thoughtfully and effectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data processing

A

data ~ info

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data processing cycle

A

Collection:

Prep(data cleaning) : removing unnecessary and inaccurate data

Input: data ~ machine readable form

Processing: raw data is processed using machine learning and AI to make a good output

Output: data to user readable format

Storage: data and metadata for quick retrieval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Alternative cycle of data processing

A

Input: raw data ~processed

Processed: processing by a suited method

Output: the out come of the process and provides info

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Types of data processing

A

Done based on source and steps taken to generate output

Batch: collected in large amounts and in batches…accumulate and process(weekly or monthly)

Online: fed into the CPU ASAP rocky!!! when it becomes available so continuous processing(barcode)
Real-time: within seconds , small amounts of data (ATM)

Multiprocessing: data is broken down into frames within the same computer…parallel processing (weather forecasting)

Time-sharing: allocates data in time slots to several users simultaneously

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a data lake?

A

Centralized storage system that holds vast amounts of raw, unprocessed data in its native format, allowing it to be structured, semi-structured, or unstructured, and making it accessible for analysis, processing, and other uses when needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data warehouse?

A

centralized storage system designed to store structured and processed data from multiple sources, optimized for querying and analysis to support business intelligence and decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data warehouse and data lake difference

A

Data Warehouse:

Stores structured data (organized into tables and schemas)

Data is processed and cleaned before storage (ETL: Extract, Transform, Load)

Optimized for business analytics and reporting.

Example: Sales records used for generating reports and dashboards.
Data Lake:

Stores raw data in its original form (structured, semi-structured, or unstructured).

Data is processed on demand, not before storage (ELT: Extract, Load, Transform).

Flexible for big data analysis, AI, and machine learning.

Example: Sensor data, images, or logs waiting for later use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly