Data Platforms Flashcards

(29 cards)

1
Q

Data-Driven Innovation

A

Refers to the use of analytics to drive innovation and business value from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Analytics

A

In this context, we mean the different types of business intelligence initiatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Advanced Analytics

A

Semi-autonomous examination of data to get deeper insights (Machine Learning)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Augmented Analytics

A

Augment how people explore data with the incorporation of AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Database

A

Structured and persistent collection of information with efficient retrieval and modification (relational databases)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Warehouse

A

Subject oriented collection of data that supports decision making processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

OLTP

A

Constant queries and updates, short term data retention. (Accounting database, online retail transactions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

OLAP

A

Periodic large updates, complex queries for reporting/decision support

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data Lake

A

Central repository system where data is kept in various original formats, unstructured, semi-structured, structured and queried only when needed.

Supports storage, processing and analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What kind of users use Data Warehouses vs Data Lakes

A

Business analysts

Vs

Data scientists, data developers, and business analysts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What kind of users use Data Warehouses vs Data Lakes

A

Business analysts

Vs

Data scientists, data developers, and business analysts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Platform

A

Meets end-to-end data needs such as acquisition, storage, preparation, delivery, governance and security so users ONLY focus on functional aspects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we prevent DP from becoming a swamp?

A

We MUST govern data transformations and leverage metadata and maintenance to keep control over data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are 5 areas of data management? (PCPED) Plankton chokes Patrick every day

A
  1. Data provenance
  2. Compression
  3. Data profiling
  4. Entity resolution
  5. Data versioning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data Provenance

A

Descriptions of origins of data and process by which it arrives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Data Provenance Granularity

A

Fine-grained (instance level)
Coarse-grained (schema level)

Tracking items vs dataset transformations

17
Q

Three levels (types) of data provenance (EAA)

A

Entity (physical/conceptual thing)
Activity (what generated the thing)
Agent (associated with the activity)

18
Q

Compression

A

Concise representation of a dataset in a comprehensible manner

19
Q

Data profiling

A

Analyzing the structure and quality of a dataset ?

Scanned for metadata, completeness and uniqueness of columns, keys and foreign keys

20
Q

Two things data profiling can help with

A
  1. Optimizing queries
  2. Cleansing (errors in data)
21
Q

Entity resolution

A

Find records that refer to the same entity

22
Q

Version Control

A

Managing changes to computer programs/data collections with a code as the version number.

23
Q

Data versioning

A

Version control that extends to data models, model parameter tracking and performance comparison

24
Q

Data lakehouse

A

Flexibility of data lakes and structure of data warehouses (ACID transactions) to combine BI and ML

Vendor lock in…?

25
Data Platform Engineer job description
Implement cloud technologies within data structure of business, in charge of purchasing decisions for cloud services and approval of data architectures
26
DevOpS
Enable software DEVeleopment and operations teams to accelerate delivery with collaboration and iterative improvement
27
DataOps
Use automation to shorten data analytic lifecycle
28
Data fabric
Seamless data access and sharing in distributed environment Fabric is smooth, unified surface
29
Data mesh
Decentralized, distributed governance and domains owning data products. Mesh is a grid like surface with interconnected “nodes”/“domains”