Chp.5 Data Generation in Source systems Flashcards

1
Q

Draw the data engineering lifecycle

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A file is a ..?

A

sequence of bytes stored on a disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

source systems produce..?

A

data in several ways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Get familiar with your source system and how

A

it generates data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

files may store…?

A

local parameters , events, logs, images, and audio.

Elaborate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

files are the universal medium of …?

A

data exchange

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the major file formats you will come across?

A

excel, csv, txt, json, xml

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are standard ways for exchanging data between systems.

A

API’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

typically an application database is an …?

A

online transaction processing system - OLTP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

OLTP are referred to as ….?

A

transactional databases. Why?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

OLTP db’s work well as …?

A

application backends when thousands or even millions or users might be interacting. Why ?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does ACID stand for?

A

atomicity, constancy, isolation, and durability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

With respect to ACID what does consistency relate to?

A

Consistency means that any database read will return the last written version of the retrieved item. Why?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is a atomic transaction?

A

it is a set of several changes that are committed as a unit. Why?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

in the data engineering : fundamentals data application stands for what?

A

applications that hybridize transactional and analytics workloads. Why?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does CDC stand for?

A

Change Data Capture

17
Q

What is CDC ?

A

it is method for extracting each change event. (insert, update, delete). Why?

18
Q

CDC is often used to … ?

A

replicate between databases in near real time or create an event stream for downstream processing. Why?

19
Q

What does OLAP stand for?

A

online analytical processing system. Why ?

20
Q

What is the difference between an OLAP and OLTP?

A

OLAP is for doing large scale analytics and OLTP is for doing large scale reads and write of individual records. Why?

21
Q

Typically OLAP are …?

A

inefficient in handing look ups of individual records. Why?

22
Q

A log captures ….?

A

information about events that occur in systems.

23
Q

A log captures ….?

A

information about events that occur in systems. Why?

24
Q

Logs are a …?

A

rich data source, potentially valuable for downstream data analysis.

25
Q

What are three common ways logs are encoded?

A
  1. Binary-encoded logs
  2. Semistructured logs
  3. Plain-text (unstructured ) logs
26
Q

Relational DB’s often store…?

A

event log stored directly on the database server that can be processed to create a stream.

27
Q

All logs track …>?

A

events and metadata

28
Q

What is log resolution?

A

it referred to the amount of event data stored/captured in the log.

29
Q

At a minimum a log should capture…?

A

who , what, and when.

30
Q

describe binary-encoded logs:

A

These logs encode data in a custom compact format for space efficiency and fast I/O. Why?

31
Q

Tables are typically indexed by a ..?

A

primary key

32
Q

what is a primary key?

A

a unique field for each row of the table

33
Q

What does RDBMS stand for?

A

relational database management system

34
Q

what is the most common db for application backends?

A

relational database management system