Data model Flashcards

Question 1

Q

Data Model is most important

Answer

A

How to model data (as Data structures/objects) is important

Question 2

Q

Each “layer” component may model same data differently

each layer hides the complexity of the
layers below it by providing a clean data model

Answer

A

For “address”, at high level model as class/object Address, at DB level model as Table “Address” at HTTP layer model as “JSON/XML” etc

Data Model reflects the problem and its solution

Question 3

Q

Type of Data Models

Answer

A

(SQL) Relational Data Model : good for many to one (upside-down tree)
(noSQL) Document Data Model: good for one to many (tree)
(noSQL) Graph Data Model: good for many to many (DAG)
(noSQL) Full-Text Search Mode: good for full text search (log files)

SQL databases target use cases where there is few one to many relationship (mostly many to one) –> SQL
Document databases target use cases where data comes in self-contained documents and relationships between one document and another are rare. (I.e. such as log files) –> SQL, MapReduce Query
Graph databases go in the opposite direction, targeting use cases where anything is potentially related to everything. (social network) —> Graph QL
Full-Text search –> Indexing

Question 4

Q

NoSQL

Answer

A

Not Only SQL

Question 5

Q

polyglot persistence (similar to Polygot Programming)

Answer

A

Adoption of multiple persistence data models (SQL, noSQL etc) in an application.

Question 6

Q

Why NoSQL (Document/JSON model)
--Better for One-to-Many (tree structure)

1 person has many job categories (1 category with 1 category ID)

Answer

A

For a complicated object, the “Normalized” SQL split into multiple tables, using Foreign ID to connect the tables (relationships).

To reconstruct the object, a “denormalization” (joining by foreign Ids) need to happen.

If save a object as JSON text inside a column, then the SQL cannot be used to query the content inside JSON, leaving the job to application.

Thus we need NoSQL, that stores data as JSON data model and can use SQL to query

Question 7

Q

Document Database (NoSQL)
 vs
SQL tables (SQL)

SQL supports many-to-one, many-to-many (Normalization, JOIN) well, but do not support one-to-many well -> shifting the query of values inside JSON to application

NoSQL supports one-to-many natively (JSON tree) but does not support JOIN (many-to-one) well –> shifting the JOIN operation to user

Answer

A

***PRO
JSON object has better locality. All information about 1 person is in the same row/document.

JSON Document implies one-to-many tree structure

SQL tables: information of 1 person is scattered among multiple tables, needs a JOIN operation

***CON
Normalization avoids duplication, but implies many-to-one relationship.

A JSON document with normalization will have both
One-To-Many and Many-To-One relationships

Question 8

Q

What is Normalization in Database( SQL and NoSQL)

- avoid duplication
- single place change

Answer

A

Normalization in Database is avoid duplication of common constant values:

similar to C/C++ to do:
#define REGION_ID_192 "Philadelphia"
#define REGION_ID_190 "Seattle"

in C where we can
1) avoid duplication
2) change at 1 place will change all references to
e.e.
#define REGION_ID_190 “Greater Seattle”
will change all ppl referencing 190 to from “Greater Seattle” area.

Question 9

Q

To Normalize or not to Normalize?

Answer

A

To #define or not #define

Question 10

Q

Normalization implies Many-to-One

Answer

A

Many ppl reference REGION_ID_192 because many ppl are from Philadelphia

Question 11

Q

A JOIN-free document has no reference to entity IDs (no use of #define values)

Answer

A

A JOIN-free document today may not be JOIN-free tomorrow, if new feature added forcing normalization (using of entity IDs)

Question 12

Q

Since NoSQL weak at JOIN forced by Normalization, the debate of NoSQL vs SQL is also a debate of Normalization vs No Normalization

Answer

A

Normalization is An age old debate

Question 13

Q

Data Locality: Document Model vs Relational Model

Answer

A

Document Model: All data in one (json) document ==> In one storage sector ==> Locality in disk, in memory etc

Relational Model: Data is “shredded” into multiple tables. Need to JOIN to get the full data.
==> Multiple tables ==> Multiple storage sectors ==> Not necessarily in close vicinity in disk or memory

Question 14

Q

Data Locality: Document Model vs Relational Model

Answer

A

Document Model: All data in one (json) document ==> In one storage sector ==> Locality in disk, in memory etc ==> less disk seek, page swap etc

Relational Model: Data is “shredded” into multiple tables. Need to JOIN to get the full data.
==> Multiple tables ==> Multiple storage sectors ==> Not necessarily in close vicinity in disk or memory ==> more disk seek, page swap etc

Question 15

Q

Document Model Locality Cons:

1) Read Whole use Part –> a waste
2) In-place update may require relocation of the document (when it exceeds sizes)

Answer

A

some SQL also support locality

Question 16

Q

SQL is a declarative query language (vs imperative language like C/Java/Python etc)

SELECT (cols) FROM (table) WHERE (cond) {TRANSFORM}

e.g
select * from * where a=b SORT BY a ASEC

Another declaration language is CSS
# highlight background of blue of any paragraph that is direct child of a <li> element whose class is set to "selected" like </li><li class="selected">
li.selected > p {
  background-color: blue;
}</li>

Answer

Study These Flashcards

A

An imperative language tells the computer to perform certain operations in a certain order (exactly as the order of the statements)

In a declarative query language, like SQL or relational algebra, you just specify WHAT not HOW

I.e. specify the pattern of the data you want—what conditions the results must meet, and how you
want the data to be transformed (e.g., sorted, grouped, and aggregated)—but not how to achieve that goal.

The SQL Optimizer will decide HOW, which allows PARALLEL execution (mapreduce for example)

Question 17

Q

SQL is a declarative query language (vs imperative language like C/Java/Python etc)

SELECT (cols) FROM (table) WHERE (cond) {TRANSFORM}

ORDER of execution:
WHERE–> TRANSFORM–>SELECT

e.g
select * from * where a=b SORT BY a ASEC

Another declaration language is CSS
# highlight background of blue of any paragraph that is direct child of a <li> element whose class is set to "selected" like </li><li class="selected">
li.selected > p {
  background-color: blue;
}</li>

Answer

Study These Flashcards

A

An imperative language tells the computer to perform certain operations in a certain order (exactly as the order of the statements)

In a declarative query language, like SQL or relational algebra, you just specify WHAT not HOW

I.e. specify the pattern of the data you want—what conditions the results must meet, and how you
want the data to be transformed (e.g., sorted, grouped, and aggregated)—but not how to achieve that goal.

The SQL Optimizer will decide HOW, which allows PARALLEL execution (mapreduce for example)

Question 18

Q

SQL engine can be implemented on top of MapReduce (for parallel execution), Similarly Pandas/Spark etc call all use MapReduce for parallel execution on data processing.

Answer

Study These Flashcards

A

MapReduce is a fairly low-level programming model for distributed execution on a cluster of machines. Higher-level query languages like SQL can be implemented as a
pipeline of MapReduce operations

Question 19

Q

ACID-compliant database (regardless SQL or noSQL)

Answer

Study These Flashcards

A

(atomicity, consistency, isolation, durability)

Data model Flashcards

(19 cards)