Lecture 12 & 13 Flashcards

1
Q

What is NoSQL Database?

A

Databases that do not follow the traditional Relational Model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do we need NoSQL Databases?

A

To help with Big Data, in order to help with Horizontal and Vertical scaling of the Database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Horizontal scaling?

A

Adding more nodes to a system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Vertical scaling?

A

Adding more resources to a node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Availability?

A

a database always responding to queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Consistency?

A

a database gives the same response at queries happening at the same time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is Distributed Databases?

A

Splitting computational Load among different nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Cluster?

A

The set of computers that co-operate to manage the Database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Two-phase Commit?

A

The algorithm used to enforce consistency during transactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a Transaction?

A

a set of changes in a database that is treated as a single change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does MVCC stand for?

A

Multi-Version Concurrency Control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is MVCC?

A

it is a method that stores data in various versions to ensure availability and recovery of data from a partition by reconciling the single databases with revisions (data isn’t replaced, just given a new revision number)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Partition/Node Failure?

A

When a node in a cluster fails causing it’s data to become outdated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is MVCC used for?

A

Coarse-grained DBMS models like document-oriented DBMS e.g. CouchDB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Sharding?

A

The Horizontal partitioning of a databse i.e. the rows are partitioned in a subset that is stored on different servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why do we use Sharding?

A

Allows better performance by distributed computer loads.

17
Q

What kinds of Sharding are there?

A

Hash Sharding and Range Sharding

18
Q

What is Hash sharding?

A

distributing rows evenly across the cluster

19
Q

What is Range Sharding?

A

Similar rows (e.g. tweets from same area) stored on the same node

20
Q

What is Replication?

A

The action of storing the same row on different node to have fault-tolerence

21
Q

What are MapReduce Algorithms?

A

A paradigm suited to parallel computing of the Single-Instruction, Multiple-Data type.

22
Q

How does MapReduce Work?

A
  1. Map: distribute data across machines

2. Reduce: hierarchically summaries data until a result is obtained.

23
Q

Example of MapReduce?

A

See Billy Boy Diagram in Lecture 12

24
Q

What is a document database? (e.g. CouchDB)

A

A Type of Database where data is stored in documents expressed as JSON

25
Q

How is data queried, added, modified and deleted?

A

HTTP requests: the protocol every web-browser uses to download webpages from a server
GET method: retrieve data from server
POST method: Create content on server
PUT method: Update/create data on server when ID is known
DELETE method: remove data from server.

26
Q

What is id of a document?

A

is a unique identifier that can be specified, by default is generated by CouchDB and is guaranteed to be unique.

27
Q

What is revision number?

A

An identifier to differ the different versions of a document. every database instance will pick up the same revision as the “live” version of the document

28
Q

What does UUID stand for?

A

Universal Unique IDentifier

29
Q

What is UUID?

A

is a sequence of letters and number that are guaranteed to be unique

30
Q

What happens when conflicts occur in CouchDB?

A

When a document recieves different updates then two different revisions are added. however only one revision is returned. The “wining” revision is guaranteed to be the same on any node of the cluster.

31
Q

Why is replication useful?

A

Can be combined with sharding with the objective of maximizing availability while maintaining a minimum level of data safety.

32
Q

Why is Map Reduce useful?

A

As it is horizontally scalable, MapReduce is the tool of choice when operations on big datasets are to be done