CIS275 - Chapter 9: NoSQL Databases Flashcards

1
Q

Structured data created within an organization, with sizes ranging from gigabytes to terabytes, is called _____.

A

transactional data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data generated by new internet and multimedia applications is commonly called ____ and differs from transactional data.

A

big data

big data and differs from transactional data in four ways:

Volume. Typical size ranges from terabytes to petabytes (million billion bytes), occasionally reaching exabytes (billion billion bytes).

Velocity. Big data is generated at extremely high rates. Facebook users upload roughly a billion photos per day, or 10,000 per second. Twitter generates roughly 6,000 tweets per second. Click rates on popular websites can be significantly higher.

Variety. Variety means both unstructured and rapidly changing data types. Unstructured data refers to information embedded in complex data types like images, video, GPS coordinates, and natural language. Rapidly changing data means the information content of records vary greatly, as in data collected from social media. Both unstructured and rapidly changing data are common in big data.

Veracity. Transactional data is typically created by an organization’s employees or trusted partners. Big data is often generated by the general public. Consequently, the accuracy of big data varies much more than transactional data.

  1. Variety refers to unstructured data, such as text files, video, web logs, social media, and sensor data.
  2. Variety also refers to variable data structures. Ex: Facebook, LinkedIn, and Twitter contain different information about people, which might be combined in big data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

_____ increases capacity by increasing speed and size of CPUs and storage devices for a limited number of machines.

A

Vertical scaling

To accommodate increasing database sizes, transactional applications commonly scale vertically, not horizontally.

Vertical scaling increases processing speed, memory, and storage of a limited number of machines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

_____ increases capacity by adding large numbers of low-cost components like standard disk drives and CPUs.

A

Horizontal scaling

To accommodate increasing database sizes, transactional applications commonly scale vertically, not horizontally.

Horizontal scaling adds an unlimited number of machines working in parallel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

_____ splits large tables into separate physical files on one machine.

A

Partitioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Relational databases were developed prior to big data. Historically, which of the following requirements were prioritized by relational databases?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

_____ splits data sets across multiple machines.

A

Sharding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data is represented as a single key and an associated value. The key is used to access the value.

A

Key-value database

Key-value systems support a limited set of queries, such as:

put(key, value) - Stores the value in the database, indexed by key.

get(key) - Retrieves the value associated with the key.

multiGet(key1, …, keyn) - Retrieves the values associated with keys 1 through n.

delete(key) - Deletes the value associated with the key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

also called column-based, column family, or tabular database. Data is represented as a key and multiple values. Since each record has multiple values, a descriptive name is stored with each value.

A

Wide column database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data is represented as a key and a ‘document’. Usually the document is in a structured, human-readable format such as XML or JSON.

A

Document database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data is represented as a graph with nodes and edges.

A

Graph database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

_____ supports the data models of several categories.

A

multi-model database, also called a hybrid database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Keys are used to identify and locate values. In this example, the key is an email address.

Values are photographs of the person associated with the email address.

Each key is associated with one value.

A

Key-value logical structure

The put() function stores a value in the database.

The get() function retrieves the value associated with a key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Values are grouped in hash buckets.

Values are replicated on multiple devices for high availability and fast access.

A

Key-value physical structure

Updates to values are applied to one replica. For fast updates and high availability, additional replicas are not updated within a transaction.

If other replicas are accessed before an update is propagated, obsolete values are returned.

Eventually, the update is propagated to all replicas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q
A

Expected: Webpage, User

The key must be unique. A webpage domain name and user email are unique, whereas two students may have the same age.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q
A

Expected:

Building,

Stock

The key must be unique. A building’s street address and stock symbol are unique, whereas two students may have the same age.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q
A

Expected:

Stock,

Employee

The key must be unique. A stock symbol and employee email are unique, whereas two patients may have the same full name.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q
A

Expected: Webpage

The key must be unique. A webpage domain name is unique, whereas two students may have the same age, or two employees may have the same title.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q
A

Expected:

‘/User/pics/flower3.jpg’ get(‘mike@email.com’) retrieves the value associated with the key ‘mike@email.com’, which is ‘/User/pics/flower3.jpg’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q
A

Expected: New value added, Existing value replaced ‘joe@email.com’ is not a key, so put(‘joe@email.com’, ‘/User/pics/cat4.jpg’) adds a new key ‘joe@email.com’ with value ‘/User/pics/cat4.jpg’. ‘matt@email.com’ is already a key, so put(‘matt@email.com’, ‘/User/pics/puppy7.jpg’) updates the value ‘/User/pics/puppy7.jpg’ for key ‘matt@email.com’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

_____ databases store multiple versions of each value. Each version is marked with the date and time the version is created, called a timestamp.

A

timestamp

To access older values, the timestamp must be specified in a query. If a query does not specify a timestamp, the database selects the most recent version.

In a wide column database, a specific value is accessed with a combination of table name, key, column family name, column name, and optional timestamp.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q
A
30
Q

All columns of a family are stored together for fast access via the key.

Different column families are physically separated.

A

Wide column databases are not optimized to access multiple column families within one query.

31
Q
A
32
Q
A
33
Q
A
34
Q
A

Expected:
Table name Contact
Key ajf@acm.org
Column family name Description
Column name Status

The table ‘Contact’ contains rows indexed by a key. The table has column families ‘Name’, ‘Address’, and ‘Description’. The columns in each column family can vary from one row to another.

So, the ‘Status’ of ‘Arnold J. Fourier’ is found by looking in the ‘Status’ column of the ‘Description’ column family of the row indexed by the key ‘ajf@acm.org’.

35
Q
A

Expected:
Table name Contact
Key ajf@acm.org
Column family name Address
Column name State

The table ‘Contact’ contains rows indexed by a key. The table has column families ‘Name’, ‘Address’, and ‘Description’. The columns in each column family can vary from one row to another.

So, the ‘State’ of ‘Arnold J. Fourier’ is found by looking in the ‘State’ column of the ‘Address’ column family of the row indexed by the key ‘ajf@acm.org’.

36
Q
A

Expected: manufacturing

First, the row with key ‘sales@corp.com’ is located. Then, the column family ‘Description’ is accessed. Finally, the value of the column ‘Category’ within the column family ‘Description’, which is ‘manufacturing’, is accessed.

37
Q

A _____ stores data as a collection of documents.

A

document database

A document database may contain multiple collections, just as a relational database may contain multiple tables.

38
Q
  1. The Flight collection consists of documents describing scheduled airline flights, in JSON format.
  2. Documents may have a different number of values with different names.
  3. Usually all documents in a collection share common value names, to facilitate queries.
A
  1. The Flight collection consists of documents describing scheduled airline flights, in JSON format.
  2. Documents may have a different number of values with different names.
  3. Usually all documents in a collection share common value names, to facilitate queries.
39
Q

Flight

{
identifier: “cb20896a-eea8-b55c-7a22-08d885640c96”,
FlightNumber: “8809”,
Airline:”United”,
DepartureAirportCode: “JFK”,
ArrivalAirportCode: “ATL”,
}

{
identifier: “bha5678cdbd9e3a587de9b814578dba1”,
FlightNumber: “44”,
Airline: “American”,
DepartureAirportCode: “OAK”,
ArrivalAirportCode: “DFW”,
}

{
identifier: “41b3b38cdbd9e3a587de9b8145111aab”,
FlightNumber: “239”,
Airline:”United”,
DepartureAirportCode: “SFO”,
ArrivalAirportCode: “ORD”
}

A
40
Q

Documents are assigned to a shard based on a _____.

A

shard key

The shard key is either the document identifier or some other value. If the shard key is a value, an index of shard key values is created so the database can quickly locate documents.

41
Q

With a _____, each shard contains a contiguous range of shard key values.

A

range function

Ex: If the shard key for the Flight collection is Airline, documents for airlines beginning with ‘A’ might be in one shard, ‘B’ in another, and so on.

42
Q
  1. The database designer selects either the identifier or an indexed value as the shard key. Airline is chosen as the shard key.
  2. Documents can be assigned to a shard based on a hash function on the shard key.
A
  1. Alternatively, documents can be assigned to a shard based on a range function.
43
Q
A
44
Q
A
45
Q
A

Expected: No documents are selected

No documents have the key Credits.

46
Q
A
47
Q
A
48
Q

a hub where network lines converge.

A

A vertex, also called a node

49
Q

a connection between two vertices.

A

An edge, also called a link

50
Q

descriptive information associated with vertices and edges.

A

property

51
Q

In a _____, edges have a starting and ending vertex and are depicted as arrows.

A

directed graph

52
Q

In an _____, edges have no direction and are depicted as lines.

A

undirected graph

53
Q
A
54
Q
A
55
Q
  1. Vertex labels are collections of objects, like entity types or tables.
  2. A vertex is an individual object, like an entity instance or table row. An edge is a relationship between individual objects.
  3. Properties are name-value pairs for vertices and edges.
A
  1. Property graphs have flexible schema. Different vertices and edges can have different properties.
56
Q
A
57
Q
  1. g.addV().property() adds a vertex with label ‘Passenger’ to graph g.
  2. g.addV().property() adds a vertex with label ‘Flight’ to graph g.
A
  1. g.V().addE().to() adds an edge between two vertices.
  2. out() traverses edges from start to end vertex, like a relational join.
58
Q
A
59
Q
  1. In a relational database, a relationship is stored as a foreign key value in an index, along with a pointer to the location of the row containing related data.
  2. With index-free adjacency, a pointer is stored within the start vertex. Queries that traverse edges require fewer reads.
A
  1. A pointer is also stored within the end vertex to enable traversal in any direction.
60
Q
A
61
Q
A
62
Q
A

Expected: Jen Choi, $266

Person is an Entity type, so a vertex label. Same for Payment.
Jen Choi is a Person instance, so a vertex.
$266 is a Payment instance, so a vertex.
Kim Soto-Makes-$66 is a connection between two vertices, so an edge.

63
Q
A

Expected:

Jan West-Makes-$142

Payment is an Entity type, so a vertex label, not an edge.
Fay Choi-Pays-$152 could not be an edge as Pays is not a relationship type shown in the graph.
The graph is directed and the arrow points from Person to Payment, so $227-MadeBy-Pat Reid could not be an edge.
The graph is directed and the arrow points from Person to Payment, so $91-MadeBy-Tia Hale could not be an edge.
The graph is directed and the arrow points from Person to Payment, so $268-MadeBy-Del Hall could not be an edge.
Jan West-Makes-$142 is a connection between two vertices, so could be an edge.

64
Q
A

Expected:
Rob Ross-Teaches-English
Ina West-Teaches-English

Rob Ross-Teaches-English is a connection between two vertices, so could be an edge.
Course is an Entity type, so a vertex label, not an edge.
Instructor is an Entity type, so a vertex label, not an edge.
Ina West-Teaches-English is a connection between two vertices, so could be an edge.
The graph is directed and the arrow points from Instructor to Course, so Databases-TaughtBy-Zoe Rios could not be an edge.
Teaches is a Relationship type, so an edge label.

65
Q
A

Expected:
NumberOfTerminals: 5
FlightNumber: 3572

MealPreference is a property name without a value, so not a property.
5 is a property value without a name, so not a property.
FlightNumber is a property name without a value, so not a property. NumberOfTerminals: 5 is a name-value pair, so a property.
Rob Wood is a property value without a name, so not a property.
Default is a property value without a name, so not a property. FlightNumber: 3572 is a name-value pair, so a property.

66
Q
A

Expected:
PhoneNumber: (171) 736-1461
Gate: 29

DateOfBirth is a property name without a value, so not a property.
10 is a property value without a name, so not a property.
First is a property value without a name, so not a property. PhoneNumber: (171) 736-1461 is a name-value pair, so a property.
Duration is a property name without a value, so not a property. ArrivalDateTime is a property name without a value, so not a property. Gate: 29 is a name-value pair, so a property.

67
Q
A

Expected:

g. addV(‘Flight’).property(‘FlightNumber’, ‘3416’).property(‘AirlineName’, ‘Delta’)
g. V(‘Gus King’).addE(‘Books’).to(g.V(‘3416’))
g. V(‘Gus King’).out(‘Books’)

g.addV(‘Flight’) adds a vertex with label ‘Flight’ to graph g. property(‘FlightNumber’,’3416’) adds name-value pair FlightNumber: 3416 to graph g.
property(‘AirlineName’, ‘Delta’) adds name-value pair AirlineName: Delta to graph g.

g. V(‘Gus King’).addE(‘Books’).to(g.V(‘3416’)) adds an edge between vertices Gus King and 3416.
g. V(‘Gus King’).out(‘Books’) traverses edge ‘Books’, merges data from 3416 and Delta, and displays the result.

68
Q
  1. A single student is represented as a document with field:value pairs. The name field is assigned a BSON string, gpa is a double, and interests is an array.
  2. Documents may be nested. The student document contains a nested address document.
A
  1. MongoDB organizes documents into collections. A group of students is stored in a single collection.
69
Q
A
70
Q
A
71
Q
A