Chapter Ten Flashcards
(80 cards)
What are the classifications of NoSQL databases?
- Key-value stores
- Document stores
- Wide-column stores
- Graph-oriented databases
These classifications help in understanding the various types of NoSQL databases and their use cases.
What is a wide-column store?
A NoSQL database that distributes data based on both key values and columns, using ‘column groups/families’.
Example: Apache Cassandra.
What is a graph-oriented database?
A database that maintains information regarding the relationships between data items, with nodes and properties, and connections between nodes can also have properties.
Example: Neo4j.
What does NoSQL stand for?
Not Only SQL.
A category of data storage and retrieval technologies not based on the relational model.
What are the main characteristics of NoSQL databases?
- Performance: Variable
- Scalability: High
- Flexibility: High
- Complexity: Low
- Functionality: Variable
These characteristics differentiate NoSQL databases from traditional relational databases.
What are the Five V’s of Big Data?
- Volume
- Variety
- Velocity
- Veracity
- Value
These characteristics define the nature and challenges of Big Data.
What is meant by ‘Schema on Read’?
A data model determined later, depending on how the data will be used, allowing flexibility in data access.
This contrasts with ‘Schema on Write’, which uses a preexisting data model.
What is a Data Lake?
A large integrated repository for internal and external data that does not follow a predefined schema.
It allows flexible access and captures everything.
What is Hadoop?
An open-source implementation framework of MapReduce designed for managing large files in a distributed environment.
Hadoop is widely used for big data management.
What is MapReduce?
An algorithm for massive parallel processing of various types of computing tasks, dividing a computing task so multiple nodes can work on it simultaneously.
It consists of two stages: Map and Reduce.
What is the function of the Hadoop Distributed File System (HDFS)?
A file system designed for managing large files in a distributed environment, breaking data into blocks and distributing them across nodes.
HDFS allows efficient data storage and processing.
What is the role of Pig in the Hadoop ecosystem?
A tool that integrates a scripting language and an execution environment to simplify the use of MapReduce.
It is useful for developing data processing tasks.
What is Hive in the context of Hadoop?
A tool that supports management and querying of large datasets using a SQL-like language called HiveQL.
It is useful for ETL tasks.
What is the primary difference between relational and NoSQL databases?
Relational databases rely on a predefined schema, while NoSQL databases support schema on read and greater flexibility.
This affects how data is stored and retrieved.
Fill in the blank: A document-store database uses _______ for storage.
BSON-based storage format.
BSON stands for Binary JSON.
True or False: NoSQL databases are typically ACID compliant.
False.
NoSQL databases are often not ACID compliant, which distinguishes them from traditional relational databases.
What does the acronym BASE in NoSQL refer to?
Basically Available, Soft state, Eventually consistent.
This is a model used in many NoSQL databases.
What is the purpose of the _id property in MongoDB?
To uniquely identify a document.
It serves as a primary key in the database.
What does the term ‘move computation to the data’ refer to in Hadoop?
The principle of processing data where it is stored rather than moving it to a centralized location for processing.
This approach enhances efficiency in data processing.
With HDFS it is less expensive to move the execution of computation to data than to move the:
data to computation.
data to hardware.
data to systems analysis.
data to processes.
Hive uses ________ to query data.
Honeyquery
* HiveQL *
BeesNest
SQL
Structured Query Language (SQL) is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful information.
False
JSON is commonly used in conjunction with the ‘document store’ NoSQL database model.
True
NoSQL includes data storage and retrieval:
based on normalized tables.
* not based on the relational model. *
based on the relational model.
not based on data.