MongoDB Basics Flashcards

(50 cards)

1
Q

How to list all databases?

A

show dbs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Write a query to create a new database named games

A

use games

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Write a query to create a collection named dogsBreeds

A

db.createCollection("dogBreeds")

db doesn’t need to be replaced by the actual name of the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to list all collections in a database?

A

show collections

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a collection?

A

It is a group of documents. It is equivalent to the table in relational databases.

For example, students from a school would be grouped into the students collection, with one document holding the information of a student.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Write a query that lists all documents in the students collection

A

db.students.find()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Do documents in a collection need to share the same structure?

A

No, there schema is not enforced within instances of a collection, in fact, each document can have its own set of fields.

Remember that a flexible schema is one of the characteristics from MongoDB’s document store.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the acronym CRUD stands for?

A

Create, Read, Update and Delete (operations).

These are basic operations to manipulate data from a database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Write a query to count the number of documents in a collection?

A

db.collection_name.countDocuments()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Write a query to insert the following data in the students collection.

name: Ana Bell
age: 34
email: anabell@uni.com

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Write a query to insert the following data in the books collection.

title: A Brief History of Time: From the Big Bang to Black Holes
author: Stephen Hawking

title: Sapiens: A Brief History of Humankind
author: Yuval Harari

title: The 4-Hour Workweek
author: Timothy Ferriss

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Write a query to find the document within the books collection that has the following title The 4-Hour Workweek?

A

db.books.find({"title": "The 4-Hour Workweek"})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens when findOne() is used and there are multiple documents that match the criteria provided (see example below)?

db.students.findOne({"liveInCampus": True})

A

It will return only the first match.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Write a query to find the document within the students collection for which the id is equal to 202503456 (string).

A

db.students.find({"id": "202503456"})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens when deleteMany() is used and there are multiple documents that match the criteria provided (see example below)?

A

All documents that match the criteria will be deleted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two commands for deleting one and many documents from a collection?

A

db.collection_name.deleteOne({"id": "123"})
db.collection_name.deleteMany({"id": "123"})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Write a query that updates all documents in the books collection, that have genre equal to Drama to include a field called onSale and set it to True.

A

db.books.UpdateMany({"genre": "Drama"}, {$set: {"onSale": True}});

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain why using indexing in database is benefitial and give an real-world example of

A

Indexes allows us to read data without having to loop over the entire set of documents in a collection, similar to how to an index works in a book.

Suppose we are searching for a document with id equals to 1534534. Without an index, MongoDB must scan all documents in the collection and check if id == 1534534

Alternatively, if we create an index for the id and sort it, then the search operations becomes much faster as we just need go straight for the index.

Ultimately, having the index improved the efficiency of the read operation.

For example, when we want look for a word in a physical dictionary, there is no need to go through the whole book just to find one word.

The dictionary is indexed already so one just need to look for the first letter fo the word, then the second one and so on until we find the word we are looking for.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When should I create an index for a MongoDB database?

A

Indexes should be created for the most frequent queries.

One can create a compound index with many fields, but adding to many fields to the index can hurt the efficiency of the read operation.

Specifically, a single compound index can include up to 32 fields, and a collection can include up to 64 indexes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a compound index?

A

A compound index includes more than one field in its composition. It allows us to create an index for a frequent query that requires more than one field to return the information needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Complete the sentence below

An index in MongoDB is a special data structure that stores —– and —–

A

An index in MongoDB is a special data structure that stores Field being indexed and the Location of the corresponding document on disk.

18
Q

In which format does MongoDB stores an index?

A

It stores as a (balanced) tree, containing the fields indexed and the path to the correspoding documents on disk.

19
Q

What are the downsides of creating too many indexes?

A
  • Too many indexes can slow down write operations (insert, update, delete) because indexes must be updated as well.
  • Indexes consume disk space.
20
Q

Write two queries:
1. Create an index for the bigdata collection using the account_no field in ascending order.
2. Show list of existing indexes

20
How to get rid of an index?
`db.bigdata.dropIndex({"account_no":1})`
20
How can I understand the performance of a query?
Use `explain()`
21
What is the difference between the two queries below?
Both queries produce the **same output**, returning only two documents from the collection. However, the first query is actually an** aggregate pipeline**, which allows data to be processed in more complex ways, while the latter is a simple **read operation** with a limit on the number of results to be returned.
22
In the context of aggregation frameworks, what is a pipeline?
A pipeline consists of one or more operations achieved by adding operators inside an array, separated by commas. Each step is called a stage. MongoDB executes the first operator in the pipeline and sends its output to the next operator.
23
Explain what the aggregation pipeline below does
It aggregates documents from the `marks` collection. First, it uses the `subject` field as key for grouping data, then it calculates the average value of `marks` field for each key.
24
Explain what the aggregation pipeline below does
Pipelines stages are: 1. Finding the average marks per `name`. 2. Sorting the output based on average marks in descending order. 3. Limiting the output to two documents. Therefore it returns the two documents from the `marks` collection with the highest average value of `marks` in descending order.
25
Explain briefly how to interact with MongoDB through Python
`pymongo` is the official MongoDB driver for Python `from pymongo import MongoClient` From this module we can import the `MongoClient` class to interact with MongoDB databases. This class includes methods for performing CRUD operations.
26
What is the aggregation framework, and what is its purpose?
The aggregation framework is a set of tools that allow us to **perform complex analysis** on the data in MongoDB. It is based on a set of aggregation operations that process data records and return computed results. These operations are put together so to form a **pipeline**, meaning the output from one operation serve as input to the next one.
27
Explain what is a cursor and how MongoDB shell and MongoClient driver (Python) handle a cursor.
A cursor is an object that allows you to iterate over the results of a query, one document at a time. When I call `.find()` with a collection, it does **not return the list of documents immediately**, but rather a cursor object that lazily fetches the documents **as needed**. When using the MongoDB shell, documents will be retrieved automatically. However, when using MongoClient in Python, I need to use `dumps` from the `bson.json_util` module to retrieve the documents. If a collection has millions of documents, you don’t want to load them all into memory at once. A cursor allows MongoDB to return results in batches, reducing memory usage.
28
Give an example of an aggregation pipeline
For example, we can build an aggregation pipeline that matches a set of documents based on a set of criteria, groups those documents together, sorts them, then returns that result set to us.
29
Explain what the query below accomplishes.
This is an **aggregation pipeline** written in Python using the `MongoClient` class from the `pymongo` module. The client connects to the `routes` collection from the `sample_training` database. The pipeline does the following: 1. Filters documents where `src_airport` is PDX and `stops` is equal to zero. 2. Groups the results using airline_name as key, and increments a counter by 1 for each new document with the same key. 3. Sorts results in descending order 4. Returns only the top 3 results Translating this to business language, this query returns the *top three airlines that offer the most direct flights out of the airport in Portland, Oregon, USA (PDX)*
30
What is the aggregation pipeline in MongoDB?
A series of stages for data transformation and processing, including filtering, grouping, sorting, and projecting. ## Footnote The aggregation pipeline is a powerful tool for expressive data manipulation.
31
What is a B+ Tree?
A data structure commonly used in database indexing to efficiently store and retrieve data based on ordered keys. ## Footnote B+ Trees provide efficient searching, sequential access, insertions, and deletions.
32
What is an election in a MongoDB replica set?
The process of selecting a new primary node when the current primary becomes unavailable. ## Footnote Elections ensure that there is always one primary node available to handle write operations.
33
What is horizontal scaling?
Adding more machines or nodes to a NoSQL database to improve performance and capacity. ## Footnote This is typically achieved through techniques like sharding.
34
Define idempotent changes in MongoDB.
Operations that can be safely repeated multiple times without changing the result. ## Footnote MongoDB encourages idempotent operations to ensure data consistency.
35
What is indexing in databases?
The creation of data structures that improve query performance by allowing the database to quickly locate specific records. ## Footnote Indexes can be created on certain fields or columns to optimize search operations.
36
What is the Mongo shell?
An interactive command-line interface for interacting with a MongoDB server using JavaScript-like commands. ## Footnote The mongo shell is a versatile tool for administration and data manipulation.
37
What is MongoClient?
The official MongoDB driver that provides a connection to a MongoDB server and allows interaction with the database. ## Footnote MongoClient supports various programming languages.
38
What is the Oplog?
A special collection that records all write operations in a primary node. ## Footnote It is used to replicate data to secondary nodes and recover from failures.
39
What is a primary node in a MongoDB replica set?
The active, writable node that processes all write operations. ## Footnote There can only be one primary node at a time in a replica set.
40
Define replication in the context of databases.
Creating and maintaining copies of data on multiple nodes to ensure availability and reduce data loss. ## Footnote Replication improves fault tolerance and provides read scalability.
41
What is replication lag?
The delay in data replication from a primary node to its secondary nodes. ## Footnote Replication lag can impact the consistency of secondary data.
42
What are secondary nodes in a MongoDB replica set?
Nodes that replicate data from the primary and can be used for read operations. ## Footnote Secondary nodes help distribute read load.
43
What is sharding?
The practice of partitioning a database into smaller pieces called shards to distribute data across multiple servers. ## Footnote Sharding helps with horizontal scaling.
44
Define vertical scaling.
Upgrading the resources of existing machines to improve performance. ## Footnote This includes increasing CPU and RAM.