MongoDB Basics Flashcards

Question

How to get rid of an index?

Answer 1

`db.bigdata.dropIndex({"account_no":1})`

Answer 2

Use `explain()`

Answer 3

Both queries produce the **same output**, returning only two documents from the collection. However, the first query is actually an** aggregate pipeline**, which allows data to be processed in more complex ways, while the latter is a simple **read operation** with a limit on the number of results to be returned.

Answer 4

A pipeline consists of one or more operations achieved by adding operators inside an array, separated by commas. Each step is called a stage. MongoDB executes the first operator in the pipeline and sends its output to the next operator.

Answer 5

It aggregates documents from the `marks` collection. First, it uses the `subject` field as key for grouping data, then it calculates the average value of `marks` field for each key.

Answer 6

Pipelines stages are: 1. Finding the average marks per `name`. 2. Sorting the output based on average marks in descending order. 3. Limiting the output to two documents. Therefore it returns the two documents from the `marks` collection with the highest average value of `marks` in descending order.

Answer 7

`pymongo` is the official MongoDB driver for Python `from pymongo import MongoClient` From this module we can import the `MongoClient` class to interact with MongoDB databases. This class includes methods for performing CRUD operations.

Answer 8

The aggregation framework is a set of tools that allow us to **perform complex analysis** on the data in MongoDB. It is based on a set of aggregation operations that process data records and return computed results. These operations are put together so to form a **pipeline**, meaning the output from one operation serve as input to the next one.

Answer 9

A cursor is an object that allows you to iterate over the results of a query, one document at a time. When I call `.find()` with a collection, it does **not return the list of documents immediately**, but rather a cursor object that lazily fetches the documents **as needed**. When using the MongoDB shell, documents will be retrieved automatically. However, when using MongoClient in Python, I need to use `dumps` from the `bson.json_util` module to retrieve the documents. If a collection has millions of documents, you don’t want to load them all into memory at once. A cursor allows MongoDB to return results in batches, reducing memory usage.

Answer 10

For example, we can build an aggregation pipeline that matches a set of documents based on a set of criteria, groups those documents together, sorts them, then returns that result set to us.

Answer 11

This is an **aggregation pipeline** written in Python using the `MongoClient` class from the `pymongo` module. The client connects to the `routes` collection from the `sample_training` database. The pipeline does the following: 1. Filters documents where `src_airport` is PDX and `stops` is equal to zero. 2. Groups the results using airline_name as key, and increments a counter by 1 for each new document with the same key. 3. Sorts results in descending order 4. Returns only the top 3 results Translating this to business language, this query returns the *top three airlines that offer the most direct flights out of the airport in Portland, Oregon, USA (PDX)*

Answer 12

A series of stages for data transformation and processing, including filtering, grouping, sorting, and projecting. ## Footnote The aggregation pipeline is a powerful tool for expressive data manipulation.

Answer 13

A data structure commonly used in database indexing to efficiently store and retrieve data based on ordered keys. ## Footnote B+ Trees provide efficient searching, sequential access, insertions, and deletions.

Answer 14

The process of selecting a new primary node when the current primary becomes unavailable. ## Footnote Elections ensure that there is always one primary node available to handle write operations.

Answer 15

Adding more machines or nodes to a NoSQL database to improve performance and capacity. ## Footnote This is typically achieved through techniques like sharding.

Answer 16

Operations that can be safely repeated multiple times without changing the result. ## Footnote MongoDB encourages idempotent operations to ensure data consistency.

Answer 17

The creation of data structures that improve query performance by allowing the database to quickly locate specific records. ## Footnote Indexes can be created on certain fields or columns to optimize search operations.

Answer 18

An interactive command-line interface for interacting with a MongoDB server using JavaScript-like commands. ## Footnote The mongo shell is a versatile tool for administration and data manipulation.

Answer 19

The official MongoDB driver that provides a connection to a MongoDB server and allows interaction with the database. ## Footnote MongoClient supports various programming languages.

Answer 20

A special collection that records all write operations in a primary node. ## Footnote It is used to replicate data to secondary nodes and recover from failures.

Answer 21

The active, writable node that processes all write operations. ## Footnote There can only be one primary node at a time in a replica set.

Answer 22

Creating and maintaining copies of data on multiple nodes to ensure availability and reduce data loss. ## Footnote Replication improves fault tolerance and provides read scalability.

Answer 23

The delay in data replication from a primary node to its secondary nodes. ## Footnote Replication lag can impact the consistency of secondary data.

Answer 24

Nodes that replicate data from the primary and can be used for read operations. ## Footnote Secondary nodes help distribute read load.

Answer 25

The practice of partitioning a database into smaller pieces called shards to distribute data across multiple servers. ## Footnote Sharding helps with horizontal scaling.

Answer 26

Upgrading the resources of existing machines to improve performance. ## Footnote This includes increasing CPU and RAM.

MongoDB Basics Flashcards

(50 cards)