DBs and SQL Flashcards

1
Q

What is a SQL database?

A
  • Structured Data: SQL databases, also known as relational databases, store data in structured tables with rows and columns, where each row represents a record and each column represents a field.
  • Schema: SQL databases have a fixed schema, which defines the structure of the tables, including the data types and constraints for each column.
  • ACID Transactions: SQL databases support ACID (Atomicity, Consistency, Isolation, Durability) transactions, ensuring data integrity and reliability.
  • Query Language: SQL databases use the Structured Query Language (SQL) to interact with the data, allowing for complex queries, joins, and aggregations.
  • Examples: Popular SQL databases include MySQL, PostgreSQL, Oracle Database, and Microsoft SQL Server.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a NoSQL database?

A
  • Flexible Schema: NoSQL databases do not require a fixed schema, allowing for more flexibility in data storage and accommodating diverse data structures.
  • Variety of Data Models: NoSQL databases can use various data models, including document, key-value, column-family, and graph.
  • Scalability: NoSQL databases are designed for horizontal scalability, making them suitable for handling large volumes of data across multiple servers.
  • CAP Theorem: NoSQL databases often prioritize availability and partition tolerance over consistency, as described by the CAP theorem.
  • Examples: Popular NoSQL databases include MongoDB (document), Redis (key-value), Cassandra (column-family), and Neo4j (graph).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When should you use a SQL vs a NoSQL database?

A
  • Structured Data: Use SQL databases when dealing with structured data that fits well into tables with a fixed schema.
  • Complex Queries: Choose SQL databases when you need to perform complex queries, joins, and aggregations on your data.
  • Data Integrity: Opt for SQL databases when data integrity and ACID transactions are crucial for your application.
  • Flexible Schema: Use NoSQL databases when you have diverse or rapidly changing data structures that don’t fit well into a fixed schema.
  • Scalability: Choose NoSQL databases when you need to handle large volumes of data and require horizontal scalability.

Additional interview questions around these topics:

How do you handle data consistency in a NoSQL database?
What are the trade-offs between consistency, availability, and partition tolerance in the CAP theorem?
How do you design a schema for a relational database?
What are the advantages and disadvantages of using a document-based NoSQL database?
How do you handle data normalization and denormalization in a SQL database?
What are some strategies for optimizing query performance in a SQL database?
How do you handle data migration when transitioning from a SQL to a NoSQL database or vice versa?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you design a schema for a relational database?

A
  • Identify Entities: Determine the main entities (objects or concepts) that need to be represented in the database, such as customers, orders, or products.
  • Define Relationships: Establish the relationships between entities, including one-to-one, one-to-many, and many-to-many relationships.
  • Normalize Data: Organizing data into separate tables to minimize redundancy and ensure data integrity based on logical relationships.
  • Choose Primary and Foreign Keys: Select primary keys to uniquely identify records in each table and use foreign keys to establish relationships between tables.
  • Determine Data Types: Define the appropriate data types for each column, such as integer, varchar, or date, considering the nature of the data and any constraints.
  • Consider Indexes: Create indexes on columns that are frequently used in queries to improve query performance.
  • Implement Constraints: Apply constraints like NOT NULL, UNIQUE, or CHECK to enforce data integrity and business rules.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the advantages and disadvantages of using a document-based NoSQL database?

A

Advantages:
- Flexible Schema: Accommodating diverse data structures and changes over time.
- Scalability: Designed for horizontal scalability. Suitable for handling large volumes of data.
- Ease of Use: Often use JSON-like formats
- Aggregation: Many offer powerful aggregation frameworks for complex data processing and transformation.

Disadvantages:
- Lack of Joins: Typically do not support joins. Harder to query data across multiple collections.
- Data Redundancy: Flexible schema can lead to data redundancy and inconsistency, with the same data may be stored in multiple documents.
- Complex Transactions: Handling ACID transactions can be more complex compared to relational databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you handle data normalization and denormalization in a SQL database (and what is it)?

A

Normalization:

  • Normalization = organizing data into separate tables to minimize data redundancy and ensure data integrity based on logical relationships.
  • Functional Dependencies: Identify functional dependencies between columns and ensure that each table represents a single concept.
  • Normal Forms: Apply normalization rules to achieve various normal forms (1NF, 2NF, 3NF, etc.), each with specific criteria for data organization.

Denormalization:

  • Denormalization = the process of introducing redundancy into a database by merging tables or adding redundant data to improve query performance
  • Performance Trade-offs: Use denormalization to optimize specific queries, but be aware of the trade-offs, e.g. increased storage and complexity in data maintenance.
  • Materialized Views: Consider using materialized views to store precomputed query results, providing a balance between normalization and denormalization.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some strategies for optimizing query performance in a SQL database?

A
  • Use Indexes: on columns that are frequently used in queries, especially WHERE clauses and JOIN conditions.
  • Optimize Joins: Minimize the number of joins in queries and avoid using cross joins. Use indexes on join columns to improve performance.
  • Limit Result Sets: Use the LIMIT clause to restrict the number of rows returned, reducing the amount of data transferred and processed.
  • Analyze Query Plans: Use query execution plans to analyze query performance and identify bottlenecks, such as full table scans or inefficient joins.
  • Optimize Data Types: Choose appropriate data types for columns, considering the nature of the data and any constraints. Smaller data types can improve query performance.
  • Denormalize Data: Consider denormalizing data or using materialized views to improve query performance, especially for complex aggregations or calculations.
  • Cache Results: store frequently accessed query results, reducing the need for repeated database queries.
  • Test and Monitor: Test to ensure it meets performance requirements and continue monitoring e.g. with Datadog if query is used by an endpoint to identify any further performance issues.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is indexing, when should you use it and what are the pros and cons?

A

What is Indexing?

  • Creating a data structure (an index) to improve the speed of data retrieval operations
  • Used to quickly locate data without having to search every row in a database table
  • Can be created using one or more columns of a database table, providing a pointer to the data in those columns
  • Balanced-tree indexes (most common - efficient searching, insertion and deletion), or things like hash tables (good for searching for exact matches), bitmap indexes, etc
  • Beneficial for frequent reads, large data sets, WHERE clauses, JOIN operations, and ORDER BY/GROUP BY operations.

When Should You Use It?

  • Frequent Read Operations that require fast data retrieval.
  • Large Data Sets: particularly beneficial for large data sets where scanning the entire table would be time-consuming.
  • WHERE Clauses: columns that are often used in WHERE clauses to filter data
  • JOIN Operations: columns that are used in JOINs to improve join performance
  • Ordering and Grouping: columns that are frequently used for sorting (ORDER BY) or grouping (GROUP BY) data.

Pros of Indexing:

  • Faster Data Retrieval
  • Efficient Data Filtering i.e WHERE clauses
  • Optimized Joins
  • Improved Sorting and Grouping

Cons of Indexing:

  • Slower Write Operations: they can slow down write operations (INSERT, UPDATE, DELETE) because the index needs to be updated
  • Increased Storage Requirements
  • Maintenance Overhead: require maintenance, such as rebuilding or reorganizing, to maintain their performance and minimize space usage
  • Added complexity
  • Over-Indexing: too many indexes can lead to performance issues. Find a balance between the number of indexes and the performance benefits provided
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is database caching? What are the pros and cons?

A
  • Store frequently accessed data in a temporary storage area (cache) so future requests can be served more quickly.
  • For databases, caching can be applied at various levels, including query result caching, object caching, or full-page caching
  • Cache can be implemented in-memory or on disk, and it can be managed by the database system, the application, or a dedicated caching system.

Overview of Database Caching:

  • Query Result Caching: The results of frequently executed queries are stored in the cache
  • Object Caching: Database objects, such as rows or documents, are cached in the application layer
  • Full-Page Caching: Entire pages or views generated from database queries are cached, reducing the need for repeated database queries and page rendering
  • Dedicated Caching Systems: e.g. Redis and Memcached

Pros of Database Caching:

  • Improved Performance: reduces the need for repeated database queries
  • Reduced Database Load: the load on the database is reduced, freeing up resources for other operations
  • Enhanced User Experience: more responsive
  • Cost Savings: Reduced DB load, less DB scaling and additional hardware.

Cons of Database Caching:

  • Data Inconsistency: Cached data can become outdated or inconsistent
  • Cache Management Complexity
  • Memory Usage: In-memory caching consumes additional memory resources
  • Cache Invalidation: Deciding when and how to invalidate or update the cache can be challenging, especially in applications with frequent data updates.
  • Initial Cache Miss Penalty
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are Constraints in SQL?

A

Constraints are used to specify the rules concerning data in the table. It can be applied for single or multiple fields in an SQL table during the creation of the table or after creating using the ALTER TABLE command. The constraints are:

NOT NULL - Restricts NULL value from being inserted into a column.
CHECK - Verifies that all values in a field satisfy a condition.
DEFAULT - Automatically assigns a default value if no value has been specified for the field.
UNIQUE - Ensures unique values to be inserted into the field.
INDEX - Indexes a field providing faster retrieval of records.
PRIMARY KEY - Uniquely identifies each record in a table.
FOREIGN KEY - Ensures referential integrity for a record in another table.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a Join? List its different types.

A

The SQL Join clause is used to combine records (rows) from two or more tables in a SQL database based on a related column between the two.

There are four different types of JOINs in SQL:

(INNER) JOIN: Retrieves records that have matching values in both tables involved in the join. This is the widely used join for queries.

LEFT (OUTER) JOIN: Retrieves all the records/rows from the left and the matched records/rows from the right table.

RIGHT (OUTER) JOIN: Retrieves all the records/rows from the right and the matched records/rows from the left table.

FULL (OUTER) JOIN: Retrieves all the records where there is a match in either the left or right table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an Index? Explain its different types.

A

A database index is a data structure that provides a quick lookup of data in a column or columns of a table. It enhances the speed of operations accessing data from a database table at the cost of additional writes and memory to maintain the index data structure.

CREATE INDEX index_name /* Create Index /
ON table_name (column_1, column_2);
DROP INDEX index_name; /
Drop Index */

There are different types of indexes that can be created for different purposes:

Unique and Non-Unique Index:

Unique indexes are indexes that help maintain data integrity by ensuring that no two rows of data in a table have identical key values. Once a unique index has been defined for a table, uniqueness is enforced whenever keys are added or changed within the index.

CREATE UNIQUE INDEX myIndex
ON students (enroll_no);

Non-unique indexes, on the other hand, are not used to enforce constraints on the tables with which they are associated. Instead, non-unique indexes are used solely to improve query performance by maintaining a sorted order of data values that are used frequently.

Clustered and Non-Clustered Index:

Clustered indexes are indexes whose order of the rows in the database corresponds to the order of the rows in the index. This is why only one clustered index can exist in a given table, whereas, multiple non-clustered indexes can exist in the table.

The only difference between clustered and non-clustered indexes is that the database manager attempts to keep the data in the database in the same order as the corresponding keys appear in the clustered index.

Clustering indexes can improve the performance of most query operations because they provide a linear-access path to data stored in the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Cursor? How to use a Cursor?

A

A database cursor is a control structure that allows for the traversal of records in a database. Cursors, in addition, facilitates processing after traversal, such as retrieval, addition, and deletion of database records. They can be viewed as a pointer to one row in a set of rows.

Working with SQL Cursor:

DECLARE a cursor after any variable declaration. The cursor declaration must always be associated with a SELECT Statement.
Open cursor to initialize the result set. The OPEN statement must be called before fetching rows from the result set.
FETCH statement to retrieve and move to the next row in the result set.
Call the CLOSE statement to deactivate the cursor.
Finally use the DEALLOCATE statement to delete the cursor definition and release the associated resources.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

List the different types of relationships in SQL.

A

One-to-One - This can be defined as the relationship between two tables where each record in one table is associated with the maximum of one record in the other table.
One-to-Many & Many-to-One - This is the most commonly used relationship where a record in a table is associated with multiple records in the other table.
Many-to-Many - This is used in cases when multiple instances on both sides are needed for defining a relationship.
Self-Referencing Relationships - This is used when a table needs to define a relationship with itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Can you explain the architecture of PostgreSQL?

A

The architecture of PostgreSQL follows the client-server model.
The server side comprises of background process manager, query processer, utilities and shared memory space which work together to build PostgreSQL’s instance that has access to the data. The client application does the task of connecting to this instance and requests data processing to the services. The client can either be GUI (Graphical User Interface) or a web application. The most commonly used client for PostgreSQL is pgAdmin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are steps to take toward zero-downtime schema changes / migrations

A
  • Don’t rename columns/tables which are in use by the app - always copy the data and drop the old one once the app is no longer using it
  • Don’t rewrite a table while you have an exclusive lock on it (e.g. no ALTER TABLE foos ADD COLUMN bar varchar DEFAULT 'baz' NOT NULL)
  • Don’t perform expensive, synchronous actions while holding an exclusive lock (e.g. adding an index without the CONCURRENTLY flag)

Also:
- Set lock_timeout in your migration scripts to a pause your app can tolerate. It’s better to abort a deploy than take your application down



- Keep Postgres up to date. The locking code is improved with every release.

17
Q

What are Data Definition Language (DDL), Data Manipulation Language (DML), and Data Control Language (DCL) commands?

A

Data Definition Language (DDL), is a subset of SQL commands that define the structure or schema of the database. e.g CREATE, ALTER, TRUNCATE, RENAME and DROP

Data Manipulation Language (DML), is an element in SQL language that deals with managing and manipulating data in the database. e.g. SELECT, INSERT, UPDATE, DELETE, MERGE

Data Control Language (DCL), are those commands in SQL that deal with controls, rights, and permission in the database system. e.g. GRANT, REVOKE

18
Q

What are database transactions?

A

A database transaction is a sequence of one or more SQL operations executed as a single unit of work. These operations can include insertions, updates, deletions, or queries.

Characterized by the ACID properties:

  • Atomicity: Ensures all operations within a transaction are completed successfully; if not, the transaction is aborted and all changes are rolled back.
  • Consistency: Guarantees that a transaction brings the database from one valid state to another.
  • Isolation: Ensures that concurrent transactions are executed in a way that the results are the same as if they were executed sequentially.
  • Durability: Once a transaction is committed, its effects are permanent in the database, even in the event of system failures.

Specifics:

  • Begin, Commit, and Rollback: Transactions typically start with a “BEGIN” statement. If everything goes as planned, the transaction is “COMMITTED”, making all changes permanent. If there’s an issue, the transaction can be “ROLLED BACK”, undoing all changes made during that transaction.
  • Concurrent Transactions: Multiple transactions can be executed concurrently. The database management system (DBMS) ensures that concurrent transactions maintain the database’s consistency.
  • Deadlocks: When two or more transactions wait indefinitely for each other to release resources, it’s called a deadlock. Modern DBMSs have mechanisms to detect and resolve deadlocks, often by aborting one of the transactions.
  • Savepoints: Some DBMSs allow for setting savepoints within a transaction. If a part of a transaction fails, you can roll back to a savepoint without affecting the entire transaction.
  • Nested Transactions: Some systems support nested transactions, where a new transaction begins before the previous one ends. If the outer transaction fails but the inner one succeeds, the inner transaction’s changes can still be rolled back.
  • Read Phenomena: Without proper transaction control, various read phenomena can occur, such as dirty reads (reading uncommitted changes), non-repeatable reads (a value read twice within a transaction changes), and phantom reads (new records appear in a re-read).
  • Isolation Levels: To control the aforementioned read phenomena, DBMSs offer different isolation levels, like Read Uncommitted, Read Committed, Repeatable Read, and Serializable. Each level provides a different balance between performance and the strictness of data reading rules.