SQL Flashcards

1
Q

What is Database?

A

A database is an organized collection of data, stored and retrieved digitally from a remote or local computer system. Databases can be vast and complex, and such databases are developed using fixed design and modeling approaches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is DBMS?

A

DBMS stands for Database Management System. DBMS is a system software responsible for the creation, retrieval, updation and management of the database. It ensures that our data is consistent, organized and is easily accessible by serving as an interface between the database and its end users or application softwares.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is RDBMS? How is it different from DBMS?

A

RDBMS stands for Relational Database Management System. The key difference here, compared to DBMS, is that RDBMS stores data in the form of a collection of tables and relations can be defined between the common fields of these tables. Most modern database management systems like MySQL, Microsoft SQL Server, Oracle, IBM DB2 and Amazon Redshift are based on RDBMS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is SQL?

A

SQL stands for Structured Query Language. It is the standard language for relational database management systems. It is especially useful in handling organized data comprised of entities (variables) and relations between different entities of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between SQL and MySQL?

A

SQL is a standard language for retrieving and manipulating structured databases. On the contrary, MySQL is a relational database management system, like SQL Server, Oracle or IBM DB2, that is used to manage SQL databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are Tables and Fields?

A

A table is an organized collection of data stored in the form of rows and columns. Columns can be categorized as vertical and rows as horizontal. The columns in a table are called fields while the rows can be referred to as records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are Constraints in SQL?

A

Constraints are used to specify the rules concerning data in the table. It can be applied for single or multiple fields in an SQL table during creation of table or after creating using the ALTER TABLE command. The constraints are:

NOT NULL - Restricts NULL value from being inserted into a column.

CHECK - Verifies that all values in a field satisfy a condition.

DEFAULT - Automatically assigns a default value if no value has been specified for the field.

UNIQUE - Ensures unique values to be inserted into the field.

INDEX - Indexes a field providing faster retrieval of records.

PRIMARY KEY - Uniquely identifies each record in a table.

FOREIGN KEY - Ensures referential integrity for a record in another table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Primary Key?

A

The PRIMARY KEY constraint uniquely identifies each row in a table. It must contain UNIQUE values and has an implicit NOT NULL constraint.A table in SQL is strictly restricted to have one and only one primary key, which is comprised of single or multiple fields (columns).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a UNIQUE constraint?

A

A UNIQUE constraint ensures that all values in a column are different. This provides uniqueness for the column(s) and helps identify each row uniquely. Unlike primary key, there can be multiple unique constraints defined per table. The code syntax for UNIQUE is quite similar to that of PRIMARY KEY and can be used interchangeably.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a Foreign Key?

A

A FOREIGN KEY comprises of single or collection of fields in a table that essentially refer to the PRIMARY KEY in another table. Foreign key constraint ensures referential integrity in the relation between two tables.

The table with the foreign key constraint is labelled as the child table, and the table containing the candidate key is labelled as the referenced or parent table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a Join? List its different types.

A

The SQL Join clause is used to combine records (rows) from two or more tables in a SQL database based on a related column between the two.

(INNER) JOIN: Retrieves records that have matching values in both tables involved in the join. This is the widely used join for queries.

LEFT (OUTER) JOIN: Retrieves all the records/rows from the left and the matched records/rows from the right table.

RIGHT (OUTER) JOIN: Retrieves all the records/rows from the right and the matched records/rows from the left table.

FULL (OUTER) JOIN: Retrieves all the records where there is a match in either the left or right table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Self-Join?

A

A self JOIN is a case of regular join where a table is joined to itself based on some relation between its own column(s). Self-join uses the INNER JOIN or LEFT JOIN clause and a table alias is used to assign different names to the table within the query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a Cross-Join?

A

Cross join can be defined as a cartesian product of the two tables included in the join. The table after join contains the same number of rows as in the cross-product of number of rows in the two tables. If a WHERE clause is used in cross join then the query will work like an INNER JOIN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an Index? Explain its different types.

A

A database index is a data structure that provides quick lookup of data in a column or columns of a table. It enhances the speed of operations accessing data from a database table at the cost of additional writes and memory to maintain the index data structure.

There are different types of indexes that can be created for different purposes:

Unique and Non-Unique Index:

Unique indexes are indexes that help maintain data integrity by ensuring that no two rows of data in a table have identical key values. Once a unique index has been defined for a table, uniqueness is enforced whenever keys are added or changed within the index.

Non-unique indexes, on the other hand, are not used to enforce constraints on the tables with which they are associated. Instead, non-unique indexes are used solely to improve query performance by maintaining a sorted order of data values that are used frequently.

Clustered and Non-Clustered Index:

Clustered indexes are indexes whose order of the rows in the database correspond to the order of the rows in the index. This is why only one clustered index can exist in a given table, whereas, multiple non-clustered indexes can exist in the table.

The only difference between clustered and non-clustered indexes is that the database manager attempts to keep the data in the database in the same order as the corresponding keys appear in the clustered index.

Clustering index can improve the performance of most query operations because they provide a linear-access path to data stored in the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the difference between Clustered and Non-clustered index?

A

As explained above, the differences can be broken down into three small factors -

Clustered index modifies the way records are stored in a database based on the indexed column. Non-clustered index creates a separate entity within the table which references the original table.

Clustered index is used for easy and speedy retrieval of data from the database, whereas, fetching records from the non-clustered index is relatively slower.

In SQL, a table can have a single clustered index whereas it can have multiple non-clustered indexes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Data Integrity?

A

Data Integrity is the assurance of accuracy and consistency of data over its entire life-cycle, and is a critical aspect to the design, implementation and usage of any system which stores, processes, or retrieves data. It also defines integrity constraints to enforce business rules on the data when it is entered into an application or a database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a Query?

A

A query is a request for data or information from a database table or combination of tables. A database query can be either a select query or an action query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a Subquery? What are its types?

A

A subquery is a query within another query, also known as nested query or inner query . It is used to restrict or enhance the data to be queried by the main query, thus restricting or enhancing the output of the main query respectively. For example, here we fetch the contact information for students who have enrolled for the maths subject:

There are two types of subquery - Correlated and Non-Correlated.

A correlated subquery cannot be considered as an independent query, but it can refer the column in a table listed in the FROM of the main query.

A non-correlated subquery can be considered as an independent query and the output of subquery is substituted in the main query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the SELECT statement?

A

SELECT operator in SQL is used to select data from a database. The data returned is stored in a result table, called the result-set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are some common clauses used with SELECT query in SQL?

A

Some common SQL clauses used in conjuction with a SELECT query are as follows:

WHERE clause in SQL is used to filter records that are necessary, based on specific conditions

ORDER BY clause in SQL is used to sort the records based on some field(s) in ascending (ASC) or descending order (DESC).

GROUP BY clause in SQL is used to group records with identical data and can be used in conjuction with some aggregation functions to produce summarized results from the database.

HAVING clause in SQL is used to filter records in combination with the GROUP BY clause. It is different from WHERE, since WHERE clause cannot filter aggregated records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are UNION, MINUS and INTERSECT commands?

A

The UNION operator combines and returns the result-set retrieved by two or more SELECT statements.

The MINUS operator in SQL is used to remove duplicates from the result-set obtained by the second SELECT query from the result-set obtained by the first SELECT query and then return the filtered results from the first.

The INTERSECT clause in SQL combines the result-set fetched by the two SELECT statements where records from one match the other and then returns this intersection of result-sets.

Certain conditions need to be met before executing either of the above statements in SQL -

Each SELECT statement within the clause must have the same number of columns

The columns must also have similar data types

The columns in each SELECT statement should necessarily have the same order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Cursor? How to use a Cursor?

A

A database cursor is a control structure that allows for traversal of records in a database. Cursors, in addition, facilitates processing after traversal, such as retrieval, addition and deletion of database records. They can be viewed as a pointer to one row in a set of rows.

Working with SQL Cursor

  1. DECLARE a cursor after any variable declaration. The cursor declaration must always be associated with a SELECT Statement.
  2. Open cursor to initialize the result set. The OPEN statement must be called before fetching rows from the result set.
  3. FETCH statement to retrieve and move to the next row in the result set.
  4. Call the CLOSE statement to deactivate the cursor.
  5. Finally use the DEALLOCATE statement to delete the cursor definition and release the associated resources.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are Entities and Relationships?

A

Entity: An entity can be a real-world object, either tangible or intangible, that can be easily identifiable. For example, in a college database, students, professors, workers, departments, and projects can be referred to as entities. Each entity has some associated properties that provide it an identity.

Relationships: Relations or links between entities that have something to do with each other. For example - The employees table in a company’s database can be associated with the salary table in the same database.

24
Q

List the different types of relationships in SQL.

A

One-to-One - This can be defined as the relationship between two tables where each record in one table is associated with the maximum of one record in the other table.

One-to-Many & Many-to-One - This is the most commonly used relationship where a record in a table is associated with multiple records in the other table.

Many-to-Many - This is used in cases when multiple instances on both sides are needed for defining a relationship.

Self Referencing Relationships - This is used when a table needs to define a relationship with itself.

25
Q

What is an Alias in SQL?

A

An alias is a feature of SQL that is supported by most, if not all, RDBMSs. It is a temporary name assigned to the table or table column for the purpose of a particular SQL query. In addition, aliasing can be employed as an obfuscation technique to secure the real names of database fields. A table alias is also called a correlation name .

An alias is represented explicitly by the AS keyword but in some cases the same can be performed without it as well. Nevertheless, using the AS keyword is always a good practice.

26
Q

What is a View?

A

A view in SQL is a virtual table based on the result-set of an SQL statement. A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.

27
Q

What is Normalization?

A

Normalization represents the way of organizing structured data in the database efficiently. It includes creation of tables, establishing relationships between them, and defining rules for those relationships. Inconsistency and redundancy can be kept in check based on these rules, hence, adding flexibility to the database.

28
Q

What is Denormalization?

A

Denormalization is the inverse process of normalization, where the normalized schema is converted into a schema which has redundant information. The performance is improved by using redundancy and keeping the redundant data consistent. The reason for performing denormalization is the overheads produced in query processor by an over-normalized structure.

29
Q

What are the various forms of Normalization?

A

Normal Forms are used to eliminate or reduce redundancy in database tables. The different forms are as follows:

First Normal Form

A relation is in first normal form if every attribute in that relation is a single-valued attribute. If a relation contains composite or multi-valued attribute, it violates the first normal form.

Second Normal Form

A relation is in second normal form if it satisfies the conditions for first normal form and does not contain any partial dependency. A relation in 2NF has no partial dependency, i.e., it has no non-prime attribute that depends on any proper subset of any candidate key of the table. Often, specifying a single column Primary Key is the solution to the problem

Third Normal Form

A relation is said to be in the third normal form, if it satisfies the conditions for second normal form and there is no transitive dependency between the non-prime attributes, i.e.,all non-prime attributes are determined only by the candidate keys of the relation and not by any other non-prime attribute.

Boyce-Codd Normal Form

A relation is in Boyce-Codd Normal Form if satisfies the conditions for third normal form and for every functional dependency, Left-Hand-Side is super key. In other words, a relation in BCNF has non-trivial functional dependencies in the form X –> Y, such that X is always a super key

30
Q

What are the TRUNCATE, DELETE and DROP statements?

A

DELETE statement is used to delete rows from a table.

TRUNCATE command is used to delete all the rows from the table and free the space containing the table.

DROP command is used to remove an object from the database. If you drop a table, all the rows in the table is deleted and the table structure is removed from the database.

31
Q

What is the difference between DROP and TRUNCATE statements?

A

If a table is dropped, all things associated with the tables are dropped as well. This includes - the relationships defined on the table with other tables, the integrity checks and constraints, access privileges and other grants that the table has. To create and use the table again in its original form, all these relations, checks, constraints, privileges and relationships need to be redefined. However, if a table is truncated, none of the above problems exist and the table retains its original structure.

32
Q

What is the difference between DELETE and TRUNCATE statements?

A

The TRUNCATE command is used to delete all the rows from the table and free the space containing the table. The DELETE command deletes only the rows from the table based on the condition given in the where clause or deletes all the rows from the table if no condition is specified. But it does not free the space containing the table.

33
Q

What are Aggregate and Scalar functions?

A

An aggregate function performs operations on a collection of values to return a single scalar value. Aggregate functions are often used with the GROUP BY and HAVING clauses of the SELECT statement. Following are the widely used SQL aggregate functions:

AVG() - Calculates the mean of a collection of values.

COUNT() - Counts the total number of records in a specific table or view.

MIN() - Calculates the minimum of a collection of values.

MAX() - Calculates the maximum of a collection of values.

SUM() - Calculates the sum of a collection of values.

FIRST() - Fetches the first element in a collection of values.

LAST() - Fetches the last element in a collection of values.

Note: All aggregate functions described above ignore NULL values except for the COUNT function.

A scalar function returns a single value based on the input value. Following are the widely used SQL scalar functions:

LEN() - Calculates the total length of the given field (column).

UCASE() - Converts a collection of string values to uppercase characters.

LCASE() - Converts a collection of string values to lowercase characters.

MID() - Extracts substrings from a collection of string values in a table.

CONCAT() - Concatenates two or more strings.

RAND() - Generates a random collection of numbers of given length.

ROUND() - Calculates the round off integer value for a numeric field (or decimal point values).

NOW() - Returns the current data & time.FOR

MAT() - Sets the format to display a collection of values.

34
Q

What is User-defined function? What are its various types?

A

The user-defined functions in SQL are like functions in any other programming language that accept parameters, perform complex calculations, and return a value. They are written to use the logic repetitively whenever required. There are two types of SQL user-defined functions:

Scalar Function: As explained earlier, user-defined scalar functions return a single scalar value.

Table Valued Functions: User-defined table-valued functions return a table as output.

Inline: returns a table data type based on a single SELECT statement.

Multi-statement: returns a tabular result-set but, unlike inline, multiple SELECT statements can be used inside the function body.

35
Q

What is OLTP?

A

OLTP stands for Online Transaction Processing, is a class of software applications capable of supporting transaction-oriented programs. An essential attribute of an OLTP system is its ability to maintain concurrency. To avoid single points of failure, OLTP systems are often decentralized. These systems are usually designed for a large number of users who conduct short transactions. Database queries are usually simple, require sub-second response times and return relatively few records.

36
Q

What are the differences between OLTP and OLAP?

A

OLTP stands for Online Transaction Processing, is a class of software applications capable of supporting transaction-oriented programs. An important attribute of an OLTP system is its ability to maintain concurrency. OLTP systems often follow a decentralized architecture to avoid single points of failure. These systems are generally designed for a large audience of end users who conduct short transactions. Queries involved in such databases are generally simple, need fast response times and return relatively few records. Number of transactions per second acts as an effective measure for such systems.

OLAP stands for Online Analytical Processing, a class of software programs which are characterized by relatively low frequency of online transactions. Queries are often too complex and involve a bunch of aggregations. For OLAP systems, the effectiveness measure relies highly on response time. Such systems are widely used for data mining or maintaining aggregated, historical data, usually in multi-dimensional schemas.

37
Q

What is Collation? What are the different types of Collation Sensitivity?

A

Collation refers to a set of rules that determine how data is sorted and compared. Rules defining the correct character sequence are used to sort the character data. It incorporates options for specifying case-sensitivity, accent marks, kana character types and character width. Below are the different types of collation sensitivity:

Case sensitivity: A and a are treated differently.

Accent sensitivity: a and á are treated differently.

Kana sensitivity: Japanese kana characters Hiragana and Katakana are treated differently.

Width sensitivity: Same character represented in single-byte (half-width) and double-byte (full-width) are treated differently.

38
Q

What is a Stored Procedure?

A

A stored procedure is a subroutine available to applications that access a relational database management system (RDBMS). Such procedures are stored in the database data dictionary. The sole disadvantage of stored procedure is that it can be executed nowhere except in the database and occupies more memory in the database server. It also provides a sense of security and functionality as users who can’t access the data directly can be granted access via stored procedures.

39
Q

What is a Recursive Stored Procedure?

A

A stored procedure which calls itself until a boundary condition is reached, is called a recursive stored procedure. This recursive function helps the programmers to deploy the same set of code several times as and when required. Some SQL programming languages limit the recursion depth to prevent an infinite loop of procedure calls from causing a stack overflow, which slows down the system and may lead to system crashes.

40
Q

How to create empty tables with the same structure as another table?

A

Creating empty tables with the same structure can be done smartly by fetching the records of one table into a new table using the INTO operator while fixing a WHERE clause to be false for all records. Hence, SQL prepares the new table with a duplicate structure to accept the fetched records but since no records get fetched due to the WHERE clause in action, nothing is inserted into the new table.

41
Q

What is Pattern Matching in SQL?

A

SQL pattern matching provides for pattern search in data if you have no clue as to what that word should be. This kind of SQL query uses wildcards to match a string pattern, rather than writing the exact word. The LIKE operator is used in conjunction with SQL Wildcards to fetch the required information.

Using the % wildcard to perform a simple search

The % wildcard matches zero or more characters of any type and can be used to define wildcards both before and after the pattern.

Omitting the patterns using the NOT keyword

Use the NOT keyword to select records that don’t match the pattern.

Matching a pattern anywhere using the % wildcard twice

Search for a student in the database where he/she has a ‘%K%’ in his/her name.

Using the _ wildcard to match pattern at a specific position

The _ wildcard matches exactly one character of any type. It can be used in conjunction with % wildcard.

Matching patterns for specific length

The _ wildcard plays an important role as a limitation when it matches exactly one character. It limits the length and position of the matched results

42
Q

What are Window Functions?

A

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function

43
Q

What are the advantages of using Window Functions?

A

The main advantage of using Window functions over regular aggregate functions is: Window functions do not cause rows to become grouped into a single output row, the rows retain their separate identities and an aggregated value will be added to each row.

44
Q

What are the four Fetching functions

A

Relative

  • LAG(column, n) returns column ‘s value at the row n rows before the current row
  • LEAD(column, n) returns column ‘s value at the row n rows after the current row

Absolute

  • FIRST_VALUE(column) returns the rst value in the table or partition
  • LAST_VALUE(column) returns the last value in the table or partition
45
Q

What does PARTITION BY do?

A
  • PARTITION BY splits the table into partitions based on a column’s unique values
  • The results aren’t rolled into one column
  • Operated on separately by the window function
  • ROW_NUMBER will reset for each partition
  • LAG will only fetch a row’s previous value if its previous row is in the same partition
46
Q

What are the three Ranking functions?

A
  1. ROW_NUMBER() always assigns unique numbers, even if two rows’ values are the same
  2. RANK() assigns the same number to rows with identical values, skipping over the next numbers insuch cases
  3. DENSE_RANK() also assigns the same number to rows with identical values, but doesn’t skip overthe next numbers
47
Q

What is paging?

A

-Paging: Splitting data into (approximately) equal chunks

  • Uses
  • Many APIs return data in “pages”to reduce data being sent
  • Separating data into quartiles or thirds (top middle 33%, and bottom thirds) to judgeperformance

Enter NTILE
-NTILE(n) splits the data into n approximately equal pages

48
Q

What is a Frame?

A

The definition of a window used with a window function can include a frame clause. A frame is a subset of the current partition and the frame clause specifies how to define the subset.

ROWS BETWEEN
-ROWS BETWEEN [START] AND [FINISH]
–n PRECEDING : n rows before the current row
–CURRENT ROW :the current row
–n FOLLOWING : n rows after the current row
Examples
-ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
-ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
-ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING

RANGE BETWEEN

  • RANGE BETWEEN [START] AND [FINISH]
  • -Functions much the same as ROWS BETWEEN
  • -RANGE treats duplicates in OVER ‘s ORDER BY subclause as a single entity

ROWS BETWEEN is almost always used over RANGE BETWEEN

49
Q

What is a moving average (MA)?

A

Moving average (MA): Average of last n periods

Example: 10-day MA of units sold in sales is the average of the last 10 days’ sold units

  • Used to indicate momentum/trends
  • Also useful in eliminating seasonality
50
Q

What is a moving total?

A

Moving total: Sum of last n periods

Example: Sum of the last 3 Olympic games’ medals

-Used to indicate performance; if the sum is going down, overall performance is going down

51
Q

What is pivoting?

A

Transforms a table by making columns out of the unique values of one of its columns.

Easier to scan, especially if pivoted by a
chronologically ordered column

52
Q

What is ROLLUP?

A

-ROLLUP is a GROUP BY subclause that includes extra rows for group-level aggregations

-GROUP BY Country, ROLLUP(Medal) will count all Country - and Medal -level totals,then
count only Country -level totals and ll in Medal with null s for these rows

  • ROLLUP is hierarchical, de-aggregating from the leftmost provided column to the right-most
  • -ROLLUP(Country, Medal) includes Country -leveltotals
  • -ROLLUP(Medal, Country) includes Medal -leveltotals
  • -Both include grand totals

Use ROLLUP when you have hierarchical
data (e.g., date parts) and don’t want all
possible group-level aggregations

53
Q

What is CUBE?

A
  • CUBE is a non-hierarchical ROLLUP
  • It generates all possible group-level aggregations
  • -CUBE(Country, Medal) counts Country -level, Medal -level, and grand totals

Use CUBE when you want all possible
group-level aggregations

54
Q

What is COALESCE? When is it useful?

A
  • COALESCE() takes a list of values and returns the first non- null value, going from left to right
  • COALESCE(null, null, 1, null, 2) ? 1
  • Useful when using SQL operations that return null s
  • -ROLLUP and CUBE
  • -Pivoting
  • -LAG and LEAD
55
Q

What is STRING_AGG? When is it useful?

A

STRING_AGG(column, separator) takes all the values of a column and concatenates them, with
separator in between each value

It is useful when you want to reduce the number of rows that are returned.

56
Q

What are the common data types?

A

Text data types
–CHAR , VARCHAR and TEXT

Numeric data types
–INT and DECIMAL

Date / time data types
–DATE , TIME , TIMESTAMP , INTERVAL

Arrays