SQL - CL1 Flashcards

1
Q

Tables

A

Table is a collection of data, organized in terms of rows and columns. In DBMS term, table is known as relation and row as tuple.

Note: A table has a specified number of columns, but can have any number of rows.

Table is the simple form of data storage. A table is also considered as a convenient representation of relations.

Tables are the central and the most important objects in any relational
database. The primary purpose of any database is to hold data that is
logically stored in tables.
One of the relational database design principles is that each table
holds information about one specific type of thing, or entity. For
example, a CUSTOMER table would contain data about customers
only, not about the products they ordered, invoices issued to them, or
salesmen who placed orders for them. The ACME database doesn’t
even have customers’ addresses and phone numbers because those
are separate entities represented by ADDRESS and PHONE tables
respectively.
Rows (sometimes also called records) are horizontal slices of data;
each row contains data about one entity item. A row from the
CUSTOMER table contains information about one single customer, a
row from the ORDER_HEADER, about one single order, and so on.
The vertical cuts of table data are called columns. A column holds a
particular type of information for all entity records. The
CUST_NAME_S column in the CUSTOMER table encloses all
customers’ names; ORDHDR_INVOICENBR_N in ORDER_HEADER
contains all invoice numbers.

CREATE TABLE statement
Even though we can make some generalizations about the database
table creation process, internal table implementations and CREATE
TABLE statement clauses differ from vendor to vendor significantly.
For example, Oracle’s CREATE TABLE syntax diagram is about fifteen
pages long; DB2’s takes roughly seven pages; MS SQL Server has
the shortest definition — only one-and-a-half pages.
CREATE [{GLOBAL | LOCAL}
TEMPORARY] TABLE (
[ |
[[,] ] [,…] [DEFAULT
] [COLLATE ],…
[] [ON COMMIT {DELETE | PRESERVE} ROWS]
)

Permanent and temporary tables
Database tables can be permanent or temporary, based upon the
lifespan of table data.
Usually you want tables to be permanent, meaning that inserted data
stays there until somebody explicitly deletes table rows. In some less
common situations, you may want the table data to disappear as soon as one commits changes in other tables or logs off. Typically, that may
be the case when you are issuing SQL statements from other
programs (embedded SQL) or using procedural SQL extensions, such
as PL/SQL (Oracle) or Transact SQL (MS SQL Server) to perform
complex tasks. For example, you might want a program to select
columns from dozens of different tables, apply formulas to perform
calculations on them, store the intermediate results in a temporary
table, and then update another group of tables based on those
results.

Column definitions
The table has to have one or more column definitions, which consist of
the column name and the data type.

Column default values
Each column can optionally be given a default value (in range of its data
type). In this case, if an INSERT statement omits the column, the default
value will automatically be populated

The SQL CREATE TABLE Statement
The CREATE TABLE statement is used to create a new table in a database.

CREATE TABLE table_name (
    column1 datatype,
    column2 datatype,
    column3 datatype,
   ....
);

The column parameters specify the names of the columns of the table.

The datatype parameter specifies the type of data the column can hold (e.g. varchar, integer, date, etc.).

Create Table Using Another Table
A copy of an existing table can also be created using CREATE TABLE.

The new table gets the same column definitions. All columns or specific columns can be selected.

If you create a new table using an existing table, the new table will be filled with the existing values from the old table.

Syntax
CREATE TABLE new_table_name AS
    SELECT column1, column2,...
    FROM existing_table_name
    WHERE ....;
The following SQL creates a new table called "TestTables" (which is a copy of the "Customers" table): 

Example
CREATE TABLE TestTable AS
SELECT customername, contactname
FROM customers;

The SQL DROP TABLE Statement
The DROP TABLE statement is used to drop an existing table in a database.

Syntax
DROP TABLE table_name;
Note: Be careful before dropping a table. Deleting a table will result in loss of complete information stored in the table!

SQL DROP TABLE Example
The following SQL statement drops the existing table “Shippers”:

Example
DROP TABLE Shippers;

SQL TRUNCATE TABLE
The TRUNCATE TABLE statement is used to delete the data inside a table, but not the table itself.

Syntax
TRUNCATE TABLE table_name;

SQL ALTER TABLE Statement
The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.

The ALTER TABLE statement is also used to add and drop various constraints on an existing table.

ALTER TABLE - ADD Column
To add a column in a table, use the following syntax:

ALTER TABLE table_name
ADD column_name datatype;
The following SQL adds an “Email” column to the “Customers” table:

Example
ALTER TABLE Customers
ADD Email varchar(255);

ALTER TABLE - DROP COLUMN
To delete a column in a table, use the following syntax (notice that some database systems don’t allow deleting a column):

ALTER TABLE table_name
DROP COLUMN column_name;
The following SQL deletes the “Email” column from the “Customers” table:

Example
ALTER TABLE Customers
DROP COLUMN Email;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Relationships

A

Link:
https://code.tutsplus.com/articles/sql-for-beginners-part-3-database-relationships–net-8561

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Keys

A

Primary keys and foreign keys are two types of constraints that can be used to enforce data integrity in SQL Server tables. These are important database objects.

Primary Key Constraints
A table typically has a column or combination of columns that contain values that uniquely identify each row in the table. This column, or columns, is called the primary key (PK) of the table and enforces the entity integrity of the table. Because primary key constraints guarantee unique data, they are frequently defined on an identity column.

When you specify a primary key constraint for a table, the Database Engine enforces data uniqueness by automatically creating a unique index for the primary key columns. This index also permits fast access to data when the primary key is used in queries. If a primary key constraint is defined on more than one column, values may be duplicated within one column, but each combination of values from all the columns in the primary key constraint definition must be unique.

As shown in the following illustration, the ProductID and VendorID columns in the Purchasing.ProductVendor table form a composite primary key constraint for this table. This makes sure that every row in the ProductVendor table has a unique combination of ProductID and VendorID. This prevents the insertion of duplicate rows.

  • A table can contain only one primary key constraint.
  • A primary key cannot exceed 16 columns and a total key length of 900 bytes.
  • The index generated by a primary key constraint cannot cause the number of indexes on the table to exceed 999 nonclustered indexes and 1 clustered index.
  • If clustered or nonclustered is not specified for a primary key constraint, clustered is used if there no clustered index on the table.
  • All columns defined within a primary key constraint must be defined as not null. If nullability is not specified, all columns participating in a primary key constraint have their nullability set to not null.
  • If a primary key is defined on a CLR user-defined type column, the implementation of the type must support binary ordering.

Foreign Key Constraints
A foreign key (FK) is a column or combination of columns that is used to establish and enforce a link between the data in two tables to control the data that can be stored in the foreign key table. In a foreign key reference, a link is created between two tables when the column or columns that hold the primary key value for one table are referenced by the column or columns in another table. This column becomes a foreign key in the second table.

For example, the Sales.SalesOrderHeader table has a foreign key link to the Sales.SalesPerson table because there is a logical relationship between sales orders and salespeople. The SalesPersonID column in the SalesOrderHeader table matches the primary key column of the SalesPerson table. The SalesPersonID column in the SalesOrderHeader table is the foreign key to the SalesPerson table. By creating this foreign key relationship, a value for SalesPersonID cannot be inserted into the SalesOrderHeader table if it does not already exist in the SalesPerson table.

A table can reference a maximum of 253 other tables and columns as foreign keys (outgoing references). SQL Server 2016 (13.x) increases the limit for the number of other table and columns that can reference columns in a single table (incoming references), from 253 to 10,000. (Requires at least 130 compatibility level.) The increase has the following restrictions:

Greater than 253 foreign key references are only supported for DELETE DML operations. UPDATE and MERGE operations are not supported.

A table with a foreign key reference to itself is still limited to 253 foreign key references.

Greater than 253 foreign key references are not currently available for columnstore indexes, memory-optimized tables, Stretch Database, or partitioned foreign key tables.

Indexes on Foreign Key Constraints
Unlike primary key constraints, creating a foreign key constraint does not automatically create a corresponding index. However, manually creating an index on a foreign key is often useful for the following reasons:
- Foreign key columns are frequently used in join criteria when the data from related tables is combined in queries by matching the column or columns in the foreign key constraint of one table with the primary or unique key column or columns in the other table. An index enables the Database Engine to quickly find related data in the foreign key table. However, creating this index is not required. Data from two related tables can be combined even if no primary key or foreign key constraints are defined between the tables, but a foreign key relationship between two tables indicates that the two tables have been optimized to be combined in a query that uses the keys as its criteria.
- Changes to primary key constraints are checked with foreign key constraints in related tables.

Referential Integrity
Although the main purpose of a foreign key constraint is to control the data that can be stored in the foreign key table, it also controls changes to data in the primary key table. For example, if the row for a salesperson is deleted from the Sales.SalesPerson table, and the salesperson’s ID is used for sales orders in the Sales.SalesOrderHeader table, the relational integrity between the two tables is broken; the deleted salesperson’s sales orders are orphaned in the SalesOrderHeader table without a link to the data in the SalesPerson table.

A foreign key constraint prevents this situation. The constraint enforces referential integrity by guaranteeing that changes cannot be made to data in the primary key table if those changes invalidate the link to data in the foreign key table. If an attempt is made to delete the row in a primary key table or to change a primary key value, the action will fail when the deleted or changed primary key value corresponds to a value in the foreign key constraint of another table. To successfully change or delete a row in a foreign key constraint, you must first either delete the foreign key data in the foreign key table or change the foreign key data in the foreign key table, which links the foreign key to different primary key data.

Cascading Referential Integrity
By using cascading referential integrity constraints, you can define the actions that the Database Engine takes when a user tries to delete or update a key to which existing foreign keys point. The following cascading actions can be defined.

NO ACTION
The Database Engine raises an error and the delete or update action on the row in the parent table is rolled back.

CASCADE
Corresponding rows are updated or deleted in the referencing table when that row is updated or deleted in the parent table. CASCADE cannot be specified if a timestamp column is part of either the foreign key or the referenced key. ON DELETE CASCADE cannot be specified for a table that has an INSTEAD OF DELETE trigger. ON UPDATE CASCADE cannot be specified for tables that have INSTEAD OF UPDATE triggers.

SET NULL
All the values that make up the foreign key are set to NULL when the corresponding row in the parent table is updated or deleted. For this constraint to execute, the foreign key columns must be nullable. Cannot be specified for tables that have INSTEAD OF UPDATE triggers.

SET DEFAULT
All the values that make up the foreign key are set to their default values if the corresponding row in the parent table is updated or deleted. For this constraint to execute, all foreign key columns must have default definitions. If a column is nullable, and there is no explicit default value set, NULL becomes the implicit default value of the column. Cannot be specified for tables that have INSTEAD OF UPDATE triggers.

CASCADE, SET NULL, SET DEFAULT and NO ACTION can be combined on tables that have referential relationships with each other. If the Database Engine encounters NO ACTION, it stops and rolls back related CASCADE, SET NULL and SET DEFAULT actions. When a DELETE statement causes a combination of CASCADE, SET NULL, SET DEFAULT and NO ACTION actions, all the CASCADE, SET NULL and SET DEFAULT actions are applied before the Database Engine checks for any NO ACTION.

Triggers and Cascading Referential Actions
Cascading referential actions fire the AFTER UPDATE or AFTER DELETE triggers in the following manner:

All the cascading referential actions directly caused by the original DELETE or UPDATE are performed first.

If there are any AFTER triggers defined on the affected tables, these triggers fire after all cascading actions are performed. These triggers fire in opposite order of the cascading action. If there are multiple triggers on a single table, they fire in random order, unless there is a dedicated first or last trigger for the table. This order is as specified by using sp_settriggerorder.

If multiple cascading chains originate from the table that was the direct target of an UPDATE or DELETE action, the order in which these chains fire their respective triggers is unspecified. However, one chain always fires all its triggers before another chain starts firing.

An AFTER trigger on the table that is the direct target of an UPDATE or DELETE action fires regardless of whether any rows are affected. There are no other tables affected by cascading in this case.

If any one of the previous triggers perform UPDATE or DELETE operations on other tables, these actions can start secondary cascading chains. These secondary chains are processed for each UPDATE or DELETE operation at a time after all triggers on all primary chains fire. This process may be recursively repeated for subsequent UPDATE or DELETE operations.

Performing CREATE, ALTER, DELETE, or other data definition language (DDL) operations inside the triggers may cause DDL triggers to fire. This may subsequently perform DELETE or UPDATE operations that start additional cascading chains and triggers.

If an error is generated inside any particular cascading referential action chain, an error is raised, no AFTER triggers are fired in that chain, and the DELETE or UPDATE operation that created the chain is rolled back.

A table that has an INSTEAD OF trigger cannot also have a REFERENCES clause that specifies a cascading action. However, an AFTER trigger on a table targeted by a cascading action can execute an INSERT, UPDATE, or DELETE statement on another table or view that fires an INSTEAD OF trigger defined on that object.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Constraints

A

SQL constraints are used to specify rules for data in a table.
Constraints can be specified when the table is created with the CREATE TABLE statement, or after the table is created with the ALTER TABLE statement.

CREATE TABLE table_name (
    column1 datatype constraint,
    column2 datatype constraint,
    column3 datatype constraint,
    ....
);

SQL Constraints
SQL constraints are used to specify rules for the data in a table.

Constraints are used to limit the type of data that can go into a table. This ensures the accuracy and reliability of the data in the table. If there is any violation between the constraint and the data action, the action is aborted.

Constraints can be column level or table level. Column level constraints apply to a column, and table level constraints apply to the whole table.

The following constraints are commonly used in SQL:

NOT NULL - Ensures that a column cannot have a NULL value
UNIQUE - Ensures that all values in a column are different
PRIMARY KEY - A combination of a NOT NULL and UNIQUE. Uniquely identifies each row in a table
FOREIGN KEY - Uniquely identifies a row/record in another table
CHECK - Ensures that all values in a column satisfies a specific condition
DEFAULT - Sets a default value for a column when no value is specified
INDEX - Used to create and retrieve data from the database very quickly

Dropping Constraints
Any constraint that you have defined can be dropped using the ALTER TABLE command with the DROP CONSTRAINT option.

For example, to drop the primary key constraint in the EMPLOYEES table, you can use the following command.

ALTER TABLE EMPLOYEES DROP CONSTRAINT EMPLOYEES_PK;
Some implementations may provide shortcuts for dropping certain constraints. For example, to drop the primary key constraint for a table in Oracle, you can use the following command.

ALTER TABLE EMPLOYEES DROP PRIMARY KEY;
Some implementations allow you to disable constraints. Instead of permanently dropping a constraint from the database, you may want to temporarily disable the constraint and then enable it later.

Integrity Constraints
Integrity constraints are used to ensure accuracy and consistency of the data in a relational database. Data integrity is handled in a relational database through the concept of referential integrity.

There are many types of integrity constraints that play a role in Referential Integrity (RI). These constraints include Primary Key, Foreign Key, Unique Constraints and other constraints which are mentioned above.

Link:
https://www.w3schools.com/sql/sql_constraints.asp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Indexes

A

Index is another database physical structure that occupies disk space
in a way similar to that of a table. The main difference is that indexes
are hidden from users and are not mentioned in any DML statements,
even though they are often used behind the scene.
Assume our PHONE table created previously in this chapter has been
populated with records. Assume you issue the following query:
SELECT * FROM phone WHERE PHONE_CUSTID_FN = 152
The RDBMS first checks if the index exists on the condition column
(PHONE_CUSTID_FN). If the answer is yes, the index is used to
determine the physical location of the corresponding rows (i.e., the
two rows with PHONE_CUSTID_FN = 152). If no index is found on the
column, the whole table is scanned to find rows with appropriate
values.
A database index is similar to an index at the end of a book — it
stores pointers to the physical row locations on the disk in the same
way a book’s index points to the page numbers for the appropriate
topics. From another viewpoint, it is similar to a database table with
two or more columns: one for the row’s physical address, and the rest
for the indexed table columns. In other words, index tells the RDBMS
where to look for a specific table row (or a group of rows) on the disk
as illustrated on Figure 4-4.
As you can see, the table column and the index have the same set of
values; the main difference is that in the index, these values are
sorted, so it takes much less time to find what you need.
In most databases indexes are implemented as B-Tree indexes, that
is, they use the so called B-Tree algorithm that minimizes the number
of times the hard disk must be accessed to locate a desired record,
thereby speeding up the process. Because a disk drive has
mechanical parts, which read and write data far more slowly than
purely electronic media, it takes thousands of times longer to access a
data element from a hard disk as compared with accessing it from
RAM.
B-Trees save time by using nodes with many branches (called
children). The simplest version of a B-Tree is called a binary tree
because each node has only two children. Figure 4-5 illustrates a search for the value 100 using a binary tree. The algorithm is very
simple. Starting at the top, if the top node value is less than what you
are looking for, move to the left; if it’s greater than 100, go to the right,
until the value is found.
Indexes can be created to be either unique or nonunique. Unique
indexes are implicitly created on columns for which a PRIMARY KEY
or a UNIQUE constraint is specified. Duplicate values are not
permitted. Nonunique indexes can be created on any column or
combination of columns without any regard to duplicates.
Indexes can be created on one column or on multiple columns. The
latter can be useful if the columns are often used together in WHERE
clauses. For example, if some frequently used query looks for a
certain customer’s orders created on a certain date, you may create a
nonunique index on the ORDHDR_CUSTID_FN and
ORDHDR_CREATEDATE_D columns of the ORDER_HEADER table.
SQL99 does not specify any standards for indexes (or even require
their existence at all), but practically all database vendors provide
mechanisms to create indexes, because without them any production
database would be unbearably slow.
There is no universal rule on when to create indexes, but some
general recommendations can be given.
- It usually does not make much sense to create indexes on
small tables — they can degrade performance rather than
improve it. If the table is only, say, 50 rows long, it might be
faster to scan it than to use the B-Tree algorithm.
- On large tables, indexes should be created only if the queries
that involve indexed column(s) retrieve a small percentage of
rows (usually under 15 percent).
- Indexes are usually helpful on columns used in table joins.
(Primary keys and unique columns are indexed by default; it is
often not such a bad idea to index foreign key columns also.)
- Indexes slow down DML operations that involve indexed
columns — for example, if you update a value on such
column, the index column has also to be updated; and if you
insert a row, the corresponding index(es) may have to be re-
sorted. So if a table is likely to be subjected to frequent
updates, inserts, and/or deletes, it is recommended to have
fewer indexes.

CREATE INDEX statement
The CREATE INDEX statement differs slightly for different
implementations. The somewhat simplified syntax is below.
CREATE [UNIQUE | BITMAP]
INDEX [.] ON
[.] ({ |
}[ASC | DESC],…)
[];
This statement creates a unique index on the IDX_CUST_NAME on
the CUST_NAME_S column of the CUSTOMER table with column
values stored in descending order: CREATE UNIQUE INDEX
idx_cust_name ON CUSTOMER(cust_name_s DESC)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Views

A

The most common view definition describes it as a virtual table.
Database users can select rows and columns from a view, join it with
other views and tables, limit, sort, group the query results, and so on.
Actually, in most cases, users wouldn’t even know if they were
selecting values from a view or from a table. The main difference is
that, unlike tables, views do not take physical disk space. View
definitions are stored in RDBMS as compiled queries that dynamically
populate data to be used as virtual tables for users’ requests.
The details are implementation-specific — RDBMS can create a
temporary table behind the scene, populate it with actual rows, and
use it to return results from a user’s query. The database engine may
also combine a user’s query with an internal view definition (which is,
as you already know, also a query) and execute the resulting query to
return data, and so on — from a user’s viewpoint, it does not matter at
all.
Views are used in many different ways. For example, they can
combine data from multiple tables in a more user-friendly form or
enforce security rules by making available for users certain horizontal
or vertical slices of data (more about security in Chapter 12). In this
chapter, you’ll learn how to create different types of views.

CREATE VIEW statement
CREATE VIEW
[(,…)] AS [WITH [CASCADED |
LOCAL] CHECK OPTION]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DDL, DML, DCL understanding

A

Link:
https://www.geeksforgeeks.org/sql-ddl-dql-dml-dcl-tcl-commands/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

SQL data types

A

Strings
Generally all strings could be divided into character strings (to store
plain text) and binary strings which contain either machine code
(computer programs) or special binary instructions for other programs.

Character strings
A character string can be defined simply as a sequence of bytes. The
length of the string is the number of bytes in the sequence. A string of
zero length is called an empty string. It can be an equivalent to NULL
(special concept introduced at the end of this chapter) or not,
depending on implementation. SQL99 specifically differentiates
between empty strings and nulls.
All strings in SQL can be of fixed length or varying length. The
difference is quite simple, but sometimes not very easy to understand
for people with no technical background, so let us explain it in some
greater detail.

Fixed-length character strings
If you define string to be of a fixed length, the system preallocates the
desired number of bytes in memory and/or computer hard disk. It
does not matter if the actual value to be stored in that string is exactly
that many bytes, twice smaller, or just one character long — it is still
going to occupy the whole allocated space (unused bytes will be
padded with blank characters), so all strings will have exactly the
same length. For example, imagine you defined a column DATABASE
as having a character field of length 13, and now we want to store
three strings in that column: ORACLE, UDB2, and MS SQL SERVER.
Figure 3-1 illustrates the results of that operation. Note that all strings
are exactly 13 bytes long.

Character strings of varying length
If you define DATABASE column as a varying-length string with
maximum 13 characters to store, the picture will be different. The
actual memory or disk space required to hold our values is allocated
dynamically. As a result, all three strings will be of different length and
will require different numbers of bytes to hold them. String ORACLE
occupies 6 bytes, DB2 UDB — 4 bytes and MS SQL SERVER takes
maximum allowed 13 bytes.
Here is the general platform-independent recommendation: use a
fixed-length data type when your values are expected to be of the
same size, and a varying-length one for values when size is expected
to vary considerably. In the example above it rather looks logical to
use the varying-length strings, but if we need a column to store, say,
gender in form M or F, a fixed-length string is more appropriate.

National character strings
Even though English is a very popular language, it is not the only
language on Earth. And in spite of the fact that practically all major
software companies reside in the United States, the market dictates
its own rules. For example, Oracle is a very popular database vendor
around the world with customers in China, India, Korea, Germany,
France, Israel, Russia, Saudi Arabia, and many other countries.
Now we have a little bit of a problem. So far we’ve used terms
“character” and “byte” as synonyms. Byte is a computer term for a
unit of information storage that consists of 8 bits. Each bit can either
be 1 or 0 and the combination of 8 bits allows us to store 256 (28)
distinct values (or 256 different characters represented by numbers
from 0 to 255). That looks like a lot, but … not actually. We need
separate holders for uppercase and lowercase letters, punctuation
marks, digits, math symbols, and so on. That barely leaves space for
distinct characters used in other languages that employ the Latin
alphabet. And what about the ones which don’t? There are about
3,000 different languages in the world, dead and living, in addition to
constructed languages like J.R.R. Tolkien’s Quenya, Sindanin, or
Entish, and most of them have their own distinct alphabets!
The solution seems to be rather intuitive — use two bytes per
character instead of one. That allows 65,535 (216) distinct
combinations, which is enough to store all existing characters from
every major language on Earth.
SQL has two data types to store strings in national characters —
national character string and national character string of varying
length — that behave in exactly same way as previously described
character string and character string of varying length
correspondingly, but use the two-byte Unicode standard. So, if you
declared your DATABASE column as a national character field of size
13, it would still hold 13 characters, but would reserve 2 bytes for
each letter, for a total of 26 bytes. The difference is, now it can hold
the names from previous examples spelled in practically any
language, for example, in Russian. Figure 3-3 illustrates that.

SQL99 has two major character sets: CHARACTER and CHARACTERVARYING. In addition, there are also NATIONAL CHARACTER and
NATIONAL CHARACTER VARYING.
CHARACTER can also be abbreviated with CHAR. The size can
optionally be specified in the form CHARACTER(n). For example,
CHARACTER(15) can hold character strings up to 15 characters long.
If size is omitted, the default is 1. An error occurs if one tries to store a
string that is bigger than the size declared.
CHARACTER VARYING can be abbreviated with CHAR VARYING or
VARCHAR. You have to specify the maximum size for strings to be
stored, for example, CHARACTER VARYING(15) holds 15-character
strings, or smaller.
NATIONAL CHARACTER (NATIONAL CHAR, NCHAR, CHARACTER
CHARACTER SET ) specifies the default data type for
the country of implementation. This is a fixed-length character string
data type.
NATIONAL CHARACTER VARYING (NATIONAL CHAR VARYING,
NCHAR VARYING, CHARACTER VARYING CHARACTER SET <
char_set_name>, CHAR VARYING CHARACTER SET < char_set_name>)
is a varying-length country-specific character string data type.
CLOB is a new SQL99 data type to store large nondatabase-
structured text objects of varying size and complexity, such as
employees’ resumes, collections of papers, books, and other similar
data.

Binary strings
A binary string is a sequence of bytes in the same way that a
character string is, but unlike character strings that usually contain
information in the form of text, a binary string is used to hold
nontraditional data such as images, audio and video files, program
executables, and so on. Binary strings may be used for purposes
similar to those of character strings (e.g., to store documents in MS
Word format), but the two data types are not compatible; the
difference being like text and a photo of the same text. Binary string
data types are summarized in Table 3-2.

SQL99 has following data types to store binary strings: BIT, BIT
VARYING, and BLOB.
BIT is a fixed-length binary string somewhat similar to CHAR.
If you declare a column to be BIT(100), 100 bytes will be
allocated in memory/disk, and if the object you store is just 60
bytes, it’s still going to occupy all 100 bytes.
BIT VARYING is similar to VARCHAR — even if you specify
BIT VARYING(100) to be the data type lasting the previous
example, it will only take 60 bytes to store the object.
BLOB is a binary equivalent to CLOB.

Numbers
All numeric data could generally be divided into two categories: exact
numbers and approximate numbers.

Exact numbers
Exact numbers can either be whole integers (numeric primary keys,
quantities, such as number of items ordered, age) or have decimal
points (prices, weights, percentages). Numbers can be positive and
negative and have precision and scale. Precision determines the
maximum total number of decimal digits that can be stored (both to
the left and to the right of the decimal point). Scale specifies the
maximum number of decimals allowed. Exact numeric data types are
summarized in Table 3-3.

SQL99 specifies the following data types for exact numbers:
- INTEGER, SMALLINT, NUMERIC, DECIMAL (as well as some
synonyms found in Table 3-3).
- INTEGER represents countable numbers; its precision is
implementation-specific.
- SMALLINT is virtually same as INTEGER, but maximum
precision can be smaller than that for INTEGER.
- NUMERIC data type supports storage of numbers with
specific decimal component as well as whole numbers.
Optional scale specifies the number of decimal locations
supported.
- DECIMAL is very similar to NUMERIC. The only difference is
the precision (but not the scale) used by a vendor-specific
implementation can be greater than that used in declaration.

Approximate numbers
Approximate numbers are numbers that cannot be represented with
absolute precision (or don’t have a precise value). Approximate
numeric data types are summarized in Table 3-4.

Note A classic example is number p, which is usually
approximated to 3.14. The number was known in ancient
Babylon and Egypt some 4,500 years ago and has been a
matter of interest for mathematicians from Archimedes to
modern scientists. As of today, 206,158,430,208 (3 * 236)
decimal digits of p have been calculated. It would take
approximately forty million pages, or fifty thousand volumes
to store it in written form!

SQL99 specifies the following data types for approximate numbers:
- FLOAT, REAL, and DOUBLE PRECISION.
- FLOAT is to store floating-point numbers with precision
optionally specified by user.
- REAL is similar to FLOAT, but its precision is fixed.
- DOUBLE PRECISION is virtually the same as REAL, but with a
greater precision.

Date and Time Data Types
Handling dates and times is probably one of the most complicated
and inconsistent topics in SQL. According to our personal
experiences, operations with dates often cause confusion and even
frustration not only among database users, but in the developers’
community, too.

Introduction to complex data types
One of the problems is dates are not actually what they look like. So
far we were talking only about simple data types that store one value
per row. Date and time data types hold a number of elements (year,
day, month, hour, etc.). In programming, such data types are called
complex and are often represented as structures. When returned as a
database query result, date and time fields appear like strings, but in
fact they rather are parts of structures, similar to ones in the example
below. (We don’t use any specific programming language in this
example, but rather some kind of pseudocode.)
STRUCTURE DATE { YEAR
DECIMAL(4,0), MONTH DECIMAL(2,0), DAY DECIMAL(2,0) }
STRUCTURE TIME { HOUR
DECIMAL(2,0), MINUTE DECIMAL(2,0), SECOND DECIMAL(5,2) }
STRUCTURE DATETIME { YEAR
DECIMAL(4,0), MONTH DECIMAL(2,0), DAY DECIMAL(2,0), HOUR DECI
DECIMAL(2,0), SECOND DECIMAL(5,2) }

The displayed value just formats and concatenates the fields of this
structure. For example, for the YYYY/DD/MM format, the
pseudocode may look like this:
CONCAT(CAST(DATETIME.YEAR,
STRING), ‘/’, CAST(DATETIME.DAY, STRING), ‘/’, CAST(DATETIME.MONT
STRING))

These structures should also have some methods to handle situations
when users want to display dates and times in different formats, for
example to display time on the 12- or 24-hour scale, show day of
week for a certain date, display century, convert it into a different time
zone, and so on.
We’ll talk more about complex data types later in this chapter.

Date and time implementations
As we’ve mentioned before, date and time data types are mandated
by SQL99 and handled by different RDBMS implementations quite in
a different way. Date and time data types are summarized in Table 3-
5.

SQL99 supports DATE, TIME, TIMESTAMP, TIME WITH TIME
ZONE, and TIMESTAMP WITH TIME ZONE data types.
- DATE data type is a structure that consists of three elements:
year, month, and day. The year is a four-digit number that
allows values from 0000 through 9999; the month is a two-
digit element with values from 01 through 12; and the day is
another two-digit figure with range from 01 through 31. SQL99
does not have any strict rules on how to implement DATE
internally, so vendors can make their own decisions. One
vendor could choose something similar to the structures
above; others could implement characters, numbers with
different scale, and so on.
- TIME consists of hour, minute, and second components. The
hour is a number from 00 to 23, the minute is a two-digit
number from 00 to 59, and the second is either another
integer from 00 to 61 or a decimal number with scale of 5 and
precision of 3 that can hold values from 00.000 to 61.999.
Note The range of values for seconds greater than 59 is to
handle the representation of leap seconds,
occasionally added to Earth’s time. None of our
three major vendors has that feature implemented.
- TIMESTAMP is a combination of DATE and TIME data types
and includes year, month, day, hour, minute, and second.
- TIME WITH TIME ZONE is basically an improvement to the
TIME data type. It stores time zone information in addition to
standard TIME elements.
- TIMESTAMP WITH TIME ZONE is an extension to the
TIMESTAMP with information on time zone.

Object and User-Defined Data Types
You are already familiar with the concept of complex data types (date
and time), but their complexity is hidden from the users. Now, we are
going to talk about some complex data types that do require more
user involvement. Understanding these data types requires some
background in object-oriented programming and design.

In general, SQL99 defines two new groups of complex data types:
Abstract types (ADT) and collections.

ADT
ADT provides means to explicitly define a structural type within the
larger context of the database structure. It is a series of attribute
names paired with system data types plus optional methods to
manipulate those properties. ADT can virtually be anything one can
imagine and describe by the means of the language — a car, a
person, an invoice.

Collections can be of reference type, set type, and list type.

Reference type
Reference type is a special type through which an instance of another
type can be referenced. The reference values are read-only and
unique in database system catalog.

Set and list types
These types extend a field’s capabilities by storing a collection of
homogeneous data. For instance, we could make a good use of array
data type for a phone number field.

Sets and lists can consist of built-in data types (for example, array of
strings), abstract data types, reference data types, or named row data
types. The only difference between the two collection types is that the
list has to be ordered.

Other Data Types
We covered all the major SQL data types in detail. Now let’s briefly
describe a couple more that either are not implemented by any of the
major vendors or are implemented in such a way that the data type
name would not match SQL99 standards.

BOOLEAN
SQL99 has a special BOOLEAN data type with a range that includes
only two values: TRUE and FALSE. Oracle, DB2, and Microsoft SQL
Server don’t have a BOOLEAN data type. (Or, to be more precise,
DB2 has it, but for internal use only, i.e., you cannot declare a column
of type BOOLEAN.) But the BOOLEAN data type can be easily
simulated, for example by using a user-defined data type of type
VARCHAR that only allows FALSE and TRUE for its values.
This example illustrates how to do it in MS SQL Server:
CREATE RULE bool_rule AS @list
in (‘TRUE’, ‘FALSE’) sp_addtype BOOLEAN, ‘VARCHAR(5)’, ‘NULL’ sp_bindr
‘bool_rule’, ‘BOOLEAN’
Now you can use it just as another data type in your instance of MS
SQL Server.

ROWID
ROWID is a special Oracle data type to store unique addresses for
each row in the database. Tables can be created with ROWID
columns, but that’s not recommended.

UROWID
UROWID is similar to ROWID but used for index-organized tables.

BFILE
BFILE Oracle data type enables read-only access to binary files stored outside the Oracle database.

DATALINK
DATALINK is an DB2 data type to manage large objects in the form of
external files. The files can reside in a file system on the same server
or on a remote server. Internal database functions are used to
manipulate DATALINK columns.

BIT
BIT data type in MS SQL Server stores a bit of data (0 or 1) and does
not correspond to previously described SQL99 BIT. The literal value
for bit is a single character from its range optionally enclosed into
single quotes.

TIMESTAMP
TIMESTAMP data type in MS SQL Server is not the same as
TIMESTAMP SQL99 data type. You can only have one column per
table of type TIMESTAMP. It exposes automatically generated binary
numbers (unique within a database) and is basically used to uniquely
identify a database row in a manner similar to (but not identical to)
Oracle’s ROWID and primarily serves version control purposes. The
main difference between TIMESTAMP and ROWID is that the value of
a TIMESTAMP column gets updated every time the row is changed,
whereas Oracle’s ROWID is assigned to a row for as long as the row
exists.
Microsoft is planning to replace this data type with a ROWVERSION
data type in future releases for SQL99 compliance. Currently
ROWVERSION is a synonym to TIMESTAMP.The storage size of TIMESTAMP data type is 8 bytes; it is
semantically identical to binary(8).

NULL
NULL is a special database concept to represent the absence of
value. It is neither a zero nor an empty string, but a special character
that can be substituted for any data type. Nulls are usually used when
the value is unknown or meaningless. NULL columns can later be
updated with some real data.
For example, when a new employee is hired, he/she might neither yet
have the phone number nor be assigned to a department. In such
situation the NULL values are appropriate for PHONE and
DEPARTMENT columns.
Another situation is when a value is nonapplicable; like STATE field in
a European address. It can also be set to NULL.
The NULL value requires special handling and has to be dealt with
carefully. In fact, any operator that involves NULL as an operand also
returns NULL.

Caution NULL can cause you serious troubles if not used
properly. For example, imagine you have two columns in
your table and you want to calculate the difference
between them. If one of your columns has NULL values,
the result of your calculation is undefined (NULL), i.e.,
100 – 0 = 100, but 100 – NULL = NULL. That means you
have to use special mechanisms (discussed in Chapters
10 and 11) to handle this and similar situations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

SQL functions

A

SQL functions exist to make your life easier when you need to
manipulate data retrieved from a table. While SQL query, which is
composed of the statements, is busy retrieving some data for you, the
functions used within that query are validating, converting, calculating,
getting the system information, and much more.
Think of the SQL functions as tools designed to accomplish a single
well-defined task, for example, calculating square root or converting
lowercase letters into uppercase. You invoke a function within SQL
query by name (usually a single keyword). Some functions accept
arguments and some do not, but what differentiates a function from
every other executable module in RDBMS is that it always returns
value.
While SQL itself is not a procedural language — that is, it lacks
procedural features such as flow control structures and loops — using
functions allows you, to a certain extent, to alleviate problems
stemming from this deficiency.
All functions could be divided into two broad categories: deterministic
functions and nondeterministic functions. Deterministic functions
always return the same result if you pass into the same arguments;
nondeterministic functions might return different results, even if they
are called with exactly the same arguments. For example function
ABS, which returns the absolute value of a number passed to it as an
argument, is a deterministic function — no matter how many times
you call it with, say argument, -5, it will always return 5 as a result. For
example, the Microsoft SQL Server function GETDATE() — when it
accepts no arguments and returns only the current date and time on
the RDBMS server — is an example of a nondeterministic function:
each time you call it a new date and time is returned, even if the
difference is one second.
Some RDBMS restrict use of the nondeterministic function indatabase objects such as INDEX or VIEW. For example, the MS SQL
Server disallows use of such functions for indexed computed columns
and indexed views; the IBM DB2 UDB does not allow nondeterministic
functions in the join condition expression, and you cannot use these
function in Oracle’s function-based index.
Note Virtually every database vendor provides procedural
extensions for use with their database products. Oracle has
built-in PL/SQL, Microsoft uses its own dialect of Transact-
SQL, and DB2 UDB uses IBM SQL (which is similar to
Transact-SQL). Unlike SQL, these procedural extensions
allow for creating full-fledged programs within their
respective host environments. User-defined custom
functions are usually created using one of their procedural
languages.
The list of SQL functions available for use within a particular RDBMS
implementation grows with every new release, and some vendors are
allowing users to define their own custom functions to perform
nonstandard tasks. In this chapter we provide only a short list of the
most helpful functions and their uses. Differences between vendor-
specific implementations are highlighted.
Note The portability problem with using functions in a query is the
possibility that the query might not run properly with
competitors’ products. Some functions are identical in name
and usage, some have only a different name, and some
exist only within a particular RDBMS implementation.

Not all of these functions (some would say most of them are not) are
part of the SQL standard — be it SQL89 (SQL1), SQL92 (SQL2), or
even SQL99 (SQL3). In fact, all of these standards specify only a
handful of functions as a requirement for conformance to a specific
level (entry, intermediate, or full). The old saying that you cannot
program a standard still keeps true. The list of the functions specified
in the SQL2 standard is given in Table 10-1.

BIT_LENGTH
(expression) Returns the length of the
expression, usually string, in bits.
CAST (value AS data
type) Converts supplied value from one
data type into another compatible
data type.
CHAR_LENGTH
(expression) Returns the length of the
expression, usually string, in
characters.
CONVERT (expression
USING conversion) Returns string converted according
to the rules specified in the
conversion parameter.
CURRENT_DATE Returns current date of the system.
CURRENT_TIME
(precision) Returns current time of the system,
of the specified precision.
CURRENT_TIMESTAMP
(precision) Returns current time and the
current date of the system, of the
specified precision.
EXTRACT (part FROM
expression) Extracts specified named part of
the expression.
LOWER (expression) Converts character string from
uppercase (or mixed case) into
lowercase letters.
OCTET_LENGTH
(expression) Returns the length of the
expression in bytes (each byte
containing 8 bits).
POSITION (char
expression IN source) Returns position of the char
expression in the source.
SUBSTRING (string
expression, start,
length) Returns the string part of a string
expression, from the start position
up to specified length.TRANSLATE (string
expression USING
translation rule) Returns string translated into
another string according to
specified rules.
TRIM(LEADING |
TRAILING | BOTH char
expression FROM string
expression) Returns string from a string
expression where leading, trailing,
or both char expression characters
are removed.
UPPER (expression) Converts character string from
lowercase (or mixed case) into
uppercase letters.

Every vendor has its own classifications of the functions supported in
its database product. IBM groups its DB2 UDB functions into column
functions, scalar functions, row functions and table functions; whereas
Oracle uses terms like single-row functions, aggregate functions,
analytic functions, and object-reference functions; and Microsoft
sports the most detailed classifications of configuration functions,
cursor, date and time, mathematical functions, aggregate functions,
metadata, security, string, system functions, and system statistical
functions, as well as text and image functions, and rowset functions.
Note IBM makes a distinction between SYSIBM and SYSFUN
schema functions; they differ in the way they handle data; for
example, sometimes one is handling numeric input while the
other handling character input. Consider it an IBM
implementation of function overloading where a different task is expected of a function based on the argument data
types passed into the function.

Numeric functions
We grouped some functions into numeric functions because they are
operating with numbers — both input and output parameters are
usually numeric: INTEGER, DECIMAL, and so on. The list of most
common numeric functions is given in Table 10-2.
CEIL
By definition, function CEIL returns the least integer of the argument
passed; that means that the function rounds the values up. The
syntax and usage are identical for all three databases.
The argument could be a number or a column value
ROUND
One might wonder how that is different from the TRUNC function. This
function rounds a number to a specific length or precision, and works
almost identically in all three RDBMS implementations.
TRUNC
Function TRUNC returns its argument truncated to the number of
decimal places specified with the second argument.
RAND
The RAND function is used to generate some random numbers at
runtime.
SIGN
The SIGN function works exactly the same way in all three
implementations. It is used to determine the sign of the numeric
expression argument: if the number is positive, then the function
returns 1; if the number is negative (the result will be -1, if the
argument is zero), then 0 is returned.

String functions
String functions are grouped together because they perform some
operations specifically pertaining to strings characters — that is,
manipulation of the strings: changing the letter case, changing
alignment, finding ASCII codes, and so on. Usually, but not always the
output of such functions is a string. Some of the most common string
functions are listed in Table 10-3.
CONCAT
The CONCAT function simply concatenates two strings. This function
could be replaced with an operator — + for SQL Server, and || for
Oracle and DB2 UDB.
CHARINDEX, INSTR, LOCATE, and POSSTR
SQL is a language specifically designed to handle information. As
such it has a rich toolset for manipulating strings and characters. The
three functions INSTR, LOCATE, and CHARINDEX are used to
determine the position of a specific character (or combination of
characters) within a string; based on this information, you can slice
and dice text information in a number of ways.
For example, to locate the position of the end of the first word, use a
blank space to separate the words in the description (assuming that
every value in the column PROD_DESCRIPTION_S would have at
least one blank space).
SUBSTR and SUBSTRING
The SUBSTR (SUBSTRING on MS SQL Server) function returns part
of an argument string, which is designated by starting position and
required character length.
LENGTH
The function LENGTH (LEN for MS SQL Server) returns a number of
characters (not a number of bytes! — see Chapter 3 for more details)
in the argument. If an argument is not of a character type, it will be
implicitly converted into string, and its length will be returned.
LOWER and UPPER
The functions LOWER and UPPER are the rare examples of functions
mandated by the SQL92/99 standard and implemented across all
three RDBMS without modifications. These functions are simple and
intuitive to use. They convert string expressions into lowercase or
uppercase, respectively
TO_CHAR, CHAR, and STR
These functions fall into the broader range of Conversion Functions,
which are discussed later in this chapter. They are used to convertone data type into character data type, for example, a number into a
string or date/time data into a string (this might be needed to produce
a report that accepts character data only). In addition, these functions
allow you to format output when used to convert, for instance, date
and time data types.
The usage examples are shown in the CONCAT function above. Since
this function accepts only strings, it is necessary to convert all the
numbers into strings to avoid an error in IBM DB2 UDB; Oracle 9i
implicitly converts all concatenated values into strings.
REPLACE
The REPLACE function found in IBM DB2 UDB, Oracle, and MS SQL
Server returns a string (CHAR data type); every occurrence of an
expression2 is replaced with expression3
REPLICATE and REPEAT
To replicate a character or sequence of characters you may use the
REPLICATE and REPEAT functions. These functions pertain to IBM
DB2 UDB and the Microsoft SQL Server; in Oracle similar functionality
is achieved with a combination of functions LPAD/RPAD.
TRANSLATE
This is a smart version of the REPLACE function. It uses pattern
matching to find and replace characters within a string; the following
query replaces all numbers (from 0 through 9) with 0, and all letters —
except K — with an asterisk (*); the letter K is replaced with X
TRIM, LTRIM, and RTRIM
Although it might not be apparent, blank spaces could be a major
concern. Usually, blank spaces are not shown in the user interface
when typing in some character value, and that could easily cause
mistakes. RDBMS requires absolute precision — the string ‘user’ and
the string ‘user ‘ (with a trailing blank space) are never the same.
These functions act similarly in all three RDBMS: they remove leading
and/or trailing characters from a string expression. The main
difference is that Oracle 9i supports more of the SQL standard syntax
(TRIM in addition to LTRIM and RTRIM), and allows for trimming
characters other than blank spaces; while IBM DB2 UDB and the
Microsoft SQL Server use this function for blank spaces only.

Date and time functions
The functions grouped in Table 10-4 deal with date and time; they
accept a wide range of parameter data types and produce output
strings (characters), date/times, and numbers.
GETDATE, SYSDATE, and CURRENT DATE
Keeping a time track of the changes in the database requires access
to the system’s date and time settings. Oracle implemented SYSDATE
pseudo column (which can be considered a function for our purposes)
that returns the system’s current date and time, the Microsoft SQL
Server has function GETDATE (), and IBM DB2 UDB uses a
CURRENT DATE clause in the SELECT part of the query.
Time zone functions
These functions deal with the Earth’s different time zones. The
“standard” functions always return the time zone in which the machineis located, and sometimes — especially when telecommuting to a
central location from a different time zone — it is not what is needed.
The MS SQL server function GETUTCDATE returns current UTC time
(Universal Time Coordinate or Greenwich Mean Time). Consider the
following query, which returns results from both the GETDATE and
GETUTCDATE functions
ADD_MONTHS and DATEADD
The MS SQL Server DATEADD function returns a new datetime value
calculated by adding a specified date part on top of the date
argument. It is not as straightforward as it may seem.
EXTRACT and DATEPART
Oracle’s EXTRACT function returns the value of a specified part of a
date/time expression. The acceptable parts are YEAR, MONTH, DAY,
HOUR, MINUTE, and SECOND. It also allows you to specify
TIMEZONE_HOUR, TIMEZONE_MINUTE, TIMEZONE_REGION, and
TIMEZONE_ABBR (for ‘abbreviation’); these are added to
accommodate time zone differences.
DAYNAME, MONTHNAME, and DATENAME
DB2 UDB and the SQL Server have special functions to return the
name of the part of the date.
MONTHS_BETWEEN and DATEDIFF
The Oracle function MONTHS_BETWEEN returns the number of themonths between two dates: if the first date is later than the second,
the result is a positive number; if the first date is earlier than the
second, the returned number will be negative.

Aggregate functions
While aggregate functions (listed in Table 10-7) logically belong with
the SELECT statement discussed in Chapter 8 and are thoroughly
covered there, it is appropriate to include a brief description in this
chapter for reference.
Aggregate functions return a single value based on a specific
calculation within a set (group) of values; usually they are tied to the
GROUP BY clause of the SELECT statement, though it is not a
requirement for some of them. When used with a GROUP BY clause,
each aggregate function produces a single value for each group, not
to the whole table.
SUM
The SUM function sums up all the values in the specified column. If
you, for example, needed to know the total amount of all your sales,this query would bring the answer:
SELECT SUM(total_price)
total_sale FROM v_customer_totals; total_sale —————
457000.40
COUNT
This function returns the total number of records in the table/view.
AVG
This function produces average value of a group of numeric values.
MIN and MAX
These functions select minimum and maximum values from the list of
values in the column.

Conversion functions
Sometimes it is necessary to convert one data type into another. In
the examples with CONCAT function, we had to convert numbers into
string before being able to concatenate then with other string values.
As it becoming clearer that English is not the only language on Earth,
there is ever-increasing demand for national characters databases:
conversion functions provide translation for data so it could be
correctly represented in the character set of the native alphabets.
Some of the most common conversion functions are listed in Table
10-8.
Conversion between different data types
There are two general functions that perform this type of conversion:
CAST and CONVERT. These functions convert one data type into
another. The function CAST is used almost identically across all threeRDBMS.

Miscellaneous functions
With every classification there are always some functions that do not
fit into a single well-defined category.
DECODE and CASE
CASE is an SQL99 keyword that is implemented as a DECODE
function in Oracle. Microsoft and DB2 UDB allow the CASE statement
to be used with a standard SELECT statement, and DB2 UDB also
uses it as a part of its procedural extension to SQL.
Oracle’s DECODE function allows you to modify the output of the
SELECT statement depending on certain conditions. It compares an
expression (usually a column value) to each search value one by one.
If a match is found, the function returns the corresponding result,
otherwise it returns the default value; if no match is found and no
default specified, the function returns NULL. In addition to DECODE,
Oracle 9i also has a CASE statement that is identical in usage to that
of the other RDBMS discussed.
The CASE statement produces similar results though using somewhat
different syntax — and no function is involved.
COALESCE and NULLIF
These two functions are special cases of the Oracle DECODE function
and the CASE expression, dealing with NULL values exclusively; they
are found in all three databases.
Note IBM DB2 UDB also has a VALUE function that operates in
exactly the same way as COALESCE; Oracle’s function
VALUE has nothing to do with NULLs and belongs to the
object-oriented features domain.
The function NULLIF compares two expressions; if they are equal, it
returns NULL, otherwise it returns the first expression. For example,
in our CUSTOMER table we not only have CUST_CREDHOLD_S
information but also a CUST_STATUS_S column that tells us whether
or not this customer is active. If CUST_CREDHOLD_S and
CUST_STATUS_S s contain the same values, we return NULL,
otherwise CUST_CREDHOLD_S value. Such a technique might be
useful for discovering discrepancies in the company rules. If customer
credit is on hold (N), then its status also has to be N;, if NULLs are
detected, the conflict must be resolved manually by one of the
managers
The COALESCE function takes the principle of NULLIF a step further
— it returns the very first argument on the list that is not NULL. For
example, you’ve devised several methods for your customers to pay
their bills — credit account, direct deposit, and mail-in check. If you
are about to send a customer a bill but do not remember which
method was approved for this particular customer, this query might
help
NVL, NVL2, and ISNULL
These functions are used to detect NULLs in the fetched values and
take action by replacing NULL values with non-NULL values. NVL is
Oracle’s equivalent of the SQL Server’s ISNULL function; DB2 UDB
does not have NULL detection functions, but you may use the
COALESCE function to achieve similar results.

User-Defined Functions
User-defined functions extend the built-in functionality provided with
RDBMS products. They are not a part of the SQL standard, and as
such are out of the scope of this book. The syntax for creating a
function within RDBMS (the CREATE FUNCTION) is similar across all
three vendors; it is created just as any other RDBMS object.
Although not every vendor provides an ability to add custom functions,
all “big three” RDBMS vendors provide this capability with their
RDBMS, thus enabling code reuse (the same functions could be
called by many users/programs).
The user-defined functions are usually implemented in a procedurallanguage, which includes Java and PL/SQL for Oracle, Transact-SQL
and C (for extended stored procedures) for the Microsoft SQL Server
2000, and Java and IBM SQL for IBM DB2 UDB 8.1. The syntax
vastly differs among these RDBMS.
In addition to the functions created and maintained within RDBMS, all
three vendors provide scripts and executables capabilities for calling
external objects maintained by the operating system. The
mechanisms for invoking external programs are different, but the
principle is the same — provide an ability to communicate with outside
OS, without restrictions imposed by RDBMS framework. Such
programs could be written in any language supported by OS.
The MS SQL Server 2000 also has the ability to invoke and execute
OLE (ActiveX) objects within SQL code inside stored procedures via a
special set of system-stored procedures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

SQL Operators

A

Operators in SQL are defined as symbols and keywords that are used
to specify an action to be performed on one or more expression called
operands or arguments.
There are two general types of operators
- Unary Operators. Applied to only one operand at the time;
the typical format is .
- Binary Operators. Applied to two operands at the time; they
usually appear in format .

Arithmetic Operators
These operators, just as the name implies, are used for arithmetic
computations. The use of the arithmetic operators is very intuitive
(assuming that one did not flunk elementary school), and they can be
used in virtually every clause of the SQL statement. The full list of
arithmetic operators is given in Table 11-1.
1. +
Addition; adds two numbers or — in the case of
MS SQL Server — also concatenates strings.
With this exception, the usage is identical across
all three databases. Only MS SQL Server
overloads the operator, using it both for
concatenation and addition.
2. -
Subtraction; subtracts one numeric value from
another. The usage is identical across all three
databases.
It is also used as a sign identity or unary negation
operator.
3. *
Multiplication; multiplies one number by another.
The usage is identical across all three databases.
4. /
Division; divides one number by another. The
usage is identical across all three databases.
5. ||
Concatenation operator; concatenates character
strings; valid for Oracle and IBM DB2 UDB only.
6. %
Modulo; calculates integer remainder of a division.
This is an MS SQL Server-only operator. The
functionality of this operator is represented by the
MOD function in both Oracle and IBM DB2 UDB.

Here are several examples of arithmetic operator usage. To add two
values in Oracle, the following query could be used:
SELECT 5 + 5 total_value FROM
dual; total_value ———– 10
The resulting TOTAL_VALUE is of a numeric data type; if, instead of
the addition operator, the concatenation operator is used, the result
would be quite different:
SELECT 5 || 5 total_value FROM
dual; total_value ———– 55

Logical Operators
These operators are used to evaluate some set of conditions, and the
returned result is always of value of TRUE, FALSE or “unknown.”
1. ALL
Evaluates to TRUE if all of a set of comparisons
are TRUE.
2. AND
Evaluates to TRUE if both Boolean expressions
are TRUE.
3. ANY
Evaluates to TRUE if any one of a set of
comparisons are TRUE.
4. BETWEEN
Evaluates to TRUE if the operand is within a
range.
5. EXISTS
Evaluates to TRUE if a subquery contains any
rows.
6. IN
Evaluates to TRUE if the operand is equal to
one of a list of expressions.
7. LIKE
Evaluates to TRUE if the operand matches a
pattern.
8. NOT
Reverses the value of any other Boolean
operator.
9. OR
Evaluates to TRUE if either Boolean expressionis TRUE.
10. SOME
Evaluates to TRUE if some of a set of
comparisons are TRUE.

Operator Precedence
Precedence represents the order in which operators from the same
expression are being evaluated. When several operators are used
together, the operators with higher precedence are evaluated before
those with the lower precedence. In general, the operators’
precedence follows the same rules as in the high school math, which
might be somewhat counterintuitive. The order of the precedence is
indicated in Table 11-4.
1. Unary operators, bitwise NOT (MS
SQL Server only)
2. Multiplication and division
3. Addition, subtraction, and
concatenation
4. SQL conditions

Assignment Operator
The assignment operator is one of the most intuitive to use. It assigns
a value to a variable. The only confusion in using this operator could
stem from its overloading. All RDBMS overload this operator with an
additional function — comparison — in the SQL.
The equals operator (=) is used as an assignment in the following
SQL query that updates the price (PROD_PRICE_N) column in the
PRODUCT table, raising the existing prices by 2 percent:
UPDATE product SET prod_price_n
= prod_price_n * 1.02 (10 row(s) affected)

Comparison Operators
Comparison operators are used to compare two or more values. They
are usually found in the WHERE clause of a SELECT statement,
though you may use them in any valid SQL expression.
The usage is identical across all three databases except for the
nonstandard operators !< and !> — they are recognized by IBM DB2
UDB 8.1 and MS SQL Server 2000, but are excluded from Oracle 9i.
The nonstandard not equal to operator, !=, could be used in all three
dialects. Table 11-5 lists all comparison operators.
= Equals
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
<> Not equal to
!= Not equal to[]
!< Not less than[
]
!> Not greater than[*]

Bitwise Operators
Bitwise operators perform bit operations on integer data types; all
bitwise operators introduced into SQL by RDBMS vendors are listed in
Table 11-6. To understand the results of the bitwise operations one
must understand the basics of Boolean algebra.
& Bitwise AND
| Bitwise OR
/\ Bitwise exclusive OR
~ Bitwise NOT

User-defined Operators
Oracle 9i allows you to create user-defined operators. They are
created using the CREATE OPERATOR statement and are identified
by names rather than by single characters. Once a user-defined
operator is created it can be used just like any of the built-in operators
with the notable exception that one has to have EXECUTE privilege to
this object.
User-defined operators in Oracle 9i behave very similarly to user-
defined functions, because Oracle takes sets of operands for input
and returns values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data manipulation (insert, update, delete)

A

INSERT: Populating Tables with Data
As you know, tables in a relational database denote entities — or at
least they should. For example, each row in the CUSTOMER table
holds information about a specific customer; a row in
ORDER_HEADER represents a definite order, and so on. Usually, the
appearance of a new “real-life” entity calls for inserting a new row. For
example, you would need a new row in CUSTOMER table if ACME,
Inc. obtained a new customer; you need to insert a row into
ORDER_HEADER table when a customer makes an order; a new row
has to be added to the PRODUCT table if ACME starts selling a new
product, and so on.
The INSERT statement is used to add rows to a table, either directly or
through an updateable view. The syntax differs slightly among SQL99,
Oracle 9i, DB2 UDB 8.1, and MS SQL Server 2000, but it is possible
to come up with some kind of a generic INSERT syntax that works
with all our “big three” databases: INSERT INTO
[(,…)] { {VALUES ( |
| NULL | DEFAULT,…)} | {} }
We are going to concentrate on the generic INSERT functionality first,
and then describe some of its SQL99 and vendor-specific only
features.

Common INSERT statement clauses
One would typically insert values into one table at a time within one
INSERT statement. (Oracle 9i allows you to perform multitable inserts,
but that is rather advanced functionality not covered in this book.) The
name of the table is provided in the table_or_view_name clause. (An
updateable view name can be given instead of a table name.) The
column_name clause is optional; it determines the order in which
column values of the row to be inserted are populated. It also allows
you to skip values for columns you don’t want to populate at that
particular moment; such columns would either have NULL values or
would be populated with column defaults specified with a CREATE(ALTER) TABLE statement.
Caution You cannot skip any NOT NULL columns in your
column_name clause; otherwise the RDBMS will give you
an error.
The values to insert can either be specified using the VALUES clause
or the resulting set from the select_statement clause (also called
subquery). In the first case, usually only one row is inserted
(exception is DB2 where you can specify multiple VALUES clauses
within one INSERT statement; more about that in this chapter’s section
about DB2); in the second case, the RDBMS inserts as many rows as
were returned by the select_statement — it can be zero, one, ten, or
one thousand.
You have to list values for all table columns in the VALUES clause in
the exact same order they are specified in the table definition if the
column_name list was omitted.
If the column_name list is present, you have to specify a corresponding
value for each column listed.

Inserting values for specified columns
The situation when you want to insert a row with NULL values for
certain columns is not unusual. As you know, NULL is used when
value is unknown or nonapplicable. For example, suppose you know
ACME starts selling a new product SPRUCE LUMBER 30 ×40 ×50,
but certain properties of this product (price and weight) are still
unknown. We can add a record to the PRODUCT table using thefollowing INSERT statement: INSERT INTO product ( prod_id_n,
prod_num_s, prod_description_s, prod_status_s, prod_brand_s,
prod_pltwid_n, prod_pltlen_n ) VALUES ( 990, ‘990’, ‘SPRUCE
LUMBER 30X40X50’, ‘N’, ‘SPRUCE LUMBER’, 4, 6 )

Inserting values for all columns
Assuming you have all the necessary values and want to populate all
columns by the time you are ready to insert a row, you have a choice.
You can either still list all column names with corresponding values, or
the column_name clause can be completely skipped.

Inserting NULL and default values explicitly
In the examples above we insert literals; when a column name is
skipped, the column is populated with NULL or with the column
default value implicitly. But sometimes you may want to explicitly
insert a NULL value into a column or make the RDBMS use the
column’s default value. That can be accomplished by using keywords
NULL or DEFAULT, correspondingly.

Inserting values selected from other tables
A typical (but not the only) situation when you may want to insert
values selected from other tables is when archiving on a periodic
basis. For example, a table that holds shipments for a large company
can grow dramatically over long periods. Assume a company that has
a business rule stating it usually does not need any information about
shipments that are older than 180 days on a regular basis, but still
may need the old shipment data occasionally for special cases like
auditing, etc. To improve performance, we can create the table
SHIPMENT_ARCHIVE with exactly the same columns as in
SHIPMENT (see Chapter 4 for details), and then, say once a month,
insert into SHIPMENT_ARCHIVE all rows from SHIPMENT that are
older than 180 days. The old records can then be removed from the
SHIPMENT table using the DELETE statement (discussed later in this
chapter). This statement archives shipment records older than 180
days using Oracle syntax:
INSERT INTO shipment_archive
SELECT *
FROM shipment
WHERE TRUNC(shipment_createdate_d) < TRUNC(SYSDATE) - 180;

INSERT statement and integrity constraints
Inserting rows into a table obeys certain rules and restrictions. For
example, all column values have to be of same or at least compatible
data types and sizes with corresponding column definitions. There are
some implementation-specific variations — for example, Oracle
performs implicit conversions whenever possible (from character
string data types to numeric, from dates to strings, etc.), and in DB2
you always have to explicitly convert values to a compatible data type
— but in general there is no RDBMS that would allow you to insert a
customer name into a numeric or date column. An error will be
generated and the whole row (or even multiple rows) is rejected.

UPDATE: Modifying Table Data
The UPDATE statement serves the purpose of modifying existing
database information. We can emphasize two general situations when
we need to change data.
Somevimes when you insert rows into a table, you don’t know all
information yet (that’s where NULL values come in handy); later on,
when the information becomes available, you can update the
appropriate row(s). For example, you may want to create a new
customer before you know who the customer’s salesperson is going
to be, or generate a new order line entry with an unknown shipped
quantity. (There is no way to know what this quantity is before the
order is actually shipped.) Another case when you may need to
change database information is when you have to reflect some
changes in the “real world.” For example, a customer could cancel an
order, so you would have to change the order status from COMPLETE
to CANCELLED; a customer might accumulate “bad debt,” so you
would want to put his credit on hold. (In the ACME database that
would mean to change CUST_CREDHOLD_S field to Y.) The UPDATE
statement is used to modify table data; again, as with the INSERT
statement discussed earlier, either directly or through an updateable
view. Here is the generic syntax for our “big three” databases:
UPDATE SET { = |
| () | NULL |
DEFAULT,…} [WHERE ]
The UPDATE statement allows you to update one table at a time.
Other than that, it provides great flexibility on what set of values you
are updating. You could update single or multiple columns of a single
row or multiple rows, or (though it is rarely employed) you could
update each and every column of all table rows. The granularity is
determined by different clauses of the UPDATE statement.
Note The situation when no rows were updated because there
were no rows in the table that satisfied the WHERE clause
condition is not considered an error by RDBMS. The same
stands for the DELETE statement (discussed in the nextsection). When no rows satisfy the WHERE clause, no rows
are updated and no error is generated.
We are now going to discuss the most common clauses of the generic
UPDATE.

Common UPDATE statement clauses
The name of the table (or an updateable view) to be updated is
provided in the table_or_view_name clause.
In the SET clause, you specify the name of the column to update and
the new value to be assigned to it. You can specify multiple
corresponding column/value pairs separated by commas. The
assignment values could themselves be the same as in the VALUES
clause of the INSERT statement (i.e., literals, expressions, nulls,
defaults, etc.).
The WHERE clause sets your “horizontal” limits — if in the SET clause
you specified what columns to update, now you have to define a
condition upon which some rows need to be updated.

Updating a single column of a single row
One of the most common update situations is when you need to
modify just one column of a single row. Assigning a salesperson to a
recently created customer, canceling an order, changing a product
price — these are all examples of such a procedure. The following
example assigns a price to product 990 that we previously created
when we discussed INSERT in this chapter: UPDATE product SET
prod_price_n = 18.24 WHERE prod_id_n = 990

Updating multiple columns
Sometimes you might want to update more than one column within
one UPDATE statement. For example, imagine the manufacturer has
changed the packaging for its product 990, so the dimensions are now
5 ×7 instead of 4 ×6. This update statement synchronizes the
database information with the real-world change: UPDATE product
SET prod_pltwid_n = 5, prod_pltlen_n = 7 WHERE prod_id_n = 990

Updating a column in all rows
Even though updating all table rows is not very typical (and often
undesirable), sometimes you might want to perform such an
operation. Giving all employees a 5 percent raise, inactivating all
customers, setting all column values to NULL — these are a few
common examples. As you could have noticed, the keyword here is”all.” In other words, we would only want to omit the WHERE clause
intentionally if we wanted to update each and every single row in the
target table.
The UPDATE statement below increases all product prices by 10
percent (ACME, Inc. struggles with the increased operation costs):
UPDATE product SET prod_price_n = prod_price_n * 1.1

Updating column using a single-row subquery
You can use the result of a SELECT statement (subquery) as an
assignment value in an UPDATE statement. The main thing to
remember is that your subquery must return no more than one row. (If
no rows are returned, the NULL value will be assigned to the target
column.) Also, according to SQL99 standards, only one expression
can be specified in the select list.

Deriving the assignment value from another value
There are many situations when using a subquery as an assignment
value is beneficial. For example, you want to change the payment
terms for order 30670 to be N21531 in the ACME database. The
problem is, in our relational database we do not store the actual value
N21531 in an ORDER_HEADER table column; instead, we use the
foreign key, which is a meaningless integer, from the
PAYMENT_TERMS table. Using a subquery helps us to accomplishthe task: UPDATE order_header SET ordhdr_payterms_fn =
(SELECT payterms_id_n FROM payment_terms WHERE
payterms_code_s = ‘N21531’) WHERE ordhdr_id_n = 30670

Update with correlated subquery
The previous example was relatively straightforward — you derived
the value you needed for the update from another given value. But
sometimes conditions are more complicated. For example, imagine
that ACME’s business rules have changed and no longer allow orders
to have payment terms different from the default payment terms of a
customer who placed the order. You can relate (join) a table from the
UPDATE clause with tables specified in the assignment subquery —
that pretty much overrides the “single-row” rule because the
assignment will be done on a row-by-row basis: UPDATE
order_header SET ordhdr_payterms_fn = (SELECT payterms_id_n
FROM payment_terms, customer WHERE payterms_id_n =
cust_paytermsid_fn AND ordhdr_custid_fn = cust_id_n)
The very last line of this syntax joins the ORDHDR_CUSTID_FN field
of the ORDER_HEADER table (UPDATE statement) with the
CUST_ID_N table of the CUSTOMER table (nested subquery); in other
words, the customer id field is the link between the UPDATEstatement and the subquery that correlates them.

UPDATE statement and integrity constraints
Updating table rows obeys rules and restrictions similar to ones with
INSERT statement. All column values have to be of the same or
compatible data types and sizes with corresponding column
definitions and no integrity constraints should be violated. There is a
slight difference in behavior with the referential integrity constraints —
when constraint is specified with ON UPDATE CASCADE or ON
UPDATE SET NULL, RDBMS successfully performs an update of the
target table; child tables’ columns are also updated with the
corresponding values.

DELETE: Removing Data from Table
Believe it or not, there is such a thing as too much data. Since
computerized databases were introduced, humankind had
accumulated pentabytes of data. We are drowning in it, and DELETE
provides a way to get rid of the information that is no longer needed.
Deleting database rows is usually necessary when the entity these
rows correspond to becomes irrelevant or completely disappears from
the real world. For example, an employee quits, a customer does not
make orders any more, or shipment information is no longer needed.
Note
Quite often the entity information is not removed from the
database right away. When an employee quits, the HR
department still keeps the historical information about this
employee; when a customer is inactive for a long time, it is
sometimes more logical to “inactivate” him rather than delete
— in the relational database deleting a customer usually
involves much more than just removing one CUSTOMER
table record — it would normally have referential integrity
constraints from other tables, the other tables in their order
would be referenced by more tables, and so on. There is a
way to simplify the process by using the ON DELETE
CASCADE clause, but that’s not always exactly what you
want. Even when you don’t need, say, information for a
certain customer any more, you still may want to be able to
access the data about orders and invoices for this customer,
and so on.
You may also want to delete rows when they were inserted by mistake
— for example, an order was taken twice, or a duplicate customer
record was created. Situations like those are not atypical at all,
especially for large companies where dozens of clerks take orders
and create new customers.
Tip
Good database design can help to reduce the number of
human errors. For example, putting unique constraint on the
customer name field could help in preventing duplicate
customers (not completely foolproof, though, becauseRDBMS would still treat “ACME, INC.” and “ACME INC.” as
two distinct strings).
The DELETE statement removes rows from a single table (either
directly or through an updateable view). The generalized syntax is
DELETE FROM WHERE
DELETE removes rows from one table at a time. You can delete one
or many rows using a single DELETE statement; when no rows in the
table satisfy the condition specified in the WHERE clause, no rows are
deleted, and no error message is generated.

Common DELETE statement clauses
The DELETE statement is probably the simplest out of all DML
statements. All you need to specify is the table you want to remove
rows from and (optionally) upon what criteria the rows are to be
deleted. The syntax simplicity should not mislead you — DELETE
statements can be quite complicated and require caution. If the
WHERE clause is omitted or malformed, valuable information could be
deleted from the target table. Quite often the results are not exactly
what you wanted, and the data restoration process could be painful
and time consuming.
The statement below deletes a salesman record from the SALESMAN
table: DELETE FROM salesman WHERE salesman_code_s = ‘02’
This statement deletes all records from PHONE table: DELETE FROM
phone

DELETE statement and integrity constraints
The DELETE statement is not as restrictive as INSERT and UPDATE
in terms of integrity constraints. PRIMARY KEY, UNIQUE, NOTNULL, or CHECK constraints would not prevent you from deleting a
row. The referential integrity constraints are a different story — you
would not be able to delete a row that contains a column referenced
by another column unless the referential integrity constraint has the
ON DELETE CASCADE option (SQL99 standard implemented by all
“big three” RDBMS vendors). In that case DELETE would succeed; all
rows in any tables that referenced the constrained column would also
be deleted. This behavior can be extremely dangerous, especially in
combination with a badly constructed WHERE clause, so it is
considered to be a good practice to use ON DELETE CASCADE with
care. Imagine the situation where you have a table CUSTOMER
referenced by a table ORDER, which is its order referenced by a table
SHIPMENT. If ON CASCADE DELETE is used in both relationships,
when you delete a customer record, all related order and shipment
records are also gone. Just imagine what would happen if you also
skipped the WHERE clause! Figure 6-3 illustrates this example.

Using subqueries in DELETE statement WHERE
clause
Similarly to UPDATE statement, in addition to comparison
operators, literals, and expressions, the WHERE clause in DELETE
statements can contain a subquery to allow the selection of rows to
be deleted based on data from other tables. The idea is very similar
to one explained in section about the SET clause of the insert value
— using a subquery you derive value(s) based on some known
value(s). For example, you want to delete all orders for customer
WILE SEAL CORP., but we don’t know the value of its primary key
(which is a foreign key in the ORDER_HEADER). You can
accomplish the task using the appropriate subquery in the WHERE
clause:
DELETE FROM order_header
WHERE ordhdr_custid_fn =
(SELECT cust_id_n
FROM customer
WHERE cust_name_s = ‘WILE SEAL CORP.’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Retrieving data (simple select statement)

A

This chapter covers selecting data from the RDBMS tables using the
SELECT statement. As the name implies it deals with selecting data
from the RDBMS objects — tables or views — either to be presented
to the users, or for some internal purpose. This is the only statement
of the data query language (DQL) group.
The use of this statement within a SELECT query is relatively simple,
but the SELECT statement rarely executes without clauses, and that’s
where the fun begins. The select query clauses are probably the most
confusing in the SQL and have to be dealt with accordingly. This
chapter introduces the topic, covering use of subqueries, compound
operators, and aggregate function clauses.

Single Table SELECT Statement Syntax
Here is the generic SELECT statement, as it is defined by the SQL99
standard, for selecting data from a single table. The query includes
the SELECT command, followed by the list of identifiers (table or view
columns); then comes the mandatory FROM clause that contains
names of the tables, from which these columns are selected. The rest
of the clause is optional, used to increase selectiveness of the query,
as well as add some ordering capabilities. All of these pieces make up
the complete SELECT statement.
SELECT [DISTINCT] [.] |
* |

[AS ],…
FROM |

[[AS] ]
[WHERE ]
[GROUP BY [.],...
[HAVING ]
]
[ORDER_BY  |

[ASC | DESC],…
];

SELECT Clause: What Do We Select?
In the relational databases the SELECT statement selects values in
the columns, literal values, or expressions. The returned values
themselves could be of any valid data types. These values can be
displayed in the client application, or written into a file, used in the
intermediate calculations, or entered into database tables.
Prior in this chapter, we’ve mentioned that the FROM clause of the
SELECT statement is mandatory, whereas all other clauses are
optional. This still holds true for the SQL99 standard, though the rules
are somewhat more relaxed with the vendor-specific implementations.

Single-column select
You can select as many or as few columns as you wish from a table
(or a view) for which you have SELECT privileges (see Chapter 12 on
SQL security). The following example selects only a customer name
(column CUST_NAME_S from the table CUSTOMER in the ACME
database).

Multicolumn SELECT
A single column SELECT, while useful, is certainly not the limit of the
SQL capabilities. It’s very likely that you’ll be selecting more than one
column at a time in your queries.
Selecting several columns
The following query selects three columns at the same time from the
same CUSTOMER table in the ACME database:
SELECT cust_id_n,
cust_status_s,
cust_name_s
FROM customer

As in the case with a single column SELECT, the result returned is a
set of values, with a distinction that it is rather a set of sets — one set
for each column mentioned in the SELECT statement. The sequence
in which these sets appear directly corresponds to the sequence in
which the column names were mentioned in the SELECT clause.
Note
It is possible to select a column more than once within a
single query. The result will simply be duplicate sets of
values.

Selecting all columns
Selecting all columns in the table could be achieved by listing every
single column from the table in the SELECT clause of the query, or
using the convenient shortcut — asterisk (*) — provided by the
SQL99 standard and implemented by virtually every RDBMS on the
planet.
SELECT *
FROM status

Selecting all columns plus an extra column
In a relatively rare case where you need to select all the columns fromthe table and a duplicate of a column(s), you may do so in all three
RDBMS, but this is where vendor’s implementations differ from the
SQL99 standard as well as from each other.
This may seem a superfluous feature, but imagine a situation when
you need to see all 200 rows in the table and wish to change the
default sequence in which the columns appear in the final resultset.
You have an option to list all the columns in the desired order (i.e.,
typing in all the columns in the SELECT clause of your query) or to do
it the easier way at the expense of having a duplicate set of values —
but in the place where you would rather see it. Of course, you maycombine all/any rows in the table into a single resultset more than
once.

Selecting distinct values
As you become more selective in terms of what data is expected to be
returned by a query, the need may arise to eliminate duplicates. The
SQL99 standard provides an easy and elegant way to eliminate any
duplicate values from the final resultset.
The table PAYMENT_TERMS in the ACME database contains data
about discounts, in terms of percentages, given to customers. While
the particulars of each discount might differ, it is quite conceivable that
the actual percentage might be the same.
The following example selects all the rows for the
PAYTERMS_DISCPCT_N column from that table:
SELECT payterms_discpct_n
FROM payment_terms
PAYTERMS_DISCPCT_N

Using subqueries in a SELECT clause
The concept of a subquery is simple — it is a query within a query that
supplies necessary values for the first query. A SELECT query could
have an embedded subquery as a way to retrieve unknown values,
and the nesting level (how many subqueries you could have within
each other) is limited only by the RDBMS capability.
To illustrate the concept, some preliminary work is required (since the
required table is not within the ACME database). Let’s say that in full
accordance with database design guidelines you have created a table
that contains state tax amounts for each state, as well as each state’s
full name and two-letter postal abbreviation.

FROM Clause: Select from What?
The FROM clause is mandatory for every SELECT statement — with
the exception of the MS SQL Server case discussed earlier in this
chapter. The database objects you should be able to select from are
tables and views. These come in many flavors — temporary tables,
inline views, materialized views, to name just a few — but the truth is
that there is nothing else in the RDBMS world to select from.

Selecting from tables and views
We’ve used tables and views as “selectable from” objects in the
examples given earlier in this chapter (and all previous chapters). In
fact, the only truly selectable object in RDBMS is a table, while view is
a query that is based on some table (or tables). Unlike a table, view,
by definition, does not contain data but rather collects it from the base
tables whenever a SELECT query is executed against the view.

Using aliases in a FROM clause
It is possible to alias the table names listed in the FROM clause of a
SELECT query to shorten notation and make it more visual by
prefixing the columns in a different clause with the table alias. Here is
an example of selecting three columns from a STATUS table in the
ACME database where the table is aliased with s: SELECT
status_id_n, s.status_code_s, s.status_desc_s FROM status sThe columns in the SELECT clause may or may not be prefixed with
the table’s alias (or table name); moreover, the columns themselves
could be aliased for readability purposes, replacing somewhat cryptic
column names with more meaningful ones (see paragraph earlier in
this chapter). If such prefixes are used, they follow .
notation, shown in the example above.
While not being very useful in the case of a single table selection, it
simplifies queries when more than one table is involved, and helps to
remove confusion should two or more tables have identically named
columns. Please refer to Chap-ter 9 for more information on multitable
queries.

Using subqueries in a FROM clause (inline views)
We have discussed the VIEW database object in Chapter 4 and
Chapter 5. Here we are going to introduce so-called inline views.
Unlike the VIEW object, the inline views do not exist outside the query
that contains them, and may or may not have a proper name forthemselves. Consider the following statement that selects customer’s
ID, name, and status from an inline view CUST: SELECT cust.id, cust.
cust_name_s, cust.active FROM (SELECT cust_id_n AS id,
cust_status_s AS active, cust_name_s, cust_alias_s AS alias,
cust_credhold_s AS hold FROM customer) cust ID CUST_NAME_S
ACTIVE ———– ————————— —— 51 DLH INDUSTRIES Y 5
FAIR AND SONS AIR CONDTNG Y 12 KILBURN GLASS
INDUSTRIES Y 61 BOSWELL DESIGNS CORP. Y 55 WILE
ELECTROMATIC INC. Y 6 FABRITEK INC. Y … 16 DALCOMP INC. Y
89 INTEGRATED POWER DESIGNS Y 85 GUARDIAN
MANUFACTURING INC. Y 152 WILE BESS COMPANY Y 37
record(s) selected.
Note that the outer SELECT clause refers to the columns selected
from the inline view by their alias — because of the way these
columns are exposed to it; replacing CUST.ID with CUST. CUST_ID_N
would generate an Invalid column name error, since that is not the
name that outer query could reference. At the same time, the column
CUST_NAME_S could be used the way it is mentioned in the
subquery because it was not aliased.

WHERE Clause: Setting Horizontal Limits
While selecting everything a table or view could contain might be of
value for some operations, most of the time you will be looking for
specific information — a person with a particular phone number, data
falling into a certain date range, and so on. The table might contain
several million rows, and you simply have no time to search for the
information all by yourself. The SQL WHERE clause provides a
mechanism for setting horizontal limits; specifically, it allows you to
limit the number of rows in resultsets returned by a query through
specifying some condition or set of conditions. Depending on what
conditions you have specified with your query, there might be zero,
one, or more records (rows) returned. The search criteria specified in
the WHERE clause evaluate to TRUE or FALSE, and all the rules of
Boolean algebra are fully applicable there.

Using comparison operators
To specify conditions in the WHERE clause, SQL employs a number of
operators. These are discussed in detail in Chapter 11. Here, we are
going to touch them only briefly.
Consider the following query run against the ACME database in
Microsoft SQL Server (the syntax and results would be identical in all
“big three” databases). It returns some information about a particular
customer, uniquely identified by the customer ID field CUST_ID_N.
The uniqueness of the customer ID (not that of the address record)
value is enforced by the primary key constraint as well as a UNIQUE
constraint placed onto the column in the table CUSTOMER. (There
could be only one customer associated with any given ID.)
SELECT cust_id_n,
cust_name_s,
cust_status_s
FROM customerWHERE cust_id_n = 7

Compound operators: Using AND and OR
There could be more than one criterion specified with the query. For
example, you may want to retrieve all the phone salespersons in your
company that are not assigned to a customer. In the table, there are
phone numbers and fax numbers; to eliminate the latter, the following
query could be used:
SQL> SELECT phone_salesmanid_fn,
phone_phonenum_s,
phone_type_s
FROM
phone
WHERE
phone_custid_fn IS NULL
AND
phone_type_s = ‘PHONE’

Using the BETWEEN operator
While it is possible to use a combination of => (greater than or equal
to) and <= (less than or equal to) operators to achieve exactly the
same results, the BETWEEN operator provides a more convenient
(and often more efficient) way for selecting a range of values.
SELECT prod_description_s,
prod_price_n
FROM product
WHERE prod_price_n BETWEEN 23.10 AND 30

Using the IN operator: Set membership test
When there is more than one exact criterion for the WHERE clause,
and these criteria do not fit any range of values, you may use an OR
statement. Consider the following query:SELECT cust_name_s,
cust_credhold_s
FROM customer
WHERE cust_alias_s = ‘MNGA71396’ OR
cust_alias_s = ‘MNGA71398’ OR
cust_alias_s = ‘MNGA71400’

The NOT operator
The NOT operator negates results of the operator by making it
perform a search for the results exactly opposite to those specified.
Any of the operators and queries discussed to this point could have
produced opposite results if NOT was used. The following example
returns all the results that do not match the specified criteria — having
the name with the second letter I, third L, and fourth E; only records
that do not have such a sequence starting from the second position
within the company name are selected:
SQL> SELECT cust_name_s
FROM customer
WHERE cust_name_s NOT LIKE ‘_ILE%’

Using the IS NULL operator: Special test for NULLS
We have mentioned before that relational databases are using a
special value to signify the absence of the data in the database table
column — NULL. Since this value does not comply with the rules that
all the other values follow (e.g., comparison, operations, etc.), theycannot be detected with the equation/comparison operator =; i.e., the
syntax WHERE = NULL, while being technically valid
in Oracle or DB2 UDB (and valid in Microsoft SQL Server 2000 under
certain circumstances), would never yield any data because the
equation will always evaluate to FALSE.
The test for NULL is performed with the IS keyword, as in the
example below, which retrieves information about salesmen that have
customers without a PHONE_CUSTID_FN number.
SELECT phone_salesmanid_fn,
phone_phonenum_s,
phone_type_s
FROM phone
WHERE phone_custid_fn IS NULL

Using subqueries in a WHERE clause
As in the SELECT clause, the subqueries could be used with the
WHERE clause to provide missing values (or a set of values). For
example, you cannot find information from the ORDER_HEADER table
using a customer’s name only, because the ORDER_HEADER table
contains customer IDs, not the names; thus, the customer ID could be
found in the table CUSTOMER using the customer name as acriterion, and then used to select values from the ORDER_HEADER
table:
SELECT ordhdr_nbr_s,
ordhdr_orderdate_d
FROM order_header
WHERE ordhdr_custid_fn =
(SELECT cust_id_n
FROM customer
WHERE cust_name_s = ‘WILE ELECTRONICS INC.’)

Nested subqueries
The subquery could host a subquery in its turn. This is called nested
subqueries. There is no theoretical limit on the nesting level — i.e.,
how many times there could be a query within a query — though
some vendors limit it. Subquery is an expensive way (in computer
resources terms) to find out information and should be used
judiciously.

GROUP BY and HAVING Clauses:
Summarizing Results
Grouping records in the resultset based on some criteria could
provide a valuable insight into data that has accumulated in the table.
For example, you would like to see the final resultset of your orders
(where there could be one or more order items per order) not in the
random order they were entered in, but rather in groups of items that
belong to the same order:
SELECT ordline_ordhdrid_fn,
ordline_ordqty_n AS QTY_PER_ITEM
FROM order_line
GROUP BY ordline_ordhdrid_fn,
ordline_ordqty_n;
Note the repeating values (groups) in the ORDLINE_ORDHDRID_FN
field, representing the order header ID for which there could be one or
more order items and for which there are different quantities. This
information might become more concise with the use of aggregate
functions that could sum the quantity for the order or calculate the
average, and so on (see the example in this paragraph using SUM
function).
The GROUP BY clause is mostly (but not always) used in conjunction
with aggregate functions, which are introduced in Chapter 10. The
aggregate functions return a single value as a result of an operation
conducted on a set of values. The set is grouped to provide a series
of sets for use with the aggregate functions.

The HAVING clause used exclusively with the GROUP BY clause
provides a means of additional selectivity. Imagine that you need to
select not all records in your GROUP BY query but only those that
would have their grouped value greater than 750. Adding additional
criterion to the WHERE clause would not help, as the value by which
we could limit the records is calculated using GROUP BY and is
unavailable outside it before the query has completed execution. The
HAVING clause used within the GROUP BY clause allows us to add
this additional criterion to the results of the GROUP BY operation. For
example, to display orders with a total quantity greater than 750, the
following query could be used:
SELECT ordline_ordhdrid_fn,
SUM(ordline_ordqty_n) TOT_QTY_PER_ORDER
FROM
order_line
GROUP BY ordline_ordhdrid_fn
HAVING SUM(ordline_ordqty_n) > 750

ORDER BY Clause: Sorting Query Output
The query returns results matching the criteria unsorted — i.e., in the
order they’ve been found in the table. To produce sorted output —
alphabetically or numerically — you would use an ORDER BY clause.
The functionality of this clause is identical across all “big-three”
databases.
The following query sorts the output by the customer name
alphabetically in ascending order:
SQL> SELECT cust_name_s,
cust_alias_s,
cust_status_s
FROM
customer
ORDER BY cust_name_s;
The results could be sorted in either ascending or descending order.
To sort in descending order, you must specify keyword DESC after the
column name; to sort in ascending order you may use ASC keyword
(or omit it altogether, as it is done in the above query, since ascending
is the default sorting order).
It is possible to sort by more than one column at the same time,
though results might not be as obvious. The precedence of the
columns in the ORDER BY clause is of importance here: First results
are sorted by the values of the first column, then — within the order
established by the first column — the results will be sorted by the
second column values.
It is even possible to specify different orders — ascending or
descending for the different columns listed in the ORDER BY clause.
The following example orders the records from the PRODUCT table
first by the price in ascending order, then by the net weight — in
descending order for each price value:
SELECT prod_id_n,
prod_price_n,
prod_netwght_n
FROM product
ORDER BY prod_price_n ASC,
prod_netwght_n DESC

Combining the Results of Multiple Queries
It is possible to produce a single result combining the results of two or
more queries. The combined resultset might be a simple aggregation
of all records from the queries; or some operation related to the theory
of sets (see Appendix L) could be performed before the final resultset
was returned.
The SQL99 standard supports UNION, INTERSECT, and EXCEPT
clauses that could be used to combine the results of two or more
queries.

UNION
The following query returns all the records containing some
information about customers that do not yet have an assigned
salesman:
SELECT phone_custid_fn OWNER_ID,
‘CUSTOMER PHONE’ PHONE_TYPE,
phone_phonenum_s
FROM phone
WHERE phone_type_s = ‘PHONE’
AND phone_salesmanid_fn IS NULL
This query returns a total of 37 records. Now, assume that you also
would like to include in the resultset the list of salesmen’s phones who
do not have a customer assigned to them yet. Here is the query to
find these salesmen; it returns six records:
SELECT phone_salesmanid_fn,
‘SALESMAN PHONE’,
phone_phonenum_s
FROM
phone
WHERE
phone_type_s = ‘PHONE’
AND
phone_custid_fn IS NULL
To combine these records into a single resultset, you would use the
UNION statement
Now you have a full list that includes all records from the query about
customers, combined with the results brought by the query about
salesmen. You may visualize this as two resultsets glued together. All
queries in an SQL statement containing a UNION operator must have
an equal number of expressions in their lists. In addition, these
expressions (which could be columns, literals, results of functions,
etc.) must be of compatible data types: For example, if the expression
evaluates to a character string in one query, it cannot be a number in
the second query that is joined to the first by the UNION operator.
The results of UNION could be ordered (as we can see in the UNION
query above) but the ORDER BY clause could be used only with the
final resultset — that is, it can refer to the result of the UNION, not to
particular queries used in it.
If the queries potentially could bring duplicate records, you may want
to filter the duplicates, or, conversely, make sure that they all are
present. By default, the UNION operator excludes duplicate records;
specifying UNION ALL makes sure that your final resultset has all the
records returned by all the queries participating in the UNION.

INTERSECT
The INTERSECT operator is used to evaluate results returned by two
queries but includes only the records produced by the first query that
have matching ones in the second. This operator is implemented in
Oracle and IBM DB2 UDB but not in Microsoft SQL Server 2000,which uses EXISTS operator for this purpose.
Consider the query that selects customer IDs (field CUST_ID_N) from
the CUSTOMER table of the ACME database and intersects them with
results returned by a second query, producing a resultset of
customer’s IDs who placed an order:
SELECT cust_id_n
FROM customer
INTERSECT
SELECT ordhdr_custid_fn
FROM order_header

EXCEPT (MINUS)
When combining the results of two or more queries into a single
resultset, you may want to exclude some records from the first query
based on what was returned by the second. This keyword is
implemented in IBM DB2 UDB only, whereas Oracle uses the MINUS
keyword and Microsoft SQL Server 2000 uses EXISTS for the same
purpose.
The EXCEPT result will be all the records from the first minus those
returned by the second.
SELECT cust_id_n
FROM customer
MINUS
SELECT ordhdr_custid_fn
FROM order_header;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Joins understanding

A

In the previous chapter we introduced the simple, or single-table
query. However, you can hardly find a relational database with just
one table — that contradicts the whole idea of RDBMS and
normalization rules. To achieve meaningful results you usually have to
retrieve information from multiple tables within a single query. All
RDBMS allow you to join data from two or more tables based on a
common column (or multiple columns), i.e., when this column(s)
appears in both tables — under the same or a different name; for
example (having ACME database in mind), ORDER_HEADER table
could be joined with CUSTOMER using the ORDHDR_CUSTID_FN
and CUST_ID_N columns.
All joins can generally be divided into two large groups — inner joins
and outer joins.

Inner Joins
In this section we’ll be talking about inner joins, which only return rows
with matching values from both joined tables excluding all other rows.
Inner join options
With an inner join, you have these options:
- Keyword INNER is optional; it could be used for clarity to distinguish between inner and outer joins.
- Keyword NATURAL is used to specify a natural join between
two tables, i.e., join them by column(s) with identical names.
- You cannot invoke either the ON or USING clause along with
the NATURAL keyword. Out of all our “big three” RDBMS, it is
available only in Oracle 9i. The natural join is discussed in
more detail later in this chapter.Keyword CROSS is used to produce a cross join, as discussed
later in this chapter. The keyword is valid for Oracle 9i and MS
SQL Server 2000 syntax but is not recognized by DB2 UDB
8.1.

ON and USING clauses
The ON clause is to specify the join condition (equijoin or nonequijoin,
explained later in the chapter); all our “big three” databases have it in
their syntax.
When you are specifying an equijoin of columns that have the same
name in both tables, a USING clause can indicate the column(s) to be
used. You can use this clause only if the join columns in both tables
have the same name. The USING clause could be used when the
NATURAL join would not work properly, i.e., tables have more
identically named columns than you would actually want to use in your
join. For example, if hypothetical tables A and B have common fields
CUST_ID, PROD_ID, and ORDER_ID, but you only want to join them
by CUST_ID and PROD_ID, you could specify … FROM A JOIN B
USING (CUST_ID, PROD_ID) …
Note The column names in USING clause should not be qualified
by table names.
The USING clause is only supported by Oracle 9i out of our three
vendors, but the same results could easily be achieved with ON
clause: … FROM A JOIN B ON A.CUST_ID = B.CUST_ID AND
A.PROD_ID = B.PROD_ID …

Equijoin
Equijoin, which is the most popular type of table join, uses exact
matching between two columns. For example, in the ACME database
the CUSTOMER table does not contain any information about
customer phone numbers; the PHONE table has the field
PHONE_CUSTID_FN, which is the foreign key to CUSTOMER table.
The SQL99-compliant syntax to produce the output shown on the
bottom of Figure 9-1 is given here: SELECT cust_id_n,
cust_name_s, phone_phonenum_s, phone_type_s FROM
customer JOIN phone ON cust_id_n = phone_custid_fn

Natural join
Natural join is a special case of equijoin performed by RDBMS on
the following assumption: “Always join tables using a column (or
multiple columns) if they share the same name(s).” In other words,
you don’t have to specify the columns explicitly in the ON
subclause of the SELECT statement’s FROM clause. The ON
subclause is omitted completely.
The natural join can make the query writing process easier, but
only assuming the database structure and the query itself are
simple enough; otherwise undesirable joins and incorrect query
output could result.

Nonequijoin
Sometimes you need to join tables based on criteria other than
equality. While the most typical use of equijoin deals with
primary/foreign key relationships, that is not usually the case with
nonequijoins — such a join would usually populate the resulting set ina way that does not make much sense at all. For example, if you
replace the = (equals) operator in the query from the last section with
<> (not equal), the resulting query will return every possible
combination of customers and phone numbers except the ones that
actually represent the valid customer/phone listings (Figure 9-2); in
other words, the result will be somehow similar to the Cartesian
product (CROSS JOIN) discussed later in this chapter.

Self-join
The idea of self-join is probably one of the most unintuitive SQL
concepts. Even though it sounds very simple — a table is joined withitself rather than with another table — sometimes it causes lots of
confusion.
One important thing to understand here is that despite the fact that
you are joining the table with itself, you are still dealing with two
instances of the same table, or with two identical tables rather than
with just one table, so the self-join should be considered as a special
case of the multitable join.

Cross join (Cartesian product)
Cross join, or the Cartesian product of two tables, can be defined as
another (rather virtual) table that consists of all possible pairs of rows
from the two source tables. Returning to our customer phone example
in the nonequijoin section of this chapter, the cross join of the
CUSTOMER and PHONE tables returns results very similar to what
we’ve got on Figure 8-2 except it would also return the valid
customer/phone combinations, excluded from the previously
mentioned nonequijoin.
This query will return all possible combinations of customer names
and phone numbers by performing cross join of CUSTOMER and
PHONE tables:
SELECT cust_name_s,
phone_phonenum_s
FROM customer CROSS JOIN
phone

Joining more than two tables
In a relational database quite often you need to retrieve data from
many tables simultaneously within a single query to get all necessary
information. Thus, in real life a multitable query could easily mean a
dozen-table query or even a hundred-table query. Probably 90
percent of the SQL programming art is the talent to properly join
multiple tables based on the knowledge of the internal database
objects’ structure plus the ability to apply this knowledge.
The concept is simple. The result of joining two tables could be
considered as a new virtual table, which, in its turn, could be joined
with the next table in the query, producing yet another virtual table,and so on.

Number of joins
One important rule to remember is that there should be at least (n –
1) joins in an n-table query, thus, at least two joins for a three-table
query, at least three joins for query that involves four tables, and so
on. The words “at least” are important — there could be more than
(n – 1) joins in a nonequijoin query or in a query that joins tables
using composite primary/foreign keys, but if your multitable query
has less than (n – 1) joins, the result will be a Cartesian product.
The query in the previous example uses SQL99-compliant syntax.
The query that follows also uses SQL99-compliant syntax to join
the CUSTOMER, ORDER_HEADER, and STATUS tables and return
customer name, order number, and order status:
SELECT cust_name_s,
ordhdr_nbr_s,
status_desc_s
FROM customer
JOIN
order_header
ON cust_id_n = ordhdr_custid_fn
JOIN
status
ON status_id_n = ordhdr_statusid_fn

Outer Joins: Joining Tables on Columns
Containing NULL Values
You probably noticed in the RESELLER table presented earlier in this
chapter that the query returns all table records except one for ACME,
INC. This is because the ACME, INC. record in the RESELLER table
has NULL in the RESELLER_SUPPLIER_ID column, so an RDBMS
cannot find the corresponding value in the table you are trying to join
(in this case, the other instance of RESELLER table). As the result, the
query returns nine rows even though the table contains ten records.
That’s just the way the standard (inner) join works. Sometimes,
however, you want a query to return all rows from table A and the
corresponding rows from table B — if they exist. That’s where you use
outer joins.
The SQL99-compliant syntax indicates outer join in the FROM clause
of the SELECT statement:

FROM
{LEFT | RIGHT | FULL [OUTER]} | UNION JOIN

[ON ] | [USING ,…],…

The syntax is generally supported by all our three RDBMS vendors.
The exceptions are the USING clause that is implemented only by
Oracle and the UNION clause (discussed later in this chapter) that is
not a part of any of the “big three” databases syntax.

Left outer join
In fact, the term “left outer join” is just a convention used by SQL
programmers. You can achieve identical results using left or right
outer joins as we will demonstrate later in this chapter. The whole idea
behind an outer join is to retrieve all rows from table A (left) or table B
(right), even though there are no matching columns in the counterpart
table, so the join column(s) is NULL. A left (or right) outer join also
returns nulls for all unmatched columns from the joined table (for rows
with NULL join columns only).
The following query illustrates how to produce the resulting set
containing all ten rows from RESELLER table using SQL99-compliant
left outer join:
SELECT
r.reseller_id_n AS res_id,
r.reseller_name_s AS res_name,
s.reseller_id_n AS sup_id,
s.reseller_name_s AS sup_name
FROM
reseller r
LEFT OUTER JOIN
reseller s
ON
r.reseller_supplier_id = s.reseller_id_n

Right outer join
As we mentioned before, the only difference between left and right
outer joins is the order in which the tables are joined in the query. To
demonstrate that we’ll use queries that produce exactly same output
as in the previous section.
As you can see, the resulting set of the inner join of ORDER_HEADER
and CUSTOMER is on the right-hand side from the PAYMENT_TERMS
table:
SELECT cust_name_s,
ordhdr_nbr_s,
payterms_desc_s
FROM payment_terms
RIGHT OUTER JOIN
order_header
ON payterms_id_n = ordhdr_payterms_fn
JOIN
customer
ON cust_id_n = ordhdr_custid_fn
WHERE cust_id_n = 152

Full outer join
Full outer join is the combination of left and right outer join. It returns
all rows from both “left” and “right” tables, no matter if the counterpart
table has matching rows or not. For example, in the ACME database
there are some customers that did not place any orders yet — as well
as some orders with no customers assigned to them.
The query that retrieves all customers without orders as well as all
orders with no customer assigned to them is shown below:
SELECT customer.cust_name_s,
order_header.ordhdr_nbr_s
FROM customer
FULL OUTER JOIN
order_header
ON customer.cust_id_n = order_header.ordhdr_custid_fn

Union join
The UNION join (not to be confused with the UNION operator) could
be thought of as the opposite of an inner join — its resulting set only
includes those rows from both joined tables for which no matches
were found; the columns from the table without matching rows are
populated with nulls.

Joins Involving Inline Views
As we already mentioned in this chapter, a query that involves table
joins could be quite complicated. You can join tables with other tables,
views, or any other RDBMS-specific objects you can select rows from
to produce the resulting set.
Another type of object that can participate in a join is an inline view
(which simply is a nested query in the FROM clause).
Note
An inline view could be useful to produce a resulting set that
is used in only one single query as an alternative to creating
a regular view. Unlike regular views, inline view definitions
do not exist in the database information schema and require
no maintenance at all. You could consider using inline views
in your queries if you know for sure you (or somebody else)
are not going to use its results anywhere else.

Multitable Joins with Correlated Queries
One important rule to remember is never to combine the new syntax
with the old one within a single query. First, such syntax may not work
properly, and also it would definitely look confusing.
However, if you need to create a correlated query, the SQL99 syntax
simply will not work, so the general recommendation is to either use
the old syntax or replace correlated query with something else.
The following statement uses a correlated query to retrieve all
customer names and phone numbers for customers who have orders:
SELECT
DISTINCT c.cust_name_s,
p.phone_phonenum_s
FROM customer c,
phone p
WHERE c.cust_id_n = p.phone_custid_fn
AND EXISTS (SELECT *
FROM order_header oh
WHERE oh.ordhdr_custid_fn = c.cust_id_n)
AND p.phone_type_s = ‘PHONE’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly