Relational Databases ~ Introduction to SQL I

There are a number of database query languages in use, either commercially or experimentally. In this series we study the most widely used query language, SQL. Although we refer to the SQL language as a “query language”, it can do much more than just query a database. It can define the structure of the data, modify data in the database, and specify security constraints.

It is not our intention to provide a complete users’ guide for SQL. Rather, we present SQL’s fundamental constructs and concepts. Individual implementations of SQL may differ in details, or may support only a subset of the full language.

1、Overview of the SQL Query Language

IBM developed the original version of SQL, originally called Sequel, as part of the System R project in the early 1970s. The Sequel  language has evolved since then, and its name has changed to SQL (Structured Query Language). Many products now support the SQL language. SQL has clearly established itself as the standard relational database language.

In 1986, the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO) published an SQL standard, called SQL-86. ANSI published an extended standard for SQL, SQL-89, IN 1989. The next version of the standard was SQL-92 standard, followed by SQL:1999, SQL:2003, SQL:2006, and most recently SQL:2008. The bibliographic notes provide references to these standards.

The SQL language has several parts:

  • Data-definition language (DDL). The SQL DDL provides commands for defining relation schemas, deleting relations, and modifying relation schemas.
  • Data-manipulation language (DML). The SQL DML provides the ability to query information from the database and to insert tuples into, delete tuples from, and modify tuples in the database.
  • Integrity. The SQL DDL includes commands for specifying integrity constraints that the data stored in the database must satisfy. Updates that violate integrity constraints are disallowed.
  • View definition. The SQL DDL includes commands for defining views.
  • Transaction control. SQL includes commands for specifying the beginning and ending of transactions.
  • Embedded SQL and dynamic SQL. Embedded and dynamic SQL define how SQL statements can be embedded within general-purpose programming language, such as C, C++, and Java.
  • Authorization. The SQL DDL includes commands for specifying access rights to relations and views.

Although most SQL implementations support the standard features we describe here, you should be aware that there are differences between implementations. Most implementations support some nonstandard features, while omitting support for some of the more advanced features. In case you find that some language features described here do not work on the database system that you use, consult the user manuals for your database system to find exactly what features it supports.

2、SQL Data Definition

The set of relations in a database must be specified to the system by means of data-definition language (DDL). The SQL DDL allows specification of not only a set of relations, but also information about each relation, including:

  • The schema for each relation.
  • The types of values associated with each attribute.
  • The integrity constraints.
  • The set of indices to be maintained for each relation.
  • The security and authorization information for each relation.
  • The physical storage structure of each relation on disk.

2.1 Basic Types

The SQL standard supports a variety of built-in types, including:

  • char(n): A fixed-length character string with user-specified length n. The full form, character, can be used instead.
  • varchar(n): A variable-length character string with user-specified maximum length n. The full form, character varying, is equivalent.
  • int: An integer (a finite subset of the integers that is machine dependent). The full form, integer, is equivalent.
  • smallint: A small integer (a machine-dependent subset of the integer type).
  • numeric(p, d): A fixed-point number with user-specified precision. The number consists of p digits (plus a sign), and d of the p digits are to the right of the decimal point. Thus, numeric(3, 1) allows 44.5 to be stored exactly, but neither 444.5 or 0.32 can be stored exactly in a field of this type.
  • real, double precision: Floating-point and double-precision floating-point numbers with machine-dependent precision.
  • float(n): A floating-point number, with precision of at least n digits.

Each type may include a special value called the null value. A null value indicates an absent value that may exist but be unknown or that may not exist at all. In certain cases, we may wish to prohibit null values from being entered, as we shall see shortly.

The char data type stores fixed length strings. Consider, for example, an attribute A of type char(10). If we store a string “Avi” in this attribute, 7 spaces are appended to the string to make it 10 characters long. In contrast, if attribute B were of type varchar(10), and we store “Avi” in attribue B, no spaces would be added. When comparing two values of type char, if they are of different lengths extra spaces are automatically added to the shorter one to make them the same size, before comparison.

When comparing a char type with a varchar type, one may expect extra spaces to be added to the varchar type to make the lengths equal, before comparison; however, this may or may not be done, depending on the database system. As a result, even if the same value “Avi” is stored in the attributes A and B above, a comparison A = B may return false. We recommend you always use the varchar type instead of the char type to avoid these problems.

SQL also provides the nvarchar type to store multilingual data using the Unicode representation. However, many databases allow Unicode (in the UTF-8 representation) to be stored even in varchar types.

2.2 Basic Schema Definition

We define an SQL relation by using the create table command. The following command creates a relation department in the database.

The relation created above has three attributes, dept_name, which is a character string of maximum length 20, building, which is a character string of maximum length 15, and budget, which is a number with 12 digits in total, 2 of which are after the decimal point. The create table command also specifies that the dept_name attribute is the primary key of the department relation.

The general form of the create table command is :

where r is the name of the relation, each Ai is the name of an attribute in the schema of relation r, and Di is the domain of attribute Ai; that is, Di specifies the type of attribute Ai along with optional constraints that restrict the set of allowed values for Ai.

The semicolon shown at the end of the create table statements, as well as at the end of other SQL statements later in this article, is optional in many SQL implementations.

SQL supports a number of different integrity constraints. In this section, we discuss only a few of them:

  • primary key (Aj1, Aj2, … , Ajm): The primary-key specification says that attributes Aj1, Aj2, … , Ajm form the primary key for the relation. The primary-key attributes are required to be nonnull and unique; that is, no tuple can have a null value for a primary-key attribute, and no two tuples in the relation can be equal on all the primary-key attributes. Although the primary-key specification is optional, it is generally a good idea to specify a primary key for each relation.
  • foreign key (Ak1, Ak2, … Akn) references s: The foreign key specification says that the values of attributes (Ak1, Ak2, … , Akn) for any tuple in the relation must correspond to values of the primary key attributes of some tuple in relation s.   The relations created above present a partial SQL DDL definition of the university database we use in the text. The definition of the course table has a declaration “foreign key (dept_name) references department”. This foreign-key declaration specifies that for each course tuple, the department name specified in the tuple must exist in the primary key attribute (dept_name) of the department relation. Without this constraint, it is possible for a course to specify a nonexistent department name. What above also shows foreign key constraints on tables section, instructor and teaches.
  • not null: The not null constraint on an attribute specifies that the null value is not allowed for that attribute; in other words, the constraint excludes the null values from the domain of that attribute. For example, the not null constraint on the name attribute of the instructor relation ensures that the name of an instructor cannot be null.

More details on the foreign-key constraint, as well as on other integrity constraints that the create table command may include, are provided later.

SQL prevents any update to the database that violates an integrity constraint. For example, if a newly inserted or modified tuple in a relation has null values for any primary-key attribute, or if the tuple has the same value on the primary-key attributes as does another tuple in the relation, SQL flags an error and prevents the update. Similarly, an insertion of a course tuple with a dept_name value that does not appear in the department relation would violate the foreign-key constraint on course, and SQL prevents such an insertion from taking place.

A newly created relation is empty initially. We can use the insert command to load data into the relation. For example, if we wish to insert the fact that there is an instructor named Smith in the Biology department with instructor_id 10211 and a salary of $66000, we write:

The values are specified in the order in which the corresponding attributes are listed in the relation schema. The insert command has a number of useful features.

We can use the delete command to delete tuples from a relation. The command

would delete all tuples from the student relation. Other forms of the delete command allow specific tuples to be deleted; the delete command is covered in more detail later.

To remove a relation from an SQL database, we use the drop table command. The drop table command deletes all information about the dropped relation from the database. The command

is a more drastic action than

The latter retains relation r, but deletes all tuples in r. The former deletes not only all tuples of r, but also the schema for r. After r is dropped, no tuples can be inserted into r unless it is re-created with the create table command.

We use the alter table command to add attributes to an existing relation. All tuples in the relation are assigned null as the value for the new attribute. The form of the alter table command is 

where r is the name of an existing relation, A is the name of the attribute to be added, and D is the type of the added attribute. We can drop attributes from a relation by the command

where r is the name of an existing relation, and A is the name of an attribute of the relation. Many database systems do not support dropping of attributes, although they will allow an entire table to be dropped.

发表评论

电子邮件地址不会被公开。 必填项已用*标注