SQL for SQLite

Querying the Database

SQLite supports all the relational operations in ANSI SQL with the exception of right and full outer joins. The types of joins can be constructed using combinations of other fundamental relational operations, so don't worry then.

A very general form of SELECT in SQLite (without too much distracting syntax) can be represented as follows:

select [distinct] heading
from tables
where predicate
group by columns
having predicate
order by columns
limit count[,offset];

Each keyword—from, where, having, and so on—is a separate clause.

All clauses are optional with the exception of SELECT. You must always provide at least this clause to make a valid select command. By far the most common invocation of the select command consists of three clauses: select, from, and where. This basic syntax and its associated clauses are as follows:

select heading from tables where predicate;

You can limit the size and particular range of the result using the limit and offset keywords. limit specifies the maximum number of records to return. offset specifies the number of records to skip.

If you use both limit and offset together, you can use a comma notation in place of the offset keyword. In SQLite, when using this shorthand notation, the offset precedes the limit.

Notice that limit and offset are placed last in the operational pipeline. One common misconception of limit/offset is that it speeds up a query by limiting the number of rows that must be collected by the where clause. This is not true. If it were, then order by would not work properly. For order by to do its job, it must have the entire result in hand to provide the correct order. There is a small performance boost, in that SQLite only needs to keep track of the order of the 10 biggest values at any point. This has some benefit but not nearly as much as the sort-free limiting that some expect.

The `GLOB` operator

The GLOB operator is very similar in behavior to the like operator. Its key difference is that it behaves much like Unix or Linux file globbing semantics in those operating systems. This means it uses the wildcards associated with file globbing such as * and _, and that matching is case sensitive. The following example demonstrates glob in action:

sqlite> select id, name from foods
   ...> where name glob 'Pine*';

SQLite also recognizes the match and regexp predicates but currently does not provide a native implementation.

Joining Tables*

Creating a Database

Creating Tables

You create a table with the create table command, which is defined as follows:

create [temp] table table_name (column_definitions [, constraints]);

The temp or temporary keyword creates a temporary table. This kind of table is, well, temporary—it will last only as long your session. As soon as you disconnect, it will be destroyed (if you haven't already destroyed it manually). The brackets around the temp denote that it is an optional part of the command.

Whenever you see any syntax in brackets, it means that the contents within them are optional. Furthermore, the pipe symbol (|) denotes an alternative (think of the word or). Take, for example, the following syntax:

create [temp|temporary] table ... ;

This means that either the temp or temporary keyword may be optionally used. You could say create temp table foo..., or you could say create temporary table foo.... In this case, they mean the same thing.

If you don't create a temporary table, then create table creates a base table. The term base table refers to a table that is a named, persistent table in the database. This is the most common kind of table. In general, the term base table is used to differentiate tables created by create table from system tables and other table-like objects such as views.

The minimum required information for create table is a table name and a column name. The name of the table, given by table_name, must be unique among all other identifiers. In the body, column_definitions consists of a comma-separated list of column definitions composed of a name, a domain, and a comma-separated list of column constraints. A type, sometimes referred to as a domain, or a storage class, is synonymous with a data type in a programming language. It denotes the type of information that is stored in the column.

There are five native types in SQLite: integer, real, text, blob, and null. All of these types are covered in Storage Classes.

Constraints are constructs that control what kind of values can be placed in the table or in individual columns. For instance, you can ensure that only unique values are placed in a column by using a unique constraint. Constraints are covered in the section on Data Integrity.

The create table command allows you to include a list of additional column constraints at the end of the command, denoted by constraints in the syntax outlined earlier. Consider the following example:

create table contacts ( id integer primary key,
name text not null collate nocase,
phone text not null default 'UNKNOWN',
unique (name,phone) );

Column id is declared to have type integer and constraint primary key. As it turns out, the combination of this type and constraint has a special meaning in SQLite. Integer primary key basically turns the column into an autoincrement column. Column name is declared to be of type text and has two constraints: not null and collate nocase. Column phone is of type text and has two constraints defined as well. After that, there is a table-level constraint of unique, which is defined for columns name and phone together.

Altering Tables

You can change parts of a table with the alter table command. SQLite's version of alter table can either rename a table or add columns. The general form of the command is as follows:

alter table table { rename to name | add column column_def }

If you add a column, the column definition, denoted by column_def, follows the form in the create table statement. It is a name, followed by an optional domain and list of constraints. Here's an example:

sqlite> alter table contacts
        add column email text not null default '' collate nocase;

sqlite> .schema contacts
  create table contacts ( id integer primary key,
                          name text not null collate nocase,
                          phone text not null default 'UNKNOWN',
                          email text not null default '' collate nocase,
                          unique (name,phone) );

To view the table definition from within the SQLite command-line program, use the .schema shell command followed by the table name. It will print the current table definition. If you don't provide a table name, then .schema will print the entire database schema.

Tables can also be created from the results of select statements, allowing you to create not only the structure but also the data at the same time. This particular use of the create table statement is covered later in the section Inserting Records.

Data Integrity*

Grouping

An essential part of aggregation is grouping. That is, in addition to computing aggregates over an entire result, you can also split that result into groups of rows with like values and compute aggregates on each group—all in one step. This is the job of the group by clause. Here's an example:

sqlite> select eyecolour from people group by eyecolour;

Operationally, group by takes the output of where and splits it into groups of rows that share a common value (or values) for a specific column (or columns). These groups are then passed to the select clause.

In the example above, there are several different eye colours, and therefore group by organizes all rows in people into as many groups groups varying by eyecolour. select takes each group, extracts its common eyecolour value, and puts it into a separate row.

When group by is used, the select clause applies aggregates to each group separately, rather than the entire result as a whole. Since aggregates produce a single value from a group of values, they collapse these groups of rows into single rows. For example, consider applying the count aggregate to the preceding example to get the number of records in each eyecolour group:

sqlite> select eyecolour,count(*) from people group by eyecolour;

eyecolour count(*) ----------- ------ lightbrown 22 darkbrown 7 blue 20 bluegreen 14 green 2

Since group by has to do all this work to create groups with like values, it seems a pity not to let you filter these groups before handing them off to the select clause. That is the purpose of having, a predicate that you apply to the result of group by. It filters the groups from group by in the same way that the where clause filters rows from the from clause. The only difference is that the where clause's predicate is expressed in terms of individual row values, and having's predicate is expressed in terms of aggregate values.

Take the previous example, but this time say you are interested only in looking at the people groups that have fewer than 20 eyecolour in them:

sqlite> select eyecolour, count(*) from people
        group by eyecolour having count(*) < 20;

eyecolour count(*) ----------- ------ darkbrown 7 bluegreen 14 green 2

Notice that blue and lightbrown have been filtered out for not satisfying the having clause.

So, group by and having work as additional restriction phases. group by takes the restriction produced by the where clause and breaks it into groups of rows that share a common value for a given column. having then applies a filter to each of these groups. The groups that make it through are passed on to the select clause for aggregation and projection.

Caution: Some databases, including SQLite, will allow you to construct a select statement where nonaggregated columns are not grouped in a select statement with mixed aggregate and nonaggregate columns.

For instance, SQLite will allow you to execute this SQL:

select eyecolour, count(*) from people

Because the aggregate (count) collapses the input table, there is a mismatch in the number of rows that pass all the filtering steps. count in this case will return one row, but we haven't instructed SQLite how to group eyecolour. Nevertheless, SQLite will return something. Sadly, the results of such a statement are meaningless—do not rely on any SQL statement that doesn't group by nonaggregate fields. The results are arbitrary!

Removing Duplicates

distinct takes the result of the select clause and filters out duplicate rows. For example, you'd use this to get all distinct eyecolour values from people:

sqlite> select distinct eyecolour from people;

which yields

eyecolour ----------- lightbrown darkbrown blue bluegreen green

This statement works as follows: the where clause returns the entire people table (dozens of records, say). The select clause pulls out just the eyecolour column, and finally distinct removes duplicate rows, reducing the number to just 5 rows, all unique.

Triggers

See SQL Triggers, too.

Update Triggers

Update triggers, unlike insert and delete triggers, may be defined for specific columns in a table. The general form of this kind of trigger is as follows:

create trigger name
[before|after] update of column on table
action

The following is a SQL script that shows an UPDATE trigger in action:

create temp table log(x);
create temp trigger foods_update_log update of name on foods
begin
insert into log values('updated foods: new name=' || new.name);
end;
begin;
update foods set name='JUJYFRUIT' where name='JujyFruit';
select * from log;
rollback;

This script creates a temporary table called log, as well as a temporary update trigger on foods.name that inserts a message into log when it fires. The action takes place inside the transaction that follows. The first step of the transaction updates the name column of the row whose name is 'JUJYFRUIT'. This causes the update trigger to fire. When it fires, it inserts a record into the log. Next, the transaction reads the log, which shows that the trigger did indeed fire. The transaction then rolls back the change, and when the session ends, the log table and the update trigger are destroyed.

SQLite provides access to both the old (original) row and the new (updated) row in update triggers. The old row is referred to as old and the new row as new. Notice in the script how the trigger refers to new.name. All attributes of both rows are available in old and new using the dot notation. You could have just as easily recorded new.type_id or old.id.

Error Handling

Defining a trigger before an event takes place gives you the opportunity to stop the event from happening and, equally, examining the event afterward allows you to have second thoughts. before and after triggers enable you to implement new integrity constraints. SQLite provides a special SQL function for triggers called raise(), which allows them to raise an error within the trigger body. raise is defined as follows:

raise(resolution, error_message);

The first argument is a conflict resolution policy (abort, fail, ignore, rollback, and so on). The second argument is an error message. If you use ignore, the remainder of the current trigger along with the SQL statement that caused the trigger to fire, as well as any subsequent triggers that would have been fired, are all terminated. If the SQL statement that caused the trigger to fire is itself part of another trigger, then that trigger resumes execution at the beginning of the next SQL command in the trigger action.

Updatable Views

Triggers make it possible to create something like updatable views. The idea here is that you create a view and then create a trigger that handles update events on that view.

SQLite supports triggers on views using the instead of keywords in the trigger definition. To illustrate this, let's create a view that combines foods with food_types:

create view foods_view as
  select f.id fid, f.name fname, t.id tid, t.name tname
  from foods f, food_types t;

This view joins the two tables according to their foreign key relationship. Notice that you have created aliases for all column names in the view. This allows you to differentiate the respective id and name columns in each table when you reference them from inside the trigger. Now, let's make the view updatable by creating an UPDATE trigger on it:

create trigger on_update_foods_view
instead of update on foods_view
for each row
begin
  update foods set name=new.fname where id=new.fid;
  update food_types set name=new.tname where id=new.tid;
end;

Now if you try to update the foods_view, this trigger gets called. The trigger simply takes the values provided in the UPDATE statement and uses them to update the underlying base tables foods and food_types.

You can just as easily add insert and delete triggers to complete the trigger based manipulation of data via views.

Transactions

By default, every SQL command in SQLite is run under its own transaction. That is, if you do not define a transaction scope with begin...commit/rollback, SQLite will implicitly wrap every individual SQL command with a begin...commit/rollback. In that case, every command that completes successfully is committed. Likewise, every command that encounters an error is rolled back. This mode of operation (implicit transactions) is referred to as autocommit mode: SQLite automatically runs each command in its own transaction, and if the command does not fail, its changes are automatically committed.

SQLite also supports the savepoint and release commands. These commands extend the flexibility of transactions so that a body of work that incorporates multiple statements can set a savepoint, which SQLite can then revert to in the event of a rollback. Creating a savepoint is a simple as issuing the savepoint command with a name of your choice, just as in this next example:

savepoint sv2;

Later, if we realize our processing needs to be reverted, instead of rolling all the way back to the start of the transaction, we can use a named rollback as follows:

rollback [transaction] to sv2;

Conflict Resolution

Constraint violations cause the command that committed the violation to terminate. What exactly happens when a command terminates in the middle of making a bunch of changes to the database? In most databases, all of the changes are undone. That is the way the database is programmed to handle a constraint violation—end of story.

SQLite, however, has an uncommon feature that allows you to specify different ways to handle (or recover from) constraint violations. It is called conflict resolution. Take, for example, the following UPDATE:

sqlite> update foods set id=800 - id;

SQL error: PRIMARY KEY must be unique

This results in a UNIQUE constraint violation because once the update statement reaches the 388th record, it attempts to update its id value to 800 - 388 = 412. But a row with an id of 412 already exists, so it aborts the command. But SQLite already updated the first 387 rows before it reached this constraint violation. What happens to them? The default behavior is to terminate the command and reverse all the changes it made, while leaving the transaction intact.

But what if you wanted these 387 changes to stick despite the constraint violation? Well, believe it or not, you can have it that way too, if you want. You just need to use the appropriate conflict resolution. There are five possible resolutions, or policies, that can be applied to address a conflict (constraint violation): replace, ignore, fail, abort, and rollback. These five resolutions define a spectrum of error tolerance or sensitivity: from replace, the most forgiving, to rollback, the most strict. The resolutions are defined as follows in order of their severity:

replace: When a unique constraint violation is encountered, SQLite removes the row (or rows) that caused the violation and replaces it (them) with the new row from the insert or update. The SQL operation continues without error. If a NOT NULL constraint violation occurs, the NULL value is replaced by the default value for that column. If the column has no default value, then SQLite applies the abort policy. It is important to note that when this conflict resolution strategy deletes rows in order to satisfy a constraint, it does not invoke delete triggers on those rows. This behavior, however, is subject to change in a future release.
ignore: When a constraint violation is encountered, SQLite allows the command to continue and leaves the row that triggered the violation unchanged. Other rows before and after the row in question continue to be modified by the command. Thus, all rows in the operation that trigger constraint violations are simply left unchanged, and the command proceeds without error.
fail: When a constraint violation is encountered, SQLite terminates the command but does not restore the changes it made prior to encountering the violation. That is, all changes within the SQL command up to the violation are preserved. For example, if an update statement encountered a constraint violation on the 100th row it attempts to update, then the changes to the first 99 rows already modified remain intact, but changes to rows 100 and beyond never occur as the command is terminated.
abort: When a constraint violation is encountered, SQLite restores all changes the command made and terminates it. abort is the default resolution for all operations in SQLite. It is also the behavior defined in the SQL standard. As a side note, abort is also the most expensive conflict resolution policy— requiring extra work even if no conflicts ever occur.
rollback: When a constraint violation is encountered, SQLite performs a rollback—aborting the current command along with the entire transaction. The net result is that all changes made by the current command and all previous commands in the transaction are rolled back. This is the most drastic level of conflict resolution where a single violation results in a complete reversal of everything performed in a transaction.

Conflict resolution can be specified within SQL commands as well as within table and index definitions. Specifically, conflict resolution can be specified in insert, update, create table, and create index. Furthermore, it has specific implications within triggers. The syntax for conflict resolution in insert and update is as follows:

insert or resolution into table (column_list) values (value_list);
update or resolution table set (value_list) where predicate;

The conflict resolution policy comes right after the insert or update command and is prefixed with OR. Also, the insert or replace expression can be abbreviated as just replace. This is similar to other database's merge or upsert behavior.

In the preceding update example, the updates made to the 387 records were rolled back because the default resolution is abort. If you wanted the updates to stick, you could use the fail resolution. To illustrate this, in the following example you copy foods into a new table test and use it as the guinea pig. You add an additional column to test called modified, the default value of which is no. In the update, you change this to yes to track which records are updated before the constraint violation occurs. Using the fail resolution, these updates will remain unchanged, and you can track afterward how many records were updated.

create table test as select * from foods;
create unique index test_idx on test(id);
alter table test add column modified text not null default 'no';
select count(*) from test where modified='no';

When defined within tables, conflict resolution is specified for individual columns. Here's an example:

sqlite> create temp table cast(name text unique on conflict rollback);
sqlite> insert into cast values ('Jerry');
sqlite> insert into cast values ('Elaine');
sqlite> insert into cast values ('Kramer');

Any insert or update that triggers a constraint violation on name will be arbitrated by the rollback resolution rather than the default abort. The result will abort not only the statement but the entire transaction as well.

Conflict resolution within tables and indices changes the default behavior of the operation from abort to that defined for the specific columns when those columns are the source of the constraint violation.

Conflict resolution at the statement level (DML) overrides that defined at the object level (DDL). Working from the previous example:

sqlite> begin;
sqlite> insert or replace into cast values('Jerry');
sqlite> commit;

the replace resolution in the insert overrides the rollback resolution defined on cast.name.

Database Locks

Locking is closely associated with transactions in SQLite. To use transactions effectively, you need to know a little something about how it does locking.

SQLite has coarse-grained locking. When a session is writing to the database, all other sessions are locked out until the writing session completes its transaction. To help with this, SQLite has a locking scheme that helps defer writer locks until the last possible moment in order to maximize concurrency.

SQLite uses a lock escalation policy whereby a connection gradually obtains exclusive access to a database in order to write to it. There are five different locking states in SQLite: unlocked, shared, reserved, pending, and exclusive. Each database session (or connection) can be in only one of these states at any given time. Furthermore, there is a corresponding lock for each state, except for unlocked—there is no lock required to be in the unlocked state.

To begin with, the most basic state is unlocked. In this state, no session is accessing data from the database. When you connect to a database or even initiate a transaction with BEGIN, your connection is in the unlocked state.

The next state beyond unlocked is shared. For a session to read from the database (not write), it must first enter the shared state and must therefore acquire a shared lock. Multiple sessions can simultaneously acquire and hold shared locks at any given time. Therefore, multiple sessions can read from a common database at any given time. However, no session can write to the database during this time—while any shared locks are active.

If a session wants to write to the database, it must first acquire a reserved lock. Only one reserved lock may be held at one time for a given database. Shared locks can coexist with a reserved lock. A reserved lock is the first phase of writing to a database. It does not block sessions with shared locks from reading, and it does not prevent sessions from acquiring new shared locks.

Once a session has a reserved lock, it can begin the process of making modifications; however, these modifications are cached and not actually written to disk. The reader's changes are stored in a memory cache.

When the session wants to commit the changes (or transaction) to the database, it begins the process of promoting its reserved lock to an exclusive lock. To get an exclusive lock, it must first promote its reserved lock to a pending lock. A pending lock starts a process of attrition whereby no new shared locks can be obtained. That is, other sessions with existing shared locks are allowed to continue as normal, but other sessions cannot acquire new shared locks. At this point, the session with the pending lock is waiting for the other sessions with shared locks to finish what they are doing and release them. Once all shared locks are released, the session with the pending lock can promote it to an exclusive lock. It is then free to make changes to the database. All of the previously cached changes are written to the database file.

Deadlocks

Why do you need to know about locking? Well, if you don't know what you are doing, you can end up in a deadlock.

Consider the following scenario. Two sessions, A and B—completely oblivious to one another—are working on the same database at the same time. Session A issues the first command, B the second and third, A the fourth, and so on.

Both sessions wind up in a deadlock. Session B was the first to try to write to the database and therefore has a pending lock. A attempts to write but fails when INSERT tries to promote its shared lock to a reserved lock.

For the sake of argument, let's say that A decides to just wait around for the database to become writable. So does B. Then at this point, everyone else is effectively locked out too. If you try to open a third session, it won't even be able to read from the database. The reason is that B has a pending lock, which prevents any sessions from acquiring shared locks. So, not only are A and B deadlocked, they have locked everyone else out of the database as well. Basically, you have a shared lock and one pending lock that don't want to relinquish control, and until one does, nobody can do anything.

How do you avoid a deadlock? It's not like A and B can sit down in a meeting and work it out with their lawyers. A and B don't even know each other exists. The answer is to pick the right transaction type for the job.