(SQL) Indexes

Indexes are a construct designed to speed up queries under certain conditions. Consider the following query:

SELECT * FROM people WHERE city='Oxford';

When a database searches for matching rows, the default method it uses to perform this is called a sequential scan. That is, it literally searches (or scans) every row in the table to see whether its city attribute matches Oxford.

However, if this query is used frequently and the people table is very large, it makes far more sense to use an index approach to finding the data.

Indexes also increase the size of the database. They literally keep a copy of all columns they index. If you index every column in a table, you effectively double the size of the table. Another consideration is that indexes must be maintained. When you insert, update, or delete records, in addition to modifying the table, the database must modify each and every index on that table as well. So although indices can make queries run much faster, they can slow down inserts, updates, and similar operations.

The command to create an index is as follows:

create index [unique] index_name on table_name (columns)

The variable index_name is the name of the index, and table_name is the name of the table containing the column(s) to index. The variable columns is either a single column or a comma-separated list of columns.

If you use the unique keyword, then the index will have the added constraint that all values in the index must be unique. This applies to both the index and, by extension, to the column or columns it indexes. The unique constraint covers all columns defined in the index, and it is their combined values (not individual values) that must be unique. Here's an example:

sqlite> create table foo(a text, b text);
sqlite> create unique index foo_idx on foo(a,b);
sqlite> insert into foo values ('unique', 'value');
sqlite> insert into foo values ('unique', 'value2');
sqlite> insert into foo values ('unique', 'value');
SQL error: columns a, b are not unique

You can see here that uniqueness is defined by both columns collectively, not individually. Notice that collation plays an important role here as well.

To remove an index, use the drop index command, which is defined as follows:

drop index index_name;

When you create an index, have a reason for creating it. Make sure there is a specific performance gain you are getting before you take on the overhead that comes with it. Well-chosen indexes are a wonderful thing. Indexes that are thoughtlessly scattered here and there in the vain hope of performance are of dubious value.

Collations

Each column in the index can have a collation associated with it. For example, to create a case-insensitive index on foods.name, you'd use the following:

create index foods_name_idx on foods (name collate nocase);

This means that values in the name column will sort without respect to case.