SQLite: A Light OpenSource Relational Database (RDBMS and Library)

SQLite is a software library written in C. It was developed by D. Richard Hipp in 2000 originally as part of a contract with the U.S. Navy that was implemented by General Dynamics. Today, it consists of some 184,000 lines of code.

SQLite is in the public domain, so it can be used by anyone. Further details and links to download the source code are available at sqlite.org.

You can compile the source code into a library that, in turn, you can use in an application program. Although you may find references to SQLite as a DBMS, it is, strictly speaking, just this library. The container in which that library is compiled (a class, a framework, or a full-fledged DBMS) provides the larger DBMS functionalities.

Is sqlite a DBMS?

In some ways, this only matters if that question can be answered in a way that influences the way you design or implement your SQLite-based project. However, if you track down references to SQLite, you'll see that it is usually referred to (correctly) as a library or as a database engine.

The code has been designed from the beginning to be compact and reusable. (That's in keeping with its origin with the U.S. Navy is relevant here—when you're on a ship at sea, every resource is limited, including power, space, and weight. Furthermore, in modern ships, technological components must function together even if their origins are with multiple vendors. That's the SQLite environment.)

Whether you are worried about the limited resources on a ship, on a mobile device running an operating system such as Android or iOS, or on a small mobile device such as a programmable beacon . . . or even the limited resources on a supercomputer (limited is always relative), SQLite is a good choice in many cases.

The following subsections highlight some of the major features of SQLite that implement this mandate and that are important to you as you use it in whatever container you choose (DBMS, class, framework, or basic library). These features are important to you as you undertake a SQLite-based project for a mobile device.

The features discussed actually all revolve around the fact that SQLite is a library that is designed to be used by a single user to handle database functionality. (Before you throw up your hands, read on to see how SQLite can function very well in a multiuser and multiprocess environment.) The following are the features to consider:

SQLite is designed for a single user
SQLite is self-contained
SQLite supports transactions and is ACID-compliant

Lastly, I discuss SQLite on Linux.

SQLite Is Designed for a Single User

One of the biggest differences between SQLite and most DBMSs is that SQLite is designed for a single user. Most DBMSs manage multiple users including various security features that allow or block access to specific SQL commands and features. Even more important to many people, a DBMS manages contention for resources so that several users can apparently use the same data concurrently.

Apparently is the key word here because although each user typically thinks that he or she has unique access to the database, in fact, behind the scenes the DBMS manages concurrency so that in some cases, it has locked a record to prevent access to it by a second user while another user is updating it.

Single User Doesn't Mean Single-Thread

SQLite manages concurrency within its own environment. This means that it may have multiple threads running at the same time to perform its own tasks, but those threads are managed within the SQLite environment itself. They do not represent separate users.

Using SQLite with Multiple Users

How can you have multiple users when SQLite is designed for a single user? The answer is simple: you manage multiple users yourself. There are a number of ways of doing this, but in general, what you do is to push the multiuser management onto the app, class, or language into which SQLite is embedded. Apps typically have the ability to communicate with one another (subject to security and platform constraints). Thus, although SQLite is not going to manage the case of User A and User B attempting to modify the same data at the same time, your app can do so.

Typical concurrency strategies involve either having a master process that manages the concurrency or having a mechanism whereby multiple independent processes communicate without a master process. You find many examples of multiple independent processes in apps such as Dropbox and in many cloud-based apps. (In the case of Dropbox and cloud-based apps, there may be a process continually running somewhere in the cloud and, perhaps separate processes running on active clients. Whether or not control resides in a central process or is distributed among the clients (and master) varies depending on the specific implementation.)

Thus, SQLite is perfectly capable of functioning in a multiuser world; it just needs to be runninginside apps or other processes that themselves implement the multiuser features.

SQLite Is Self-Contained

SQLite is self-contained in two ways.

The code itself is self-contained ANSI-C code. It makes minimal use of C libraries. In fact, the only ones it uses are:
- memset()
- memcpy()
- memcmp()
- strcmp()
- malloc()
- free()
- realloc()
The data store itself is self-contained, portable, and platform-agnostic.

Self-Contained Code

Self-contained code means that when you include the SQLite library in your project (either directly or through a language, class, or framework), you have everything you need. You don't need to include additional libraries.

You don't have to worry about versions, and, once you have a compiled SQLite library, you can generally reuse it without being dependent on changes in components. (Remember, though, that in most cases SQLite is embedded in a language, class, framework, or DBMS so the container is what you will need to update from time to time.)

Because SQLite is in the public domain, you don't have to worry about licenses or license fees. (You can, however, obtain a license as described on sqlite.org. That option is provided for some users who need to demonstrate to their management that they actually have the right to use SQLite.)

Remember that SQLite is often contained within a class, framework, or language that you are using so this feature (which to a large extent makes such containability possible) may not be visible to you.

Note: If you go to sqlite.org, you'll find the various download options for the SQLite source code. There are options to download it in sections that are combined (amalgamated) in the final build. This is done primarily to accommodate development environments that have trouble handling the full build at one time.

Self-Contained Data

The data store that SQLite uses for a database is designed to be self-contained and cross-platform. This means that you can transfer a SQLite file from one environment to another without problems in most cases. (That caveat is necessary because you may encounter issues in specific configurations, but for many if not most SQLite users, the first step in verifying that a database can be moved is to actually move it—this usually works.)

In its most basic (and original) form, each SQLite database is stored in a single file. The file is readable and writable on any platform

With the database in a single file, users can see it and move it around if they want. (If they do so, the app that you are building must be able to find the file—perhaps with the user's help.) Users can also delete the database file. Because it is self-contained, that's very simple, and it doesn't corruptother SQLite databases. Of course, the fact that a user may be able to delete the database means that in your implementation, you may want to hide it or place it in protected places. If you do that, you often take advantage of platform-specific features such as hidden folders, but the SQLite file itself remains an ordinary file that can be moved around and copied to other devices.

Because the database file is self-contained, it contains both the database data and its schema (structure if you're not from the database world). In part because each database is self-contained, SQLite is referred to as serverless (there is no separate server process).

The fact that a SQLite database can be stored in a single file leads to a common use of SQLite to implement structure and database functionality inside what appears to the user to be a simple file.

Note: Although SQLite began with the one-database/one-file structure, it now supports write-ahead logging (WAL) as an option. WAL is a technique that optimizes database performance using multiple files.

Architecture

SQLite has an elegant, modular architecture that takes some unique approaches to relational database management. It consists of eight separate modules grouped within three major subsystems. These modules divide query processing into discrete tasks that work like an assembly line. The top of the stack compiles the query, the middle executes it, and the bottom handles storage and interfacing with the operating system.

The Interface

The interface is the top of the stack and consists of the SQLite C API. It is the means through which programs, scripting languages, and libraries alike interact with SQLite. Literally, this is where you as developer, administrator, student, or mad scientist talk to SQLite.

The Compiler

The compilation process starts with the tokenizer and parser. They work together to take a Structured Query Language (SQL) statement in text form, validate its syntax, and then convert it to a hierarchical data structure that the lower layers can more easily manipulate. SQLite's tokenizer is hand-coded. Its parser is generated by SQLite's custom parser generator, which is called Lemon. The Lemon parser generator is designed for high performance and takes special precautions to guard against memory leaks. Once the statement has been broken into tokens, evaluated, and recast in the form of a parse tree, the parser passes the tree down to the code generator.

The code generator translates the parse tree into a kind of assembly language specific to SQLite. This assembly language consists of instructions that are executable by its virtual machine. The code generator's sole job is to convert the parse tree into a complete mini-program written in this assembly language and to hand it off to the virtual machine for processing.

The Virtual Machine

At the center of the stack is the virtual machine, also called the virtual database engine (VDBE). The VDBE is a register-based virtual machine that works on byte code, making it independent of the underlying operating system, CPU, or system architecture. The VDBE's byte code (or virtual machine language) consists of more than 100 possible tasks known as opcodes, which are all centered on database operations. The VDBE is designed specifically for data processing. Every instruction in its instruction set either accomplishes a specific database operation (such as opening a cursor on a table, making a record, extracting a column, or beginning a transaction) or performs manipulations to prepare for such an operation. Altogether and in the right order, the VDBE's instruction set can satisfy any SQL command, however complex. Every SQL statement in SQLite—from selecting and updating rows to creating tables, views, and indexes—is first compiled into this virtual machine language, forming a stand-alone instruction set that defines how to perform the given command. For example, take the following statement:

SELECT name FROM episodes LIMIT 10;

This compiles into a VDBE program comprising 15 instructions. These instructions, performed in this particular order with the given operands, will return the name field of the first ten records in the episodes table (which is part of the example database included with this book).

In many ways, the VDBE is the heart of SQLite. All of the modules before it work to create a VDBE program, while all modules after it exist to execute that program, one instruction at a time.

The Back End

The back end consists of the B-tree, page cache, and OS interface. The B-tree and page cache (pager) work together as information brokers. Their currency is database pages, which are uniformly sized blocks of data that, like shipping containers, are made for transportation. Inside the pages are the goods: more interesting bits of information such as records and columns and index entries. Neither the B-tree nor the pager has any knowledge of the contents. They only move and order pages; they don't care what's inside.

The B-tree's job is order. It maintains many complex and intricate relationships between pages, which keeps everything connected and easy to locate. It organizes pages into tree-like structures (which is the reason for the name), which are highly optimized for searching. The pager serves the B-tree, feeding it pages. Its job is transportation and efficiency. The pager transfers pages to and from disk at the B-tree's behest. Disk operations are still some of the slowest things a computer has to do, even with today's solid-state disks. Therefore, the pager tries to speed this up by keeping frequently used pages cached in memory and thus minimizes the number of times it has to deal directly with the hard drive. It uses special techniques to predict which pages will be needed in the future and thus anticipate the needs of the B-tree, keeping pages flying as fast as possible. Also in the pager's job description are transaction management, database locking, and crash recovery. Many of these jobs are mediated by the OS interface.

Things such as file locking are often implemented differently in different operating systems. The OS interface provides an abstraction layer that hides these differences from the other SQLite modules. The end result is that the other modules see a single consistent interface with which to do things like file locking. So, the pager, for example, doesn't have to worry about doing file locking one way on Windows and doing it another way on different operating systems such as Unix. It lets the OS interface worry about this. It just says to the OS interface, Lock this file, and the OS interface figures out how to do that based on the operating system on which it happens to be running. Not only does the OS interface keep code simple and tidy in the other modules, but it also keeps the messy issues cleanly organized and at arm's length in one place. This makes it easier to port SQLite to different operating systems—all of the OS issues that must be addressed are clearly identified and documented in the OS interface's API.

Utilities and Test Code

Miscellaneous utilities and common services such as memory allocation, string comparison, and Unicode conversion routines are kept in the utilities module. This is basically a catchall module for services that multiple modules need to use or share. The testing module contains a myriad of regression tests designed to examine every little corner of the database code. This module is one of the reasons SQLite is so reliable: it performs a lot of regression testing and makes those tests available for anyone to run and improve.

SQLite on Linux

First I install the following packages:

sqlite3: provides same-named executable sqlite3
sqlite3-doc: found at /usr/share/doc/sqlite3/
sqlite-utils, documented at /usr/share/doc/sqlite-utils/ and below, provides features like:
- Pipe JSON (or CSV or TSV) directly into a new SQLite database file, automatically creating a table with the appropriate schema
- Run in-memory SQL queries, including joins, directly against data in CSV, TSV or JSON files and view the results.
- Configure SQLite full-text search against your database tables and run search queries against them, ordered by relevance
- Run transformations against your tables to make schema changes that SQLite `ALTER TABLE` does not directly support, such as changing the type of a column
- Extract columns into separate tables to better normalize your existing data

I check up on JSON support...