JSON Schema for JSON Validating, Formating etc.
A JSON Schema is a JSON object that defines the structure of a set of JSON objects.
An Example
{
"type": "object",
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" },
"birthday": { "type": "string", "format": "date" },
"address": {
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" },
"country": { "type" : "string" }
}
}
}
}
The type Keyword
The most common thing to do in a JSON Schema is to restrict to a specific type. The type keyword is used for that.
"type" = {"string" | "boolean" | "integer" | "number" | "array" | "object" | "null" | "regular expression" ...}
Declaring a JSON Schema
It's not always easy to tell which draft a JSON Schema is using. You can use the $schema keyword to declare which version of the JSON Schema specification the schema is written to. See $schema for more information. It's generally good practice to include it, though it is not required.
{ "$schema": "https://json-schema.org/draft/2020-12/schema" }
The type Keyword
The most common thing to do in a JSON Schema is to restrict to a specific type. The type keyword is used for that.
The types/literals/values allowed for type are: array, boolean, integer, number, null, object, regular expressions, and string.
The type keyword can take two forms:
- A single string
-
When it is a single string, it must be one of the types mentioned above (array, boolean, integer, number, null, object, regular expressions, or string). This specifies that the instance data is only valid when it matches that specific type.
Here is an example of using the string keyword as a single string:
{ "type": "number" } - An array of strings
-
When type is used as an array, it contains more than one string specifying the types mentioned above. In this case, the instance data is valid if it matches any of the given types.
Here is an example using the type keyword as an array of strings, where instance data of the type string and number are valid but array isn't:
{ "type": ["number", "string"] }
JSON Data Types
(Heavily from https://json-schema.org/understanding-json-schema/reference/type)
Arrays
There are two ways in which arrays are generally used in JSON Schema:
- List validation: a sequence of arbitrary length where each item matches the same schema.
- Tuple validation: a sequence of fixed length where each item may have a different schema. In this usage, the index (or location) of each item is meaningful as to how the value is interpreted. (This usage is often given a whole separate type in some programming languages, such as
std::tuplein C++).
List Validation with items
List validation is useful for arrays of arbitrary length where each item matches the same schema. For this kind of array, set the items keyword to a single schema that will be used to validate all of the items in the array.
In the following example, we define that each item in an array is a number:
{
"type": "array",
"items": {
"type": "number"
}
}
Tuple Validation with prefixItems
Tuple validation is useful when the array is a collection of items where each has a different schema and the ordinal index of each item is meaningful.
For example, you may represent a street address such as 1600 Pennsylvania Avenue NW as a 4-tuple of the form:
[number, street_name, street_type, direction]
where each field will have a different schema.
We enforce this with the prefixItems keyword. prefixItems is an array, where each item is a schema object that corresponds to each index of the document's array. That is, an array where the first element validates the first element of the input array, the second element validates the second element of the input array, etc.
Here's the example schema:
{
"type": "array",
"prefixItems": [
{ "type": "number" },
{ "type": "string" },
{ "enum": ["Street", "Avenue", "Boulevard"] },
{ "enum": ["NW", "NE", "SW", "SE"] }
]
}
Additional Items
The items keyword can be used to control whether it's valid to have additional items in a tuple beyond what is defined in prefixItems. The value of the items keyword is a schema that all additional items must pass in order for the keyword to validate.
Here, we'll reuse the example schema above, but set items to false, which has the effect of disallowing extra items in the tuple.
{
"type": "array",
"prefixItems": [
{ "type": "number" },
{ "type": "string" },
{ "enum": ["Street", "Avenue", "Boulevard"] },
{ "enum": ["NW", "NE", "SW", "SE"] }
],
"items": false
}
You can express more complex constraints by using a non-boolean schema to constrain what value additional items can have. In that case, we could say that additional items are allowed, as long as they are all strings:
{
"type": "array",
"prefixItems": [
{ "type": "number" },
{ "type": "string" },
{ "enum": ["Street", "Avenue", "Boulevard"] },
{ "enum": ["NW", "NE", "SW", "SE"] }
],
"items": { "type": "string" }}
Unevaluated Items
The unevaluatedItems keyword is useful mainly when you want to add or disallow extra items to an array.
unevaluatedItems applies to any values not evaluated by an items, prefixItems, or contains keyword. Just as unevaluatedProperties affects only properties in an object, unevaluatedItems affects only items in an array.
As with items, if you set unevaluatedItems to false, you can disallow extra items in the array.
{
"prefixItems": [
{ "type": "string" },
{ "type": "number" }
],
"unevaluatedItems": false
}
Contains
While the items schema must be valid for every item in the array, the contains schema only needs to validate against one or more items in the array.
{
"type": "array",
"contains": { "type": "number" }}
A single number is enough to make this pass:
["life", "universe", "everything", 42]
minContains and maxContains
minContains and maxContains can be used with contains to further specify how many times a schema matches a contains constraint. These keywords can be any non-negative number including zero.
{
"type": "array",
"contains": { "type": "number" },
"minContains": 2,
"maxContains": 3
}
Array Size
The length of the array can be specified using the minItems and maxItems keywords. The value of each keyword must be a non-negative number. These keywords work whether doing list validation or tuple-validation.
{
"type": "array",
"minItems": 2,
"maxItems": 3
}
Uniqueness
A schema can ensure that each of the items in an array is unique. Simply set the uniqueItems keyword to true.
{
"type": "array",
"uniqueItems": true
}
Objects
The properties (key-value pairs) on an object are defined using the properties keyword. The value of properties is an object, where each key is the name of a property and each value is a schema used to validate that property. Any property that doesn't match any of the property names in the properties keyword is ignored by this keyword.
For example, let's say we want to define a simple schema for an address made up of a number, street name and street type:
{
"type": "object",
"properties": {
"number": { "type": "number" },
"street_name": { "type": "string" },
"street_type": { "enum": ["Street", "Avenue", "Boulevard"]}
}
}
which would match an object such as:
{"number": 1600, "street_name": "Pennsylvania", "street_type": "Avenue" }}
Pattern Properties
Sometimes you want to say that, given a particular kind of property name, the value should match a particular schema. That's where patternProperties comes in: it maps regular expressions to schemas. If a property name matches the given regular expression, the property value must validate against the corresponding schema.
In this example, any properties whose names start with the prefix S_
must be strings, and any with the prefix I_
must be integers. Any properties that do not match either regular expression are ignored.
{
"type": "object",
"patternProperties": {
"^S_": { "type": "string" },
"^I_": { "type": "integer" }
}
}
which would match:
{ "S_25": "This is a string" }
or
{ "I_0": 42 }
Additional Properties
The additionalProperties keyword is used to control the handling of extra stuff, that is, properties whose names are not listed in the properties keyword or match any of the regular expressions in the patternProperties keyword. By default any additional properties are allowed (default is true).
The value of the additionalProperties keyword is a schema that will be used to validate any properties in the instance that are not matched by properties or patternProperties. Setting the additionalProperties schema to false means no additional properties will be allowed.
Reusing the example from Properties, but this time setting additionalProperties to false:
{
"type": "object",
"properties": {
"number": { "type": "number" },
"street_name": { "type": "string" },
"street_type": { "enum": ["Street", "Avenue", "Boulevard"] }
},
"additionalProperties": false
}
You can use non-boolean schemas to put more complex constraints on the additional properties of an instance. For example, one can allow additional properties, but only if their values are each a string:
{
"type": "object",
"properties": {
"number": { "type": "number" },
"street_name": { "type": "string" },
"street_type": { "enum": ["Street", "Avenue", "Boulevard"] }
},
"additionalProperties": {"type": "string"}
}
Unevaluated Properties*
The unevaluatedProperties keyword is similar to additionalProperties except that it can recognize properties declared in subschemas.
[...]
Required Properties
By default, the properties defined by the properties keyword are not required. However, one can provide a list of required properties using the required keyword.
The required keyword takes an array of zero or more strings. Each of these strings must be unique.
In the following example schema defining a user record, we require that each user has a name and e-mail address, but we don't mind if they don't provide their address or telephone number:
{
"type": "object",
"properties": {
"name": { "type": "string" },
"email": { "type": "string" },
"address": { "type": "string" },
"telephone": { "type": "string" }
},
"required": ["name", "email"]
}
Property Names
The names of properties can be validated against a schema, irrespective of their values. This can be useful if you don't want to enforce specific properties, but you want to make sure that the names of those properties follow a specific convention. You might, for example, want to enforce that all names are valid ASCII tokens so they can be used as attributes in a particular programming language.
{
"type": "object",
"propertyNames": {
"pattern": "^[A-Za-z_][A-Za-z0-9_]*$"
}
}
Size
The number of properties on an object can be restricted using the minProperties and maxProperties keywords. Each of these must be a non-negative integer.
{
"type": "object",
"minProperties": 2,
"maxProperties": 3
}
Strings
String Length
The length of a string can be constrained using the minLength and maxLength keywords. For both keywords, the value must be a non-negative number.
{
"type": "string",
"minLength": 2,
"maxLength": 3
}
Strings that are Regular Expressions
The pattern and patternProperties keywords use regular expressions to express constraints.
The following example matches a simple North American telephone number with an optional area code:
{
"type": "string",
"pattern": "^(\\([0-9]{3}\\))?[0-9]{3}-[0-9]{4}$"}
so that both "555-1212" and "(888)555-1212" would be matched.
Enumerated Values
The enum keyword is used to restrict a value to a fixed set of values. It must be an array with at least one element, where each element is unique.
Below are several examples demonstrating its usage.
This example demonstrates how to validate that the color property of a street light is either "red", "amber", or "green":
{
"properties": {
"color": {
"enum": ["red", "amber", "green"]
}
}
}
In the following example, the schema is extended to include null (to represent an "off" state) and the number 42:
{
"properties": {
"color": {
"enum": ["red", "amber", "green", null, 42]
}
}
}
Constant values
The const keyword is used to restrict a value to a single value.
For example, if you only support shipping to the United States for export reasons:
{
"properties": {
"country": {
"const": "United States of America"
}
}
}
The format Keyword
The format keyword conveys semantic information for values that may be difficult or impossible to describe using JSON Schema. Typically, this semantic information is described by other documents. The JSON Schema Validation specification defines several formats, but this keyword also allows schema authors to define their own formats.
For example, because JSON doesn't have a DateTime
type, dates need to be encoded as strings. format allows the schema author to indicate that the string value should be interpreted as a date. By default, format is just an annotation and does not affect validation.
Optionally, validator implementations can provide a configuration option to enable format to function as an assertion rather than just an annotation. That means that validation fails when, for example, a value with a date format isn't in a form that can be parsed as a date. This allows values to be constrained beyond what other tools in JSON Schema, including Regular Expressions, can do.
The JSON Schema specification has a bias toward networking-related formats due to its roots in web technologies. However, custom formats may also be used if the parties exchanging the JSON documents share information about the custom format types. A JSON Schema validator will ignore any format type it does not understand.
Open-Source Command Line (CLI) jsonschema
(Visit this project at https://github.com/sourcemeta/jsonschema)
A comprehensive solution for maintaining repositories of schemas and ensuring their quality, both during local development and when running on CI/CD pipelines. For example:
- Ensure your schemas are valid
- Debug unexpected schema validation results
- Unit test your schemas against valid and invalid instances
- Enforce consistent indentation and keyword ordering in your schema files
- Detect and fix common JSON Schema anti-patterns
- Inline external references for conveniently distributing your schemas
A Simple Validation Example
jsonschema validate schema-for-person.json a-person.json
where schema-for-person.json is:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" },
"age": {"type": "number"}
}
}
and a-person.json is:
{
"first_name": "Joanne",
"last_name": "Stein",
"born": 1999
}
Note that our schema would also allow an object lacking any of the properties, or all of them:
{
"first_name": "Joanne",
"born": 1999
}
We could add a property outside the schema:
{
"first_name": "Joanne",
"last_name": "Stein",
"born": 1999,
"married": false
}
and so on.
jsonschema Usage
The functionality provided by the JSON Schema CLI is divided into commands. The following sections describe each feature in detail. Additionally, running the JSON Schema CLI without passing a command will print convenient reference documentation:
jsonschema version: print the current version of the JSON Schema CLI to standard output, without thevprefix. Example:jsonschema versionjsonschema validatejsonschema metaschema(ensure a schema is valid)jsonschema compile(for pre-compiling schemas)jsonschema test(write unit tests for your schemas)jsonschema fmtjsonschema lintjsonschema bundle(for inlining remote references in a schema). A schema may contain references to remote schemas outside the scope of the given schema. These remote schemas may live in other files, or may be served by others over the Internet. JSON Schema supports a standardized process, referred to as bundling, to resolve remote references in advance and inline them into the given schema for local consumption or further distribution. The JSON Schema CLI supports this functionality through thebundlecommand.jsonschema inspect(for debugging references)jsonschema encode(for binary compression)jsonschema decode
jsonschema validate for Validation
jsonschema metaschema for ***
One usually validates a metaschema like so:
jsonschema metaschema [schemas-or-directories...] [--http/-h] [--verbose/-v] [--extension/-e <extension>] [--resolve/-r <schemas-or-directories> ...] [--ignore/-i <schemas-or-directories>] [--trace/-t] [--default-dialect/-d <uri>] [--json/-j]
Ensure that a schema or a set of schemas are considered valid with regards to their metaschemas. The --json/-j option outputs the evaluation result using the JSON Schema Basic standard format.
The --resolve/-r option is crucial to import custom meta-schemas into the resolution context, otherwise the validator won't know where to look for them.
Common usage:
jsonschema metaschema mySchema.json
For example, consider this fictitious JSON Schema that follows the Draft 4 dialect but sets the type property to an invalid value:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": 1
}
Running the metaschema command on it will print an error.
jsonschema lint for Linting
Like with traditional programming languages, writing efficient and maintainable schemas takes experience, and there are lots of common pitfalls. The JSON Schema CLI provides a lint command that can check your schemas against various common anti-patterns and automatically fix many of them.
Use --list/-l to print all the available rules and brief descriptions about them.
The --strict/-s enables additional opinionated strict rules with a focus on preventing mistakes and promoting correctness.
jsonschema fmt for Formating Schema
The default is for a schema to be formated in-place, that is, the formatted version will be substituted.
jsonschema fmt [schemas-or-directories...] [--check/-c] [--verbose/-v] [--resolve/-r <schemas-or-directories> ...] [--extension/-e <extension>] [--ignore/-i <schemas-or-directories>] [--keep-ordering/-k] [--indentation/-n <spaces>] [--default-dialect/-d <uri>]
Typicall invokation:
jsonschema fmt mySchema.json --keep-ordering
You can change the default indentation (2 spaces) with switch --indentation, as in --indentation 4 (4 spaces)
You can check that a single JSON Schema is properly formatted with switch --check.