Writing JSON Documents
Let's discuss both the structure of JSON documents and the content of a collection of documents, say one held in a database.
There are three main areas to the consider: the basic document structure (fields and field types), the use of document types to track different document structures, and the structure and consistency of the documents that you store.
Document Structure
The main consideration is that because you can put all of the information about a single item into one document, you need a structure that is capable of defining and displaying that information. JSON is very flexible, but keep in mind that you also want to easily process the information.
Just like in a database that requires a schema, there are some conventions and field types that you should consider including in your documents.
Some good habits to get into:
- Don't store a separate ID or reference; use the document ID as the unique identifier. The exception to this rule, of course, is if you have a separate identifier (such as a product code, ISBN, or other information) that you want to use in addition to your main document ID because it provides reference or connectivity information.
- Consider using a type to identify different document types. Some likely values for type are:
quote
,comment
,summary
,discussion
,anecdote
etc. -
Use fields to store the document data, and use compound fields to store related information. For example, if you are modeling a blog post system, store the blog data as top-level fields in the document and comments as a compound element:
- Remember to include standard fields such as timestamps (created, updated), and status so that you can organize and identify the content.
-
Use arrays for lists of values. Sounds obvious, but within a document database, the temptation can be to create a completely flat structure. A typical example is tags, often used to classify data:
{ "tags" : ["blog", "article", "computing"] }
-
Don't concatenate values because you can, but use a compound value to hold the information. For example, when listing ingredients within a recipe, the temptation is to put the full ingredient—2 cups carrots—into a single field within your document. When you come to extract or search on that information, you will probably want the carrots more than the measurement. Instead, store it as a compound value:
[ { "ingredient": "butter", "measure": "50 g" }, { "ingredient": "onion", "measure": "1" }, ... ]
-
Don't rely on the implied field sequence of the document. For example, don't assume that if you create the document with three fields—title, author, and description—that the fields will be stored or returned in that order.
This doesn't affect top-level fields so much since you can access them by name, but it does affect compound values. If you want to retain a specific order, use an array of compound values, as in the previous ingredient example.
-
When thinking about your data structure, decide on whether you want to use one document that contains all of the information or multiple documents that you will later combine together with a clever view.
Using the blog post as an example, you can put the blog and comments into one document, or the blog content in one and the individual comments in further documents (one per comment).
The main consideration is how frequently you expect to update the information. If the blog and comments are one document, the entire document will need to be retrieved, updated with the new comment, and then saved back. If the blog and comments are separate documents, all you need do is another document with the comment content.
Document Types*
If you are constrained to store all your documents in one database, you had better include a field called type or schema that identifies what the document contains. For example, in a blog enviroment you might store the blog posts and comments separately, so a blog post would like this:
{ "title": "My Blog Post", "created_at": "2011-11-27T14:34", "content" : "My first blog post" "author" : "MC Brown", "schema" : "blogpost", }
A comment should be identified (and formatted) differently:
{ "schema": "comment", "blogdocid": "myblogpost", "from": "Joe Blog", "comment": "Good post!", }
When writing views and searching and referring the content, the type or schema field can be used to identify the different document types and create different views and representations of the information accordingly.
Structure and Consistency
There are no constraints or schemas for a document database, but that doesn't mean that you can ignore aspects like the simplicity and consistency of the format. For example, there's no point in using a field for the title of a recipe if in one document the field name is title
, recipetitle
in another, and recipe_name
in yet others.
It is a good idea to employ some basic consistency in your field names and contents to ensure you can cope with future changes and updates. I recomend using the same field names for all sorts of different data types across many different databases.
It is advisable to store and use a sample document that contains the basic document structure, which you can then use as a template or reference for your other documents.
ID's with _id
Both MongoDB and CouchDB add an ID field to the documents that they handle.