Using Apache CouchDB

`/{db}/_find`

(From https://docs.couchdb.org/en/stable/api/database/find.html)

Find documents using a declarative JSON querying syntax. Queries will use custom indexes, specified using the _index endpoint, if available. Otherwise, when allowed, they use the built-in _all_docs index, which can be arbitrarily slow.

Structure of Request JSON Object

A Request JSON Object holds the following fields:

selector (object) – JSON object describing criteria used to select documents. More information provided in the section on selector syntax. Required
limit (number) – Maximum number of results returned. Default is 25. Optional
skip (number) – Skip the first ‘n’ results, where ‘n’ is the value specified. Optional
sort (array) – JSON array following sort syntax. Optional
fields (array) – JSON array specifying which fields of each object should be returned. If it is omitted, the entire object is returned. More information provided in the section on filtering fields. Optional
use_index (string|array) – Request a query to use a specific index. Specified either as "<design_document>" or ["<design_document>", "<index_name>"]. It is not guaranteed that the index will be actually used because if the index is not valid for the selector, fallback to a valid index is attempted. Therefore that is more like a hint. When fallback occurs, the details are given in the warning field of the response. Optional
allow_fallback (boolean) – Tell if it is allowed to fall back to another valid index. This can happen on running a query with an index specified by use_index which is not deemed usable, or when only the built-in _all_docs index would be picked in lack of indexes available to support the query. Disabling this fallback logic causes the endpoint immediately return an error in such cases. Default is true. Optional
conflicts (boolean) – Include conflicted documents if true. Intended use is to easily find conflicted documents, without an index or view. Default is false. Optional
r (number) – Read quorum needed for the result. This defaults to 1, in which case the document found in the index is returned. If set to a higher value, each document is read from at least that many replicas before it is returned in the results. This is likely to take more time than using only the document stored locally with the index. Optional, default: 1
bookmark (string) – A string that enables you to specify which page of results you require. Used for paging through result sets. Every query returns an opaque string under the bookmark key that can then be passed back in a query to get the next page of results. If any part of the selector query changes between requests, the results are undefined. Optional, default: null
update (boolean) – Whether to update the index prior to returning the result. Default is true. Optional
stable (boolean) – Whether or not the view results should be returned from a “stable” set of shards. Optional
stale (string) – Combination of update=false and stable=true options. Possible options: "ok", false (default). Optional Note that this parameter is deprecated. Use stable and update instead. See Views Generation for more details.
execution_stats (boolean) – Include execution statistics in the query response. Optional, default: false

Using an Index

Example request body for finding documents using an index:

POST /movies/_find HTTP/1.1
Accept: application/json
Content-Type: application/json
Content-Length: 168
Host: localhost:5984

{
    "selector": {
        "year": {"$gt": 2010}
    },
    "fields": ["_id", "_rev", "year", "title"],
    "sort": [{"year": "asc"}],
    "limit": 2,
    "skip": 0,
    "execution_stats": true
}

Example response when finding documents using an index:

HTTP/1.1 200 OK
Cache-Control: must-revalidate
Content-Type: application/json
Date: Thu, 01 Sep 2016 15:41:53 GMT
Server: CouchDB (Erlang OTP)
Transfer-Encoding: chunked

{
    "docs": [
        {
            "_id": "176694",
            "_rev": "1-54f8e950cc338d2385d9b0cda2fd918e",
            "year": 2011,
            "title": "The Tragedy of Man"
        },
        {
            "_id": "780504",
            "_rev": "1-5f14bab1a1e9ac3ebdf85905f47fb084",
            "year": 2011,
            "title": "Drive"
        }
    ],
    "execution_stats": {
        "total_keys_examined": 200,
        "total_docs_examined": 200,
        "total_quorum_docs_examined": 0,
        "results_returned": 2,
        "execution_time_ms": 5.52
    }
}

Selector Syntax

Selectors are expressed as a JSON object describing documents of interest. Within this structure, you can apply conditional logic using specially named fields.

Whilst selectors have some similarities with MongoDB query documents, these arise from a similarity of purpose and do not necessarily extend to commonality of function or result.

Elementary selector syntax requires you to specify one or more fields, and the corresponding values required for those fields. This selector matches all documents whose "director" field has the value Lars von Trier.

{ "director": "Lars von Trier" }

A simple selector, inspecting specific fields:

"selector": {
    "title": "Live And Let Die"
},
"fields": [
    "title",
    "cast"
]

You can create more complex selector expressions by combining operators. For best performance, it is best to combine ‘combination’ or ‘array logical’ operators, such as $regex, with an operator that defines a contiguous range of keys such as $eq, $gt, $gte, $lt, $lte, and $beginsWith (but not $ne). For more information about creating complex selector expressions, see creating selector expressions.

Views and Map-Reduce

CouchDB uses views filtered through map-reduce to query all the documents of your database. Each view has a map- and optionally a reduce-function. Doctrine CouchDB ODM allows you to create and query views in your application.

MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of a Map() procedure (method) that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() method that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies).

A MapReduce view is created by adding it to the _design document for the database (which we will discuss later). You can also use Futon, which is accessible with the following URL once you have CouchDB set up:

http://127.0.0.1:5984/_utils/

to create a Temporary View directly through the interface. So I would set up a map and a reduce function. The result which would be displayed at the bottom of the image is the result of the map step, which contains rows of data containing, say, the documents name as the key, and the age as the value. We can then enable the reduce step by ticking the reduce box just above the result set.

An example:

function(doc){

    if(doc.age > 29 && doc.age < 36){
        emit(doc.age, doc.income);
    }

}

whose result would be fed to a reduce function:

function(keys, values) {
  var averageIncome = sum(values) / values.length;
  return averageIncome;
}

We might actually create more realistic MapReduce views and add them to the design doc (rather than just creating temporary views)

When you query a view, CouchDB will run the MapReduce function against every document in the database. On the surface, that sounds like a bad idea – especially if you've got millions of documents. However, it only performs this process once to create the view initially, and when updates are made to the data it only needs to make updates to the resulting view for that specific document (it doesn't need to regenerate the entire view again).

To query a view, all you need to do it access its URL, which will look something like this (once you have added it to a design doc):

mydb/_design/nameOfDesigndoc/_view/nameOfView

You can also supply parameters to the URL to restrict the returned dataset by using things like by_date? with a start_key and an end_key. The process for this would go something like Start at the row that has a date that equals the start_key and then keep returning rows until you reach a key that matches the end_key.