Querying and Indexing a MongoDB Database
Finding Data
Querying returns a subset of documents in a collection, from no documents at all to the entire collection. Which documents get returned is determined by the first argument to find, which is a document specifying the query criteria.
An empty query document (i.e., {}) matches everything in the collection. If find
isn't given a query document, it defaults to {}. For example, the following:
<db.c.find()
matches every document in the collection c (and returns these documents in batches).
There are two methods to and select data from a MongoDB collection, find()
and findOne()
.
Also, if you want to see this data in a cleaner, way just add .pretty() to the end of it. This will display document in pretty-printed JSON format: db.myCollection.find().pretty()
find(QUERY_OBJECT)
This method accepts a query object, actually a query object. If left empty, all documents will be returned.
findOne(QUERY_OBJECT)
findOne()
is used to select only one document.
Filtering Data
To query, or filter, data we can include a query object in our find()
or findOne()
methods.
When we start adding key/value pairs to the query document, we begin restricting our search. This works in a straightforward way for most types: numbers match numbers, booleans match booleans, and strings match strings. Querying for a simple type is as easy as specifying the value that you are looking for. For example, to find all documents where the value for age is 27, we can add that key/value pair to the query document:
> db.users.find({"age":27})
If we have a string we want to match, such as a "username" key with the value "joe", we use that key/value pair instead:
> db.users.find({"username":"joe"})
Multiple conditions can be strung together by adding more key/value pairs to the query document, which gets interpreted as "condition1 AND condition2 AND … AND conditionN." For instance, to get all users who are 27-year-olds with the username "joe," we can query for the following:
> db.users.find({"username":"joe","age":27})
Example:
db.posts.find( {"category": "News"} )
Projection
Both find methods accept a second parameter called projection.
This parameter is an object that describes which fields to include in the results.
Note: This parameter is optional. If omitted, all fields will be included in the results.
Example:
This example will only display the title and date fields in the results.
db.posts.find({}, {title: 1, date: 1})
Notice that the _id
field is also included. This field is always included unless specifically excluded.
We use a 1
to include a field and 0
to exclude a field.
Example:
This time, let's exclude the _id field.
db.posts.find({}, {_id: 0, title: 1, date: 1})
Note: You cannot use both 0 and 1 in the same object. The only exception is the _id field. You should either specify the fields you would like to include or the fields you would like to exclude.
Example:
Let's exclude the category field. All other fields will be included in the results.
db.posts.find({}, {category: 0})
We will get an error if we try to specify both 0 and 1 in the same object.
Limitations on MongoDB Queries
There are some restrictions on queries. The value of a query document must be a constant as far as the database is concerned. (It can be a normal variable in your own code.) That is, it cannot refer to the value of another key in the document. For example, if we were keeping inventory and we had both in_stock and num_sold keys, we couldn't compare their values by querying the following:
> db.stock.find({"in_stock":"this.num_sold"}) // doesn't work
There are ways to do this (see
> db.stock.find({"in_stock": 0})
Query Criteria
Queries can go beyond the exact matching described in the previous section; they can match more complex criteria, such as ranges, OR-clauses, and negation.
Query: Comparisons
"$lt", "$lte", "$gt", and "$gte" are all comparison operators, corresponding to <, <=, >, and >=, respectively. They can be combined to look for a range of values. For example, to look for users who are between the ages of 18 and 30, we can do this:
> db.users.find({"age":{"$gte":18,"$lte":30}})
This would find all documents where the age field was greater than or equal to 18 AND less than or equal to 30.
To query for documents where a key's value is not equal to a certain value, you must use another conditional operator, "$ne", which stands for "not equal." If you want to find all users who do not have the username joe, you can query for them using this:
> db.users.find({"username":{"$ne":"joe"}})
Actually, "$ne" can be used with any type.
Match by Date
These types of range queries are often useful for dates. For example, to find people who registered before January 1, 2007, we can do this:
> start = new Date("01/01/2007") > db.users.find({"registered":{"$lt":start}})
Depending on how you create and store dates, an exact match might be less useful, since dates are stored with millisecond precision. Often you want a whole day, week, or month, making a range query necessary.
The following query filter uses the $gt
operator and
{ dateCreated: { $gt: new Date('2000-06-22') } }
Match by Multiple Conditions ($and
)
The following query filter finds all documents where scores array contains the value 75, and the name is Greg Powell:
{ $and: [ { scores: 75, name: "Greg Powell" } ] }
Match by Multiple Possible Conditions ($in
and $or
)
There are two ways to do an OR query in MongoDB. "$in" can be used to query for a variety of values for a single key. "$or" is more general; it can be used to query for any of the given values across multiple keys.
If you have more than one possible value to match for a single key, use an array of criteria with "$in". For instance, suppose we're running a raffle and the winning ticket numbers are 725, 542, and 390. To find all three of these documents, we can construct the following query:
> db.raffle.find({"ticket_no":{"$in":[725,542,390]}})
"$in" is very flexible and allows you to specify criteria of different types as well as values. For example, if we are gradually migrating our schema to use usernames instead of user ID numbers, we can query for either by using this:
> db.users.find({"user_id":{"$in":[12345,"joe"]}})
This matches documents with a user_id equal to 12345 and documents with a user_id equal to joe
.
If "$in" is given an array with a single value, it behaves the same as directly matching the value. For instance, {ticket_no : {$in : [725]}} matches the same documents as {ticket_no : 725}.
The opposite of "$in" is "$nin", which returns documents that don't match any of the criteria in the array. If we want to return all of the people who didn't win anything in the raffle, we can query for them with this:
> db.raffle.find({"ticket_no":{"$nin":[725,542,390]}})
This query returns everyone who did not have tickets with those numbers.
"$in" gives you an OR query for a single key, but what if we need to find documents where ticket_no is 725 or winner is true? For this type of query, we'll need to use the "$or" conditional. "$or" takes an array of possible criteria. In the raffle case, using "$or" would look like this:
> db.raffle.find({"$or":[{"ticket_no": 725},{"winner": true}]})
"$or" can contain other conditionals. If, for example, we want to match any of the three ticket_no values or the winner key, we can use this:
> db.raffle.find({"$or":[{"ticket_no":{"$in":[725,542,390]}}, ... {"winner":true}]})
With a normal AND-type query, you want to narrow down your results as far as possible in as few arguments as possible. OR-type queries are the opposite: they are most efficient if the first arguments match as many documents as possible.
While "$or" will always work, use "$in" whenever possible as the query optimizer handles it more efficiently.
The following query filter uses the $or
operator to find documents where version is 4, or name is Andrea Le:
{ $or: [ { version: 4 }, { name: "Andrea Le" } ] }
Match by Exclusion ($not
)
"$not" is a metaconditional: it can be applied on top of any other criteria. As an example, let's consider the modulus operator, "$mod". "$mod" queries for keys whose values, when divided by the first value given, have a remainder of the second value:
> db.users.find({"id_num":{"$mod":[5,1]}})
The previous query returns users with id_nums of 1, 6, 11, 16, and so on. If we want, instead, to return users with id_nums of 2, 3, 4, 5, 7, 8, 9, 10, 12, etc., we can use "$not":
> db.users.find({"id_num":{"$not":{"$mod":[5,1]}}})
"$not" can be particularly useful in conjunction with regular expressions to find all documents that don't match a given pattern (regular expression usage is described in the section
The following query filter uses the $not
operator to find all documents where the value of the name field is not equal to "Andrea Le", or the name field does not exist:
{ name: { $not: { $eq: "Andrea Le" } } }
Type-Specific Queries
MongoDB has a wide variety of types that can be used in a document. Some of these types have special behavior when querying.
null
null
behaves a bit strangely. It does match itself, so if we have a collection with the following documents:
> db.c.find() {"_id":ObjectId("4ba0f0dfd22aa494fd523621"),"y":null} {"_id":ObjectId("4ba0f0dfd22aa494fd523622"),"y":1} {"_id":ObjectId("4ba0f148d22aa494fd523623"),"y":2}
we can query for documents whose y key is null
in the expected way:
> db.c.find({"y":null}) {"_id":ObjectId("4ba0f0dfd22aa494fd523621"),"y":null}
However, null
also matches does not exist
. Thus, querying for a key with the value null
will return all documents lacking that key:
> db.c.find({"z":null}) {"_id":ObjectId("4ba0f0dfd22aa494fd523621"),"y":null} {"_id":ObjectId("4ba0f0dfd22aa494fd523622"),"y":1} {"_id":ObjectId("4ba0f148d22aa494fd523623"),"y":2}
If we only want to find keys whose value is null
, we can check that the key is null
and exists using the "$exists" conditional:
> db.c.find({"z":{"$eq":null,"$exists":true}})
Match by Substring
The following query filter uses the $regex
operator to find all documents where the value of email includes the term /andrea_le/ (where the underscores stands for any character):
{ email: { $regex: "andrea_le" } }
Regular Expression
"$regex" provides regular expression capabilities for pattern matching strings in queries. Regular expressions are useful for flexible string matching. For example, if we want to find all users with the name "Joe" or "joe," we can use a regular expression to do case-insensitive matching:
> db.users.find({"name":{"$regex":/joe/i}})
Regular expression flags (e.g., i) are allowed but not required. If we want to match not only various capitalizations of "joe," but also "joey," we can continue to improve our regular expression:
> db.users.find({"name":/joey?/i})
MongoDB uses the Perl Compatible Regular Expression (PCRE) library to match regular expressions; any regular expression syntax allowed by PCRE is allowed in MongoDB. It is a good idea to check your syntax with the JavaScript shell before using it in a query to make sure it matches what you think it matches.
Regular expressions can also match themselves. Very few people insert regular expressions into the database, but if you insert one, you can match it with itself:
> db.foo.insertOne({"bar":/baz/}) > db.foo.find({"bar":/baz/}) { "_id":ObjectId("4b23c3ca7525f35f94b60a2d"), "bar":/baz/ }
Querying Array
The following query filter uses the $elemMatch
operator to find all documents where at least one value in the scores array is greater than 80 and less than 90:
{ scores: { $elemMatch: { $gt: 80, $lt: 90 } } }
Querying on Embedded Document
The following query filter finds the document with the school.name subfield of Northwestern
:
{ "school.name": "Northwestern" }
$where Queries*
*
*
Cursors*
Indexing/Search
To create an index, use the createIndex() shell method or equivalent method for your driver.
When you run a create index command in the MongoDB Shell or a driver, MongoDB only creates the index if an index of the same specification does not exist.
Creating an Index using the node.js driver (createIndex()
)
o create an index using the Node.JS driver, use createIndex()
.
collection.createIndex( { <key and index type specification> }, function(<err>, <result>) { console.log(<result>); callback(<result>); } )
This example creates a single key ascending index on the name field:
collection.createIndex( { name : 1 }, function(err, result) { console.log(result); callback(result); } )
Seeing/Monitoring Indexes
To see what indexes exist on your collection, including indexes that are currently being built, run the db.collection.getIndexes()
method:
db.collection.getIndexes()
To check if your index is being built, use the $currentOp
aggregation stage to return information about active operations on your database. To run $currentOp
in mongosh, use the db.aggregate()
method on the admin database.
The following aggregation pipeline uses the $match
stage to return information about an active operation that builds a descending index on the name field:
db.getSiblingDB("admin").aggregate( [ { $currentOp : { idleConnections: true } }, { $match : {"command.createIndexes": { $exists: true } } } ] )
MongoDB marks index builds in various stages, including waiting on commit quorum, as an idle connection by setting the active field to false
. The idleConnections: true
setting includes these idle connections in the $currentOp
output.
Specifying an Index Name
When you create an index, you can give the index a custom name. Giving your index a name helps distinguish different indexes on your collection. For example, you can more easily identify the indexes used by a query in the query plan's explain results if your indexes have distinct names.
To specify the index name, include the name option when you create the index:
db.<collection>.createIndex( { <field>: <value> }, { name: "<indexName>" } )
Before you specify an index name, consider the following:
- Index names must be unique. Creating an index with the name of an existing index returns an error.
- You can't rename an existing index. Instead, you must drop and recreate the index with a new name.
Default Index Names
If you don't specify a name during index creation, the system generates the name by concatenating each index key field and value with underscores. For example: { score : 1 }
will produce an index named score_1, while { content : "text", "description.tags": "text" }
will produce an index named content_text_description.tags_text.
Procedure
A blog collection contains data about blog posts and user interactions.
Create a text index on the content, users.comments, and users.profiles fields. Set the index name to InteractionsTextIndex:
db.blog.createIndex( { content: "text", "users.comments": "text", "users.profiles": "text" }, { name: "InteractionsTextIndex" } )
Creating an Index using the *** driver (???
)
Creating an Index using the *** driver (???
)
Running a Query with the $search
Operator
To use our search index, we will use the $search
operator in our aggregation pipeline.
Example:
db.movies.aggregate([ { $search: { index: "default", // optional unless you named your index something other than "default" text: { query: "star wars", path: "title" }, }, }, { $project: { title: 1, year: 1, } } ])
The first stage of this aggregation pipeline will return all documents in the movies collection that contain the word star
or wars
in the title field.
The second stage will project the title and year fields from each document.
MongoDB Query Operators
There are many query operators that can be used to compare and reference document fields.
Comparison
The following operators can be used in queries to compare values:
$eq
: Values are equal$ne
: Values are not equal$gt
: Value is greater than another value$gte
: Value is greater than or equal to another value$lt
: Value is less than another value$lte
: Value is less than or equal to another value$in
: Value is matched within an array
Logical
The following operators can logically compare multiple queries.
$and
: Returns documents where both queries match$or
: Returns documents where either query matches$nor
: Returns documents where both queries fail to match$not
: Returns documents where the query does not match
Evaluation
The following operators assist in evaluating documents.
$regex
: Allows the use of regular expressions when evaluating field values$text
: Performs a text search$where
: Uses a JavaScript expression to match documents
Update
Fields
The following operators can be used to update fields:
$currentDate
: Sets the field value to the current date$inc
: Increments the field value$rename
: Renames the field$set
: Sets the value of a field$unset
: Removes the field from the document
Array
The following operators assist with updating arrays.
$addToSet
: Adds distinct elements to an array$pop
: Removes the first or last element of an array$pull
: Removes all elements from an array that match the query$push
: Adds an element to an array