Enhancing MongoDB Query Performance with Indexing

Introduction

Indexes in MongoDB improve query performance by reducing the need for full collection scans. Without indexes, MongoDB must examine each document in a collection to find the ones matching a query, which can slow down performance. This guide will show how to create an index on a specific field, compare query performance before and after indexing, and observe the improvement it provides.


Prerequisites

  • Basic knowledge of MongoDB: Familiarity with MongoDB commands, collections, and indexing concepts.

  • MongoDB installation: MongoDB should be installed locally, or you should have access to a MongoDB cloud service (e.g., MongoDB Atlas).


Setup

Step 1: Create a Collection and Insert Documents

To get started, create a users collection and insert sample documents into MongoDB.

  1. Open the MongoDB shell or connect to your MongoDB instance.

  2. Insert sample documents as shown below.

     db.users.insertMany([
         { name: "Alice", age: 28, city: "New York" },
         { name: "Bob", age: 34, city: "Los Angeles" },
         { name: "Charlie", age: 22, city: "Chicago" },
         { name: "Diana", age: 29, city: "New York" },
         { name: "Ethan", age: 42, city: "Los Angeles" },
         { name: "Frank", age: 30, city: "San Francisco" },
         { name: "George", age: 38, city: "Austin" }
     ]);
    

Performance Testing

Step 2: Measure Query Performance Before Indexing

Before creating an index, measure the query performance to establish a baseline.

  1. Run the following query to retrieve users with age 30 or more, using the explain() method to examine the query’s execution statistics.

     db.users.find({ age: { $gte: 30 } }).explain("executionStats")
    
  2. Note the following in the output:

    • totalDocsExamined: Number of documents scanned to retrieve results.

    • executionTimeMillis: Time taken to execute the query.

Example Output:

    {
      "executionStats": {
        "nReturned": 3,
        "totalDocsExamined": 7,
        "executionTimeMillis": 2
      }
    }

In this example, MongoDB examined all 7 documents, even though only 3 matched the criteria. This process took 2 milliseconds.


Creating an Index

Step 3: Create an Index on the Age Field

To optimize performance, create an index on the age field.

  1. Use the following command to create an ascending index on the age field:

     db.users.createIndex({ age: 1 })
    

    The 1 specifies ascending order. You could also use -1 to create a descending index.

Step 4: Measure Query Performance After Indexing

Run the same query again to observe the performance improvement after indexing.

  1. Execute the query with explain() to view the updated execution statistics.

     db.users.find({ age: { $gte: 30 } }).explain("executionStats")
    
  2. Check the output, focusing on totalDocsExamined and executionTimeMillis to observe changes.

    Example Output:

     {
       "executionStats": {
         "nReturned": 3,
         "totalDocsExamined": 3,
         "executionTimeMillis": 0
       }
     }
    

After indexing, MongoDB only examined 3 documents (those matching the age >= 30 condition) and executed the query in less time, resulting in a significant performance boost.


Verifying the Index

To confirm the index has been created on the age field, use the getIndexes() method:

db.users.getIndexes()

Expected Output:

[
  {
    "v": 2,
    "key": { "_id": 1 },
    "name": "_id_",
    "ns": "my_database.users"
  },
  {
    "v": 2,
    "key": { "age": 1 },
    "name": "age_1",
    "ns": "my_database.users"
  }
]

You should see an index named "age_1" created on the age field.


Conclusion

In this guide, we explored how to create an index in MongoDB to enhance query performance. By comparing query execution times before and after indexing, we observed that indexing reduced the number of documents MongoDB needed to examine, resulting in faster query execution. This approach is highly beneficial for applications dealing with large datasets, where indexes can make a significant difference in performance and scalability.