Thanks! We'll be in touch in the next 12 hours
Oops! Something went wrong while submitting the form.

Improving Elasticsearch Indexing in the Rails Model using Searchkick

Nikhil Pathak

Full-stack Development

Searching has become a prominent feature of any web application, and a relevant search feature requires a robust search engine. The search engine should be capable of performing a full-text search, auto completion, providing suggestions, spelling corrections, fuzzy search, and analytics. 

Elasticsearch, a distributed, fast, and scalable search and analytic engine, takes care of all these basic search requirements.

The focus of this post is using a few approaches with Elasticsearch in our Rails application to reduce time latency for web requests. Let’s review one of the best ways to improve the Elasticsearch indexing in Rails models by moving them to background jobs.

In a Rails application, Elasticsearch can be integrated with any of the following popular gems:

We can continue with any of these gems mentioned above. But for this post, we will be moving forward with the Searchkick gem, which is a much more Rails-friendly gem.

The default Searchkick gem option uses the object callbacks to sync the data in the respective Elasticsearch index. Being in the callbacks, it costs the request, which has the creation and updation of a resource to take additional time to process the web request.

The below image shows logs from a Rails application, captured for an update request of a user record. We have added a print statement before Elasticsearch tries to sync in the Rails model so that it helps identify from the logs where the indexing has started. These logs show that the last two queries were executed for indexing the data in the Elasticsearch index.

Since the Elasticsearch sync is happening while updating a user record, we can conclude that the user update request will take additional time to cover up the Elasticsearch sync.

Below is the request flow diagram:

From the request flow diagram, we can say that the end-user must wait for step 3 and 4 to be completed. Step 3 is to fetch the children object details from the database.

To tackle the problem, we can move the Elasticsearch indexing to the background jobs. Usually, for Rails apps in production, there are separate app servers, database servers, background job processing servers, and Elasticsearch servers (in this scenario).

This is how the request flow looks when we move Elasticsearch indexing:

Let’s get to coding!

For demo purposes, we will have a Rails app with models: `User` and `Blogpost`. The stack used here:

  • Rails 5.2
  • Elasticsearch 6.6.7
  • MySQL 5.6
  • Searchkick (gem for writing Elasticsearch queries in Ruby)
  • Sidekiq (gem for background processing)

This approach does not require  any specific version of Rails, Elasticsearch or Mysql. Moreover, this approach is database agnostic. You can go through the code from this Github repo for reference.

Let’s take a look at the user model with Elasticsearch index.

CODE: https://gist.github.com/velotiotech/9cd362c30215d4e2b334076912056b1e.js

Anytime a user object is inserted, updated, or deleted, Searchkick reindexes the data in the Elasticsearch user index synchronously.

Searchkick already provides four ways to sync Elasticsearch index:

  • Inline (default)
  • Asynchronous
  • Queuing
  • Manual

For more detailed information on this, refer to this page. In this post, we are looking in the manual approach to reindex the model data.

To manually reindex, the user model will look like:

CODE: https://gist.github.com/velotiotech/19f301d5ec37aae3142ff1eca1c48f69.js

Now, we will need to define a callback that can sync the data to the Elasticsearch index. Typically, this callback must be written in all the models that have the Elasticsearch index. Instead, we can write a common concern and include it to required models.

Here is what our concern will look like:

CODE: https://gist.github.com/velotiotech/a5b0663f26128dcd00233781ccc275fa.js

In the above active support concern, we have called the Sidekiq worker named ElasticsearchWorker. After adding this concern, don’t forget to include the Elasticsearch indexer concern in the user model, like so:

include ElasticsearchIndexer

Now, let’s see the Elasticsearch Sidekiq worker:

CODE: https://gist.github.com/velotiotech/ea2fc2f70438aa567525af7f23791abe.js

That’s it, we’ve done it. Cool, huh? Now, whenever a user creates, updates, or deletes web request, a background job will be created. The background job can be seen in the Sidekiq web UI at localhost:3000/sidekiq

Now, there is little problem in the Elasticsearch indexer concern. To reproduce this, go to your user edit page, click save, and look at localhost:3000/sidekiq—a job will be queued.

We can handle this case by tracking the dirty attributes. 

CODE: https://gist.github.com/velotiotech/81f5f3274999a2058fc73c6339f9c53c.js

Furthermore, there are few more areas of improvement. Suppose you are trying to update the field of user model that is not part of the Elasticsearch index, the Elasticsearch worker Sidekiq job will still get created and reindex the associated model object. This can be modified to create the Elasticsearch indexing worker Sidekiq job only if the Elasticsearch index fields are updated.

CODE: https://gist.github.com/velotiotech/7d7d5de185ae7a03baa0cf643df9c6f0.js

Conclusion

Moving the Elasticsearch indexing to background jobs is a great way to boost the performance of the web app by reducing the response time of any web request. Implementing this approach for every model would not be ideal. I would recommend this approach only if the Elasticsearch index data are not needed in real-time.

Since the execution of background jobs depends on the number of jobs it must perform, it might take time to reflect the changes in the Elasticsearch index if there are lots of jobs queued up. To solve this problem to some extent, the Elasticsearch indexing jobs can be added in a queue with high priority. Also, make sure you have a different app server and background job processing server. This approach works best if the app server is different than the background job processing server.

Get the latest engineering blogs delivered straight to your inbox.
No spam. Only expert insights.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings

Improving Elasticsearch Indexing in the Rails Model using Searchkick

Searching has become a prominent feature of any web application, and a relevant search feature requires a robust search engine. The search engine should be capable of performing a full-text search, auto completion, providing suggestions, spelling corrections, fuzzy search, and analytics. 

Elasticsearch, a distributed, fast, and scalable search and analytic engine, takes care of all these basic search requirements.

The focus of this post is using a few approaches with Elasticsearch in our Rails application to reduce time latency for web requests. Let’s review one of the best ways to improve the Elasticsearch indexing in Rails models by moving them to background jobs.

In a Rails application, Elasticsearch can be integrated with any of the following popular gems:

We can continue with any of these gems mentioned above. But for this post, we will be moving forward with the Searchkick gem, which is a much more Rails-friendly gem.

The default Searchkick gem option uses the object callbacks to sync the data in the respective Elasticsearch index. Being in the callbacks, it costs the request, which has the creation and updation of a resource to take additional time to process the web request.

The below image shows logs from a Rails application, captured for an update request of a user record. We have added a print statement before Elasticsearch tries to sync in the Rails model so that it helps identify from the logs where the indexing has started. These logs show that the last two queries were executed for indexing the data in the Elasticsearch index.

Since the Elasticsearch sync is happening while updating a user record, we can conclude that the user update request will take additional time to cover up the Elasticsearch sync.

Below is the request flow diagram:

From the request flow diagram, we can say that the end-user must wait for step 3 and 4 to be completed. Step 3 is to fetch the children object details from the database.

To tackle the problem, we can move the Elasticsearch indexing to the background jobs. Usually, for Rails apps in production, there are separate app servers, database servers, background job processing servers, and Elasticsearch servers (in this scenario).

This is how the request flow looks when we move Elasticsearch indexing:

Let’s get to coding!

For demo purposes, we will have a Rails app with models: `User` and `Blogpost`. The stack used here:

  • Rails 5.2
  • Elasticsearch 6.6.7
  • MySQL 5.6
  • Searchkick (gem for writing Elasticsearch queries in Ruby)
  • Sidekiq (gem for background processing)

This approach does not require  any specific version of Rails, Elasticsearch or Mysql. Moreover, this approach is database agnostic. You can go through the code from this Github repo for reference.

Let’s take a look at the user model with Elasticsearch index.

CODE: https://gist.github.com/velotiotech/9cd362c30215d4e2b334076912056b1e.js

Anytime a user object is inserted, updated, or deleted, Searchkick reindexes the data in the Elasticsearch user index synchronously.

Searchkick already provides four ways to sync Elasticsearch index:

  • Inline (default)
  • Asynchronous
  • Queuing
  • Manual

For more detailed information on this, refer to this page. In this post, we are looking in the manual approach to reindex the model data.

To manually reindex, the user model will look like:

CODE: https://gist.github.com/velotiotech/19f301d5ec37aae3142ff1eca1c48f69.js

Now, we will need to define a callback that can sync the data to the Elasticsearch index. Typically, this callback must be written in all the models that have the Elasticsearch index. Instead, we can write a common concern and include it to required models.

Here is what our concern will look like:

CODE: https://gist.github.com/velotiotech/a5b0663f26128dcd00233781ccc275fa.js

In the above active support concern, we have called the Sidekiq worker named ElasticsearchWorker. After adding this concern, don’t forget to include the Elasticsearch indexer concern in the user model, like so:

include ElasticsearchIndexer

Now, let’s see the Elasticsearch Sidekiq worker:

CODE: https://gist.github.com/velotiotech/ea2fc2f70438aa567525af7f23791abe.js

That’s it, we’ve done it. Cool, huh? Now, whenever a user creates, updates, or deletes web request, a background job will be created. The background job can be seen in the Sidekiq web UI at localhost:3000/sidekiq

Now, there is little problem in the Elasticsearch indexer concern. To reproduce this, go to your user edit page, click save, and look at localhost:3000/sidekiq—a job will be queued.

We can handle this case by tracking the dirty attributes. 

CODE: https://gist.github.com/velotiotech/81f5f3274999a2058fc73c6339f9c53c.js

Furthermore, there are few more areas of improvement. Suppose you are trying to update the field of user model that is not part of the Elasticsearch index, the Elasticsearch worker Sidekiq job will still get created and reindex the associated model object. This can be modified to create the Elasticsearch indexing worker Sidekiq job only if the Elasticsearch index fields are updated.

CODE: https://gist.github.com/velotiotech/7d7d5de185ae7a03baa0cf643df9c6f0.js

Conclusion

Moving the Elasticsearch indexing to background jobs is a great way to boost the performance of the web app by reducing the response time of any web request. Implementing this approach for every model would not be ideal. I would recommend this approach only if the Elasticsearch index data are not needed in real-time.

Since the execution of background jobs depends on the number of jobs it must perform, it might take time to reflect the changes in the Elasticsearch index if there are lots of jobs queued up. To solve this problem to some extent, the Elasticsearch indexing jobs can be added in a queue with high priority. Also, make sure you have a different app server and background job processing server. This approach works best if the app server is different than the background job processing server.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings