Improving Elasticsearch Indexing in the Rails Model using Searchkick

Nikhil Pathak

Full-stack Development

Tags:

Elasticsearch

Rails

Ruby

Searchkik

Searching has become a prominent feature of any web application, and a relevant search feature requires a robust search engine. The search engine should be capable of performing a full-text search, auto completion, providing suggestions, spelling corrections, fuzzy search, and analytics.

Elasticsearch, a distributed, fast, and scalable search and analytic engine, takes care of all these basic search requirements.

The focus of this post is using a few approaches with Elasticsearch in our Rails application to reduce time latency for web requests. Let’s review one of the best ways to improve the Elasticsearch indexing in Rails models by moving them to background jobs.

In a Rails application, Elasticsearch can be integrated with any of the following popular gems:

We can continue with any of these gems mentioned above. But for this post, we will be moving forward with the Searchkick gem, which is a much more Rails-friendly gem.

The default Searchkick gem option uses the object callbacks to sync the data in the respective Elasticsearch index. Being in the callbacks, it costs the request, which has the creation and updation of a resource to take additional time to process the web request.

The below image shows logs from a Rails application, captured for an update request of a user record. We have added a print statement before Elasticsearch tries to sync in the Rails model so that it helps identify from the logs where the indexing has started. These logs show that the last two queries were executed for indexing the data in the Elasticsearch index.

Since the Elasticsearch sync is happening while updating a user record, we can conclude that the user update request will take additional time to cover up the Elasticsearch sync.

Below is the request flow diagram:

From the request flow diagram, we can say that the end-user must wait for step 3 and 4 to be completed. Step 3 is to fetch the children object details from the database.

To tackle the problem, we can move the Elasticsearch indexing to the background jobs. Usually, for Rails apps in production, there are separate app servers, database servers, background job processing servers, and Elasticsearch servers (in this scenario).

This is how the request flow looks when we move Elasticsearch indexing:

Let’s get to coding!

For demo purposes, we will have a Rails app with models: `User` and `Blogpost`. The stack used here:

Rails 5.2
Elasticsearch 6.6.7
MySQL 5.6‍
Searchkick (gem for writing Elasticsearch queries in Ruby)‍
Sidekiq (gem for background processing)

This approach does not require any specific version of Rails, Elasticsearch or Mysql. Moreover, this approach is database agnostic. You can go through the code from this Github repo for reference.

Let’s take a look at the user model with Elasticsearch index.

	# == Schema Information
	#
	# Table name: users
	#
	# id :bigint not null, primary key
	# name :string(255)
	# email :string(255)
	# mobile_number :string(255)
	# created_at :datetime not null
	# updated_at :datetime not null
	#
	class User < ApplicationRecord
	searchkick

	has_many :blogposts
	def search_data
	{
	name: name,
	email: email,
	total_blogposts: blogposts.count,
	last_published_blogpost_date: last_published_blogpost_date
	}
	end
	...
	end

view raw user.rb hosted with ❤ by GitHub

Anytime a user object is inserted, updated, or deleted, Searchkick reindexes the data in the Elasticsearch user index synchronously.

Searchkick already provides four ways to sync Elasticsearch index:

Inline (default)
Asynchronous
Queuing
Manual

For more detailed information on this, refer to this page. In this post, we are looking in the manual approach to reindex the model data.

To manually reindex, the user model will look like:

	class User < ApplicationRecord
	searchkick callbacks: false

	def search_data
	...
	end
	end

view raw user_compact.rb hosted with ❤ by GitHub

Now, we will need to define a callback that can sync the data to the Elasticsearch index. Typically, this callback must be written in all the models that have the Elasticsearch index. Instead, we can write a common concern and include it to required models.

Here is what our concern will look like:

	module ElasticsearchIndexer
	extend ActiveSupport::Concern

	included do
	after_commit :reindex_model
	def reindex_model
	ElasticsearchWorker.perform_async(self.id, self.class.name)
	end
	end
	end

view raw elasticsearch_indexer.rb hosted with ❤ by GitHub

In the above active support concern, we have called the Sidekiq worker named ElasticsearchWorker. After adding this concern, don’t forget to include the Elasticsearch indexer concern in the user model, like so:

include ElasticsearchIndexer

Now, let’s see the Elasticsearch Sidekiq worker:

	class ElasticsearchWorker
	include Sidekiq::Worker
	def perform(id, klass)
	begin
	klass.constantize.find(id.to_s).reindex
	rescue => e
	# Handle exception
	end
	end
	end

view raw elasticsearch_worker.rb hosted with ❤ by GitHub

That’s it, we’ve done it. Cool, huh? Now, whenever a user creates, updates, or deletes web request, a background job will be created. The background job can be seen in the Sidekiq web UI at localhost:3000/sidekiq

Now, there is little problem in the Elasticsearch indexer concern. To reproduce this, go to your user edit page, click save, and look at localhost:3000/sidekiq—a job will be queued.

We can handle this case by tracking the dirty attributes.

	module ElasticsearchIndexer
	extend ActiveSupport::Concern
	included do
	after_commit :reindex_model
	def reindex_model
	return if self.previous_changes.keys.blank?
	ElasticsearchWorker.perform_async(self.id, klass)
	end
	end
	end

view raw elasticsearch_indexer.rb hosted with ❤ by GitHub

Furthermore, there are few more areas of improvement. Suppose you are trying to update the field of user model that is not part of the Elasticsearch index, the Elasticsearch worker Sidekiq job will still get created and reindex the associated model object. This can be modified to create the Elasticsearch indexing worker Sidekiq job only if the Elasticsearch index fields are updated.

	module ElasticsearchIndexer
	extend ActiveSupport::Concern
	included do
	after_commit :reindex_model
	def reindex_model
	updated_fields = self.previous_changes.keys

	# For getting ES Index fields you can also maintain constant
	# on model level or get from the search_data method.
	es_index_fields = self.search_data.stringify_keys.keys
	return if (updated_fields & es_index_fields).blank?
	ElasticsearchWorker.perform_async(self.id, klass)
	end
	end
	end

view raw elasticsearch_indexer_optiomized.rb hosted with ❤ by GitHub

Conclusion

Moving the Elasticsearch indexing to background jobs is a great way to boost the performance of the web app by reducing the response time of any web request. Implementing this approach for every model would not be ideal. I would recommend this approach only if the Elasticsearch index data are not needed in real-time.

Since the execution of background jobs depends on the number of jobs it must perform, it might take time to reflect the changes in the Elasticsearch index if there are lots of jobs queued up. To solve this problem to some extent, the Elasticsearch indexing jobs can be added in a queue with high priority. Also, make sure you have a different app server and background job processing server. This approach works best if the app server is different than the background job processing server.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Improving Elasticsearch Indexing in the Rails Model using Searchkick

Elasticsearch, a distributed, fast, and scalable search and analytic engine, takes care of all these basic search requirements.

In a Rails application, Elasticsearch can be integrated with any of the following popular gems:

We can continue with any of these gems mentioned above. But for this post, we will be moving forward with the Searchkick gem, which is a much more Rails-friendly gem.

Since the Elasticsearch sync is happening while updating a user record, we can conclude that the user update request will take additional time to cover up the Elasticsearch sync.

Below is the request flow diagram:

From the request flow diagram, we can say that the end-user must wait for step 3 and 4 to be completed. Step 3 is to fetch the children object details from the database.

This is how the request flow looks when we move Elasticsearch indexing:

Let’s get to coding!

For demo purposes, we will have a Rails app with models: `User` and `Blogpost`. The stack used here:

Rails 5.2
Elasticsearch 6.6.7
MySQL 5.6‍
Searchkick (gem for writing Elasticsearch queries in Ruby)‍
Sidekiq (gem for background processing)

This approach does not require any specific version of Rails, Elasticsearch or Mysql. Moreover, this approach is database agnostic. You can go through the code from this Github repo for reference.

Let’s take a look at the user model with Elasticsearch index.

	# == Schema Information
	#
	# Table name: users
	#
	# id :bigint not null, primary key
	# name :string(255)
	# email :string(255)
	# mobile_number :string(255)
	# created_at :datetime not null
	# updated_at :datetime not null
	#
	class User < ApplicationRecord
	searchkick

	has_many :blogposts
	def search_data
	{
	name: name,
	email: email,
	total_blogposts: blogposts.count,
	last_published_blogpost_date: last_published_blogpost_date
	}
	end
	...
	end

view raw user.rb hosted with ❤ by GitHub

Anytime a user object is inserted, updated, or deleted, Searchkick reindexes the data in the Elasticsearch user index synchronously.

Searchkick already provides four ways to sync Elasticsearch index:

Inline (default)
Asynchronous
Queuing
Manual

For more detailed information on this, refer to this page. In this post, we are looking in the manual approach to reindex the model data.

To manually reindex, the user model will look like:

	class User < ApplicationRecord
	searchkick callbacks: false

	def search_data
	...
	end
	end

view raw user_compact.rb hosted with ❤ by GitHub

Here is what our concern will look like:

	module ElasticsearchIndexer
	extend ActiveSupport::Concern

	included do
	after_commit :reindex_model
	def reindex_model
	ElasticsearchWorker.perform_async(self.id, self.class.name)
	end
	end
	end

view raw elasticsearch_indexer.rb hosted with ❤ by GitHub

include ElasticsearchIndexer

Now, let’s see the Elasticsearch Sidekiq worker:

	class ElasticsearchWorker
	include Sidekiq::Worker
	def perform(id, klass)
	begin
	klass.constantize.find(id.to_s).reindex
	rescue => e
	# Handle exception
	end
	end
	end

view raw elasticsearch_worker.rb hosted with ❤ by GitHub

Now, there is little problem in the Elasticsearch indexer concern. To reproduce this, go to your user edit page, click save, and look at localhost:3000/sidekiq—a job will be queued.

We can handle this case by tracking the dirty attributes.

	module ElasticsearchIndexer
	extend ActiveSupport::Concern
	included do
	after_commit :reindex_model
	def reindex_model
	return if self.previous_changes.keys.blank?
	ElasticsearchWorker.perform_async(self.id, klass)
	end
	end
	end

view raw elasticsearch_indexer.rb hosted with ❤ by GitHub

	module ElasticsearchIndexer
	extend ActiveSupport::Concern
	included do
	after_commit :reindex_model
	def reindex_model
	updated_fields = self.previous_changes.keys

	# For getting ES Index fields you can also maintain constant
	# on model level or get from the search_data method.
	es_index_fields = self.search_data.stringify_keys.keys
	return if (updated_fields & es_index_fields).blank?
	ElasticsearchWorker.perform_async(self.id, klass)
	end
	end
	end

view raw elasticsearch_indexer_optiomized.rb hosted with ❤ by GitHub

Conclusion

About the Author

Nikhil is a Software Engineer at Velotio. He is a full-stack developer with hands-on experience in building products using Ruby/Rails. He is fond of watching anime series in his free time.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings

Improving Elasticsearch Indexing in the Rails Model using Searchkick

Nikhil Pathak

Conclusion

MORE POSTS BY THIS AUTHOR

Nikhil Pathak

You may also like

A Guide to End-to-End API Test Automation with Postman and GitHub Actions

Praful Kolhe

Why Signals Could Be the Future for Modern Web Frameworks?

Harshil Shah

Automating test cases for text-messaging (SMS) feature of your application was never so easy

Praful Kolhe

Improving Elasticsearch Indexing in the Rails Model using Searchkick

Conclusion

About the Author

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

About Velotio

Subscribe to get the latest technology updates

Related Posts

Services

By Company Stage

By Engagement Model

Expertise

Product Engineering

Data and AI

Cloud & DevOps

Strategy and Consulting

Subscribe to get the latest technology updates

Improving Elasticsearch Indexing in the Rails Model using Searchkick

Nikhil Pathak

Conclusion

MORE POSTS BY THIS AUTHOR

Nikhil Pathak

You may also like

A Guide to End-to-End API Test Automation with Postman and GitHub Actions

Praful Kolhe

Why Signals Could Be the Future for Modern Web Frameworks?

Harshil Shah

Automating test cases for text-messaging (SMS) feature of your application was never so easy

Praful Kolhe

Improving Elasticsearch Indexing in the Rails Model using Searchkick

Conclusion

About the Author

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

About Velotio

Subscribe to get the latest technology updates

Related Posts

A Guide to End-to-End API Test Automation with Postman and GitHub Actions

Why Signals Could Be the Future for Modern Web Frameworks?

Automating test cases for text-messaging (SMS) feature of your application was never so easy

Creating a Frictionless SignUp Experience with Auth0 for your Application

Why You Should Prefer Next.js 12 Over Other React Setup

Setting up S3 & CloudFront to Deliver Static Assets Across the Web

Setting Up A Single Sign On (SSO) Environment For Your App

SEO for Web Apps: How to Boost Your Search Rankings

Creating Faster and High Performing User Interfaces in Web Apps With Web Workers

A Beginner’s Guide to Kubernetes Python Client

Flutter vs React Native: A Detailed Comparison

Building a WebSocket Service with AWS Lambda & DynamoDB

Test Automation in React Native apps using Appium and WebdriverIO

Building Google Photos Alternative Using AWS Serverless

Optimize React App Performance By Code Splitting

Building a Collaborative Editor Using Quill and Yjs

How to Use Pytest Fixtures With Django Models

Getting Started With Golang Channels! Here’s Everything You Need to Know

Set Up Simple S3 Deployment Workflow with Github Actions and CircleCI

Eliminate Render-blocking Resources using React and Webpack

How to Test the Performance of Flutter Apps - A Step-by-step Guide

Automating Serverless Framework Deployment using Watchdog

Building Scalable and Efficient React Applications Using GraphQL and Relay

Implementing Federated GraphQL Microservices using Apollo Federation

Implementing Async Features in Python - A Step-by-step Guide

Building Type Safe Backend Apps with Typegoose and TypeGraphQL

UI Automation and API Testing with Cypress - A Step-by-step Guide

An Introduction to React Fiber - The Algorithm Behind React

Using DRF Effectively to Build Cleaner and Faster APIs in Django

The 7 Most Useful Design Patterns in ES6 (and how you can implement them)

Enable Real-time Functionality in Your App with GraphQL and Pusher

Set Up A Production-ready REST API Server Using TypeScript, Express And PostgreSQL

Building High-performance Apps: A Checklist To Get It Right

Building a Progressive Web Application in React [With Live Code Examples]

Node.js vs Deno: Is Deno Really The Node.js Alternative We All Didn’t Know We Needed?

Building Dynamic Forms in React Using Formik

Building A Scalable API Testing Framework With Jest And SuperTest

Publish APIs For Your Customers: Deploy Serverless Developer Portal For Amazon API Gateway

Building Scalable Front-end With Lerna, YARN And React In 60 Minutes

Implementing gRPC In Python: A Step-by-step Guide

How To Use Inline Functions In React Applications Efficiently

Blockchain 101: The Simplest Guide You Will Ever Read

Micro Frontends: Reinventing UI In The Microservices World

Cleaner, Efficient Code with Hooks and Functional Programming

A Primer To Flutter

Scalable Real-time Communication With Pusher

MQTT Protocol Overview - Everything You Need To Know

A Beginner’s Guide to Python Tornado

Build and Deploy a Real-Time React App Using AWS Amplify and GraphQL

Idiot-proof Coding with Node.js and Express.js

The Ultimate Cheat Sheet on Splitting Dynamic Redux Reducers

Understanding Node.js Async Flows: Parallel, Serial, Waterfall and Queues

Creating GraphQL APIs Using Elixir Phoenix and Absinthe

An Introduction to Asynchronous Programming in Python

A Beginner's Guide to Edge Computing

Deploy Serverless, Event-driven Python Applications Using Zappa

Web Scraping: Introduction, Best Practices & Caveats

API Testing Using Postman and Newman

Surviving & Thriving in the Age of Software Accelerations