search

How to Index and Query Data With Haystack and Elasticsearch in Python

Haystack

Haystack is a Python library that provides modular search for Django. It features an API that provides support for different search back ends such as Elasticsearch, Whoosh, Xapian, and Solr.

Elasticsearch

Elasticsearch is a popular Lucene search engine capable of full-text search, and it’s developed in Java.

Google search uses the same approach of indexing their data, and that’s why it’s very easy to retrieve any information with just a few keywords, as shown below.

Elastic Search and Google

Install Django Haystack and Elasticsearch

The first step is to get Elasticsearch up and running locally on your machine. Elasticsearch requires Java, so you need to have Java installed on your machine.

We are going to follow the instructions from the Elasticsearch site.

Download the Elasticsearch 1.4.5 tar as follows:

Extract it as follows:

It will then create a batch of files and folders in your current directory. We then go into the bin directory as follows:

Start Elasticsearch as follows.

To confirm if it has installed successfully, go to http://127.0.0.1:9200/, and you should see something like this.

Ensure you also have haystack installed.

Let’s create our Django project. Our project will be able to index all the customers in a bank, making it easy to search and retrieve data using just a few search terms.

This command creates files that provide configurations for Django projects.

Let’s create an app for customers.

settings.py Configurations

In order to use Elasticsearch to index our searchable content, we’ll need to define a back-end setting for haystack in our project’s settings.py file. We are going to use Elasticsearch as our back end.

HAYSTACK_CONNECTIONS is a required setting and should look like this:

Within the settings.py, we are also going to add haystack and customers to the list of installed apps.

Create Models

Let’s create a model for Customers. In customers/models.py, add the following code.

Register your Customer model in admin.py like this:

Create Database and Super User

Apply your migrations and create an admin account.

Run your server and navigate to http://localhost:8000/admin/. You should now be able to see your Customer model there. Go ahead and add new customers in the admin.

Indexing Data

To index our models, we begin by creating a SearchIndex. SearchIndex objects determine what data should be placed in the search index. Each type of model must have a unique searchIndex.

SearchIndex objects are the way haystack determines what data should be placed in the search index and handles the flow of data in. To build a SearchIndex, we are going to inherit from the indexes.SearchIndex and indexes.Indexable, define the fields we want to store our data with, and define a get_model method.

Let’s create the CustomerIndex to correspond to our Customer modeling. Create a file search_indexes.py in the customers app directory, and add the following code.

The EdgeNgramField is a field in the haystack SearchIndex that prevents incorrect matches when parts of two different words are mashed together.

It allows us to use the autocomplete feature to conduct queries. We will use autocomplete when we start querying our data.

document=True indicates the primary field for searching within. Additionally, the  use_template=True in the text field allows us to use a data template to build the document that will be indexed.

Let’s create the template inside our customers template directory. Inside   search/indexes/customers/customers_text.txt, add the following:

Reindex Data

Now that our data is in the database, it’s time to put it in our search index. To do this, simply run ./manage.py rebuild_index. You’ll get totals of how many models were processed and placed in the index.

Alternatively, you can use RealtimeSignalProcessor, which automatically handles updates/deletes for you. To use it, add the following in the settings.py file.

Querying Data

We are going to use a search template and the Haystack API to query data.

Search Template

Add the haystack urls to your URLconf.

Let’s create our search template. In templates/search.html, add the following code.

The page.object_list is a list of SearchResult objects that allows us to get the individual model objects, for example, result.first_name.

Your complete project structure should look something like this:

The project directory structure

Now run server, go to 127.0.0.1:8000/search/, and do a search as shown below.

Running a search on a local server

A search of Albert will give results of all customers with the name Albert. If no customer has the name Albert, then the query will give empty results. Feel free to play around with your own data.

Haystack API

Haystack has a SearchQuerySet class that is designed to make it easy and consistent to perform searches and iterate results. Much of the SearchQuerySet API is familiar with Django’s ORM QuerySet.

In customers/views.py, add the following code:

autocomplete is a shortcut method to perform an autocomplete search. It must be run against fields that are either EdgeNgramField or NgramField.

In the above Queryset, we are using the contains method to filter our search to retrieve only the results that contain our defined characters. For example, Al will only retrieve the details of the customers which contain Al. Note that the results will only come from fields that have been defined in the customer_text.txt file.

The results of a query

Apart from the contains Field Lookup, there are other fields available for performing queries, including:

  • content
  • contains
  • exact
  • gt
  • gte
  • lt
  • lte
  • in
  • startswith
  • endswith
  • range
  • fuzzy

Conclusion

A huge amount of data is produced at any given moment in social media, health, shopping, and other sectors. Much of this data is unstructured and scattered. Elasticsearch can be used to process and analyze this data into a form that can be understood and consumed.

Elasticsearch has also been used extensively for content search, data analysis, and queries. For more information, visit the Haystack and Elasticsearch sites.

Powered by WPeMatico

Leave a Comment

Scroll to Top