Elasticsearch SQL

The Elasticsearch engine

Elasticsearch is one of the most widely search engines being used in a number of production deployments today. It is based on the Lucene search library and one of the key features it provides is a JSON-based query DSL on top of Lucene that provides an easier to use mechanism for interacting with the search engine. However the query DSL is very specific to Elasticsearch. The SQL support introduced in Elasticsearch 6.3 has brought a standard mechanism for running queries against the search engine and has been one step further towards easier adoption by developers already familiar with SQL. Although SQL has been initially designed for use with relational database management systems it has been implemented in a wide range of other systems (such as NoSQL databases). Take for example SQL supported provided in a distributed data processing engine like Apache Spark or a distributed cache-based computation system like Apache Ignite where SQL is one of the core query facilities provided. In this article we will explore how Elasticsearch SQL works.

Preliminary setup

In order to try the examples in the article you need to have a local Elasticsearch (at least 6.3) instance started. In this article we are going to use latest Elasticsearch 7.5. We will create a posts index that holds posts from a forum. We will be using the Elasticsearch Java client to feed data into the index and we are not going to provide explicit mapping for the fields of the index (for the purpose of simplicity we will let Elasticsearch automatically create it for us). First we will create a Maven project with a dependency to the Elasticsearch Java high level client (the old Elasticsearch HTTP client is deprecated and planned for removal in Elasticsearch 8.0):

&lt;dependency>
    &lt;groupId>org.elasticsearch.client&lt;/groupId>
    &lt;artifactId>elasticsearch-rest-high-level-client&lt;/artifactId>
    &lt;version>7.5.0&lt;/version>
&lt;/dependency>

We will create 10000 generated post documents in the posts index using the following piece of code:

	RestHighLevelClient client = new RestHighLevelClient(
		        RestClient.builder(
		                new HttpHost("localhost", 9200, "http")));
		
		String[] possibleUsers = new String[] {"Martin", "Jim", "John"};
		String[] possibleDates = new String[] {"2019-12-15", "2019-12-16", "2019-12-17"};
		String[] possibleMessages = new String[] {"Hello, Javaadvent !",
				"Cool set of blog posts. We want more !",
				"Elasticsearch SQL is great."};
		
		for(int i = 1; i <= 10000; i++) {
			Map<String, Object> jsonMap = new HashMap<>();
			jsonMap.put("user", possibleUsers[ThreadLocalRandom.current().nextInt(0, 3)]);
			jsonMap.put("date", possibleDates[ThreadLocalRandom.current().nextInt(0, 3)]);
			jsonMap.put("message", possibleMessages[ThreadLocalRandom.current().nextInt(0, 3)]);
			IndexRequest request = new IndexRequest("posts")
			    .id(String.valueOf(i)).source(jsonMap); 
			client.index(request, RequestOptions.DEFAULT);
		}
		
		client.close();

Running the SQL queries

We can use Kibana to query all the documents where the username is Martin as follows:

POST /_sql?format=txt
{
    "query": "SELECT * FROM posts where user = 'Martin'"
}

Another example would be to count all the documents that contain the word Javaadvent in the message field:

POST /_sql?format=txt
{
    "query": "SELECT count(*) FROM posts where message like '%Javaadvent%'"
}

Now if you want to run the above queries in your Java application you have a few options:

run it using Elasticsearch JDBC driver. This option however is available only with platinum and enterprise subscriptions;

REST client calling the Elasticsearch SQL endpoint. This option is the one to choose if you only have the basic (free) Elasticsearch option.

You can use pretty much any REST client for Java in order to use the second option but we will use the low-level Elasticsearch REST client:

&lt;dependency>
    &lt;groupId>org.elasticsearch.client&lt;/groupId>
    &lt;artifactId>elasticsearch-rest-client&lt;/artifactId>
    &lt;version>7.5.0&lt;/version>
&lt;/dependency>

The following block of code returns only 10 documents from the posts index:

		RestClient restClient = RestClient.builder(
			    new HttpHost("localhost", 9200, "http")).build();
		
		Request request = new Request("POST",  "/_sql");
		request.setJsonEntity("{\"query\":\"SELECT * FROM posts limit 10\"}");
		Response response = restClient.performRequest(request);
		String responseBody = EntityUtils.toString(response.getEntity()); 
		System.out.println(responseBody);
		restClient.close();

To see how is the SQL query executed behind the scenes you can use the translate API provided under the /_sql/translate endpoint. We can run the following in Kibana If we want to see what is the query DSL generated for the previous SQL query:

POST /_sql/translate
{
    "query": "SELECT * FROM posts limit 10",
    "fetch_size": 10
}

And we should get a result similar to the following:

{
  "size" : 10,
  "_source" : {
    "includes" : [
      "message",
      "user"
    ],
    "excludes" : [ ]
  },
  "docvalue_fields" : [
    {
      "field" : "date",
      "format" : "epoch_millis"
    }
  ],
  "sort" : [
    {
      "_doc" : {
        "order" : "asc"
      }
    }
  ]
}