🏡 Home 📖 Chapter Home 👉 Next

⚡  ElasticsearchBook is crafted by Jozef Sorocin (🟢 Book a consulting hour) and powered by:

Use Case

I don't have direct control over my incoming documents' structure but before they get inserted into my index I want to:

  1. skip a document if it comes from a test environment ("env": "test")
  2. drop all empty fields (empty strings, empty arrays, null)
  3. remove leading underscores from attribute names (_colorcolor)
  4. concatenate two string fields together (color + category)

Can I do it in one go?

My documents will look like this:

{
  "env": "staging",
  "_tags": [],
  "null": null,
  "_category": "jackets",
  "_color": "white"
}

and I'd like to end up with:

{
  "env" : "staging",
  "color" : "white",
  "category" : "jackets"
  "seo_category" : "white jackets"
}

Approach

Every time a document is about to be inserted, there's the possibility to let it run through a pipeline*. A pipeline is composed of blocks called processors that will be executed in the order they've been declared. There are dozens of built-in processors available but you can compose your own in a scripting language called Painless (documented here and here.)

As we're about to insert a document, we'll specify the ?pipeline attribute (java docs, python docs):