🏡 Home 📖 Chapter Home 👉 Next
⚡ ElasticsearchBook is crafted by Jozef Sorocin (🟢 Book a consulting hour) and powered by:
I don't have direct control over my incoming documents' structure but before they get inserted into my index I want to:
"env": "test"
)null
)_color
→ color
)color
+ category
)Can I do it in one go?
My documents will look like this:
{
"env": "staging",
"_tags": [],
"null": null,
"_category": "jackets",
"_color": "white"
}
and I'd like to end up with:
{
"env" : "staging",
"color" : "white",
"category" : "jackets"
"seo_category" : "white jackets"
}
Every time a document is about to be inserted, there's the possibility to let it run through a pipeline*. A pipeline is composed of blocks called processors that will be executed in the order they've been declared. There are dozens of built-in processors available but you can compose your own in a scripting language called Painless (documented here and here.)
my_data_cleanser
, define the processor blocks, and save it inside _ingest/pipeline
As we're about to insert a document, we'll specify the ?pipeline
attribute (java docs, python docs):