Better stemming for the | Feature Improvements & Ideas for Arc XP

Better stemming for the Content API Elasticsearch endpoints

The usual Elasticsearch English text analyzer features are not available. This leads to a substantial increase in complexity for query generation code, and a decrease in result quality. Example: possesives: "Amber's" searches differently from "Amber". We'd also like to have diacritics stripped, etc. The default "English" analyzer in Elasticsearch would be perfect (https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html#english-analyzer) but just possesives and diacritics would be a huge help.

Note that while DMN has Spanish-language content, the English analyzer would work much better than the current configuration.

Allowing more control over Elasticsearch ingestion would be extremely helpful, but just having a better set of defaults would solve most of the immediate problems.

Note the above refers to the Content API endpoints, *not* the Site Search endpoint.

Christopher St. John
Dec 12 2019
Will not implement

Publishing Platform / Content API

Comments (2)
Votes (11)

Attach files

Enter a subject

Lucas Kerdo commented

February 19, 2020 15:06

We are having the same problem with French language (IPM Group). Could you have an explanation ? or an update ?

Attachments Open full size
Claire Campbell commented

December 23, 2019 18:27

Since this just got marked as "Will not implement" could we get an explanation of why?

Attachments Open full size

Better stemming for the Content API Elasticsearch endpoints

Identify yourself with your email address

Related ideas