Ideas for Arc XP

Better stemming for the Content API Elasticsearch endpoints

The usual Elasticsearch English text analyzer features are not available. This leads to a substantial increase in complexity for query generation code, and a decrease in result quality. Example: possesives: "Amber's" searches differently from "Amber". We'd also like to have diacritics stripped, etc. The default "English" analyzer in Elasticsearch would be perfect (https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html#english-analyzer) but just possesives and diacritics would be a huge help.

Note that while DMN has Spanish-language content, the English analyzer would work much better than the current configuration.

Allowing more control over Elasticsearch ingestion would be extremely helpful, but just having a better set of defaults would solve most of the immediate problems.

Note the above refers to the Content API endpoints, *not* the Site Search endpoint.

  • Christopher St. John
  • Dec 12 2019
  • Will not implement
  • Attach files
  • Guest commented
    February 09, 2021 02:57
  • Lucas Kerdo commented
    February 19, 2020 15:06

    We are having the same problem with French language (IPM Group). Could you have an explanation ? or an update ?

  • Claire Campbell commented
    December 23, 2019 18:27

    Since this just got marked as "Will not implement" could we get an explanation of why?