Stemming, thesaurus and prefix searches

Build 1501 on 14/Nov/2017  This topic last edited on: 19/Jun/2014, at 10:04

To search for syntactical variations of a word (stemming) use the suffix ‘^’ , e.g.:

president^

matches all texts containing ‘president’, ‘presidential’, ‘presidentially’ etc.

To search for synonyms of a word use the suffix ‘+’, e.g.:

Obama+

matches all texts containing any of the synonyms of ‘Obama’ defined in the thesaurus of the full-text search engine.

To search for words staring with a prefix use the suffix ‘*’, e.g.:

Obama*

matches all texts containing a word starting with ‘Obama’ – i.e. it will match a text containing ‘Obamacare’.

Caveats:

Exalead does not support thesaurus searches, so the suffix ‘+’ is ignored when using Exalead as the full-text search engine

Full-text search engines implement prefix searches first looking for words starting with the specified prefix in the index, and then doing a full-text search using the resulting list of words. Usually the list generated in the first step is limited in length for performance reason; this means that searches with a prefix that matches a lot of words won’t return all the possible results. E..g a search for ‘a*’ won’t typically return all texts containing a word starting with ‘a’, it returns texts containing the first N words in the index that starts with ‘a’ – where N depends on the full-text search engine used and on its setup.