About noise words and natural language search

Build 1501 on 14/Nov/2017  This topic last edited on: 21/Mar/2016, at 18:52

If you often do searches for headlines, so a headline (title) something like this:

Learn how to eat European-style in new cookbook

the SQL search may not find it, as the search contains noise words (highlighted).

Such behavior is the default one for the SQL server

If you remove the noise words from the query and you search again for something like this:

Learn eat European-style cookbook

Then it will find the story you want.

The solution is described in details in the http://technet.microsoft.com/en-US/library/ms187914.aspx topic. Here we give just a basic hints.

In the SQL Server Management studio, run this query:

sp_configure 'show advanced options', 1;

RECONFIGURE;

GO

sp_configure 'transform noise words', 1;

RECONFIGURE;

GO

The SQL search will then start to behave in a better way: noise words (or stopwords) will be transformed in the query, in other words ignored, and the rest of the query is evaluated.

Examples

This section illustrates the behavior of queries containing a noise word, "the", under the alternate settings of transform noise words. The sample full-text query strings are assumed to be run against a table row containing the following data: [1, "The black cat"].

With transform noise words set to 0:

Query string

Result

"cat" AND "the"

No results (The behavior is the same for "the" AND "cat".)

"cat" NEAR "the"

No results (The behavior is the same for "the" NEAR "cat".)

"the" AND NOT "black"

No results

"black" AND NOT "the"

No results

With transform noise words set to 1:

Query string

Result

"cat" AND "the"

Hit for row with ID 1

"cat" NEAR "the"

Hit for row with ID 1

"the" AND NOT "black"

No results

"black" AND NOT "the"

Hit for row with ID 1

Note

If noise words are specified in a proximity term, SQL Server removes them. For example, the noise word is is removed from CONTAINS(<column_name>, 'NEAR (hello,is,goodbye)'), transforming the search query into CONTAINS(<column_name>, 'NEAR(hello,goodbye)'). Notice that CONTAINS(<column_name>, 'NEAR(hello,is)') would be transformed into simply CONTAINS(<column_name>, hello) because there is only one valid search term.