Skip to content

Multi token synonyms appear broken by synonyms upgrade #456

@orangejulius

Description

@orangejulius

Since #453, it looks like any Pelias users who wish to define custom multi-token synonyms are out of luck.

Here's a small test script to demonstrate the change in behavior:

git checkout v5.6.0 # after major synonyms upgrade (https://github.com/pelias/schema/pull/453)                                                                                                                                                 
                                                                                                                                                                                                                                               
# clear pelias index for clean slate                                                                                                                                                                                                           
node scripts/drop_index.js --force-yes &> /dev/null                                                                                                                                                                                            
                                                                                                                                                                                                                                               
# set up a custom multi token synonym                                                                                                                                                                                                          
echo "aaaa bbbb cccc dddd, abcd" >> synonyms/custom_street.txt                                                                                                                                                                                 
                                                                                                                                                                                                                                               
node scripts/create_index.js &> /dev/null                                                                                                                                                                                                      
                                                                                                                                                                                                                                               
curl -s "localhost:9200/pelias/_analyze" -H 'Content-Type: application/json' \                                                                                                                                                                 
    -d '{ "text": "aaaa bbbb cccc dddd", "analyzer": "peliasStreet" }' | jq '.tokens[] | {token}'                                                                                                                                              
                                                                                                                                                                                                                                               
git checkout v5.5.1 # before major synonyms upgrade (https://github.com/pelias/schema/pull/453)                                                                                                                                                
                                                                                                                                                                                                                                               
# clear pelias index for clean slate                                                                                                                                                                                                           
node scripts/drop_index.js --force-yes &> /dev/null                                                                                                                                                                                            
                                                                                                                                                                                                                                               
# set up a custom multi token synonym                                                                                                                                                                                                          
echo "aaaa bbbb cccc dddd, abcd" >> synonyms/custom_street.txt                                                                                                                                                                                 
                                                                                                                                                                                                                                               
node scripts/create_index.js &> /dev/null                                                                                                                                                                                                      
                                                                                                                                                                                                                                               
curl -s "localhost:9200/pelias/_analyze" -H 'Content-Type: application/json' \                                                                                                                                                                 
    -d '{ "text": "aaaa bbbb cccc dddd", "analyzer": "peliasStreet" }' | jq '.tokens[] | {token}'

On my machine, this prints only the 4 input tokens on the latest version of schema:

{
  "token": "aaaa"
}
{
  "token": "bbbb"
}
{
  "token": "cccc"
}
{
  "token": "dddd"
}

But on the version prior to the synonyms upgrade, it prints 5 tokens, including the extra synonym term

{
  "token": "aaaa"
}
{
  "token": "abcd"
}
{
  "token": "bbbb"
}
{
  "token": "cccc"
}
{
  "token": "dddd"
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions