Strange Loop

2009 - 2023

St. Louis, MO

Tuning Elasticsearch for English-Language Precision

Everyone knows that English is weird, but just how weird it is becomes glaringly apparent once you try to build word-driven search. English is full of edge cases, irregularities, and just plain head-scratchers. Is "911" a word? How about "B2B"? How about "one-and-done", "look (something) up", or "AWOL"? How much do you have to know about language and linguistics to build something that covers the full range of possible English words? With Elasticsearch, you can create custom analyzers, tokenizers, and mappings that help you find "the right word", no matter how weird that word might be, and it's easier than you might expect. Using real-world examples from a large online dictionary project, you'll see how to juggle the tradeoffs between precision and recall, how to rank and score results, and how to push Elasticsearch to handle the full panoply of the English language. (And there might even be emoji!)

Erin McKean

IBM/Wordnik

Speaker site

@emckean

emckean

Erin McKean loves talking about dictionaries and databases (and how dictionaries are actually databases) to anyone who will stand still long enough. Before Node.js, she dabbled in Ruby, HyperCard, Perl, and Omnimark, and still finds herself writing bash scripts on a regular basis. She is a developer evangelist for LoopBack at IBM, helping people create simple CRUD APIs quickly, and the founder of Wordnik.com, which has a lot of fun APIs! In her spare time she sews clothes and makes Twitterbots.