MMS • Sergio De Simone
BERT, Google’s latest NLP algorithm, will power Google search and make it better at understanding user queries in a way more similar to how humans would understand them, writes Pandu Nayak, Google fellow and vice president for Search, with one in 10 queries providing a different set of results.
BERT, short for Bidirectional Encoder Representations from Transformers, is an NLP algorithm that uses neural networks to create pre-trained models, i.e., models that have been trained using the huge amount of unannotated text available on the Web. Pre-training models can be seen as general-purpose NLP models that can be further refined for specific NLP tasks. Google open sourced BERT last year claiming it provided state-of-the-art results on 11 NLP tasks, including the Stanford question answering dataset, which not only tests the ability of a system to provide a correct answer to a question but also its ability to not provide one when it cannot answer that question.
What sets BERT apart from other algorithms is its bidirectionality. This means BERT is able to define the context defining the meaning of a word not only considering parts of the same sentence leading to that word, but also parts following it. Bidirectionality makes it possible to understand that the word “bank” in “bank account” has a completely different meaning than it has in “river bank”, for example.
When applied to search, BERT will enable understanding important details of a query, particularly in complex, conversational queries and queries using prepositions such as “to” and “for”. For example, says Nayak, in the query “2019 Brazil traveler to USA need a visa”, the “to” exactly qualifies travellers as people travelling from Brazil to the USA, and not the opposite. Similarly, in the sentence “can you get medicine for someone pharmacy”, the occurrence of “for someone” radically changes the sentence meaning, and BERT is able to understand it.
Nayak also hints at the dichotomy between conversational and keyword-based searches as one of the driving factors behind the use of BERT for search. For each query, Google has to understand whether it contains keywords or a question made in natural language that has a definite meaning. When Google interprets a list of keywords as a natural language question, the top results it shows may not contain one or more of the required keywords, making those results of little use.
This issue is echoed in a number of comments on Hacker News complaining about the need to quote each word in a query to have it treated as a keyword. This also raises the question whether a human can be better at translating a question into a set of keywords than ML at extracting its real meaning. Google is clearly betting on the latter option, but search is admittedly still not a solved problem and it will be for a long time foreseeably.