Furhat Dialogue System

This summer Lars Ådne Heimdal and Christian Riksvold have been developing a dialogue system for the furhat robot. The system mainly revolves around answering questions about NTNU, SINTEF and Trondheim.

The first stage in the process is to create a database with knowledge about NTNU, SINTEF and Trondheim. This was achieved by developing a web crawler which crawled each institution's website and traversed the HTML to extract information. Each paragraph, list and table in every webpage was then converted into an embedding/vector using a transformer model. The transformer model has been trained to produce vectors where text with similar meaning have similar vectors. Lastly the data with its associated vector is stored in an ElasticSearch database for later retrieval.

When given a question from the user the most relevant documents in the database will have to be retrieved. We use two different searching strategies simultaneously to ensure that we get all the relevant data. The first method uses semantic search with the help of the computed embeddings/vectors. When doing semantic search, the question is first converted to an embedding/vector, called the query vector. This query vector is then compared with every vector in the database using a cosine similarity metric. The documents with vectors most similar to the query vector are then returned. The second method we deploy uses ElasticSearchs built-in text search capabilities, which relies on more traditional methods for retrieving relevant documents.

If neither the answer from Wolfram Alpha or the document search yields sufficient results, we generate an answer using a conversation model from Huggingface. This keeps track of the conversation for context and generates a fitting answer in return.

The furhat platform has built-in automatic speech recognition(ASR) with support for several languages, including norwegian and english. The robot automatically detects which language is spoken and we have programmed it to then respond in that language. It also has several possibilities of mechanical movement of its head and a range of facial expressions. This allows the robot to look at the user while he speaks and for example respond with a smile if the user smiles.