A question-answering system aims at answering questions written in natural language with a precise answer.
Web Question-answering is an end-user application. FIDJI is an open-domain QA system for French and English
Information retrieval on the Web or in document collections
Document retrieval systems such as search engines provide the user with a large set of pairs URL/snippets containing relevant information with respect to a query. To obtain a precise answer, the user then needs to locate relevant information within the documents and possibly to combine different pieces of information coming from one or several documents.
To avoid these problems, focused retrieval aims at identifying relevant documents and locating the precise answer to a user question within a document. Question-answering (QA) is a type of focused retrieval: its goal is to provide the user with a precise answer to a natural language question. While information retrieval (IR) methods are mostly numerical and use only little linguistic knowledge, QA often implies deep linguistic processing, large resources and expert rule-based modules.
Most question-answering systems can extract the answer to a factoid question when it is explicitly present in texts, but are not able to combine different pieces of information to produce an answer. FIDJI (Finding In Documents Justifications and Inferences), an open-domain QA system for French and English, aims at going beyond this insufficiency and focuses on introducing text understanding mechanisms.
The objective is to produce answers which are fully validated by a supporting text (or passage) with respect to a given question. The main difficulty is that an answer (or some pieces of information composing an answer) may be validated by several documents. For example:
In this example, the information “French Prime Minister” and “committed suicide” are validated by two different complementary passages. Indeed, this question may be decomposed into two sub-questions, e.g. “Who committed suicide?” and “Are they French Prime Minister?”.
FIDJI uses syntactic information, especially dependency relations which allow question decomposition. The goal is to match the dependency relations derived from the question and those of a passage and to validate the type of the potential answer in this passage or in another document.
Another important aim of FIDJI is to answer new categories of questions, called complex questions, typically “how” and “why” questions. Complex questions do not exist in traditional evaluation campaigns but have been introduced within the Quaero framework. Answers to these particular questions are no longer short and precise answers, but rather parts of documents or even full documents. In this case, the linguistic analysis of the question provides a lot of information concerning the possible form of the answer and keywords that should be sought in candidate passages.
PC with Linux platform
Available for licensing on case-by-case basis