Question Answering System using PLSA

Introduction

Question answering systems is a wide and active area, with much scope of further research. In our project we look into the semantics of the web documents (taken from Wikipedia) to answer the questions, asked from the user. The principal algorithm used is PLSA to find the sentences, in which the topic is similar to the query. Then we use cosine similarity algorithm to rank the answers.

What is a Knowledge Based Question Answering System ?

In order for the computers to interact with the users more naturally,the computer must understand and infer what the user wants to say from what he actually says(the latent meaning). Question Answering System (referred to QAS henceforth) is basically a search program that strives to give the best possible answers to the questions asked from it by the users from the knowledgebase it has. The knowledgebase could be local or stored in a remote machine. Probabilistic Latent Semantic Analysis (PLSA) is a statistical technique for the analysis of two-mode and concurrent data. PLSA evolved from latent semantic analysis, adding a sounder probabilistic model. PLSA has applications in information retrieval and filtering, natural language processing, machine learning from text, and related areas. There is a two-fold complexity of the question which must be considered, before answering a question:

Acquire user’s true intentions from question
The answer should be complete in best possible way and at same time non-redundant

We use Wikipedia as the web knowledge base as this specific knowledge database usually contains the entire information neededto answer a complex question.

Our Approach

Their can be many types of Question Answering system. But, in our project, we have focused on descriptive question answering, where the system gives the most probable sentence as the answer, to the question asked, from its KB. To get the implementational details of the project have a look on the report, that describes it's deepest quirk.