AI: Question Answering System using Semantic Thesaurus – Wordnet and Wikipedia

[This is a brief review article of the research in this area and specifically in [1] ]

In this article we discuss an article by (Ray et al) [1] named “A semantic approach for question classification using WordNet and Wikipedia”  in which Artificial Intelligence techniques of Natural Language Processing, semantics, search are used.

First let us consider what is the workflow of question answering system. We all ask questions from search engines. How is it processed by the AI Engine.

As per the authors [1] of this research paper and to the contribution of several other researchers the task of question answering consists of following three major parts:

  1. Question Processing

In this part the question is understood. There are various ways in which this part is tackled by researchers. The authors of this article have used Natural Language Processing techniques developed in which they parse the questions to analyse the type of question and hence process it to obtain the answer. The aim is to identify what exactly the user is asking, resolve ambiguity (such as bank-financial bank or river bank) and expand the query if required.

2. Document Processing

Here, in this segment the once the keywords which are relevant to user search string have been extracted the task is to retrieve the documents relevant to it. Search engines are being fed these keywords and documents most related to it are retrieved.

3. Answer Processing

This part deals with using the retrieved documents for the question to form the answer to the user query. Answer should be correct, concise and comprehensive.  Typically extracted as it from the top most articles in Step 2 of document processing.

These are the three basic steps followed by researchers and AI tools in question answering tasks. Now here I briefly mention to you, how the authors of  article [1] in paper “A semantic approach for question classification using WordNet and Wikipedia“ have solved this problem of question answerign using Wordnet and Wikepedia.

Their research focus on all three aspects but as mentioned above and that it is pivotal to do the question classification more accurately. If there are word sense disambuiguations, they have to be dealt, names entities have to be determines and most impotantly the type of question have to be found.  All this comes in question classification, which is followed by query expansion in many other question answering frameworks.

Here the authors [1] have elaborated the kinds of questions

  1. Questions starting with non significant verb phrases
  2. Who question.
  3. When question.
  4. Where question.
  5. Who  question.
  6. Which question.
  7. Why question.

The authors have specified parsing rules for all these types of questions. For instance a question can be of type : When <fillers> <Noun Phrase> <Verb Phrase> to give you an example. For more details you can study the research paper [1] in reference.

Once the type of question and entities, entity type tree are determined . Wordnet and Wikepedia is used to determine the main words, synonyms, hypernym,…intersection of words in two sets is taken.  It is mentioned by authors [1] that Wikipedia articles especially first paragraph follows specific patterns and words are semantically related by Wordnet relations.

Evaluation in terms of precision  being performed [1] on

  • Training: Li and Roth’s 5500 question
  • Testing: TREC 1999–2003 questions set

Finally, this has also been evaluated on answer generation wherein the relevant answers are generated from the passage extracts retrieved with the generated entities as mentioned in above module of question classification. In this phase, similarity is computed between the expected results and the results generated by keyword search from extracted information. This module aims to find relevant documents similar to query entities, where the the answer generated should match with the results of search engines too.

For details refer to [1]

References

[1] Santosh Kumar Ray, Shailendra Singh, Joshi. “A semantic approach for question classification using WordNet and Wikipedia” Pattern Recognition Letters 31(2010). pp. 1935-1943.

Published by nidhk

I have an eager research-based approach to solve problems in the domain of Artificial Intelligence and Computer Applications. I find solutions based on my strong knowledge and foundations in the subjects like Artificial Intelligence, Machine Learning, Data Mining, Optimization Techniques, Linear Algebra to mention a few. This is augmented by my high standard of coding skills which vary from C++, Java, Perl to Data Science languages such as Python, R and MATLAB. To further establish, it many of the my works have already been published online as research papers in well reputed journals. I have intense experience in Natural Language Processing applications such as summarization, search, retrieval, sentiment analysis, wordnet, deep learning. I have completed PhD specializing in Artificial Intelligence. Having worked on real time implementations of various applications of Computer Science. The domains that I have worked on are Health Care System, Electronic Document Management Systems, Natural Text Mining, EDA, Web Development etc. Apart from profession, I have inherent interest in writing especially poems, stories, doing painting, cooking, photography, music to mention a few!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: