The amount of information in the world is increasing at a very high rate. With this increases the problem of efficiently and effectively using the abundant information. Rough Set is a mathematical tool to discover patterns hidden in data. It deals with the ”decision” of ”selecting” suitable attributes and objects. There are collections of objects which may not be defined using a given knowledge, Rough Sets allows us to view them approximately. We shall discussed the problem of suitable attribute and object selection applied to text data using the Rough Set Theory.
Rough Sets are highly used in various domains such as medical data analysis and disease diagnosis, information retrieval, text mining, economic and financial prediction. The advantage of using Rough Sets is that given a knowledge base the facts hidden in data are extracted out. There is no need of additional information like ”cut-offs (thresholds)”, ”interval lengths”, etc. Further, once the knowledge base is formed the problem becomes domain independent. Thus, this is a closed-world problem.