Saturday, March 14, 2009

Thesauri for Information Retrieval Part. 1

Thesauri have been widely used in information retrieval in recent years. They are built in software to facilitate users to retrieve information on websites or content management systems. Currently, law firms and consultant companies have integrated thesauri into their websites. Thesauri could be also used to automatically index contents in databases. Most commercial databases all have thesauri to help users increase search effectiveness, such as STN, WilsonWeb. However, there are no standards for current thesauri developers to adopt or compose concepts consistently. The same concept could be displayed differently in different thesauri for different purposes. For example, "knowledge management software" could be splitted into "knowledge", "management" and "software"; it could be also broken down to "Knowledge management" and "software". The first case could happen in a general thesaurus, second one might possibly happen in a thesaurus of Information Science. People who want to integrate the above thesauri into their software will ask a question, which one is more appropriate for my system?

Except for specific purposes, most thesauri should be interoperable with software to maximize the benefit of thesauri to certain extent regardless of different domains. Sometimes, it even takes longer to customize a thesaurus for local systems than develop a thesaurus from scratch. In another word, how we could let current content management systems easily adopt available thesauri? We need a standard to standardize the way we create concepts in thesauri.