The Promise of UIMA

A very large part of information available to and generated by enterprises is in loosely structured text. Searching for useful, concise and relevant information deeply buried within the multitude of memos, emails, web-pages, documents and databases can be quite a nightmare if not entirely fruitless if one relies on the key word search capability offered by may search engines. OmniFind and UIMA bring to the table the ability to search for relevant information based on “key facts” rather than on a handful of words showing up in a document.

The Unstructured Information Management Architecture (UIMA) framework is an open, scalable and extensible platform that can be used for enhancing and customizing search solutions. Such solutions that can process unstructured information like free form text, including emails, web-pages, documents, etc. to find relevant facts empowers the users to find useful information based on declarative facts and implicit knowledge embedded in the search engine, that are more relevant in the context of a specific domain. It enables developers to build analytic modules and to compose analytic applications from multiple analytic providers, encouraging collaboration and facilitating value extraction for unstructured information.

Say, for instance you wanted to find all the high value computing assets that were procured before previous CIO, John Smith, was fired. The relevant information may be buried under emails, product literature, procedures, release notes, proposals, approval documents, memos, press releases etc. It would be nearly impossible to search via key words. Emails exchanged may use the words software licenses, PCs, blades, workstations, laptops, consulting and outsourced assets like data-centers etc and most probably may not use the exact words “computing assets”. Further, the time that relevant information was exchanged, in the form of emails or memos, or other documents, most probably will not use the words or phrases like “the time before changes in top management took place”. However, semantic search is the perfect solution. Important concepts relevant to the domain of interest can be captured in the form of analysis artifacts using UIMA. This extensibility in OmniFind using UIMA allows the search engine to return relevant information based on queries defined in the form of related (higher level) facts based on implicit knowledge that software licenses, PCs, and blades are all “computing assets”, and that Mr. John Smith was the CIO during the time of their purchase allows for better quality information retrieval. The alternative in terms of keyword search based on queries like “PC software license CIO before firing John Smith” and so on, would in all probability return meaningless results. Thus, OmniFind along with UIMA allows the use of existing implicit knowledge to be embedded in the parsing/indexing process which in turn allows for a more efficient information management.