SEMANTEX™ CAN HANDLE DATA FROM ANY TEXT SOURCE
The flexible nature of the Semantex text extraction system allows input from a variety of source types including news, blogs, e-mail, translated text (FBIS), technical documents, transcribed audio and classified HUMINT documents. Semantex has information extraction pre-processing technology that can be applied to normalize inputs and stramline the extraction process.
Janya’s tokenlist results structure contains all the output of a single document including any location or time normalization and relationships between entities. This XML structure can be presented in several different formats, or converted to meet the specifications of the user or 3rd party application.
The output of individual documents may then be further combined, creating richer entity profiles and event scenarios across a document set. The resulting output set allows analysis across documents, uncovering details and relationships that are not evident within a single document. See Cross Document Fusion.