Janya has developed stand-alone tools to facilitate handling of a wide range of unstructured text formats (HUMINT and IMINT message traffic, open source, newswires, etc.). Semantex™ includes interfaces uniquely adapted to handle the results of these systems.
CASE RESTORATION
Case Restoration automatically converts documents lacking case information, commonly intelligence reports that are all uppercase, to standard mixed case text through machine learning algorithms for more accurate entity identification and parsing.
TEXT ZONING
TextZoner is able to take English documents and generate mark-up reflecting document layout. In particular, the tool is able to identify section headers, page breaks, tables, etc. A unique rule-specification language and control structure were developed that enables users to specify their own zoning rules. The TextZoner system has been in use for over two years now, at both National Air and Space Intelligence Center (NASIC) and Joint Warfare Analysis Center (JWAC).