Multilingual Extraction


Semantex™ has recently been enhanced to support Unicode (UTF-8) text and is now able to process multilingual text. A solution for simplified Chinese is available and other languages are in the pipeline. This feature is also useful when processing English text with foreign words that use special characters, such as the umlaut in German and accents in French. The same functionality that Semantex provides for English text can now be applied to multilingual text. Porting to a new language involves changing only language resources, not code.


Semantex can be used to augment machine translation to convert multilingual documents into English. This is especially true when translating proper names. By applying entity tagging in the native language, context aware translation is enabled which can provide more accurate translation of names. This in turn leads to more accurate cross-lingual search applications.


See the Multilingual Solutions page for more information on using Semantex with other languages.


News & Events

Meet with Janya representatives at the EUCOM Intelligence Summit in Heidelberg, Germany!



Janya launches Semantex™ 4.5.



Janya joins partners to create Savanna Solution.



Mark Logic's Open Enrichment Framework features Semantex