Semantex’s modular design and inherent flexibility lends itself easily to multilingual information extraction. Language packs for Chinese (simplified) and Urdu are available. With changes to only the lexicons, grammars, language models (the customizable portions of the platform), and with absolutely no changes to the core Semantex platform, support for new languages is accomplished with relative ease. Janya is currently working on support for more languages including Arabic and Russian.
Machine translation is the process of translating text from one language to another. Although many vendors provide Machine Translation solutions, the accuracy leaves a lot to be desired. In many languages, proper nouns (names) are often incorrectly translated due to multiple meanings or incomplete name lexicons. Semantex is used to first identify entities and provide context-aware name translation/transliteration; this output can be fed to Machine Translation systems to dramatically improve translation performance.
By properly recognizing and translation/transliteration of names, Semantex provides more accurate search results when querying on foreign names.