clemTEXT by clemVOICE

By Clementine

Hungarian text processing solution.

Delivery method

Download

With the help of clemTEXT information content of a large number of electronically stored Hungarian-language documents (articles, publications, business and scientific reports, electronic messages, customer service entries, etc.) can be analysed, taking into account words, phrases and their context. There are text mining products at international level, but these are primarily for English and to a lesser extent for other internationally spoken languages.

Tokenization and stemming

Enables shallow linguistic analysis of processed text, which suffices in most text mining tasks like word tokenization and word stemming, which is very important for Hungarian language since it has rich morphology

POS tagging

MSD (Morpho-Syntactic Description) encoded Part-Of-Speech tagging of the words

Dependency analysis

enables deep linguistic analysis of the processed text, being able to identify structural linguistic patterns and their relations, often used in relation extraction tasks