clemTEXT by clemVOICE
With the help of clemTEXT information content of a large number of electronically stored Hungarian-language documents (articles, publications, business and scientific reports, electronic messages, customer service entries, etc.) can be analysed, taking into account words, phrases and their context. There are text mining products at international level, but these are primarily for English and to a lesser extent for other internationally spoken languages.
Tokenization and stemming
Enables shallow linguistic analysis of processed text, which suffices in most text mining tasks like word tokenization and word stemming, which is very important for Hungarian language since it has rich morphology
MSD (Morpho-Syntactic Description) encoded Part-Of-Speech tagging of the words
enables deep linguistic analysis of the processed text, being able to identify structural linguistic patterns and their relations, often used in relation extraction tasks