Text Mining Software

The research lab Text Mining Software (TMS) specializes in the field of computational linguistics and text mining. Syntactic-semantic methods are developed to extract information from digitized continuous texts. Depending on application-specific questions, all relevant text statements are automatically identified, visualized and transferred to structured knowledge bases.

The focus is on answering the questions "Who?-When?-Where?-What?" in order to generate complete and succinct descriptions of events. The research and development in the field of natural language processing (NLP) is expressly application-oriented by developing marketable solutions in close cooperation with our partners.


  • Development of NLP basic procedures: Part-of-Speech Tagger (POS) , Named Entity Recognizer (NE),  Dependency Parser, Phrase Chunking Methods
  • Supervised machine learning to train NLP tools
  • Development of user interfaces to visualize and edit NLP results (front-end)
  • Database design (relational and graph databases) and implementation to persist results from text mining (back-end)

Research Projects


Development of solutions to facilitate evaluations of big citizen science projects

Development of procedures to automatically analyze contributions from large-scale projects of the citizen science (CS) movement. The goal of CS.RECANA was to identify relevant information on the questions „who – where –when -what“. Results are prepared for scientific use, e.g. that citizen science initiators can easily integrate them in their professional research.

Funded by: Federal Ministry for Economic Affairs and Energy / INNO-KOM-Ost; funding code: MF 150125


Method to automatically improve the accessibility of archived print media

Development of procedures to analyze and index articles of historic newspaper archives. Furthermore, the focus is on the development of methods to automatically clip digitalized newspaper articles to create press folders for specific topics.

Funded by: Federal Ministry for Economic Affairs and Energy / INNO-KOM Ost; funding code: MF 140008