SKIMSON® Information Extraction

Automated screening, analysis, and utilization of information from technical documentation

SKIMSON® is a family of analysis tools that enables automated information retrieval from heterogeneous digital documents.

The focus is on technical documents, in particular collections of detailed product descriptions and technical data sheets. Such documents are characterized by an individual, distinctive layout design and the use of company-specific terms for products and their parameters, by a multitude of physical quantities and units of measurement, by a colorful mix of continuous text, lists, and sometimes highly compressed tables, as well as by the use of footnotes and cross-references. In technical documents, the scope of validity of statements is often handled flexibly. For example, comments may refer to individual components or entire device groups.

SKIMSON® examines technical documents as a whole. The software combines system-internal AI-based layout analysis methods with various semantic NLP methods for the different information blocks and content.

Based on natural language search queries, users have access to methods for analyzing individual documents. The result is a structured representation of the desired target parameters and a hit list from the data sheet collection.

In addition, functions for batch processing entire data sheet collections are offered, which support the automated creation of digital, structured product libraries.

In both application scenarios, the traceability of the automatically generated information is guaranteed. The corresponding sources within the documents can be displayed precisely at any time.

The configuration of the processes and their optimization for various technical application areas are carried out using automated ontology generation methods.

SKIMSON® tools can be used on-site or as a cloud application. The interaction of the individual system components can be adapted to the needs of the users.

What does SKIMSON® do?

SKIMSON® is used to extract information from heterogeneous documents and provides service modules that significantly support research tasks for technical applications.

Advantages of SKIMSON®

»  Automatic extraction of information from a large number of detailed technical product descriptions from various publishers
»  Automated creation of cross-manufacturer product catalogs
»  Analysis of individual documents based on natural language queries
»  Clear presentation of search results based on a uniform, manufacturer-neutral terminology
»  Transparency and traceability through automatic referencing and highlighting of the locations in the source data
»  Can be used on-premises or collaboratively in cloud environments

How does it work?

By importing the e-mail inbox the user starts the process. SKIMSON® performs a semantic analysis and the user checks the result. SKIMSON automatically identifies any key messages and filters them in different categories (e.g. where, when, what). E-mail content that makes no sense is recognized.

The user can easily correct and complete the system's results with virtual markers. The user is the expert and trains SKIMSON® how to improve results.

All results that SKIMSON has found are stored in structured data bases for any further research work. Personal data according to the General Data Protection Regulation (DSGVO) can be encoded and secured separately.