Chair for Technologies and Management of Digital Transformation
Univ. Prof. Dr. Ing. Tobias Meisen
Industrial Natural Language Processing and Information Extraction
Today, companies generate huge amounts of heterogeneous structured, semi-structured and unstructured data. While the structured and semi-structured data can be analyzed and processed very well due to their given structure, the unstructured data always poses new challenges to companies. Of all things, unstructured data makes up about 80% of all company-wide data. Among other things, this data includes
- Text documents
- Speech data (audio recordings, chats, etc.)
With our research focus "Industrial Natural Language Processing & Information Extraction", we pursue the goal of making this data findable and usable and to analyse this data subsequently. In addition to the application of a variety of algorithms and methods of machine learning and artificial intelligence for the extraction and analysis of existing data, our focus lies on the development and enhancement of current methods in the following two areas.
In the context of information extraction, we focus our research especially to the development of methods for the targeted analysis of text and image documents. This enables us to recognize and extract tables, diagrams and other elements for PDF documents.
In the context of Industrial Language Processing, we focus on the development and design of effective micro service-based NLP architectures for the efficient analysis of large amounts of data and on the anonymization and pseudonymization of personal information within text documents.
Selected relevant publications
Data-Driven Recognition and Extraction of PDF Document ElementsTechnologies, 7(3):65
Understanding Vocabulary Growth Through An Adaptive Language Learning System
Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning , page 65--78.
Publisher: LiU Electronic Press, Turku, Finland
Evaluation and Comparison of Cross-lingual Text Processing Pipelines
2019 IEEE Symposium Series on Computational Intelligence (SSCI) , page 417--425.