Robustness Issues in Information Extraction
Date:
Information extraction mainly includes two major tasks: named entity recognition and relation extraction, aiming to automatically extract key information from massive unstructured text, thereby effectively supporting downstream tasks such as knowledge graph construction and intelligent question answering. In the era of deep learning, since neural networks, particularly pre-trained models, can automatically extract high-level semantic features, more attention has been focused on how to construct pre-training tasks to achieve more comprehensive semantic knowledge embedding and how to efficiently use such models. However, the automatic feature extraction of deep learning models inevitably leads to shortcut learning problems, resulting in robustness deficiencies in real-world application scenarios, posing hidden dangers for downstream applications of information extraction, especially severe in low-resource environments. This talk will conduct an in-depth analysis of robustness issues in information extraction, explore the underlying reasons affecting model robustness, and introduce our research achievements in improving the robustness of information extraction models in scenarios such as weak samples, few-shot learning, unlabeled data, and cross-domain settings.
