Identification and Visualization of Legal Definitions and their Relations Based on European Regulatory Documents

Anastasiya Damaratskaya


Analyzing regulatory documents is a continuous challenge for numerous companies, especially if it is a manual process. Considering the exponential growth in legal acts, legal practitioners must invest vast amounts of time examining the legal text for relevant information. Nevertheless, the manual analysis remains susceptible to errors and misinterpretation. This thesis concentrates on semi-automating this procedure and presents an approach for extracting legal definitions and their semantic relations from European regulatory documents using natural language processing techniques. We further visualize the obtained data on the implemented web service, which serves as a practical application for the approach. Since the existing methodologies addressing legal information retrieval tasks struggle with interpreting legal text and lack semantic analysis and visualization, our method intends to cover this research gap and deepen the understanding of regulatory documents. In order to identify legal definitions, we primarily investigated the legal acts structure that regulatory documents attempt to follow. After recognizing similar formats, we focused on a single article specifying legal terms, extracted definitions and analyzed all semantic relations occurring, such as hyponymy, meronymy, and synonymy. For this purpose, contingent upon the type of semantic relationship, we applied pattern matching and natural language processing techniques, emphasizing dependency parsing and noun phrase chunking. For visualization, the prototype collected the data into separate files and extracted sentences mentioning legal definitions for each related term. To rapidly discover these sentences in the text and obtain an overview of each term’s frequency, the prototype listed the articles where the definitions occur and counted the number of retrieved sentences. Additionally, it assigned annotations to the regulatory documents, explaining the legal definitions in each paragraph to facilitate comprehension of the regulatory documents. The evaluation outcomes demonstrated that the prototype could detect 99.9% of legal definitions and 96.7% of their semantic relations correctly, thereby delivering accurate results for the introduced approach. The study further fulfilled the established requirements intending to simplify the plat- form’s usage. Consequently, these results demonstrate that natural language processing techniques perform well in the classification phase and are suitable for definition and relation extraction.

legal definitions legal information extraction natural language processing.
Research Methods

Publication Data

Author: Anastasiya Damaratskaya
Thesis Type: Bachelor's Thesis
Pages: 156
Language: English
About the Author:
Major / Study Program: Informatics
Primary Field of Study:
Additional Study Interests:
License: CC BY-NC-ND 4.0
Date of Publication: 11/10/23
Status: Available
Date of Grading: 05/16/23
Institution: Technical University of Munich (Technical University of Munich, Germany)


# Name Details Endorsement
Catherine Sai
12:00:00 AM

Thesis Documents and Supplemental Materials

05/27/24 06:48:52 PM
# Description Type Upload Date Location
1 Thesis Document PDF (33.42MB) 11/06/23 12:00:00 AMIPFS Download Raw