Augmenting Linguistic Semi-Structured Data for Machine Learning - A Case Study using Framenet

Breno W. S. R. Carvalho; Aline Paes; Bernardo Gonçalves; Breno W. S. R. Carvalho; Aline Paes; Bernardo Gonçalves

doi:10.5121/csit.2020.101201

Volume 10, Number 12, October 2020

Augmenting Linguistic Semi-Structured Data for Machine Learning - A Case Study using Framenet

Authors

Breno W. S. R. Carvalho¹, Aline Paes² and Bernardo Gonçalves³, ¹IBM Research, Universidade Federal Fluminense (UFF), Brazil, ²Universidade Federal Fluminense (UFF), Brazil, ³IBM Research, Brazil

Abstract

Semantic Role Labelling (SRL) is the process of automatically finding the semantic roles of terms in a sentence. It is an essential task towards creating a machine-meaningful representation of textual information. One public linguistic resource commonly used for this task is the FrameNet Project. FrameNet is a human and machine-readable lexical database containing a considerable number of annotated sentences, those annotations link sentence fragments to semantic frames. However, while the annotations across all the documents covered in the dataset link to most of the frames, a large group of frames lack annotations in the documents pointing to them. In this paper, we present a data augmentation method for FrameNet documents that increases by over 13% the total number of annotations. Our approach relies on lexical, syntactic, and semantic aspects of the sentences to provide additional annotations. We evaluate the proposed augmentation method by comparing the performance of a state-of-the-art semantic-role-labelling system, trained using a dataset with and without augmentation.

Keywords

FrameNet, Frame Semantic Parsing, Semantic Role Labelling, Data Augmentation.

Subscription Membership AIRCC CSCP Contact Us
All Rights Reserved ® AIRCC