Volume 10, Number 12, October 2020
Augmenting Linguistic Semi-Structured Data for Machine Learning - A Case Study using Framenet
Authors
Breno W. S. R. Carvalho1, Aline Paes2 and Bernardo Gonçalves3, 1IBM Research, Universidade Federal Fluminense (UFF), Brazil, 2Universidade Federal Fluminense (UFF), Brazil, 3IBM Research, Brazil
Abstract
Semantic Role Labelling (SRL) is the process of automatically finding the semantic roles of terms in a sentence. It is an essential task towards creating a machine-meaningful representation of textual information. One public linguistic resource commonly used for this task is the FrameNet Project. FrameNet is a human and machine-readable lexical database containing a considerable number of annotated sentences, those annotations link sentence fragments to semantic frames. However, while the annotations across all the documents covered in the dataset link to most of the frames, a large group of frames lack annotations in the documents pointing to them. In this paper, we present a data augmentation method for FrameNet documents that increases by over 13% the total number of annotations. Our approach relies on lexical, syntactic, and semantic aspects of the sentences to provide additional annotations. We evaluate the proposed augmentation method by comparing the performance of a state-of-the-art semantic-role-labelling system, trained using a dataset with and without augmentation.
Keywords
FrameNet, Frame Semantic Parsing, Semantic Role Labelling, Data Augmentation.