Volume 9, Number 13, November 2019
SFERANET: Automatic Generation of Football Highlights
Authors
Vincenzo Scotti, Licia Sbattella and Roberto Tedesco, DEIB, Politecnico di Milano, Italy
Abstract
We present a methodology for automatic generation of football match “highlights”, relying on
the commentator voices and leveraging two multimodal NNs.
The fist model (M1) classifies sequences and provides a representation of such sequences to be
elaborated by the second model. M2 exploits M1 to decode unbound streams of information,
generating the final set of scenes to put into the match summary.
Raw audio, along with transcriptions generated by an ASR, extracted from 369 football matches
provided the source for feature extraction. We employed such features to train M1 and M2; for
M1, the feature streams were split in sequences at (nearly) sentence granularity, while for M2
the entire streams were employed. The final results were promising, especially if adopted in a
semi-automatic, real-world video pipeline.
Keywords
Neural Networks, NLP, Voice, Text, Summarisation