Volume 13, Number 2

A Statistical Model for Morphology Inspired by the Amis Language


Isabelle Bril1, Achraf Lassoued2 and Michel de Rougemont3, 1Lacito-CNRS, 2University Paris II, 3University of Paris II and IRIF-CNRS


We introduce a statistical model for analysing the morphology of natural languages based on their affixes. The model was inspired by the analysis of Amis, an Austronesian language with a rich morphology. As words contain a root and potential affixes, we associate three vectors with each word: one for the root, one for the prefixes, and one for the suffixes. The morphology captures semantic notions and we show how to approximately predict some of them, for example the type of simple sentences using prefixes and suffixes only. We then define a Sentence vector s associated with each sentence, built from the prefixes and suffixes of the sentence and show how to approximately predict a derivation tree in a grammar.