Brigitte Bigi
Publications Software Corpus


I am Brigitte Bigi, working in Aix-en-Provence, France. I'm a CNRS researcher at the Laboratoire Parole et Langage.

Interested in the software I've worked on? Do check my software :)

For a list of publications, see the publications page.

If you would like to contact me please email Or, contact/follow me by phone: (+33/0) 413 552709 or on the social media.


My research topics are related to multimodal corpora:

Since 2011, all my researches are programmed, tested, documented and freely distributed: this results in a software tool with name SPPAS. It is daily developed with the aim to provide a robust and reliable software for the automatic annotation and for the analyses of annotated-data. As the primary functionality, SPPAS proposes a set of automatic or semi-automatic annotations of recordings:

Some special features are also offered in SPPAS for managing corpora of annotated files; particularly, it includes a tool to filter multi-levels annotations. Some other tools are dedicated to the analysis of time-aligned data; as for example to estimate descriptive statistics, and a version of the Time Group Analyzer (Gibbon 2013), etc.



I am a graduate from Avignon University with a PhD in Computer Science. From 1997 to 2000, I worked with Professor Renato De Mori at LIA, France. I worked on statistical language modelling for automatic speech recognition and information retrieval. I had introduced a new effective model for topic identification.

From 2000 to 2002, I worked with Professor Jean-Paul Haton and Pr Kamel Smaïli at LORIA, Nancy, France. My work focused on topic identification in newspaper articles and e-mails.

From 2002 to 2009, I worked at LIG on statistical language modelling for automatic speech recognition and statistical machine translation.

Since 2009, at LPL (Laboratoire Parole et Langage, Aix-en-Provence, France), my research has focused on corpus creation and annotation of speech recordings. The main problem I am interested in is to automatically time-align speech data with textual data and to exploit the time-aligned results. My research focuses on language-independent approaches to tools and systems development so that they can be used either for languages with few available data resources or for languages with unexpected amount of – unnecessary – data. I am the author and developer of SPPAS: Automatic Annotation of Speech, which includes 7 automatic annotation components (Momel and INTSINT, IPUs-segmentation, Tokenization, Phonetization, Forced-Alignement, Syllabification, and Repetitions detection), and 6 components for the analysis of annotated data.


Past research topics are related to text corpora

Professional Experiences