Topic segmentation of automatic speech transcripts
The targeted users and customers are the multimedia industry actors, and any content and service provider with speech data.
IRINTS (Irisa News Topic Segmenter) was designed for topic segmentation of broadcast news transcripts.
As shown on figure 1 below, input to IRINTS is an automatic transcript (in Vecsys’s VOX format or IRISA’s SSD format). The output is an XML file in SSD format specifying topic segments.
[1] http://gforge.inria.fr/projects/topic-segmenter/
[2] Masao Utiyama and Hitoshi Isahara, «A Statistical Model for Domain-Independent Text Segmentation», ACL, 491–498, 2001
[3] S. Huet, G. Gravier and P. Sébillot, «Un modèle multisources pour la segmentation en sujets de journaux radiophoniques», in Proc. Traitement Automatique des Langues Naturelles, 2008.
IRINTS was developed at Irisa in Rennes by the Texmex and Metiss teams.
The IRINTS authors are: Guillaume Gravier, Camille Guinaudeau
[1] http://www.perl.org/
[2] http://xmlsoft.org/
[3] http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
IRINTS is a software that has been developed at Irisa in Rennes and is the property of CNRS (DI 03033-01) and Inria. Registration at the Agency for Program Protection (APP) in France, is currently under process.
License can be supplied under request on a case-by-case basis.
General issues:
Patrick GROS
patrick.gros@irisa.fr
Technical issues:
Sébastien CAMPION
scampion@irisa.fr
IRISA/Texmex team
Campus de Beaulieu
35042 Rennes Cedex
France