Automatic Speech Transcription

Automatic Speech Transcription

Vocapia Research develops core multilingual large vocabulary speech recognition technologies* for voice interfaces and automatic audio indexing applications. This speech-to-text technology is available for multiple languages. (* Under license from LIMSI-CNRS)

Target users and customers

The targeted users and customers of speech-to-text transcription technologies are actors in the multimedia and call center sector, including academic and industrial organizations interested in the automatic mining processing of audio or audiovisual documents.

Application sectors

This core technology can serve as the basis for a variety of applications: multilingual audio indexing, teleconference transcription, telephone speech analytics, transcription of speeches, subtitling…

Large vocabulary continuous speech recognition is the key technology for enabling content-based information access in audio and audiovisual documents. Most of the linguistic information is encoded in the audio channel of audiovisual data, which once transcribed can be accessed using text-based tools.

Via speech recognition, spoken document retrieval can support random access using specific criteria to relevant portions of audio documents, reducing the time needed to identify recordings in large multimedia
databases. Some applications are data-mining, news-on-demand, and
media monitoring.


The Vocapia Research speech transcription system transcribes the speech segments located in an audio file. Currently systems for 17 languages varieties are available for broadcast and web data. Conversational speech transcription systems are available for 7 languages.

The transcription system has two main components: an audio partitioner and a word recognizer.

The audio partitioner divides the acoustic signal into homogeneous segments, and associates appropriate (document internal) speaker labels with the segments.

For each speech segment, the word recognizer determines the sequence of words, associating start and end times and a confidence measure for each word.

Technical requirements:

PC with Linux platform (via licensing use).

Conditions for access and use:

The VoxSigma software is available both via licensing and via our web service.

Q-Tech-Vocapia-automatic speechtranscription-visuel


  • Vocapia

Contact details:

Bernard PROUTS
+33 (0)1 84 17 01 14

Vocapia Research
28, rue Jean Rostand
Parc Orsay Université
91400 Orsay