Crowd Sourced Metadata

Crowd Sourced Metadata

Automatically tags media content according to what the crowd says of it over the Web

Target users and customers

  • Professional customers:
    • Content owners
    • Content providers
    • Service providers
  • Consumers

Application sectors

  • Content targeting
  • Content recommending
  • Content retrieving
  • Content discovering
  • Content browsing
  • Content replaying

Web contributors’ reviews and comments posted on dedicated web sites have proven to embed cleverness from which valuable descriptive metadata can be extracted. These descriptive metadata, while being synchronized with the content timeline or not, help any of the above usages and services about content. Furthermore, these associated metadata raise the value of the related content over time, which is of interest to content owners as well as to content providers.


The “Content tagging according to crowd sourced metadata” module automatically extracts metadata from what the crowd says about media content on the Web. It currently extracts named entities from subtitles, comments and reviews. It also extracts from posted comments: quotes of movie dialogs and quotes of other comments. Furthermore, it characterizes contributors to forums according to their connections to other contributors and according to their behaviour over time on these forums.

From the characterization of contributors it is expected to get a simple characterization also of the comments which content should be analysed first, since the analysed corpus is constituted of an infinite and constantly growing stream of words, and since maybe all of the posted texts are not of great interest.

Natural language processing and temporal graphs analysis dedicated modules have been developed, specialized for the specific purpose of extracting descriptive metadata and as long as possible their synchronization with the media timeline, we the aim of enriching the description of media content and increasing its value over time.

Technical requirements:

  • The « Content tagging according to crowd sourced metadata” module analyses big sets of comments and reviews – i.e. free texts – posted by contributors on Web sites dedicated to Cinema and TV
  • It currently runs as Python modules.

Conditions for access and use:

Corresponding deliverables are all stated QL – i.e. these modules are only available to a subset of PVAA partners, on their argued request.

Related IPL is proprietary of Technicolor.



  • Technicolor

Contact details:

Philippe Schmouker

Technicolor R&D France
975, avenue des Champs Blancs
ZAC des Champs Blancs
CS 176 16
35 576 Cesson-Sévigné France