Automatically tags media content according to what the crowd says of it over the Web
Web contributors’ reviews and comments posted on dedicated web sites have proven to embed cleverness from which valuable descriptive metadata can be extracted. These descriptive metadata, while being synchronized with the content timeline or not, help any of the above usages and services about content. Furthermore, these associated metadata raise the value of the related content over time, which is of interest to content owners as well as to content providers.
The “Content tagging according to crowd sourced metadata” module automatically extracts metadata from what the crowd says about media content on the Web. It currently extracts named entities from subtitles, comments and reviews. It also extracts from posted comments: quotes of movie dialogs and quotes of other comments. Furthermore, it characterizes contributors to forums according to their connections to other contributors and according to their behaviour over time on these forums.
From the characterization of contributors it is expected to get a simple characterization also of the comments which content should be analysed first, since the analysed corpus is constituted of an infinite and constantly growing stream of words, and since maybe all of the posted texts are not of great interest.
Natural language processing and temporal graphs analysis dedicated modules have been developed, specialized for the specific purpose of extracting descriptive metadata and as long as possible their synchronization with the media timeline, we the aim of enriching the description of media content and increasing its value over time.
Corresponding deliverables are all stated QL – i.e. these modules are only available to a subset of PVAA partners, on their argued request.
Related IPL is proprietary of Technicolor.