Data & AI

Video real-time insights

Video has become ubiquitous on the Internet, broadcasting channels, as well as that captured by personal devices.


This has encouraged the development of advanced techniques to analyze the semantic video content for a wide variety of applications, such as video representation learning. video highlight detection, video summarization, object detection, visual recognition, sentiment analysis, semantic segmentation, situational awareness and so on.


Microsoft is developing different solution to analyze the video in real-time. Here a Microsoft Video Indexer demo.

Video Indexer is a cloud service that enables you to extract the following insights from your videos using artificial intelligence technologies:

  • Audio Transcription: Video Indexer has speech-to-text functionality, which enables customers to get a transcript of the spoken words. Supported languages include English, Spanish, French, German, Italian, Chinese (Simplified), Portuguese (Brazilian), Japanese and Russian (with many more to come in the future).
  • Face tracking and identification: Face technologies enable detection of faces in a video. The detected faces are matched against a celebrity database to evaluate which celebrities are present in the video. Customers can also label faces that do not match a celebrity. Video Indexer builds a face model based on those labels and can recognize those faces in videos submitted in the future.
  • Speaker indexing: Video Indexer has the ability to map and understand which speaker spoke which words and when.
  • Visual text recognition: With this technology, Video Indexer service extracts text that is displayed in the videos.
  • Voice activity detection: This enables Video Indexer to separate background noise and voice activity.
  • Scene detection: Video Indexer has the ability to perform visual analysis on the video to determine when a scene changes in a video.
  • Keyframe extraction: Video Indexer automatically detects keyframes in a video.
  • Sentiment analysis: Video Indexer performs sentiment analysis on the text extracted using speech-to-text and optical character recognition, and provide that information in the form of positive, negative of neutral sentiments, along with timecodes.
  • Translation: Video Indexer has the ability to translate the audio transcript from one language to another. The following languages are supported: English, Spanish, French, German, Italian, Chinese-Simplified, Portuguese-Brazilian, Japanese, and Russian. Once translated, the user can even get captioning in the video player in other languages.
  • Visual content moderation: This technology enables detection of adult and/or racy material present in the video and can be used for content filtering.
  • Keywords extraction: Video Indexer extracts keywords based on the transcript of the spoken words and text recognized by visual text recognizer.
  • Annotation: Video Indexer annotates the video based on a pre-defined model of 2000 objects.

Once Video Indexer is done processing and analyzing, you can review, curate, and publish the video insights.

Whether your role is a content manager or a developer, the Video Indexer service is able to address your needs. 

In this demo a realtime workflow for processing frames from a video file to derive realtime insights such as objects, faces, demographics, emotion, unique face counting and face identification, celebrity detection.



The business scenarios where you can apply a real-time video insights could be:

  • Public transit (train stations, airports, etc…)
  • Retail store experiences
  • Hotel, resort, and casino environments
  • Waiting rooms
  • Gas pumps
  • Digital ads and directories in public spaces like shopping malls
  • ATMs

Related Articles

One Comment

Back to top button