Use cases for including AI/ML in your OTT workflows

Matthijs Langendijk
4 min readNov 7, 2022

If there’s anything a hot topic these days, it’s Artificial Intelligence (AI) and Machine Learning (ML). With the recent rise of text-to-image tools like Dall-E, Midjourney, and Stable Diffusion, everyone is looking at using AI and ML techniques. So how does that translate to the world of OTT? In which areas could we use AI/ML to make our applications better, and how can we improve or even fully replace manual processes?

Opening and end credit detection

Replacing manual-intensive labour is where AI and ML are incredibly useful. With the amount of content finding its way to streaming platforms, work to maintain said content becomes increasingly high. One of those areas definitely lies with the start and end credits. While having the start and end credits defined in a CMS may sound simple, manual labour is required to get them there. For each and every episode of a series someone has to watch the content, find and note down the start and end credit start- and end-times; and then enter all that information into a CMS. It’s a lengthy process that will always have to be done for each piece of content.

Wouldn’t be great if that could be automated? That’s exactly where AI and Machine Learning come into play. It’s entirely possible to train a machine learning model on detecting start and end credits. That data can then simply be retrieved and placed into a CMS. The manual labour is in this case entirely replaced by technology.

Even better though, is the fact that this technology is readily available on the market. You don’t have to reinvent the wheel and can rely on the likes of AWS, for example. With Amazon Rekognition, you can get all your content processed and easily find the opening and end credits, among others.

Subtitles via Speech-to-text, and translations

While the tech world has been focused on text-to-image generators in recent weeks, OpenAI actually released a different tool a few weeks ago. This tool, called OpenAI Whisper, is able to translate spoken sentences into text. Imagine not having to manually provide subtitles, but instead using technology to fully generate them for you!

Another important aspect is different languages. Especially if you’re operating in many countries across the world, you’ll need translations of your subtitles for many languages. Having AI provide the translations for you, removes almost all need for manual labour. This is also where OpenAI’s Whisper can come into play again. Whisper can recognise spoken words and translate them at the same time. This provides sentences in different languages, where the sentence follows the natural flow of the language. This is incredibly powerful, as some other translation tools often translate the sentences word-for-word, creating really weird results.

Metadata generation

Another aspect that AI is incredibly useful, is generating metadata. Rather than having people manually look at the content and provide the metadata for it, AI can be used to generate it instead. By analysing the content, it’s possible for AI to recognise what is actually happening on screen. Whether it’s understanding that there is a fighting scene, what kind of scenery the movie or episode takes place in, or even which actors and actresses are present at a certain time; AI can interpret the video (and audio) content and write the metadata based on what it has seen.

So what would you use this big bag of generated metadata for? If anything, it’s incredibly useful for recommendations! While in many cases recommendations are still generated based on keywords like genres, leveraging the full range of metadata is increasingly being used across streaming companies. More detailed metadata, especially more narrowed down, of course, provides for better-suited recommendations.

Thumbnails and Posters

What you might have noticed already while browsing through the likes of Netflix, is the fact that suddenly a popular actress or actor is shown in the majority of posters. Netflix in this case is trying to bank on the fact that you really dig watching a certain person, and will try to persuade you to watch content where this person is also playing in. But how incredibly labour-intensive must it be to generate thumbnails for a series with the many different actors and actresses in it.

This is another perfect example where we can use AI/ML for generating thumbnails. In this case, you can easily use video content recognition for actors and actresses. When the AI recognises the specific actor or actress you’re looking for, it can screengrab a specific scene and make it into a nice-looking poster. Yes, I’ll always recommend looking at the generated results or at least doing a sampling test. But it’ll definitely allow you to reduce the manual work by quite some margin.


AI and Machine Learning technologies are here to help. That should, in my opinion, be the key takeaway from this blog. There are many aspects of current manual-labour workflows in the OTT space, that in one way or another can be improved or even fully replaced by technology. Whether it’s generating metadata or thumbnails, or recognising parts of the video and audio for showing credits and subtitles; AI and ML are here to help make your OTT applications better. It would be a shame not to use the technology to save time and money, and improve your applications.