The high level on low latency streaming

Matthijs Langendijk
6 min readJun 21, 2024

--

With the European Football Championships on in full force, nobody likes hearing their neighbours erupt in a massive celebration, just a few seconds before seeing the goal yourself. The good thing, is that us app and streaming developers have a bit of impact on that! In this blog I’m taking a look at the low latency streaming, and some of the techniques that come into play to get goals to your screen as soon as possible.

Measuring latency

Sure, knowing that your neighbours are seeing the goal earlier than you do, is a good example of how latency has an effect. But is there actually a way to effectively measure latency? Often called the ‘glass to glass’ latency (from the glass of the camera making the recording, to the glass of the TV displaying the video), it can be difficult to fully measure the latency.

You will likely know that a lot of things happen in between the camera and the TV. In fact, there are often multiple camera’s recording at the same time, and some behind the buttons is deciding which signal eventually gets sent. The signal then has to be encoded and possibly transcoded, DRM applied and finally transferred to the TV which can do the decoding. And all that might take place in different locations, different datacentres and possibly even different countries. You can imagine that the latency with all these different steps adds up, quickly.

So the easiest way to then actually measure the glass to glass latency? Surprisingly, rather than a fancy tool that’s implemented in all the steps in between, the most effective is still actually recording a timer with the camera. Then watch the video, that’s gone through the entire video processing workflow, displayed on a TV, subtract the numbers shown and there you have it: the glass to glass latency. Was that about what you expected? ;)

Different types of latency

Now that you know how to measure the latency, it means you can also classify in which kind of latency your use case falls into. Because, yes, latency is categorised into a few different categories. Given that we’re all into video, a picture is always worth a thousand words, so the below image should give you a good distinction of what type of latency your case might belong under.

In the image you can see the type of latency, and then some additional information. At the top it shows the timing. If you have a latency anywhere between 60 and 20 seconds, that would be considered ‘just normal’ (and anything more than 60, to be fair). If you’re going anywhere between 20 and 5, you’re already doing great, faster than most in the industry, with reduced or optimised latency. Getting into the lower territories, that’s when we really start to deal with low latency and even ultra low latency. Especially for Esports, gaming, and sometimes sports, low latency really is that sweet spot of being able to use normal low latency technologies while still getting a lot of the latency benefits. Finally, at the ultra lowest of the low, we’re talking sub-second latency, really is for specific use cases like web conferencing or even betting.

Selecting the right tech

What the image also already showcases, is the types of technologies you might be able to use. Where for normal and reduced latency you would stick with regular HLS/Dash, potentially with certain optimisations; for low latency and especially ultra low latency we’re talking a whole different ball game.

LL-Dash/LL-HLS (CMAF)

As you might have guessed, the LL in stands for low latency. The basics of achieving low latency is all about generating smaller segments, making the player able to play parts of the video at an earlier time. Imagine your typical segment is 6 seconds long. That already means the player needs to wait those 6 seconds before the segment is even available. Adding on top that you do want to have a small buffer in your player in order to achieve a seamless playback experience, we’re talking 3, maybe 4 segments of buffer — and before you know it we have a latency of 18 to 24 seconds.

Now, let’s do that same calculation but with a segment size of 2 seconds: 2 second segment size, times 3 or 4 buffer, equals 6 or 8 seconds latency. That’s a massive improvement! And all that just by decreasing the segment size. That’s in essence what LL-HLS and LL-Dash are for. But, you might ask, we’re not at the ‘low latency’ stage yet. Correct you are.

This is where CMAF (Common Media Application Format) comes into play, specifically CMAF-CTE, or Chunked Transfer Encoding. This part of the spec explains how you can cut segments into even smaller pieces, or chunks. At that stage, the same principle applies as before. Now that we can retrieve even smaller parts of the video, we can reduce the buffer even more, doing the exact same calculations.

Again, a picture is worth a thousand words, with this example from the friendly folks over at THEOplayer, demonstrating the exact principle of using 2 second regular Dash segments over CMAF segments with 500ms chunks:

Source: https://www.theoplayer.com/blog/low-latency-dash

WebRTC (Web Real-Time Communications)

Getting into the real ultra ultra ultra low latency where you can basically see any recording almost immediately appear on your screen — two applications of this are found with WebRTC and HESP.

The former, WebRTC is a completely different protocol from typical video streaming. How WebRTC is built, is that it creates a peer to peer connection between two parties, in the case of streaming often a client and (streaming) server. For this, it uses the UDP protocol rather than typical TCP which is used for most streaming applications. This process makes the communication extremely fast, but can also be a bit unreliable, which is the nature of UDP. By making use of that direct communication over UDP, it does allow you to reach sub-second latencies.

Important to note though, is that because of the direct peer to peer connection, it means you can’t replicate streams over a CDN as you would normally. You’ll need a good amount of servers if you want to achieve WebRTC at scale. You can of course also use a provider to manage that for you. I personally have had a great experience with the folks over at Dolby Millicast, with easy SDKs and more capacity for concurrency than I could need.

HESP (High Efficiency Stream Protocol)

On the other hand, there is the HESP protocol, coming from the HESP alliance. Rather than WebRTC which applies a completely different model, HESP is actually CMAF-compliant. The effect of this is simple, you can achieve the same sub-second latency (albeit WebRTC can go a bit lower than HESP) but still leverage a CDN to manage the amount of traffic, as it’s a HTTP-based protocol.

HESP behaves in a slightly different way than Dash or HLS. Most importantly, it uses two different types of streams to manage playback. It starts off with an initialisation stream. This stream only contains key frames, and is, as the name suggests, only retrieved on start-up of the video. Then, the second stream kicks in: the continuation stream is a typical CMAF-CTE-type stream with very small chunks. In fact, chunks can even be as small as single frame — which is the biggest difference compared to other typical streaming protocols. If you’re interested in finding out more on how HESP works, I’d recommend reading their technical whitepaper.

The downside of HESP is that it does require a specific player, as it’s a relatively new and not yet widespread streaming protocol. So if you do want to do playback with this, you’ll have to be very careful picking out the player to use.

High level low latency

It’s about time to get started with watching the next match of the European Football Championship, the Dutchies are playing today, so I’d better stop writing and check my own low latency setup. I do hope this high level overview of low latency has given you some pointers. Not only on how to recognise and quantify the latency you achieve, but also how you can potentially make some improvements, some taking a lot of effort — and some maybe a little bit less.

As always, if you’re in need of some more information or just need help getting things going (for example with those pesky encoders that I haven’t even touched base upon in this blog!) — you know where to find me! Have a good one, and enjoy the football!

--

--