Scaling Beyond the CDN – Jake Holland, Principal Architect @ Akamai
How to scale beyond the CDN with 8k video, millions of simultaneous download and streams, local caches and multicast. This episode is the last in the series of 3 in which we discuss scaling the internet.
The main links discussed in this episode are:
https://github.com/GrumpyOldTroll/multicast-ingest-platform
https://github.com/GrumpyOldTroll/wicg-multicast-receiver-api/blob/master/explainer.md
Other main things we referenced:
https://blog.apnic.net/2020/07/28/why-inter-domain-multicast-now-makes-sense/
https://tools.ietf.org/html/rfc6726 (FLUTE)
https://tools.ietf.org/html/rfc8777 (DRIAD)
https://datatracker.ietf.org/doc/draft-ietf-mboned-dorms/
https://datatracker.ietf.org/doc/draft-ietf-mboned-cbacc/
https://datatracker.ietf.org/doc/draft-ietf-mboned-ambi/
https://github.com/GrumpyOldTroll/chromium/tree/multicast_new

In what seems to be a regular feature for me, a few counter-arguments to the points made in this show:
1. It took nearly 20 years for roughly 1/3 of all bits in an access network to get IPv6, I think it’s incredibly unlikely for multicast to be supported there. Where multicast has seen usage is in networks operated by those bundling services: delivering the ISPs own content to the set top box they provide. Expanding that (for video) to other devices is laudable, but fraught with issues as previous discussed (wrt WiFi etc).
2. Just because 20Mbit/s is recommended for 4K streams doesn’t mean that’s the average bitrate, that is the peak. Realistically, the average bitrate is lower than this, but even if we use the 20Mbit/s figure you can be assured that the vast majority of streaming clients are not subscribed to the highest bitrate streams, and that HD (or even SD) streams will consume the vast majority of streams.
For 100M concurrent viewers of the superbowl, that’s almost certainly counting viewers as people, not devices, and it’s probably much less than half of this figure… In a non-COVID year anyway!
Regardless, when the superbowl is not on you need the capacity to stream what viewers want to watch on demand, so if on this one day per year you find most people are watching the same content, you still need the network capacity for all the other days of the year.
3. Streaming OS and game updates via multicast is a terrible idea outside of the enterprise situation. You have to rely on all their devices being turned on (which is possible if you schedule the download for an overnight session for the device to wake itself up, but then the traffic demands on the last-mile access network are considerably smaller anyway) and can only send data at the lowest common denominator.
Taking the 150GB update as a figure, if you send that multicast stream at 10Mbit/s, you need roughly 30 hours of constant stream to download everything. If someone uses their internet connection during that time, you are almost certainly going to lose packets and therefore have to “fill in the gaps” later via a separate unicast catch-up channel. Whereas if you allow users to download via unicast from a local cache node, and it is spread out over a day or so, if the user has 100Mbit/s internet connection they can download that same amount of data in a little over 3 hours.
Multicast is a great tool, for sure, but almost all of these use cases that are shared are solutions looking for a problem. I don’t for a moment doubt that there are good cases for multicast, but on a sliding scale of effort versus benefit, unicast almost always wins out considering you need a well-built unicast network to start off with.
Responses:
1. Most access networks already have hardware support that requires only a config change to enable. This is one of the key reasons the idea is viable. (But with that said, “most” means that you can’t just take it for granted if you haven’t checked your gear or run TV with it.)
2. The days with big video events actually have a difference in amount of traffic, not just which specific things people are watching. It’s this change in amount of traffic that causes the network disruptions. Yes, for day-to-day non-peak usage you’ll still need to be growing capacity over time, but getting slammed by the difference-in-kind demand for those big days can stop being a recurring issue.
3. The premise of this response is based on incorrect assumptions about how the transport protocols operate for file transfer.
This is not a theoretical claim, we have a demo downloader running that starts on-demand, has receiver-driven bandwidth flexibility, is robust to loss, and doesn’t require a longer download than unicast delivery.
While it’s possible to get better network efficiency if you synchronize the scheduling (which might be useful for OS delivery and maybe sometimes for game delivery), even without scheduling there’s so much concurrency on a big release day that multicast can be leveraged very effectively for this traffic if you send it right.
I’ll also disagree that anything about this is a solution looking for a problem. This is an operations problem for peak days that’s happening frequently today. And it’s incredibly expensive to address it by digging up streets and laying new wires to cover these peak demand loads well, so it’s just going unaddressed. The mismatch between capacity and demand is causing disruptions for millions of end users several times per year, and they’re universally driven by pushing the same bits to more people than usual on the same day.
Those who are willing to actually look at the numbers and experiment quickly discover that the problem is both real and highly tractable via multicast. Yes, there’s still work to do before it’s in production, but I would urge anyone who encounters hand-wavy counterarguments not to dismiss it out of hand without taking a good hard look first, especially if you’re considering upgrading hardware.