How CDN Architecture Is Evolving to Deliver Sub-3-Second Live Streaming

The gap between what audiences expect from live streaming and what traditional CDN infrastructure can deliver has been narrowing steadily. But closing the final stretch — from five seconds of latency down to two — turns out to be a fundamentally different engineering problem than the one the industry solved a decade ago.

Achieving sub-3-second glass-to-glass latency over standard HTTP delivery requires rethinking how content distribution networks handle file caching, request volume, and the relationship between origin servers and edge nodes. The protocols designed to make this possible, LL-HLS and LL-DASH, introduce architectural demands that break many of the assumptions CDNs were originally built around.

Why Traditional CDN Caching Falls Short

Conventional CDN architecture is optimised for large, complete files. A video segment gets encoded, written to disk, cached at the edge, and served to viewers. The model works well when segments are several seconds long and requests arrive at a predictable pace.

Low-latency streaming inverts this model. Instead of waiting for complete segments, LL-DASH delivers content through chunked transfer encoding — bytes are pushed to viewers as they are generated, before the segment file is fully written. LL-HLS takes a different approach, splitting segments into tiny “parts” that are served individually, with the manifest file itself blocking until the next part becomes available.

Both approaches dramatically increase the number of requests hitting CDN edge servers. A single viewer watching a standard HLS stream might generate one request every six seconds. That same viewer on LL-HLS can generate multiple manifest and part-file requests per second. Multiply that across thousands of concurrent viewers and the request volume becomes an infrastructure problem rather than just a bandwidth problem.

The Protocol Split: LL-DASH vs LL-HLS

The two dominant low-latency protocols take meaningfully different approaches to the same problem, and supporting both simultaneously within a single delivery pipeline is one of the harder engineering challenges in modern streaming infrastructure.

LL-DASH uses a single manifest and relies on timing-based chunk requests. The player begins downloading segments while they are still being generated on the origin server. Playback can start from any key-frame, which makes key-frame placement strategy a direct lever for controlling latency. More frequent key-frames mean lower latency, but at the cost of encoding efficiency and potentially higher bandwidth consumption.

LL-HLS, developed by Apple, takes a manifest-blocking approach. The server holds the playlist request open until the next part becomes available, then delivers an updated manifest pointing to the new content. Safari handles this natively, but third-party players like hls.js have had to implement their own handling of the blocking reload mechanism and byte-range requests that LL-HLS relies on to reduce request overhead.

The practical result is that a CDN serving both protocols simultaneously needs two different data transfer strategies running in parallel — accelerated downloading of small discrete files for LL-HLS, and continuous chunked delivery of incomplete files for LL-DASH.

What This Means for Edge Infrastructure

The shift to sub-3-second delivery has downstream effects on CDN edge architecture that go beyond simply handling more requests.

Connection duration changes fundamentally. In traditional CDN operation, a connection opens, delivers a cached file, and closes within milliseconds. With chunked transfer for LL-DASH, connections stay open for the duration of a segment — potentially several seconds — while bytes trickle through. This means edge servers carry higher concurrent connection counts and sustained CPU load even when aggregate bandwidth remains similar.

Caching logic needs to be rebuilt. Standard CDN caching mechanisms were not designed for files that do not yet exist in their final form. Serving a partially-written segment from cache while simultaneously appending new bytes from the origin requires purpose-built modules that standard web server configurations like Nginx’s proxy cache cannot handle out of the box.

Monitoring changes as well. When response time is measured in segment duration rather than file download speed, traditional performance dashboards become misleading. A 500ms response time on a LL-HLS manifest request is not a performance problem — it is the protocol working as designed, holding the connection until the next part is ready. Operations teams need monitoring that understands the difference.

Key-Frame Placement as a Latency Control

One of the less obvious but most impactful variables in low-latency streaming is key-frame frequency. In both LL-DASH and LL-HLS, playback can only begin from a key-frame. If key-frames are spaced two seconds apart, that sets a floor on achievable latency regardless of how fast the CDN delivers content.

Increasing key-frame frequency reduces latency but increases the bitrate required for the same visual quality, since key-frames are significantly larger than predicted frames. This creates a direct trade-off between latency and bandwidth efficiency that must be calibrated based on the specific use case — a sports betting application has different latency requirements than a concert livestream, and the key-frame strategy should reflect that.

Where This Is Heading

The industry consensus is moving toward sub-2-second HTTP delivery as the next target. Achieving this consistently at scale will likely require further innovation in edge processing, more aggressive use of RAM-based caching for micro-segments, and potentially new approaches to manifest generation that reduce the round-trip overhead inherent in the current request-response model.

For broadcast and streaming organisations evaluating their infrastructure roadmap, the immediate takeaway is that low-latency delivery is no longer an optional premium feature. It is becoming the baseline expectation for any live content where audience engagement depends on temporal proximity to the event — sports, news, gaming, interactive entertainment, and increasingly, enterprise applications like remote production and live commerce.

The CDN is no longer just a delivery layer. It is becoming an active participant in the streaming pipeline, and the architectural decisions made at the edge now directly determine the quality of experience at the glass.

Contact Us

We'd love to hear from you