Understanding Video Streaming & AWS Elemental

A visual, beginner-friendly guide to media workflows — from raw video to your screen. Step through each concept at your own pace.

📹 Capture

⚙️ Encode

📦 Package

🌐 Deliver

📺 Play

The Basics: What Is Video?

Before diving into AWS services, let's understand what video actually is at a technical level.

🖼️

Frames

Video is just a series of images (frames) shown rapidly. 30fps = 30 images per second. Your eyes perceive this as motion.

Common frame rates: 24fps (cinema), 30fps (TV/web), 60fps (sports/gaming). Higher = smoother but more data.

📐

Resolution

The number of pixels in each frame. More pixels = sharper image but bigger file.

SD (480p) = 720×480 pixels. HD (1080p) = 1920×1080. 4K = 3840×2160 — that's 4x the pixels of 1080p!

480p

345,600 px

720p

921,600 px

1080p

2,073,600 px

4K

8,294,400 px

⚡

Bitrate

How much data flows per second. Higher bitrate = better quality but needs more bandwidth. Measured in Mbps (megabits per second).

Think of it like a water pipe — wider pipe (higher bitrate) carries more detail. A 1080p stream typically needs 4-8 Mbps. 4K needs 15-25 Mbps.

2 Mbps

8 Mbps

25 Mbps

📦

Container vs Codec

A container (.mp4, .ts, .mkv) is the box that holds everything together. A codec (H.264, H.265) is how the video and audio inside are compressed.

One container can hold multiple streams: video, audio (multiple languages), subtitles, and metadata — all in one file.

.MP4 Container

H.264 Video

AAC Audio

Subtitles

🎯

Key takeaway: Video = lots of frames × resolution × bitrate = a LOT of data. The entire media pipeline exists to compress, package, and efficiently deliver this data to viewers.

Codecs: Squeezing Video Down

Raw video is massive — a single minute of uncompressed 1080p is about 10 GB. Codecs compress it by 100-1000x so it can travel over the internet.

🧠

How codecs work (simplified): Instead of storing every pixel of every frame, codecs store only what changes between frames. A talking head on a static background? The codec only sends the mouth movements, not the entire background repeatedly. This is called inter-frame compression.

🔑

I-Frames, P-Frames, B-Frames

I-frame (Keyframe): A complete picture. The "reset point." Larger but self-contained.

P-frame: Stores only changes from the previous frame. Much smaller.

B-frame: References both past AND future frames. Smallest but most complex.

IFull image

→

PDiff from I

→

BBi-directional

→

PDiff from B

→

IFull image

⏱️

GOP (Group of Pictures)

A GOP is the sequence from one I-frame to the next. Typical GOP = 2-4 seconds of video.

Shorter GOP = more I-frames = bigger file but easier to seek/cut. Longer GOP = better compression but harder to edit.

In live streaming, GOP size directly affects latency — the player must wait for the next I-frame to start displaying.

Legacy

H.264 / AVC

The workhorse. Supported on literally everything — phones, browsers, TVs, toasters. Good compression but not the best by today's standards.

Compression

Compatibility

CPU Cost

Current

H.265 / HEVC

50% better compression than H.264 at the same quality. The go-to for 4K content. Licensing is complicated (patent pools).

Compression

Compatibility

CPU Cost

Future

AV1

Open-source, royalty-free (backed by Google, Netflix, etc). Even better compression. Growing support, very high CPU cost to encode.

Compression

Compatibility

CPU Cost

💡

Why does this matter for AWS? When you set up MediaConvert or MediaLive, you choose a codec. Better codec = smaller files = lower CDN/storage costs. But encoding H.265/AV1 costs more compute. It's a trade-off between encoding cost and delivery cost.

Streaming: Getting Video to Viewers

You can't just send one giant file. Modern streaming breaks video into chunks and adapts quality in real-time.

🔀

Renditions & the ABR Ladder

The same video is encoded at multiple quality levels — each one is called a rendition. The full set is your ABR ladder (Adaptive Bitrate ladder).

A viewer on fast wifi gets 1080p; on a congested cellular connection they get 480p. The player switches dynamically.

4K (2160p)15-20 Mbps • H.265

1080p6-8 Mbps • H.264

720p3-4 Mbps • H.264

480p1.5-2 Mbps • H.264

360p0.5-0.8 Mbps • H.264

📊

ABR — Adaptive Bitrate Streaming

The player continuously measures your available bandwidth and buffer level, then switches between renditions seamlessly. This is why Netflix quality fluctuates when your connection dips.

How switching works: Each rendition is split into aligned segments (same duration, same keyframe boundaries). The player can jump between renditions at any segment boundary without glitches.

Available Bandwidth Selected Quality

📋

HLS (HTTP Live Streaming)

Apple's protocol, the most widely used. Splits video into small .ts or .m4s segments (2-6 sec each) with a .m3u8 playlist that tells the player what's available.

Uses a two-level manifest structure: master playlist (lists renditions) → child playlists (list segments for each rendition).

.m3u8 Master Playlist

→ 1080p.m3u8

→ 720p.m3u8

→ 480p.m3u8

🎬

DASH (Dynamic Adaptive Streaming)

Open standard (MPEG). Uses .mpd manifests (XML) and .m4s segments. Functionally similar to HLS but more flexible and used heavily on Android/smart TVs.

Single manifest file describes all AdaptationSets (video, audio) and Representations (renditions) in XML.

.mpd Manifest (XML)

seg_$Number$.m4s

🔗

CMAF (Common Media Application Format)

A unified format that works with BOTH HLS and DASH. Uses fragmented MP4 (.m4s) segments and can serve both protocols from the same encoded files.

Why it matters: Without CMAF, you'd encode everything twice — once for HLS (.ts) and once for DASH (.m4s). CMAF = encode once, serve both.

🔒

DRM (Digital Rights Management)

Encryption that prevents unauthorized copying. The video segments are encrypted; only authorized players with a license key can decrypt them.

Widevine (Google/Android/Chrome), FairPlay (Apple), PlayReady (Microsoft) — most services use all three to cover every device.

Manifests: The Playlist Files

Manifests are the "table of contents" that tell the video player everything it needs — what qualities exist, where the segments are, and when ads should play.

🗂️ Master Manifest (Multi-Variant Playlist)

The top-level file the player fetches first. It lists all available renditions with their bandwidth and resolution, so the player can pick the right one.

master.m3u8

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=2000000,RESOLUTION=1280x720,CODECS="avc1.64001f,mp4a.40.2"
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080,CODECS="avc1.640028,mp4a.40.2"
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=854x480,CODECS="avc1.4d401e,mp4a.40.2"
480p/playlist.m3u8

#EXTM3U File header — identifies this as an HLS playlist

#EXT-X-STREAM-INF Describes a rendition: bandwidth, resolution, codecs used

BANDWIDTH Peak bitrate in bits/sec — player uses this to decide which quality to pick

CODECS RFC 6381 codec string — tells player if it can decode this stream

📄 Child Manifest (Media Playlist)

One per rendition. Lists the actual video segments in order with their duration. For live streams, this file updates continuously as new segments become available.

720p/playlist.m3u8

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:6
#EXT-X-MEDIA-SEQUENCE:1
#EXTINF:6.006,
segment_001.ts
#EXTINF:6.006,
segment_002.ts
#EXTINF:5.839,
segment_003.ts
#EXT-X-ENDLIST

#EXT-X-TARGETDURATION Max segment duration in seconds — player uses this for buffering decisions

#EXT-X-MEDIA-SEQUENCE Sequence number of the first segment — critical for live streams to know position

#EXTINF Duration of the next segment in seconds

#EXT-X-ENDLIST Marks VOD (complete). Absent in live streams (playlist keeps growing)

🚨 SCTE-35: Ad Markers & Signaling

SCTE-35 is the industry standard for signaling ad breaks, program boundaries, and other events in video streams. These markers tell downstream systems "an ad break should go here."

In HLS manifests, SCTE-35 appears as special tags that MediaTailor (or any SSAI system) reads to know where to splice in ads.

Live manifest with SCTE-35 ad markers

#EXTINF:6.006,
segment_044.ts
<!-- Ad break starts here -->
#EXT-X-CUE-OUT:30.0
#EXTINF:6.006,
segment_045.ts
#EXT-X-CUE-OUT-CONT:ElapsedTime=6.006,Duration=30.0
#EXTINF:6.006,
segment_046.ts
<!-- ...more ad segments... -->
#EXT-X-CUE-IN
#EXTINF:6.006,
segment_050.ts

#EXT-X-CUE-OUT:30 Start of ad break — "splice out" for 30 seconds

#EXT-X-CUE-OUT-CONT Continuation — tells player we're still in the ad break, how much time has elapsed

#EXT-X-CUE-IN End of ad break — "splice back in" to main content

🏷️ Other Important HLS Tags

#EXT-X-KEY

DRM encryption info — method (AES-128, SAMPLE-AES), key URI, and IV. Tells the player how to decrypt segments.

#EXT-X-MAP

Points to the initialization segment (fMP4 header). Required for CMAF/fMP4 segments — contains codec config, not video data.

#EXT-X-DISCONTINUITY

Signals a break in encoding parameters (codec, resolution change). Common at ad boundaries where the ad is encoded differently than content.

#EXT-X-PROGRAM-DATE-TIME

Wall-clock timestamp for a segment. Used for DVR/time-shift — lets the player show "10 minutes ago" labels.

#EXT-X-BYTERANGE

Multiple segments packed in one file — this tag says "read bytes X to Y." Reduces HTTP requests.

#EXT-X-DATERANGE

Metadata tied to a time range. Used for SCTE-35 in newer HLS specs, timed metadata events, and interstitials.

🔧

Want to inspect a real manifest? Use the Manifest Viewer tool to paste any HLS manifest and get a detailed breakdown of every tag, segment, and ad marker.

AWS Elemental Services

AWS provides purpose-built services for each step of the video pipeline. Here's what each one does.

MediaLive

Live video encoding in the cloud

Takes a live video input (camera, RTMP/RTP/HLS/MediaConnect) and encodes it in real-time into multiple renditions with ABR. Supports dual-pipeline for redundancy.

Use case: Live sports, live events, 24/7 linear channels

MediaConvert

File-based video transcoding

Serverless batch transcoding. Takes a source file from S3 and outputs multiple renditions in various formats. Pay per minute of video processed.

Use case: VOD platforms, converting uploads to HLS/DASH for streaming

MediaPackage

Just-in-time packaging & origination

Receives encoded video and packages it into HLS, DASH, CMAF on-the-fly. Adds DRM, creates manifests, enables DVR/catchup-TV. Acts as the origin for your CDN.

Use case: Multi-format delivery, DRM, time-shift TV, live-to-VOD

MediaConnect

Reliable video transport

Broadcast-quality live video transport. Moves streams between AWS regions, to/from on-prem, with protocols like SRT, Zixi, RIST for error correction over the public internet.

Use case: Contribution feeds, remote production, inter-region transport

MediaTailor

Server-side ad insertion & channel assembly

Reads SCTE-35 markers in manifests, fetches ads from your ad server (VAST/VMAP), and stitches them into the stream server-side. Also builds virtual linear channels from VOD assets.

Use case: Monetization, FAST channels, personalized ad delivery

Workflows: Putting It All Together

Now let's see how these services combine into real workflows, from simple to complex.

Beginner

Simple VOD (Video on Demand)

Upload a file, transcode it, deliver via CDN. The most basic streaming setup.

📁

S3 Upload

Source video file

→

⚙️
MediaConvert
Transcode to HLS

→

📦

S3 Output

HLS segments + manifest

→

🌐

CloudFront

CDN delivery

→

📺

Player

Viewer watches

How it works: A video file lands in S3. An EventBridge rule triggers a MediaConvert job that encodes it into multiple renditions as HLS (master + child manifests + segments). Output goes back to S3, and CloudFront distributes globally. Player fetches the master manifest, picks a rendition, and streams segments.

Intermediate

Basic Live Streaming

Take a live camera feed and stream it to thousands of viewers in real-time.

📹

Camera / Encoder

RTMP/SRT input

→

🔴
MediaLive
Live encode to ABR

→

📦
MediaPackage
Package HLS/DASH

→

🌐

CloudFront

Global CDN

→

📺

Viewers

Watch live

How it works: A live feed arrives at MediaLive via RTMP or SRT. MediaLive encodes it in real-time into an ABR ladder and pushes to MediaPackage. MediaPackage generates live manifests (constantly updated with new segments), adds DRM if configured, and serves as the origin. CloudFront caches edge-close to viewers.

Advanced

Resilient Live with Transport, DRM & Ads

Broadcast-grade live streaming with reliable transport, redundancy, content protection, and monetization.

📹

Remote Venue

Live camera

→

🔗
MediaConnect
SRT transport

→

🔴
MediaLive
Dual pipeline + SCTE-35

↓

📦
MediaPackage
DRM + DVR + manifests

→

🎯
MediaTailor
SSAI (reads SCTE-35)

→

🌐

CloudFront

CDN

→

📺

Viewers

Personalized ads

How it works: Live feed → MediaConnect provides reliable transport with FEC error correction over public internet. MediaLive encodes with dual pipelines (automatic failover) and passes through SCTE-35 ad markers. MediaPackage packages with Widevine/FairPlay DRM and enables DVR window. MediaTailor reads SCTE-35 cue-out/cue-in markers, fetches personalized ads from your VAST ad server, and stitches them in seamlessly per viewer.

Expert

Full Hybrid: Live + VOD + FAST Channels

Complete media platform with live events, VOD library, and linear FAST channels — all monetized.

How it works: This is a complete OTT platform. Live → MediaConnect → MediaLive. VOD → MediaConvert. Both paths converge at MediaPackage (DRM + unified manifests). MediaTailor inserts ads AND assembles FAST (Free Ad-Supported Television) channels by stitching VOD assets + live feeds into a 24/7 linear schedule. CloudFront delivers to every device type globally.

Audio: The Other Half of Video

Every video stream has audio — and audio has its own codecs, channel layouts, and standards that matter just as much as the visual side.

🔊

Audio Codecs

AAC — The default. Great quality at low bitrates (128-256 kbps). Universal support. Used in HLS, MP4, DASH.

Dolby Digital (AC-3) — 5.1 surround sound. Standard for broadcast TV and Blu-ray. 384-640 kbps.

Dolby Digital Plus (EC-3) — Improved AC-3 with higher efficiency. Supports 7.1 and object-based audio (Atmos). Used on Netflix, Disney+.

Dolby Atmos — Object-based audio embedded in EC-3. Sound can be placed in 3D space. Requires compatible soundbar/headphones.

Opus — Open-source, royalty-free. Excellent at all bitrates. Used in WebRTC and some DASH streams.

🎚️

Sample Rate & Bit Depth

Sample rate = how many times per second audio is measured. 48 kHz is standard for video (48,000 samples/sec). CD uses 44.1 kHz.

Bit depth = precision of each sample. 16-bit = CD quality (96 dB dynamic range). 24-bit = professional/studio (144 dB range).

More samples × more bits = bigger file. But after encoding with AAC/AC-3, the final bitrate matters more than the source format.

📡

Channel Layouts

Mono (1.0) — Single channel. Used for voice/commentary tracks.

Stereo (2.0) — Left + Right. The baseline for streaming.

5.1 Surround — Front L/C/R + Rear L/R + Subwoofer. Standard for cinema/broadcast.

7.1 — Adds side speakers. Premium home theater.

Atmos (object-based) — Not channels but "objects" with 3D positions. Renderer maps to whatever speakers you have.

📏

Loudness Standards

Without standards, ads would blast at max volume while content is quiet. Loudness normalization fixes this.

EBU R128 — European broadcast standard. Target: -23 LUFS (Loudness Units relative to Full Scale).

ATSC A/85 — US broadcast standard. Target: -24 LKFS (same unit, different name).

Why it matters: MediaLive has audio normalization filters. If your output fails loudness specs, broadcasters will reject it.

💡

AWS context: MediaLive and MediaConvert let you select audio codec, channel layout, bitrate, and loudness normalization. MediaPackage passes audio tracks through and can offer multiple audio renditions (languages, accessibility descriptions) via #EXT-X-MEDIA tags in HLS.

Latency: Why Is My Stream Behind?

The delay between something happening live and the viewer seeing it. Shorter latency = harder engineering problem.

⏱️

Latency Breakdown

Total glass-to-glass latency is the sum of every step in the pipeline:

Capture & Encode1-4 sec (GOP + encoding buffer)

Ingest/Transport0.5-2 sec (network + protocol overhead)

Packaging1-6 sec (segment duration)

CDN Propagation0.5-2 sec (origin → edge cache)

Player Buffer2-10 sec (safety buffer for smooth playback)

Typical total: Normal HLS = 20-40 sec. Low-Latency HLS = 3-5 sec. WebRTC = <1 sec.

🐢

Normal Latency (20-40s)

Standard HLS/DASH with 6-second segments. The player buffers 3-5 segments before starting.

Pros: Rock-solid reliability, works everywhere, CDN-friendly, tolerates network jitter.

Use case: VOD, linear TV, non-interactive streams where delay doesn't matter.

🐇

Low Latency (3-5s)

LL-HLS and LL-DASH use partial segments (0.3-1 sec chunks) delivered before the full segment is complete. Player starts earlier with less buffer.

Key tech: Chunked Transfer Encoding, #EXT-X-PART tags, blocking playlist reload, preload hints.

Use case: Live sports, auctions, watch parties — where a few seconds delay is acceptable.

⚡

Ultra-Low / Real-Time (<1s)

WebRTC — Peer-to-peer, sub-second latency. No segments, no manifests. Direct media delivery. Doesn't scale easily via CDN.

SRT/RIST — Used for contribution (camera → cloud) not distribution. UDP-based with error correction.

Use case: Video calls, interactive live (betting, gaming), remote production.

⚖️

The trade-off: Lower latency = smaller buffers = more rebuffering risk. You're trading reliability for speed. Most production systems use normal or low-latency HLS — ultra-low is only for truly interactive use cases.

Transport Protocols: Moving Video Around

Different protocols for different stages. Contribution (getting video to the cloud) uses different tech than distribution (sending to viewers).

🗺️

Contribution vs Distribution

Contribution = getting the raw/mezzanine video from the source (camera, venue) to the encoder/cloud. Needs reliability, not scale. Point-to-point.

Distribution = delivering the final stream from origin/CDN to millions of viewers. Needs scale, not point-to-point reliability. HTTP-based.

📹 Source → CloudRTMP, SRT, RIST, RTP (Contribution)

☁️ Cloud → ViewersHLS, DASH over HTTP/CDN (Distribution)

📺

RTMP (Real-Time Messaging Protocol)

Adobe's protocol from 2002. TCP-based. Still the most common way to push a live stream from an encoder (OBS, Wirecast) to a service.

Pros: Universal encoder support, simple push model.

Cons: TCP = head-of-line blocking under packet loss. Limited to H.264 + AAC. No built-in encryption. Being phased out for ingest but still dominant.

🔐

SRT (Secure Reliable Transport)

Open-source, UDP-based protocol designed by Haivision. Handles packet loss with ARQ (retransmission). AES-128/256 encryption built in.

Pros: Works over unpredictable internet, encrypted, low overhead, supports H.265.

Cons: Newer — not as universally supported as RTMP in legacy gear.

AWS: MediaConnect and MediaLive both accept SRT input. This is the preferred contribution protocol.

🔄

RIST (Reliable Internet Stream Transport)

Industry standard (VSF/SMPTE) competing with SRT. Also UDP + ARQ retransmission. Interoperable between vendors by design.

Pros: Standards-body backed, multi-vendor interop, profile-based (Simple, Main, Advanced).

Cons: Less community adoption than SRT, more complex profiles.

📡

RTP/RTSP

RTP (Real-time Transport Protocol) — bare UDP packets with sequence numbers. Used in professional broadcast (SDI over IP / SMPTE 2110).

RTSP — Control protocol that manages RTP streams (play, pause, seek). Used by IP cameras.

MediaLive accepts RTP input for professional contribution feeds.

🌐

WebRTC

Browser-native real-time protocol. Sub-second latency. Uses SRTP (encrypted RTP) + ICE/STUN/TURN for NAT traversal.

Use case: Video conferencing, interactive live streaming. Not a contribution protocol for broadcast — it's end-to-end.

Limitation: Doesn't scale via CDN easily. Each viewer is a peer connection. Solutions like Amazon IVS use WebRTC for low-latency at scale.

Captions & Subtitles

Text tracks are not optional — they're required by law in many contexts (FCC, ADA, EAA). Here's how they work in streaming.

📝

Captions vs Subtitles

Subtitles = translation of dialogue for viewers who don't speak the language. Assumes you can hear.

Closed Captions (CC) = transcription of ALL audio: dialogue, sound effects, music cues ("[door slams]"). For deaf/hard-of-hearing viewers.

Open captions = burned into the video pixels permanently. Can't be turned off.

Closed captions = separate data track. Viewer toggles on/off. This is what streaming uses.

📋

Caption Formats

CEA-608/708 — US broadcast standard. Embedded in the video stream (in SEI NAL units for H.264). Carried through .ts segments. Legacy but required for US broadcast.

WebVTT — Web standard. Plain text file with timestamps. Used in HLS as sidecar files. Clean, simple, widely supported.

TTML / IMSC — XML-based. Used in DASH and for interchange. Supports rich styling, positioning, regions. IMSC is the profile for streaming.

SRT (SubRip) — Simple text format. Common for file exchange but not used directly in streaming protocols.

🔀

Embedded vs Sidecar

Embedded — Captions inside the video stream itself (CEA-608/708 in .ts segments). Player extracts and renders them. No extra HTTP requests.

Sidecar — Captions in separate files referenced by the manifest. Player fetches alongside video. More flexible (add languages without re-encoding).

In HLS, sidecar captions use #EXT-X-MEDIA:TYPE=SUBTITLES in the master manifest, pointing to a .m3u8 with .vtt segment files.

⚙️

Captions in the AWS Pipeline

MediaLive — Passthrough embedded 608/708, convert between formats, or burn in for preview outputs.

MediaConvert — Extracts embedded captions, converts to WebVTT/TTML sidecar, or burns in. Supports SCC, SRT, STL input.

MediaPackage — Passes through captions. Sidecar WebVTT tracks appear in HLS manifests as separate renditions.

Encoding Quality & Decisions

Choosing the right encoding settings is the difference between wasting bandwidth and delivering sharp video.

📊

CBR vs VBR vs QVBR

CBR (Constant Bitrate) — Same bitrate every second. Wastes bits on static scenes, starves complex scenes. Predictable file size. Used in broadcast.

VBR (Variable Bitrate) — More bits for complex scenes, fewer for simple ones. Better quality-per-bit but unpredictable size.

QVBR (Quality-Defined VBR) — AWS innovation. Set a quality level (1-10) + max bitrate ceiling. Encoder uses only what's needed. Best of both worlds.

🎯

Quality Metrics

VMAF — Netflix's perceptual quality metric. Score 0-100. 93+ is excellent. Industry standard.

PSNR — Mathematical pixel comparison. Higher = closer to original. Fast but doesn't always match human perception.

SSIM — Measures structural information loss. Better correlation with human eyes than PSNR. Score 0-1.

In practice: Use VMAF for final quality decisions. Target 93+ for premium, 85+ for mobile.

🪜

Per-Title Encoding

Animated shows need less bitrate than live sports at the same resolution. Smart encoding adapts the ladder per content.

Static ladder: Same bitrates for everything. Simple but wasteful.

Per-title: Analyze complexity first, then pick optimal bitrate per rendition. A cartoon might need 3 Mbps at 1080p while sports needs 10 Mbps.

AWS QVBR is content-adaptive — it adjusts bitrate based on scene complexity within your quality target.

🏗️

Building an ABR Ladder

• Each rendition should be perceptually different from adjacent rungs.

• Lowest rung = watchable on a phone (360p @ 0.5 Mbps).

• Highest rung = match source quality — never upscale.

• Spacing = ~1.5-2x bitrate between rungs for smooth switching.

• Include an audio-only fallback for extremely poor connections.

CDN & Delivery

A CDN caches your video at edge locations worldwide so viewers get content from nearby servers, not your origin.

🌐

How CloudFront Works for Video

CloudFront has 400+ edge locations. Viewers get segments from the nearest edge. On cache miss, it fetches from origin once and caches for subsequent requests.

Key cache behaviors for video:

• Segments (.ts, .m4s) — Cache aggressively (high TTL). Same segment serves millions of viewers.

• Manifests (.m3u8, .mpd) — Brief cache for live (1-3s TTL), long for VOD. Live manifests update every segment duration.

• Personalized manifests (SSAI) — Never cache. Each viewer gets different ad URLs stitched in.

🛡️

Origin Shield

Extra caching layer between edge locations and origin. All edges in a region check the shield first.

Without: 50 edges each ask origin = 50 requests per cache miss.

With: 50 edges ask shield, shield asks origin once = 1 request. Massive origin load reduction.

⏳

TTL Strategies

VOD segments: 1 year. They never change.

Live segments: = segment duration (6s). Immutable once created.

Live manifests: 1 second or half segment duration. Must refresh frequently.

VOD manifests: 1 day+. Content is static.

🔀

Multi-CDN

Large platforms use multiple CDN providers simultaneously:

• Active-active: Route to fastest CDN via DNS or client logic.

• Failover: Shift traffic if one CDN degrades.

• Cost optimization: Route by pricing tier per region.

Players like hls.js support mid-stream CDN switching on segment failures.

Ad Tech & Monetization

How ads get into video streams — the protocols, the players, and the trade-offs between client-side and server-side insertion.

📄

VAST (Video Ad Serving Template)

XML response that describes a single ad: what creative to play, its duration, tracking pixels, click-through URL.

MediaTailor calls the ad server, gets VAST XML back, extracts the video URL, and stitches that video into the stream at the ad break point.

Contains: MediaFile URL, duration, impression trackers, click URL, companion ads.

🗓️

VMAP (Video Multiple Ad Playlist)

Wraps around VAST. Defines WHEN ad breaks should happen for VOD content that has no embedded SCTE-35 markers.

timeOffset="start" = pre-roll. timeOffset="00:05:00" = mid-roll at 5 min. timeOffset="end" = post-roll.

Each break points to a VAST URL for the actual ad creative. MediaTailor supports both VAST and VMAP.

🖥️

CSAI (Client-Side Ad Insertion)

The player itself fetches ads from an ad server and plays them locally. Traditional web/mobile approach.

Pros: Mature ecosystem, interactive overlays, companion ads, viewability measurement.

Cons: Ad blockers defeat it entirely. Quality/resolution mismatch between ad and content. Buffering at ad transitions. Detectable by the client.

☁️

SSAI (Server-Side Ad Insertion)

Ads are stitched into the video stream on the server. The player sees one continuous stream — it can't distinguish ads from content.

Pros: Ad-blocker proof, broadcast-quality transitions, works on all devices (smart TVs, Roku), consistent quality.

Cons: No client-side interactivity, harder viewability measurement, server cost. This is what MediaTailor does.

📦

Ad Pods & Breaks

Ad pod = multiple ads played in sequence during a single break (like a TV commercial break).

Competitive separation — Don't show two competing brands (Coke then Pepsi) in the same pod.

Frequency capping — Limit how many times one viewer sees the same ad per session/day.

Fill rate — What percentage of ad breaks actually get filled with ads (vs showing slate). Target: 90%+.

📺

FAST Channels

Free Ad-Supported Streaming Television — Virtual linear channels assembled from VOD content, monetized with ads. Like traditional TV but over streaming.

How it works: MediaTailor Channel Assembly takes VOD assets, arranges them into a 24/7 schedule, inserts SCTE-35 markers at break points, and serves a live HLS manifest. Viewers tune in and see a "live" channel.

Examples: Pluto TV, Tubi, Samsung TV+, Amazon Freevee channels.

Video Players: The Last Mile

The player is the most complex piece of the puzzle on the client side. It handles manifest parsing, ABR decisions, buffering, DRM, and rendering.

🔄

How a Video Player Works

Every stream playback follows this loop:

1. Fetch master manifestDiscover available renditions

2. Select renditionBased on bandwidth estimate

3. Fetch child manifestGet segment URLs for that quality

4. Download segmentsFill buffer ahead of playback position

5. Decode & renderFeed to Media Source Extensions (MSE)

This loop repeats continuously. Every segment download, the player re-evaluates bandwidth and may switch renditions.

📈

Buffer Management

Why buffering happens: The player downloads segments faster than playback consumes them (building a buffer). If download speed drops below playback speed, the buffer drains and playback stalls.

Forward buffer = seconds of video downloaded but not yet played. Typical: 30s for VOD, 10s for live.

ABR logic — If buffer is low, switch DOWN to a lower rendition (downloads faster). If buffer is full, switch UP for better quality.

Rebuffering ratio = time spent buffering / total playback time. Target: <0.5%.

📦

Open-Source Players

hls.js — JavaScript HLS player. Runs in any browser via MSE. The most popular choice for web HLS playback. Handles ABR, subtitles, DRM (via EME).

dash.js — Reference DASH player by the DASH Industry Forum. Full DASH spec support including low-latency modes.

Shaka Player — Google's player. Supports BOTH HLS and DASH, plus offline/download. Used by YouTube under the hood.

Video.js — Player UI framework. Wraps hls.js/dash.js with a consistent UI, plugin system, and analytics hooks.

🔐

DRM in the Player

Players use EME (Encrypted Media Extensions) — a browser API that talks to the device's DRM module (CDM).

Flow: Player sees EXT-X-KEY or ContentProtection → requests a license from the license server → CDM decrypts segments in a secure sandbox → decoded frames go to the screen.

The player code never sees decrypted content — the CDM handles it in hardware/trusted execution environment. This is why DRM works even on "open" platforms.

🎯

Testing tip: Open browser DevTools → Network tab while playing a stream. You'll see the manifest requests, segment downloads, and can observe ABR switching in real-time by watching segment URLs change rendition paths.