AVT Workbook Video Streaming Whitepaper

Workbook
MANAGER’S
Planning
for Video Streaming
What You
Really Need
to Know
Learning Objectives & Thought Leadership
Learn the key decision points and concepts,
and gain enough technical background information
PLANNING FOR VIDEO STREAMING | AVTechnology Manager’s Workbook
to make informed decisions during the specifying

and integrating streaming media solutions process.
A familiarity with IP networks and
video technology is helpful but not required.
p Audience Requirements
p Real Time or On Demand?
p Latency Tolerance
p Key Content Attributes
p Streaming Technology Components
p Video Compression
p Network Transit, and much more

Perspective s
MANAGER’S
Workbook
EDITORIAL
Myriad consumer electronics devices—from
Margot Douaihy
Editorial Director tablets to PCs and watches—are driving
Cindy Davis
Contributing Editor
the expectation that video can and should
SALES & MARKETING
be accessed anywhere, any
Sue Horwitz
Publisher time, and the quality should
be outstanding. This brand-
323-933-9485
sue@audientmarketing.com
ART & PRODUCTION and product-agnostic

Nicole Cobban
Senior Art Director workbook is the only one
of its kind to help enterprise
Walter Makarucha
Jr. Associate Art Director
Cindy Davis,
NEWBAY MEDIA AV and IT technology manag- Editorial Contributor,
CORPORATE AV Technology magazine
Steve Palm
President & CEO
ers, integrators, and consul-
Anthony Savona tants set expectations and devise a video
VP of Content & Marketing
streaming plan through identifying
NEWBAY AV/CONSUMER
ELECTRONICS GROUP
Adam Goldstein
application and content attributes, choos-
Executive VP/Group Publisher
ing the appropriate streaming technology
Published by
NewBay Media L.L.C.
components, and understanding network
28 East 28th Street, 12th Floor
New York, NY 10016 transit to deliver purpose-built media
Tel: 212-378-0400
Web: nbmedia.com streaming systems. You don’t need to
PRINTED IN THE USA. have a technical background; this guide
provides you a solid blueprint.
Thought Leader Sponsor
The content in this Workbook is a collaborative effort between the

editors of AV Technology magazine, its Thought Leader Sponsors, and
editorial contributors. All contributed content has been edited and ap-
proved by AVT editors to ensure that it is first and foremost informative
and educational, as well as provides inspiration.
This is a marketing free zone.
The content and images in this Workbook are owned by its authors and/or
respective companies and may not be reprinted or reproduced in any way.
avnetwork.com Planning for Video Streaming | AVTechnology Manager’s Workbook 3

Content
5 Getting Started: THE Workbook You Need 14
MANAGER’S
Workbook
Streaming Technology Components
Learn the key decision points and concepts, and gain enough 14 | Video Compression | Lossless vs Lossy
technical background information to make informed decisions Compression 14 | Compression and Codec Terminology |
during the specifying and integrating streaming media solutions Group of Pictures (GOP) 15 | Intra-frame Codecs | M-JPEG
process. | JPEG2000 | Proprietary Codecs 15 | Inter-frame Codecs
| MPEG Codecs | Inter-frame Compression | H.264
17 | Containers | MPEG Transport Stream | Audio Codecs
18 | Session Controls | Progressive Download | Real
6 Application Attributes Time Streaming Protocol (RTSP) | Session Announcement
7 Real Time or On Demand? | Audience | Who is consuming Protocol (SAP) | HTTP Live Streaming (HLS)
the stream and why? | Where are they? | What are they 18 | Transport Protocols | RTP
viewing the stream on? | Do they have special requirements?
7 | Latency | Latency Tolerance | Sources of Latency
8 | Bandwidth 8 | Security 8 | Management
9 | Scalability 19 Network Transit
19 | TCP and UDP | TCP | UDP
19 | When to Use Which 20 | Unicast and Multicast
| Unicast | Multicast
10 Content Attributes
11 | Content Source | HDCP Content 11 | Interface | AV
Interface | USB interface | Software Interface | Direct Streaming
Devices 12 | Resolution and Rate | Video | Common Video 21 Mix and Match and Tradeoffs
Resolutions | Video Scaling | Audio 13 | Bandwidth 21 | General Attributes Prioritized | Session
Technologies 23 | Bandwidth Requirements
and Typical Distribution Latencies
Thought Leader
PAUL ZIELIE, CTS-D,I

Manager, Enterprise Solutions
for Harman Professional
Since joining AMX by Harman in the fall of 2013, his expertise in

the areas of solutions architecture, network video transport and Link to a
security, have been crucial to the development of AMX’s platform downloadable
roadmaps. video streaming
Zielie has over 30 years of experience designing and integrating workbook
IT, telecommunications and audiovisual (AV) solutions. Over the http://www.amx.com/
course of his career he has had most of the roles in the AV/IT automate/
spectrum including customer/end user, IT owner, integrator, multicast-for-
designer, managed service provider, distributor, presale specifier, enterprise-video-
executive, and now manufacturer. streaming.aspx
He is a prolific writer and speaker, and is the recipient of the
2015 InfoComm International Educator of the year.
4 | AVTechnology Manager’s Workbook | Planning for Video Streaming avnetwork.com

Getting Started p
THE
Workbook
You Need
for Video Streaming
Streaming audio and video content across a net-
work is a broad topic, from real time connections
replacing a traditional AM matrix switch to con-
ferencing to cat videos on YouTube with a huge
range of mix and match technologies in any given
solution. Choosing the right solution involves con-
sidering both the application and content attri-
butes and narrowing down the technologies that
do not support those attributes.
The goal of this document is to explain key
decision points and concepts and give enough
technical background information to allow the au-
dience to make informed decisions in specifying
and integrating streaming media solutions. This
document will discuss networks only at a high
level and when it impacts the decision process. A
familiarity with IP networks and video technology
is helpful but not required.

Application Attributes p
Application
Attributes
Real Time or On Demand? . Audience .
Latency . Bandwidth . Security .
Management . Scalability
Attributes Description
Real Time or Real time streaming is consumed as it is produced.
On Demand On Demand streaming is recorded and each user consumes the video at their conve-
or Both nience.
Audience Who is consuming the stream and why?

Where are they?
What are they viewing the stream on?
Do they have special requirements?
Latency Important in real time streaming.
The allowable delay between the time the AV signal leaves the source and it is reproduced
on the output (display and/or speakers).
Bandwidth The number of bits per second available on the network for content distribution.
Security If and how the content is protected for unauthorized users.
Management Logistical tasks associated with distributing the content.
Scalability The ability of a technology or system to accommodate a larger number of endpoints or

more content.
»»Maximum number of endpoints or amount of accessible content within a single installa-
tion for a given technology.
The possibility or ease of adding endpoints or accessible content storage in a given
installation.

REAL TIME OR ON DEMAND? On the same campus. The content is available with very few network
The first step in characterizing the streaming application is deter- restrictions because source and content are on the same Local Area
mining if the application is real time (live) or on demand. Network (LAN)
In the same enterprise. The source and audience may be at different
Live streaming technologies are appropriate for time sensitive appli- locations, but content is streamed across a network completely un-
cations such as, live lectures or television transport, or ones where der the control of the enterprise.
interaction with the audience is required. Any production, such as On the Internet. At least part of the network is completely out of the
switching between content sources or subtitles will have to be done control of the organization. Available network bandwidth may be
within the integrated video capture system. unknown and devices such as firewalls may need to be traversed.
On demand video allows for more sophisticated production and
editing with only the level of effort you are willing to put into it What are they viewing the stream on?
limiting the final product. It is appropriate when the content is not You can no longer assume that streaming is being consumed on set
time sensitive or should be available over time. On demand video top boxes or computers. The audience may want to consume the
allows for individual controls such as pause and rewind, since each video on phones, tablets, video game systems, or any combination
stream is individually sent to the consumer. of operating systems and player software. Consideration must be
A hybrid approach where a live event is streamed and also captured made based on the audience what types of playback devices will be
for on demand availability is very common. Often these hybrid supported.
systems have limited editing capabilities to trim the start and stop
times and add bookmarks. Otherwise, the captured video can be Do they have special requirements?
exported to a video editing program for more complete editing. If Are there any special requirements required in order to meet the
this is an application requirement, then the workflow involved in goals of the streaming system? Some possible special requirements
getting the captured content to be available on demand should be include:
explored to make sure it meets the requirements. »»Keyword searches to find sections of the content
»»Closed captioning
AUDIENCE »»Verification or reports that the content has been viewed
The purpose of streaming content is to deliver information to the »»Ability to view multiple simultaneous content streams (pre-
audience. It is important to characterize the audience in order to senter camera and content)
make sure the content reaches them and has the required properties
for them to effectively consume it. LATENCY
Tolerance for delay or latency is perhaps the least understood crit-
Who is consuming the stream and why? ical attribute and is among the hardest to quantify. There are large
It is important to understand the size of the audience in terms of differences in latency tolerance for different use cases ranging from
simultaneous users in order to properly choose from the various almost no latency to several seconds (or minutes).
technologies available. Other important considerations include:
»»Understanding the requirements for quality based on the Latency Tolerance
audience requirements. The quality acceptable for a cute cat The human brain is very adaptable to delay and stitching togeth-
video may not be acceptable for something that requires close er perceptions that do not arrive either completely or simultane-
examination like an advanced math class. ously, but there are limitations that eventually cause the brain to
»»The length of time they are viewing. Poor quality audio and perceive the interactions as unrelated. This may not immediately
video is taxing to watch. Longer content generally requires be perceived on a conscious level, but is exhibited in additional
higher quality. If the content is on demand and the users fatigue and dissatisfaction with the application without necessarily
want to consume part of the content and continue later, per- knowing why. At some point, the delay becomes so high that the
haps on another device, it needs to be considered before the application is unusable. The study of these and related phenomena
system is chosen. is called Psychophysics.
Latency considerations are typically only a concern in real time
Where are they? streaming applications, especially those that involve interaction
This is a network consideration. It is important to know how the with technology or between people on opposite sides of a stream-
audience is accessing the content relative to the source. Possibilities ing link. There is considerable interest in moving AV transport from
include: traditional technologies such as HDMI cables and video switchers
to the network. If the content is streamed within an environment
In the same room. The content is streamed as an alternative to tra- that contains both the source and reproduction, the latency require-
ditional AV distribution. Examples include; streaming the presen- ments can be quite strict. Imagine the frustration of trying to ma-
tation computer to a shared display or streaming content such as nipulate a mouse while the display has a 3-5 second delay, common
a video camera, attached microscope, to a display at each student’s in some streaming products.
workspace

Sources of Latency Quality of Service (QoS) to prioritize the video traffic.
There are many causes of delay along a streaming signal path so la- Some network organizations will require you to request band-
tency has to be treated holistically. The thresholds described before width, rather than tell you what they have available. In the case of
contain the entire latency budget for the signal path. A little known video on demand or unicast real time video, the number will be
fact is that many modern displays introduce 100ms or more of la- calculated by multiplying the peak simultaneous users by the band-
tency because of “video enhancement.” The first community that width of a single video stream. For multicast the bandwidth will be
took notice were video game players who’s scores dropped when the peak number of streams times the bandwidth of a single video
they got new monitors and protested. Many monitors now have a stream.
“game mode” which removes the high latency processing.
SECURITY
BANDWIDTH Beyond the typical application security requirements, like not let-
This attribute takes into account the available network bandwidth ting unauthorized people originate content, you have to consider
for the various network links that the media will need to traverse in if the content is sensitive and should be protected. In many cases
order to reach the audience. Most network traffic like email and web it may be open to the public, but in others you will want to limit
browsing is bursty and delay tolerant. This means that the traffic is who has access and has records of that access. In more extreme cas-
intermittent, and if it has to be delayed that’s okay. You probably es there are software packages that can watermark each individual
wouldn’t notice if an email showed up a second late. The nature of video stream so if it shows up in an unauthorized location, you can
this traffic allows the network to be oversubscribed, which means tell who originally obtained the content.
that as a shared resource the network could fail if everyone used
it to capacity simultaneously, but the chances of that happening is MANAGEMENT
very small and if it did happen momentarily, they would just buf- Many content management packages exist in the streaming market.
fer (store) the extra traffic until there was capacity. AV streaming is A content management package that is excellent for live IPTV may
continuous, and is not tolerant to variations in the amount of delay. not be well suited streaming video on demand even though it can
The network team specifying the available bandwidth will have to technically be done.
understand the bandwidth available for this type of traffic. This is Once the streaming application is understood and products are
especially important for Wide Area Networks (WANs) and the links considered it is important to look at the production workflow to
connected to the video servers on Local Area Networks (LANs) if make sure it fits your needs. If a critical attribute of your application
you are using unicast. It is possible that the network team will use requires a complex string of manual tasks, then you may be looking
Current Accepted Standards for Allowable Latency by Task

Limiting Standard Max Latency Reference
Perceptible Threshold 13ms Detecting Meaning in Rapid Pictures
Attention, Perception, & Psychophysics, 2014
Distributed Video with Live <22.5ms ITU BT.1359 (Maximum specification for audio
Audio (Lip Sync) to lead video)
Audio Reinforcement Acceptable <18ms ITU-T G.131, Talker echo and its control The
(Echo threshold for Marginal 18-30ms Influence of a Single Echo on the Audibility of
intelligibility) Speech, Helmut Haas
Live Audio <30ms ITU BT.1359 (Maximum specification for video

Distribution (Lip Sync) to lead audio)
Machine interaction <45ms Quantifying the Effects of Latency on Sensory
(Keyboard, mouse, Feedback in Distributed Virtual Environments
joystick, etc.) Caroline Jay & Roger Hubbold
In the Current Accepted Video Games Excellent <50ms QoS Requirements of Multimedia Applications
Standards for Latency by (Counterstrike) Diminished>100ms Brett Berliner, Brian Clark and Albert Hartono
Task table, It is interesting
to note that many of these
Human interaction Acceptable <150ms ITU G.114, One-way transmission time
numbers are much lower
(Audio / Video Conference) Marginal 150-400ms (May require echo Cancellation per
than they were ten years ITU-T G.131)
ago as new information
and measurement Real Time, No interaction 2s-5+s N/A
techniques have been On Demand No Limit N/A
discovered.

Application Attributes s
1: Input Buffer 4: Congestion 6: Decoding Buffer 8: Display Lag

Queuing
2: Compression 5: Distance 7: Decompression
3: FIFO
at the wrong package. load, off-network traffic to a service provider. If an organization is

typically using the system internally, but has content that reaches
SCALABILITY out to an Internet audience, it may be more economical to own the
Scalability is concerned with the ability of the system to accom- internal streaming assets and outsource the external connections.
modate a larger number of endpoints or more content (storage or This is done by uploading the content, either real time or as on
simultaneous streams). Typically, the originating device will have a demand files to the service provider who then handles the authenti-
limited number of simultaneous streams and may be augmented cation, if required, and streaming. This is especially attractive when
by a streaming server which will allow a larger number of simulta- there is a large amount of internal content which would be down-
neous viewers. loaded back into the network if the application were completely
It is very common that media streaming applications are installed hosted by a service provider. Some packages and technologies may
as a pilot without a known final use case or size. If this is the case be more compatible with a chosen service provider. AVT
then it is important to examine the ease of which additional capacity
may be added without losing portions of the original investment.
Another potential scalability consideration is the ability to off-
Allowable Latency Standards Applied to Use Case and Characteristics

Limiting Maximum
Use Case Characteristics Standard Latency
Conference Room No sound reinforcement, manipulation of user Machine <45ms
Small Classroom content while observing display (Keyboard, interaction
Mouse)
Large Classroom Reinforcement of presenter voice, manipula- Audio Rein- <18-30ms
tion of user content while observing display forcement
Auditorium, IMAG Video of presenter or performer magnified for Distributed <22.5ms

(Image Magnification) view-ability, non-delayed audio Video (Lip
Sync)
Auditorium, Sound is digitized for transmission to loud- Audio Rein- <18-30ms
Live Sound speakers forcement
reinforcement Live Audio
Distribution This table has taken
the allowed latencies
Room Overflow Presenter in primary room interacts with Human <45ms from the Current
(with Q&A) remote participants interaction
Accepted Standards
for Latency by Task
Room Overflow Remote participants only watch presentation. Real Time, No 2s-5+s table, and applied
(watch only) Questions, if any are sent by email or chat, interaction them to the applicable
standard.

Content Attributes p
Content
Attributes
Content Source . Interface .
Resolution and Rate . Bandwidth
Attributes Description
Content Source What is generating the content?
Are there more than one content sources?
Do multiple sources need to be streamed simultaneously?
Interface The physical interface and of the content source.
Resolution Video
(Source, Transmitted The number of pixels (the smallest element of an image) in the video.
and Reproduced) »»Generally specified as Width X Height
Audio
»»The number of bits of information in each audio sample
»»Higher bit depth result in higher maximum signal to noise ratio (S/N)
Rate Video
Frame Rate
»»The number of frames (complete images) per second Generally specified as frames
per second (fps)
Audio
Sample rate
»»The number of times per second an audio signal is sampled (measured for its
instantaneous value)
»»Higher sample rates allow for reproduction of higher frequencies
Bandwidth The number of bits per second required to transmit the required AV signal at an
acceptable quality

CONTENT SOURCE your sources when specifying an encoder, or understand the inter-
The source(s) of the content will often help determine the type of faces on your encoder when specifying a source. For non-matching
encoders you may use. Typical content sources include computers, video interfaces, it is possible to get a media converter which will
cameras, and video playback devices. convert the signals.
If multiple content sources will be used for inputs at various Audio interfaces tend to be more standard, although they may
times for a single video stream, some sort of video switching must be analog or digital. Audio may also be brought in on the HDMI
be accounted for. If the switched sources have different resolutions video interface.
then a video scaler, a component which converts video signals from
one display resolution to another, will be required since changing USB interface
streaming resolution within a session is impractical. Video scalers Software based video encoders are often designed to use a native
can be external devices or can be built into the encoder. video interface within a computer such as a USB web camera. In-
If multiple content sources will be streamed simultaneously for use creasingly there are USB cameras with higher functionality such as
within a single application player software, which allows for that pan tilt zoom capabilities.
use case, a hardware player, often called a streaming multi-viewer, Often there may be a desire to include non-webcam content,
will be required. such as the AV interface from a computer. This may be accom-
plished through a USB capture device, but will require thorough
HDCP Content investigation to ensure compatibility. VGA and standard defini-
High-bandwidth Digital Content Protection (HDCP), is a form of tion capture devices general present few interoperability problems.
digital copy protection designed to prevent copying of digital audio Although there are several HDMI capture devices which interface
and video content as it travels between devices. These devices are HDMI inputs to the USB interface on the computer, they generally
typically a video source such as a Blu-ray player and a sink, such as are designed for recording and use proprietary software not compat-
a monitor. HDCP capabilities are often present in interfaces such as ible with streaming or web conferencing software.
HDMI, Display Port and DVI-D. Analog connections such as VGA If a USB capture device is desirable, it is best to make sure it
and component do not have HDCP capabilities. supports the USB video device class (also called USB video class
The owner of the content can encrypt the content and a HDCP or UVC). UVC 1.1 uses an uncompressed video format or MJPEG
compliant player will distribute the key to the content, to compli- to transfer the video from the origin to the computer. UVC 1.5 re-
ant receiving devices (sinks). This works well in the originally in- leased in 2012 also allows for H.264 or VP8 compression. The video
tended scenario of a single source connected to a single sink, like a stream is converted back to an uncompressed state in the computer
Blue-ray player connected directly to a monitor in your living room, and re-encoded within the software to the desired format, even if
but a video streaming encoder is not typically an HDCP compliant the desired compression matches the original format. UVC 1.5 is
sink. The exception to this are some systems
that certify a sink connection on a dedicated
streaming decoder. The stream is encrypted
Common Video Interfaces
and can only be decoded by a compliant
Max Compatibility
decoder from the same manufacturer as the Interface Signal Resolution with Adapter
encoder.
HDMI Digital 4K DVI-D, HDMI
INTERFACE DVI-D Digital 3,840 × 2,400 HDMI, DVI-I
There are many ways to get the content into
a video stream. One factor in determining DVI-I Both 1,920 × 1,200 VGA, DVI-D, HDMI
which method to use are the possible ways Display Port Digital 4K HDMI*, DVI-D*
to interface the content to the device which
will compress and stream the content. VGA Analog 1,920 × 1,200 DVI-I
Component Analog 576i No
AV Interface
The most common way that organizations S-Video Analog 576i No
have streamed AV content is the use of a Composite Analog 1080i No
streaming encoder, a standalone device
which uses the AV outputs of the content SDI (SMPTE 344M)** Digital 576p No
source as inputs. It then captures, compress- HD-SDI (SMPTE 292M)** Digital 10801 Backwards to previous SDI
es and streams the content.
There are many standard audio and video Dual Link HD-SDI (SMPTE 372M)** Digital 1080p Backwards to previous SDI
interfaces which are not interoperable, al- 3G-SDI (SMPTE 424M) Digital 1080p Backwards to previous SDI
though some are with simple passive adapt-
ers. It is important to understand the avail- *Only Display port Source is compatible
able video interfaces that can be used on ** SDI interfaces only support SMPTE resolutions and not all the standard computer resolutions

not universally supported, so if there is doubt, UVC 1.1 is most In the computer graphics world this array is called a raster, a term
likely to be supported by streaming software. that comes from the early days of CRT displays. Frame resolution is
traditionally expressed as width X height, although recently it has
Software Interface become common to express it in just width with the scan mode
In a simple application, which just requires content which is dis- indicated as a “p” or “i” and the frame rate. Aspect ratio is the ratio
played on a computer, software running on the computer may of width to height expressed as a ratio or a reduced decimal number.
capture, compress and stream the content. This software may be Common aspect ratios are 4:3 or 1.33:1 (often represented as 1.33)
installed and reside on the computer or may be a cloud-hosted and 16:9 or 1.78:1. Computer Graphics often use a 16:10 ratio.
application. Many cloud-based conferencing services have options There are two scan modes — progressive and interlaced. Progres-
which allow multiple participants with web cameras and shared sive-mode (p) displays each full frame. Interlace mode (i) displays
screens to capture and stream the conference. This type of appli- each frame as two fields — the first displaying all the odd horizontal
cation is likely to grow as computers continue to increase in pro- lines and the second displaying all the even lines.
cessing power and web interfaces, especially HTML5 become more It is important to reconcile the source rates and resolutions to the
sophisticated. streaming rates and resolutions which meet the audience require-
ments in order to optimize bandwidth (discussed later on). If the
Direct streaming devices audience is going to be viewing the content on a phone with 480p
Increasingly, there are cameras which have a streaming interface in- resolution, then streaming the source content at 1080p is a waste. If
corporated into the camera. These are primarily designed for the se- the audience is receiving PowerPoint slides that only change every
curity market and may lack the features required for a content-based couple of minutes then streaming at 60fps is a waste. On the other
streaming application, unless paired with production or manage- hand, if the streaming resolution or frame rate is too small to prop-
ment software, designed for this sort of capture. In the future, these erly convey the information, the needs of the audience are not met.
devices and the required management software may become more Matching the source and streaming resolution can be handled
common, but today they will typically only be applicable for the in two ways. The resolution of the source can be changed. This
simplest integrations. may not be possible or desirable, especially in situations where the
streamed source is also being presented to a live audience who can
RESOLUTION AND RATE benefit from the additional resolution. Or, the source can be scaled
Video to the desired resolution.
Digital video consists of a series of bitmap digital images displayed
in rapid succession at a constant rate. These individual images are Video Scaling
called frames. The terminology for the rate at which these frames A video scaler is a component which converts video signals from
are shown is frames per second (FPS). one resolution to another. Video scalers can be external devices, can
Each frame of digital video is composed of an array of pixels. be built into the encoder, can be a network service, or in on demand
situations, scaling can be done as a
Common Video Resolutions post-capture process in software to
re-encode the content at the desired
Video AR W H Computer AR W H resolution. The source may be scaled
Graphics multiple times in cases where there
QCIF 4:3 174 144
VGA 4:3 640 480 are multiple resolutions desired.
CIF 4:3 352 288 The original source content should
SVGA 4:3 800 600
SIF 4:3 352 240 be used as the master of each scaled
XGA 4:3 1024 768 output in order to avoid generational
4CIF 4:3 704 576 loss.
WXGA 4:3 1280 768
480i (NTSC) 4:3 720 480 Typically, better scaling results are
WXGA 4:3 1280 800 obtained when the source is at a high-
480p 4:3 720 480 er resolution than the stream. When
SXGA 5:3 1280 1024
576i 4:3 720 576 sources are upscaled (the displayed
SXGA+ 4:3 1400 1050 resolution is increased), the familiar
576p 4:3 720 576 blocky image with large pixels is pro-
WXGA+ 16:10 1440 900
720p (HDTV) 16:9 1280 720 duced or “information” is invented
UXGA 4:3 1600 1200 to make viewing less distracting, but
1080i (HDTV) 16:9 1920 1080 is not necessarily valid. Downscaling,
WSXGA+ 16:10 1680 1050
1080p (HDTV) 16:9 1920 1080 while fine detail may be lost, is typi-
WUXGA 16:10 1920 1200 cally less distracting and doesn’t give
2160p (UHDTV) 16:9 3840 2160 that “out of focus” impression.
QWXGA 16:9 2048 1152
4320p (UHDTV or 8k) 16:9 7280 4320

Content Attributes s
Audio Frequency Ranges
Human Voice 80 - 1,200Hz
Piano 27.5 - 4,186Hz
Orchestra Strings 30 - 16,000Hz
Telephone 200 - 3,500Hz
Video Conferencing 80 - 8,000Hz
CD Audio 20 - 22,000Hz
20Hz 200Hz 2,000Hz 20,000Hz
Pitch Color Presence
Audio This means that the highest frequency which can be reproduced
Audio, which is inherently analog, is converted to a digital represen- is half the sample rate. The full range of human hearing is be-
tation that can be transmitted through a network by a process with tween 20Hz and 20kHz. Human speech is in the range of 300Hz
the easy to remember name, analog to digital conversion (ADC). to 3,500Hz. Common sampling frequencies run from 8,000Hz for
In ADC, the audio signal is sampled (measured) at a regular inter- telephone calls to 48,000Hz for professional audio recordings.
val. The resulting set of samples is a Pulse Code Modulated (PCM) PCM audio can be streamed uncompressed or it can be further
signal. compressed before streaming.
PCM has two attributes which determine the quality of the sam- If the audio source is in the form of a digital PCM stream that has
pled audio-bit depth and sampling rate. Bit depth, the number of a different sample rate from the desired stream, that conversion is
bits used to quantify the signal, determines the resolution of a sam- typically done as part of the encoding process.
ple. An 8 bit sample has 64 possible values, a 16 bit sample has
65,536, and a 24 bit sample has 16,777,216. BANDWIDTH
The sampling rate is the number of times in a second the sample In the context of content, bandwidth is the bitrate of a single AV
is taken. This determines the highest frequency that can be repro- stream crossing the network. The biggest trade offs typically made
duced, which is described in the Nyquist–Shannon sampling theo- in a streaming application are balancing quality and bandwidth.
rem which states: Based on the overall bandwidth available discovered in the appli-
If a function x(t) contains no frequencies higher than B hertz, it is com- cation phase, a bandwidth budget for the content must be estab-
pletely determined by giving its ordinates at a series of points spaced lished, and used as goal when evaluating the technologies present-
1/(2B) seconds apart. ed in the next section. AVT

Technology Components p
Streaming Technology
Components
Video Compression . Compression and Codec Terminology
. Intra-frame Codecs . Inter-frame Codecs
. Containers . Session Controls . Transport Protocols
VIDEO COMPRESSION COMPRESSION AND CODEC TERMINOLOGY

A single uncompressed 1080p, 30 frames per second (fps) video There are two main classes of compression, intra-frame and in-
would consume 1.49Gbps (1920 horizontal pixels x 1080 vertical ter-frame compression. Intra-frame (intra is a Latin prefix meaning
pixels x 30fps x 24 bits color information) of bandwidth, just for within) each frame of video is individually compressed as if it were
the data stream. There would also be a minimum of 30Mbps of IP an individual picture. Inter-frame (inter is a Latin prefix meaning
overhead and 10-50Mbps of streaming control overhead. To stream between) compression uses multiple frames as references for each
or capture this kind of bandwidth for any usable amount of time other allowing portions of the frames that do not change to just be
would be very difficult, and certainly very expensive. We use com- repeated rather than resending the information.
pression to combat this problem. The tools used to perform com-
pression are called codecs, COmpressor DECompressor. In video compression there are three types of frames;
The term codec is often used interchangeably with the concept I‑frames (intra coded frames) are compressed individually and can
for the compression algorithm which may or may not be accurate. stand alone as complete pictures. These frames have the most com-
The tool Codec can contain one or more compression algorithms plete information and therefore are the largest (least compressed).
and methodology, which depending on the codec, may be used
together or completely independently depending on the design
and settings. Codecs also define other characteristics like bitstream Codec Description
format, the way the compressed information is organized within a
MPEG 1 Low bandwidth, low framerate, low resolution,
file or transported. Two different codecs could have the exact same
rarely used except for the mp3 audio format
compression algorithm and be incompatible.
which is part of this standard
Lossless vs Lossy Compression MPEG 2 Designed and still used for broadcast transport
In lossless compression, the bandwidth of the stream is reduced and DVD
and when it is decompressed the output is exactly the same as the MPEG 3 Originally created for HDTV but was rolled into
input. In lossy compression, some information is discarded, but in- the MPEG 2 standard and was never ratified.
telligent choices are made to minimize the perceived impact of the
MPEG 4 Uses encoding with additional complexity to
loss. While there are some lossless video codecs and some codecs
achieve higher compression factors than MPEG-
have lossless and lossy profiles, business class compression is al-
2. The most commonly used codec today.
most always lossy. Tradeoffs between bandwidth, resolution, frame
rate, color accuracy and other factors determine how lossy.

Group of Pictures
is a repeating pattern
of how the various types
of frames are arranged
P‑frames (predictive coded frames) use data from previous frames

as a reference and are smaller than I‑frames (about 1/2 the size of JPEG2000
an I-frame). JPEG 2000 is an updated standard from the ISO/IEC published in
B‑frames (bi-predictive coded frames) use both previous and future 2000. It is fairly unique in that it uses Discrete Wavelet Transform
frames as references. Because of this they cannot be encoded until (DWT). DWT allows for multi-resolution image representation
the future frame is received. They are the smallest of the 3 types which means that different image resolutions can be extracted
(about 1/4 the size of an I-frame). from a single file without re-compressing. Like M-JPEG, M-JPEG
2000 consists of a series of JPEG 2000 frames and is widely used in
Group of Pictures (GOP) broadcast production at very high bit rates. The all I-frame format
Intra-frame compression uses only I-frames. Inter-frame compres- and high quality make it easy to perform a second compression to
sion bundles the three types of frames into a Group of Pictures. A an inter-frame codec for transmission.
GOP always starts with an I-frame (sometimes referred to as the
key frame). The codec defines how the various P- and B-frames are Proprietary Codecs
interleaved, which frames can be used as references and what com- There are several non-standard codecs which are optimized for
pression method to use. GOP size is generally a user configurable low latency, often under 20ms. These typically have a very low
setting on the encoder. Because each GOP contains one I-frame, compression and correspondingly high bandwidth requirements.
larger GOP sizes allow for higher compression rates.
INTER-FRAME CODECS
INTRA-FRAME CODECS MPEG Codecs
Intra-frame are generally not used for enterprise video streaming Like JPEG, the Moving Picture Experts Group (MPEG) is a working
due to the lower compression rate achievable compared to in- group formed by ISO/IEC. MPEG’s purpose is to establish standards
ter-frame codecs. The exception is the growing use in applications for audio and video compression and transmission. The working
to replace AV switching. They lower latency more than inter-frame group was established in 1988 and published its first standard,
codes and can be very lightly compressed to mimic the results of a MPEG-1 in 1992. The MPEG family is the basis for many codecs.
directly cabled video connection. The MPEG family of standards is organized by a hierarchy of
“streams.” A mix of streams, including the possibility for multiple
M-JPEG video streams is included in the MPEG standards.
The Joint Photographic Experts Group (JPEG) image file format was »»Elementary Streams (ES) are the raw bitstreams of encoded
developed by the International Organization for Standardization audio or video. Elementary streams are fully valid files and can
(ISO) and the International Electrotechnical Commission (IEC). be played alone. The mp3 file format is a MPEG-1 Part 3 (au-
JPEG encompasses a group of ISO/IEC Standards and first issued in dio) Layer 3 elementary stream.
1992. JPEG is a versatile codec ranging from lossless compression »»Packetized Elementary Streams (PES) are created by segment-
to very lossy. JPEG compression is based on the Discrete Cosine ing elementary streams into packets (a formatted unit of data)
Transform (DCT) algorithm and uses a 4:4:4, 4:2:2 or 4:2:0 chroma and adding a PES header (supplemental data placed at the be-
subsampling depending desired compression and quality. M-JPEG ginning of a block of data being stored or transmitted).
(Motion JPEG) video consists of a series JPEG images.

GOP Compression /Decompression Order
This stream hierarchy makes MPEG codecs particularity suitable 64 samples. The macroblock is divided into 6 blocks with 64 sam-
for transport over IP networks. ples in each four Y, one Cb, one Cr. These 8 x 8 blocks are then com-
MPEG encoders’ capabilities are defined by profiles and levels. pressed using Discrete Cosine Transform (DCT) and the resulting
The profile defines the subset of features such as compression algo- compressed values are combined to form the compressed macrob-
rithm, chroma format, etc. The level defines the subset of quantita- lock. The entire I-frame is compressed as an array of macroblocks. If
tive capabilities such as maximum bit rate, maximum frame size. the height or width in pixels of the frame is not evenly divisible by
Decoders often describe their playback specification using the 16, the leftover area must still be encoded as a macroblock and the
maximum program and level they are capable of decoding. For ex- unallocated area is just not displayed. I-frames can be considered
ample a decoder which has a MPEG specification of MP@ML, can effectively identical to baseline JPEG images.
play back any stream encoded up to Main Profile, Main Level. Even if the next frame is a B-frame, the next frame to be encoded
MPEG defines a bitstream and a decoder function. A valid MPEG is a P-frame. This is because a B-frame is bi-predictive. It requires a
file or stream is one that can be decoded by the standard decod- compressed anchor frame before and after it as references for com-
er. This leaves the encoding methodology up to the manufacturer. pression.
While this means that there is room for innovation and growth,
the video quality and encoding efficiency is not guaranteed by the P-frame Compression Example
standard, so due diligence should be performed when choosing an
encoder.
Inter-frame Compression
MPEG codecs use a variety of compression techniques to compress
video based around a GOP. A greatly simplified version of the com-
pression is discussed below.
The first frame to be compressed in a GOP is the I-frame.
Macroblock Compression Example P-Frames exploit the fact that often much of the video changes little
if at all. This is called temporal (over time) redundancy.
The P-frame is divided into macroblocks and checked to see if
any macroblocks on the P-Frame equal any macroblocks on the an-
chor frame. If there are, they are recorded as “the same as that other
one” which takes much less space than a DTC compression (shown
in red on the above illustration).
The remaining macroblocks are checked to see if they mostly
match any other macroblock on the anchor frame. If they do, the
moved macroblock is recorded as a vector, (a direction and a dis-
tance) as shown in green on the above illustration. Any remaining
MPEG-4 compression is organized around blocks. The Illustration macroblocks are compressed with DTC. If a video drastically chang-
above depicts the data in a 16 pixel x 16 pixel macroblock, the es from one frame to the next (such as a cut), it is more efficient to
smallest unit of video in MPEG-2, with 4:2:0 chroma subsampling. encode it as an I-frame.
The Y component has 256 Y samples. The Cb and Cr each contain Next, any B frames between the two anchor frames. This uses the

same process as the P-frame compression, but has a better chance
of matching macroblocks, because it has 2 frames to match against. FLV, WMV, Ogg, MOV, ASF, MP4 are examples of container files
B-frames provide maximum compression but require the pre-
vious as well as next frame for computation. Therefore, processing Different players support multiple different containers and most
of B-frames require more buffer on both the encoded and decoded containers support multiple file types. Just because a player can read
side. Each successive B-frame without an anchor frame adds one a container file does not mean that it has all the tools needed to
frame of latency on the encode side. In the example above you see decode the files within the container. For example container may
the two frames of latency in red. Because the frames are sent in the have a video file encoded in h.264, but the player does not have an
order they are encoded, there is only one additional frame buffer, h.264 codec, the player will either just not play it or return an error.
no matter the number of B frames.
MPEG Transport Stream
H.264 The MPEG Transport Stream was initially specified under Part 1
The H series codecs, H.261, H.262, H.263, and H.264 are a group of MPEG 2. Because of this it is still often referred to as MPEG-2
of standards published by the International Telecommunication transport stream causing confusion. Transport stream is included in
Union (ITU) Telecommunication Standardization Sector, a stan- PART 1 of the MPEG-4 specification without significant changes. It
dards setting group dating back to the telegraph. It is now a spe- is more accurately called MPEG Transport Stream, or just Transport
cialized agency of the United Nations, and operates a public-pri- Stream (TS).
vate partnership. The ITU’s standards reach across the spectrum of TS a data packet format was originally designed to transmit one
telecommunications, including standards for fax, modems, optical data packet in four Asynchronous Transfer Mode (ATM) cells. ATM
communications, VoIP, Public Key Infrastructure (PKI), SIM cards, cell payloads are 48 Bytes which accounts for the fairly small TS
and a host of other technologies to ensure interoperability between packet size of 188 bytes. It is purposely designed for digital video
manufacturers and between nations. The standards often cross ref- and audio transmission mediums, where the beginning and the end
erence under “umbrella standards” like the video conferencing stan- of the stream may not be identified, and is the primary container for
dards H.320 and H.323 which include H.261, H.263, and H.264. digital video transmission in the broadcast world.
The ITU works with other standards organizations including ISO/ As a container designed for broadcast it is self-contained and
IEC. The H.26X series standards map closely with the MPEG stan- doesn’t require a separate descriptor (SDP) file as described below,
dards. In many cases, the ITU and ISO/IEC video standards docu- it is encoded in the stream.
ments are virtually identical.
H.261 MPEG 1
Color Model and Color Space
H.262 MPEG 2
Pixels have only one property, color. The color of a pixel in digital
H.263 MPEG 4 Part 2 video is represented by a string of bits.
A color model is a mathematical way of representing a color with
H.264 MPEG 4 Part 10
numbers. The most commonly known color model is RGB which rep-
resents a color by a combination of red, green and blue values. 24 bit
H.264 is an ITU adoption of MPEG 4 Part 10 (AVC). The ITU and RGB video has 8 bits representing each color. This allows for 256 X 256
ISO/IEC jointly maintain the standards so they remain identical. It X 256 = 16.7 million individual colors. Other bit depths can be used
is important to note that not all MPEG 4 video is H.264. An MPEG depending on the application. The color model most often used for
4 encoder may or may not be H.264, depending on the profiles im- digital video is Y’Cb’Cr’. The prime ( ‘ ) is a mathematical symbol that
plemented unless it states MPEG 4 Part 10 or AVC. An H.264 decod- shows that the value is non-linear (the sample values are not equally
er will be backwards compatible across all the MPEG 4 profiles and spaced). Y’Cb’Cr’ is very typically noted as YCbCr, as it will be for the
typically be backwards compatible across all the H.26X standards, remainder of this document. The term YUV which is the PAL and SECAM
including H.262 (MPEG-2). equivalent (and therefore incorrectly used in digital video) is also often
used interchangeably. Y is for luma (brightness), the simple, description
CONTAINERS of Cb’ and Cr’ is Blue – Y and Red –Y.
A container is a metafile; a file that can store multiple types of data Color space is the real world matchup between the numbers and the
in a structured format. A video container holds all the different files colors they represent. Depending on the range of colors needing to be
and data the player requires to play the video. This may include represented, the same color models can represent different color spac-
one or more audio channels, one or more videos, graphics files, es. When a computer which uses a 24 bit RGB model is compressed
metadata, and data files. Metadata id “data about data” it is a data with a standard video codec, the color space must be converted to YCb-
structure that organizes information the player may need, includ- Cr before it can be compressed. If you compress a video and the colors
ing, but not limited to, timing information, window size, what co- come out wrong the likely culprit is color space conversion.
dec(s) is being used, titles, program information, bookmarks, and
closed captioning.

Technology Components s
SESSION CONTROLS
Session controls are used for establishing and controlling media ses- TRANSPORT PROTOCOLS
sions between end points. Most session controls pass the required The various containers need a way to be streamed across a network
metadata to the receiver though a file called a Session Description to a player. The various methods are typically called Transport Pro-
Protocol (SDP) with the extension .sdp. tocols, although technically they may be hybrids of different tech-
SDP is a standard governed by the Internet Engineering Task niques.
Force (IETF). The SDP (often called a session profile) file contains
various information elements, including origination address, ses- RTP
sion name, connection information (multicast address), media in- Real-Time Transport Protocol (RTP) is not a container format; rather
formation for each stream (stream name, codec type) and stream it is an additional protocol (with other required support protocols),
attributes like frame size. which allows for streaming compressed audio and video and per-
forms the functions of a container. RTP performs time stamping
Progressive Download and synchronization of separate audio and video streams and in-
The simplest form of video on demand is the progressive download, cludes information in its header for proper reassembly at the receiv-
it is not really streaming, but the downloading of the container file ing end if packets are received out of order.
in a bitstream order so that the file can start to be played before Real Time Control Protocol (RTCP) is a companion protocol to
the download is complete. The advantage of progressive download RTP that is used to maintain Quality of Service (QoS). RTP nodes
is that it can be implemented on a simple http server without ad- analyze network conditions and periodically send each other RTCP
ditional streaming services. As a downside, the user is able to get a packets that report on network congestion. All receivers are expect-
copy of the video, often as simply as right clicking on the link and ed to report back RTCP information, even in a multicast where the
choosing “save target as.” If a viewer wants to see a small portion of QoS information will not be acted upon.
the video, especially at the end, they have to download the entire Typically RTP will be sent on an even-numbered UDP port, with
portion of the video, prior to the desired segment, before they can RTCP messages being sent over the next higher odd-numbered port.
view it. This can be a waste of time and bandwidth. When audio and video are streamed together, typically each have
their own RTP/RTCP streams and are reassembled using RTP time
Real Time Streaming Protocol (RTSP) stamps at the receiver. An exception to this is thr MPEG Program
RTSP is used with entertainment and communications systems to stream, which handles the multiplexing within the container. AVT
control the streams delivered to the clients via commands like play
and pause. Since the player controls what parts of the stream is sent,
the user can jump to any spot in the stream. The media is typically
transferred using RTP and RTSP described later.
Session Announcement Protocol (SAP)

A service using SAP periodically sends the SDP file as a multicast.
The receiver subscribes to the SAP multicast with a SAP listener pro-
gram so they are aware of the available streams. When the receiver
wants to view a stream, it gets the SDP file from the local SAP listen-
er and uses the metadata in the SDP file to subscribe to the stream.
SAP is primarily used as a program guide for IPTV systems.
HTTP Live Streaming (HLS)

HLS is an HTTP-based media streaming communications proto-
col originally implemented by Apple Inc. as part of its QuickTime
software. It uses an M3U playlist file rather than a SDP. The player
makes HTTP requests based on the information in the M3U and the
server returns MPEG Transport Stream packets for playout. HLS of-
fers the random access control of RTSP, without requiring a stream-
ing sever. Any web server can serve streaming video.

Network Transit p
Network Transit
TCP and UDP . When to Use Which
. Unicast and Multicast
TCP AND UDP

Transmission Control Protocol (TCP) and User Datagram Protocol UDP
(UDP) are the two most common protocols of the Internet proto- UDP is described as an unreliable, connectionless protocol. This
col (IP) suite. Both protocols use port numbers as their addressing means that once the receiver requests the data, the sender just sends
scheme. it and assumes it gets there. There is no mechanism within UDP
Port numbers are locally significant in a host. Port numbers are to request retransmission if the packet is lost or corrupted (then
not used as addressing for traveling across the network, but as logi- discarded), although some programs that use UDP transmission,
cal addresses so that data streams can be directed to the correct de- check the data and ask for it to be sent again at the program level.
coding process. For example in RTSP streaming, each media stream In UDP transmission, the packets contain complete segments of
is directed to a different port number so the program knows which data, called datagrams. The network layer of the receiver just pass-
packet contain audio and which contain video. es the datagrams to the program as they come in without regard
to order or other packets. For these reasons, UDP is described as a
TCP lightweight protocol.
TCP is described as a reliable, connection-oriented protocol. This
means the sender and receiver have a conversation about the data WHEN TO USE WHICH
to be sent and how fast to send it, and if some doesn’t make it, the TCP is a great protocol for transferring data like a document or a web
receiver knows and requests the sender to resend it. page where accuracy is very important. There are multiple mecha-
In TCP transmission, if a packet is lost or corrupted (then dis- nisms that ensure that the data gets there with no errors. However
carded) and sent later, it is the responsibility of the network layer of all that error correction adds delay, and the larger header size adds
the receiver to put the data back in the right order before sending additional bandwidth overhead compared to UDP.
it to the program. Data sent across the network can be segmented UDP is typically used when the data is time sensitive, and is re-
across multiple TCP packets as a TCP “stream” (In telecommunica- silient to some data loss, like streaming video transport. With the
tions there are two ways transfer data messages and streams. Mes- large amount of data passed with streaming video, the additional
sages have a beginning, an end, syntax and a size. Streams do not bandwidth savings is also appreciated.
have set boundaries and can transmit any data. TCP and UDP are Multicast with its one-to-many approach is particularly ill suited
stream oriented protocols, video streaming is also stream oriented. to TCP which includes one to one handshaking and error correc-
They are different applications that employ the same communica- tion. Therefore multicast almost universally uses unicast.
tions concept) and the network layer of the receiver must reassem- That’s not to say video can’t be sent over TCP, YouTube uses TCP
ble the data before passing it to the programs. For these reasons TCP transmission. It is just better suited to non-real time, one to one
is described as a heavyweight protocol. transmission like video on demand, which can be transmitted as
either TCP or UDP.

Network Transit p
Unicast Multicast
UNICAST AND MULTICAST Multicast

There are two primary ways that video is transmitted across an IP Multicast is a one-to-one or more connection between multiple de-
network: unicast and multicast. coders and the source. The multicast source relies on multicast-en-
Unicast is used in applications like video on demand where each abled routers to forward the packets to all client subnets that have
user is viewing the content on their own time frame. Due to in- clients listening. There is no direct relationship between the decod-
creased network consumption, it is not preferable for applications ers and the source. The decoders subscribe to a multicast group and
where multiple viewers are receiving the same content simultane- the network ensures delivery of the stream. Each client that listens to
ously. the multicast adds no additional overhead on the server. The server
Multicast is preferable in real time applications where the net- sends out only one stream per source. The same load is experienced
work supports it, typically on a campus network. Multicast on the on the source whether only one client or 1,000 clients are listening.
Internet is not practical because the Internet is generally not multi- Multicast works by the source device addressing the content to be
cast-enabled. multicast to a multicast address, unlike unicast where a copy of the
content is addresses to each destination individually. The range of
Unicast IP addresses reserved for multicast is 224.0.0.0 - 239.255.255.255,
Unicast is a one-to one connection between the decoder and the however many address ranges are reserved for special purposes.
source. Unicast uses IP delivery methods such as Transmission Con- Best practice for streaming is to use the range from 234.0.0.0 to
trol Protocol (TCP) or User Datagram Protocol (UDP), which are 238.255.255.255, unless there is a specific reason to use other ad-
session-based protocols. When a video decoder receives a unicast dressing.
stream from a streaming source, that client has a direct relationship A properly configured network forwards the multicast content
to the server. Each unicast client that connects to the server takes to any user that is subscribed to the multicast with only one copy
up additional bandwidth. For example, if you have 10 clients all of the content on any given network segment. If an endpoint does
playing 1 Mbps streams, those clients as a group are taking up 10 receive a multicast it is not subscribed to, it ignores it. AVT
Mbps. If you have only one client playing the 1 Mbps stream, only
1 Mbps is being used.

Mix and Match Tradeoffs p
Mix and Match

Tradeoffs
General Attributes Prioritized . Session Technologies .
Bandwidth Requirements and Typical Distribution Latencies
Each attribute you define will have a different importance Example: A classroom overflow with the ability of the remote
based on the application. The carts below offer generalized infor- classrooms to ask questions verbally.
mation on how various technologies compare within chosen at-
tributes. Some of the attributes such as management, security and Within a building or campus
some degree of scalability will be more dependent. »»They might prioritize the attributes Latency and Quality
Quality as an attribute is somewhat subjective, since each lossy with a very low priority on bandwidth since the application
compression algorithm has different artifacts which affect the qual- is staying within the local network that have very high avail-
ity. Subjective quality is also very dependent on resolution, frame able bandwidth and choose a proprietary codec using very high
rate, latency and the bandwidth allowed for the stream. bandwidth over multicast.
It is up to the reader to provide the priorities and tradeoffs
allowed. Between two campuses over the Internet: A classroom
overflow with the ability of the remote classrooms to ask
questions verbally.
»»They might prioritize Bandwidth and Latency and choose a
JPEG2000 codec, over unicast, since the Internet won’t support
multicast.
»»They might prioritize Bandwidth and Quality and choose a
H.264 codec since H.264 will give a higher quality at a given
bandwidth than JPEG2000 while sacrificing latency.
General Attributes Prioritized
Attribute First Second Last
Real Time (Audience Local) Transport Stream SDP with RTP Progressive Download
Multicast Multicast Unicast
Real Time (Audience Local) Multicast, UDP Unicast, UDP Unicast, TCP
On Demand HTTP Live Stream RTSP with RTP Progressive Download
High Quality to BYOD H.264 JPEG2000 Proprietary
Diverse Devices H.264 JPEG2000 Proprietary
High Quality Room Overflow Proprietary H.264 -
Set top boxes Transport Stream - -
Bandwidth H.264, other JPEG2000 Proprietary Codec
MPEG
Latency Proprietary Codec JPEG2000 H.264, other MPEG
Capture for production JPEG2000 H.264, other MPEG

Mix and Match Tradeoffs p
Typical Distribution Latencies

Session Real On Scalability Comments
Technology Time Demand
RTSP with RTP Fair Excellent RT: poor »»Primarily unicast, Multicast loses most features
OD: good »»Requires multiple media and control streams
»»Possibility for “catchup” joining live stream after the start
and downloading previous information*
»»Does not work with most set top boxes
»»Typically requires streaming server
SDP with RTP Good Fair RT: poor »»Unicast and multicast support
OD: good »»SDP must be created and served for each session
»»Requires multiple media and control streams (ports)
»»May have stream synchronization issues in multicast
SAP with RTP Good Poor RT: poor »»Multicast support, Unicast is rare
OD: good »»SDP must be created and published in SAP
»»Requires multiple media and control streams (ports)
»»May have stream synchronization issues in multicast
Transport Stream Excellent Good RT: good »»Unicast and multicast support
OD: fair »»No SDP required, session information is embedded
»»Single stream (port) for all media and control
»»Media multiplexed to maintain synchronization
»»On demand required special sever functions for individual
stream control
»»Real time Transport Stream can be a direct source
for HTTP Live stream content
HTTP Live Stream Fair Excellent RT: fair »»Only Possible for unicast, no multicast support.
OD: Excellent »»No SDP required, session information is embedded
»»Served from commodity web server
»»Possibility for “catchup” joining live stream after the start
and downloading previous information*
Progressive N/A Fair RT: poor »»Requires the full file be available before serving
Download »»Does not require a streaming server, just a file server.
*Dependent on streaming server configuration

Mix and Match Tradeoffs s
BANDWIDTH REQUIREMENTS AND TYPICAL
DISTRIBUTION LATENCIES
Bandwidth requirements vary greatly with the desired attributes of
the content so rather than giving numbers, the various codecs are
rated in order from most to least efficient (1-4). The various con-
tent encoded at the target bandwidths must be examined to ensure
suitability for an application
The latency numbers shown are best case end to end numbers
with non-streaming technologies show as a comparison. AVT
Typical Distribution Latencies

Minimum
Distribution Technology BW Comments Comments
Latency
HDMI Cable N/A <1ms
HDBaseT N/A <2ms Packetize and forward TDMS as

it comes in.
HDBaseT or HDMI with Scaling N/A 60fps <18ms Typically one frame of latency.
30fps <35ms Line scalers may have lower latency,
25fps <42ms but may have more artifacts.
Proprietary Streaming 4+ As little compression <15 Plus transport Typically 6 to 20 lines of latency at encode
as possible and decode
MJPEG 3 Intra-frame limits 60fps 35-60ms Standard mandates one frame at encode
compression 30fps 70-100ms and decode
25fps 80-120ms
Plus transport
JPEG 2000 2 Intra-frame limits 60fps 35-60ms Standard mandates one frame at encode
compression 30fps 70-100ms and decode
Wavelet is efficient 25fps 80-120ms
Plus transport
MPEG codecs H.264,H.265 1 Inter-frame compres- 100ms-600ms or At least one frame of latency at encode
sion is most efficient. more and decode. Additional fixed latency is
Later standards are added based on Group of Pictures (GOP)
more efficient size. Each b frame adds 1 frame of
encode latency. Any number
of b frames adds one frame decode
latency.

avnetwork.com

AVT Workbook Video Streaming Whitepaper

Uploaded by

Copyright:

Available Formats

AVT Workbook Video Streaming Whitepaper

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AVT Workbook Video Streaming Whitepaper

Uploaded by

Copyright:

Available Formats

Workbook

to make informed decisions during the specifying

p Real Time or On Demand?

p Key Content Attributes

p Streaming Technology Components

p Network Transit, and much more

ART & PRODUCTION and product-agnostic

Thought Leader Sponsor

The content in this Workbook is a collaborative effort between the

avnetwork.com Planning for Video Streaming | AVTechnology Manager’s Workbook 3

PAUL ZIELIE, CTS-D,I

Since joining AMX by Harman in the fall of 2013, his expertise in

4 | AVTechnology Manager’s Workbook | Planning for Video Streaming avnetwork.com

avnetwork.com Planning for Video Streaming | AVTechnology Manager’s Workbook 5

Audience Who is consuming the stream and why?

Security If and how the content is protected for unauthorized users.

Management Logistical tasks associated with distributing the content.

Scalability The ability of a technology or system to accommodate a larger number of endpoints or

6 | AVTechnology Manager’s Workbook | Planning for Video Streaming avnetwork.com

avnetwork.com Planning for Video Streaming | AVTechnology Manager’s Workbook 7

Current Accepted Standards for Allowable Latency by Task

Live Audio <30ms ITU BT.1359 (Maximum specification for video

8 | AVTechnology Manager’s Workbook | Planning for Video Streaming avnetwork.com

1: Input Buffer 4: Congestion 6: Decoding Buffer 8: Display Lag

at the wrong package. load, off-network traffic to a service provider. If an organization is

Allowable Latency Standards Applied to Use Case and Characteristics

Auditorium, IMAG Video of presenter or performer magnified for Distributed <22.5ms

avnetwork.com Planning for Video Streaming | AVTechnology Manager’s Workbook 9

Interface The physical interface and of the content source.

10 | AVTechnology Manager’s Workbook | Planning for Video Streaming avnetwork.com

avnetwork.com Planning for Video Streaming | AVTechnology Manager’s Workbook 11

12 | AVTechnology Manager’s Workbook | Planning for Video Streaming avnetwork.com

Human Voice 80 - 1,200Hz

Piano 27.5 - 4,186Hz

Orchestra Strings 30 - 16,000Hz

Telephone 200 - 3,500Hz

Video Conferencing 80 - 8,000Hz

20Hz 200Hz 2,000Hz 20,000Hz

Pitch Color Presence

avnetwork.com Planning for Video Streaming | AVTechnology Manager’s Workbook 13

VIDEO COMPRESSION COMPRESSION AND CODEC TERMINOLOGY

14 | AVTechnology Manager’s Workbook | Planning for Video Streaming avnetwork.com

P‑frames (predictive coded frames) use data from previous frames

avnetwork.com Planning for Video Streaming | AVTechnology Manager’s Workbook 15

GOP Compression /Decompression Order

16 | AVTechnology Manager’s Workbook | Planning for Video Streaming avnetwork.com

avnetwork.com Planning for Video Streaming | AVTechnology Manager’s Workbook 17

Session Announcement Protocol (SAP)

HTTP Live Streaming (HLS)

18 | AVTechnology Manager’s Workbook | Planning for Video Streaming avnetwork.com

TCP AND UDP

avnetwork.com Planning for Video Streaming | AVTechnology Manager’s Workbook 19

UNICAST AND MULTICAST Multicast

20 | AVTechnology Manager’s Workbook | Planning for Video Streaming avnetwork.com

Mix and Match

Attribute First Second Last

avnetwork.com Planning for Video Streaming | AVTechnology Manager’s Workbook 21

Typical Distribution Latencies

22 | AVTechnology Manager’s Workbook | Planning for Video Streaming avnetwork.com

Typical Distribution Latencies