US011076153B2
( 12) United States Patent
( 10 ) Patent No .: US 11,076,153 B2
(45 ) Date of Patent :
Jul . 27 , 2021
Pattichis et al .
( 54 ) SYSTEM AND METHODS FOR JOINT AND
ADAPTIVE CONTROL OF RATE , QUALITY ,
AND COMPUTATIONAL COMPLEXITY FOR
VIDEO CODING AND VIDEO DELIVERY
( 71 ) Applicant: STC.UNM , Albuquerque, NM (US)
( 72 ) Inventors: Marios Stephanou Pattichis ,
Albuquerque, NM ( US ) ; Yuebing
Jiang , Santa Clara, CA (US ) ; Cong
Zong , Albuquerque, NM ( US ) ;
Gangadharan Esakki, Albuquerque,
NM (US ) ; Venkatesh Jatla ,
Albuquerque, NM (US ) ; Andreas
Panayides, Strovolos ( CY)
( * ) Notice: Subject to any disclaimer, the term of this
patent is extended or adjusted under 35
U.S.C. 154 ( b ) by 0 days.
( 21 ) Appl . No .:
15 /747,982
( 22 ) PCT Filed :
Jul . 31 , 2016
( 86) PCT No .:
PCT/US2016/ 044942
$ 371 (c ) ( 1 ) ,
( 2 ) Date:
31 , 2015 .
( 2014.01 )
( 2014.01 )
( Continued )
(56)
References Cited
U.S. PATENT DOCUMENTS
6,426,772 B1 * 7/2002 Yoneyama
8,798,137 B2 * 8/2014 Po
(Continued )
HO4N 19/61
375 /240.02
HO4N 19/176
375 /240.02
( Continued )
System and methods for the joint control of reconstructed
video quality, computational complexity and compression
rate for intra -mode and inter -mode video encoding in
HEVC . The invention provides effective methods for (i )
generating a Pareto front for intra -coding by varying CTU
parameters and the QP, ( ii ) generating a Pareto front for
inter -coding by varying GOP configurations and the QP, ( iii )
real- time and offline Pareto model front estimation using
regression methods, ( iv ) determining the optimal encoding
configurations based on the Pareto model by root finding and
(Continued )
100
102
www.woman
DRASTIC
Controller
102A Time
102B
Rate .
102C - Quality
108
Split Tran . Quant.
CU
CTU
Frame
Decoded
HO4N 19/149 ; HO4N 19/119 ; HO4N
19/172 ; HO4N 19/147
( Continued )
Primary Examiner Tung T Vo
(74 ) Attorney, Agent, or Firm Valauskas Corder LLC
ABSTRACT
( 57 )
Related U.S. Application Data
( 60 ) Provisional application No. 62 / 199,438 , filed on Jul .
Input
(Continued )
( 58 ) Field of Classification Search
CPC .. GO6F 15/177 ; HO4N 19/127 ; H04N 19/126 ;
Marios Pattichis . *
Jan. 26 , 2018
Prior Publication Data
US 2018/0220133 A1
Aug. 2 , 2018
H04N 19/127
H04N 19/172
( 2014.11 ) ; H04N 19/126 ( 2014.11 ) ;
Dynamically Reconfigurable Architecture System for Time-varying
Image Constraints (DRASTIC ) for HEVC Intra Encoding, 2013 ,
IEEE , pp . 1112-1116 , by Yuebing Jiang, Gangadharan Esakki, and
PCT Pub . Date : Feb. 9 , 2017
(51 ) Int. Cl.
H04N 19/127 ( 2014.11 ) ; H04N 19/119
OTHER PUBLICATIONS
( 87 ) PCT Pub . No .: WO2017/023829
( 65 )
(52) U.S. CI.
CPC
Split
106
104
Picture Buffer
SAO
Entropy
coding
inv. Quant .
iny. Tran .
Prediction
DBF
Intra Recon .
Picture Buffer
US 11,076,153 B2
Page 2
local search, and ( v ) robust adaptation of the constraints and
model updates at both the CTU and GOP levels.
18 Claims , 14 Drawing Sheets
( 51 ) Int . Cl.
H04N 19/149
H04N 19/119
H04N 19/126
4/2015 Saxena
2015/0172661 A1 *
6/2015 Dong
2015/0215631 A1 *
7/2015 Zhou
2015/0271531 A1 *
9/2015 Wen
2015/0326883 A1 * 11/2015 Rosewarne
( 2014.01 )
( 2014.01 )
( 2014.01 )
( 2014.01 )
2015/0373328 A1 * 12/2015 Yenneti
H04N 19/147
(52) U.S. CI.
CPC
H04N 19/147 ( 2014.11 ) ; H04N 19/149
(2014.11 ) ; H04N 19/172 (2014.11 )
( 58 ) Field of Classification Search
USPC
....... 375 / 240.15 , 240.03
See application file for complete search history.
2016/0050422 A1 *
2/2016 Rosewarne
2016/0088298 Al *
3/2016 Zhang
2016/0094855 A1 *
3/2016 Zhou
2016/0127733 A1 *
5/2016 Wan
References Cited
6/2016 Ugur
2016/0173875 A1 * 6/2016 Zhang
U.S. PATENT DOCUMENTS
2016/0295217 A1 * 10/2016 Suzuki
( 56 )
B2 * 1/2015 Yang
8/2015 Pattichis et al .
B2
1/2017 Pattichis et al .
B2
A1 * 6/2006 Chang
HO4N 19/172
2009/0175330 A1 *
7/2009 Chen
HO4N 19/115
2011/0164677 A1 *
7/2011 Lu
375 / 240.01
HO4N 19/176
2011/0235928 A1 *
9/2011 Strom
HO4N 19/115
382/233
8,934,538
9,111,059
9,542,198
2006/0133480
2015/0110181 A1 *
HO4N 19/15
375 / 240.02
2016/0314603
2016/0316215
2017/0013261
2018/0184089
375 / 240.02
H04B 7/15592
2012/0287987 A1 * 11/2012 Budagavi
2013/0094565 A1 * 4/2013 Yang
2013/0129241 A1 * 5/2013 Wang
HO4N 19/587
370/246
375 / 240.02
HO4N 19/105
375 / 240.02
HO4N 19/19
382/233
HO4N 19/172
375 / 240.03
2014/0016693 A1 *
1/2014 Zhang
2014/0161177 Al *
6/2014 Sim
2014/0192862 A1 *
7/2014 Flynn
2015/0030068 A1 *
1/2015 Sato
2015/0049805 A1 *
2/2015 Zhou
HO4N 19/176
2015/0092840 A1 *
4/2015 Mochizuki
HO4N 19/593
2015/0103892 A1 *
4/2015 Zhou
HO4N 19/117
375 / 240.03
HO4N 21/23439
375 / 240.02
HO4N 19/70
375 / 240.03
H04N 19/117
375 / 240.03
HO4N 19/70
A1 10/2016 Carranza et al .
A1 * 10/2016 Minoo
A1 * 1/2017 Lin
A1 * 6/2018 Zhang
HO4N 19/52
375 /240.16
HO4N 19/119
375 /240.18
HO4N 19/12
375 /240.03
HO4N 19/105
375 /240.12
HO4N 19/146
375 /240.03
HO4N 19/176
375 /240.12
HO4N 19/127
375 /240.02
HO4N 19/33
375 /240.08
HO4N 19/436
375 /240.03
H04N 19/70
HO4N 19/132
H04N 19/176
HO4N 19/146
OTHER PUBLICATIONS
375 / 240.03
2011/0305144 A1 * 12/2011 Sethakaset
2013/0272383 A1 * 10/2013 Xu
2016/0156917 A1 *
HO4N 19/176
375 /240.12
HO4N 19/51
375 /240.03
HO4N 19/436
375 /240.02
Overview of the High Efficiency Video Coding ( HEVC ) Standard ,
Gary J. Sullivan , Fellow , IEEE , Jens -Rainer Ohm , Member, IEEE ,
Woo Jin Han , Member, IEEE , and Thomas Wiegand, Fellow , IEEE ,
IEEE Transactions on Circuits and Systems for Video Technology,
vol . 22 , No. 12 , Dec. 2012. *
Dynamic Switching of GOP Configurations in High Efficiency
Video Coding ( HEVC ) using Relational Databases for Multi
objective Optimization, Gangadharan Esakki, Sep. 12 , 2014. *
Gangadharan Esakki et al . , “ Dynamic Switching of GOP Configu
rations in High Efficiency Video Coding ( HEVC ) using Relational
Databases for Multi - objective Optimization . ” The University of
New Mexico , 2014. website http://digitalrepository.unm.edu/ece_
etds/ 80 .
Jiang et al., “ Dynamically reconfigurable architecture system for
time- varying image constraints ( drastic ) for hevc intra encoding ” ,
Asilomar Conference on Signals Systems and Computers, pp .
1112-1116 , Nov. 2013 .
Jiang et al . , “ Dynamically reconfigurable DCT architectures based
on bitrate power and image quality considerations ” , 19th IEEE
International Conference on Image Processing ( ICIP) , pp . 2465
375 / 240.03
2468 , 2012 .
375 / 240.03
Jiang et al., “ Dynamically reconfigurable architecture system for
time -varying image constraints ( drastic ) for motion ipeg" ,J Real
375 / 240.03
Time Image Proc ( 2018 ) 14 : 395. https://doi.org/10.1007/s11554
014-0460-8 .
* cited by examiner
U.S. Patent
Jul . 27 , 2021
100
102
southwest
102A Time
102B
US 11,076,153 B2
Sheet 1 of 14
DRASTIC
Controller
Rate
102C - Quality
Input CTU
Frame
Decoded
108
Split Tran . Quant.
CU
Split
106
104
SAO
coding
inv.Quant.
inv. Tran ,
Intra
Picture Buffer
reco
Entropy
TU
Prediction
Intra Recon .
Picture Buffer
DB
FIG . 1
Cu
Size
64
32
proc .
O
id
0-0
"A "
2
16
OO
8
22
region 1
ma
" B"
5-20
?
25
region 2
FIG . 2
84
28
O
21-84
(212) 85-212
U.S. Patent
Jul . 27 , 2021
US 11,076,153 B2
Sheet 2 of 14
******
? ?????? ?
A.
.
ti
WWIR
??? ?,?
itii
??? ?
????
? ??
timelns)Dern2000
bits perper
FIG . 3
+
psnr1
bits 1
time1
psnr3
bits3
time3
FIG . 4
psnr2
bits2
time2
U.S. Patent
Jul . 27 , 2021
Sheet 3 of 14
US 11,076,153 B2
1 : Estimate budgets for T , Q , R for all CTUs.
2 : Estimate QP and Config using initial model .
Encode frame by iterating through the CTUS.
3 : for each CTU in current frame do
Robust allocation T , Q , R within available budgets .
Allocate T.Q. R based on available budgets .
Update remaining budgets for T , Q , R.
7:
if any remaining budget < 0 then
> Adjust budget to minimize the violation .
ReAllocate CTU budgets using a fraction
8:
end if
of the remaining total frame budget.
Robust model update
Update model using three neighboring CTUS.
11 :
12 :
if model update failed then
Update model with neighboring CTU model
that gave best prediction .
end if
cont ...
FIG . 5A
U.S. Patent
Jul . 27 , 2021
Sheet 4 of 14
US 11,076,153 B2
cont ...
13
:
16 :
Robust parameter estimation and optimization ,
Estimate QP and Config based on the model .
Solve optimization problem using local search .
if either QP or Config is out of range then
>> Update constraints and fix encodings
Update constraints and estimate new
estimates of QP and Config .
Constrain QP to be within ++ of
neighboring CTUS.
Enforce QP and Config within valid ranges .
end if
20:
>
Encode CTU and store encoding parameters .
Encode CTU using QP and Config .
Compute T. Q , R for current CTU .
Save QP . Config , T. Q , R and CTU location for
model updates.
23 : end for
FIG . 5B
U.S. Patent
Jul . 27 , 2021
US 11,076,153 B2
Sheet 5 of 14
Appended
Appended
Appended Appended Appended Appended Appended
CTU
CTU
CTU
CTU
CTU
CTU
CTU
?
Appended
CTU
Appended
CTU
Appended
CTU
Appended
CTU
Appended
CTU
Appended
CTU
FIG . 6
psnr2
1
bits2
time2
psnr3
bits3
time3
FIG . 7
?
U.S. Patent
Sheet 6 of 14
Jul . 27 , 2021
Mode
Minimum Rate
US 11,076,153 B2
Objective (minimum )
norm ( abs (MSEest - MSEtarget ))
norm (abs ( Timeest Timetarget )
Minimum Complexity norm (abs (MSEest MSEtarget )
Maximum Quality
* -norm (abs (BPSest - BPStarget
norm ( abs (Timeest - Timetarget ))
+ norm ( abs (BPSest
-BPStarget )
FIG . 8
Use CTU SSE and times T to estimate a.b.
1 : if ( SSEtop ! = SSEleft ) and ( Ttop ! = Tleft ) then
2:
3:
b = log( Ttop / Tleftb ) / log ( SSEtop /SSEleft)
a = Ttop / SSEtop
4 : end if
FIG . 9
U.S. Patent
A
Jul . 27 , 2021
Sheet 7 of 14
US 11,076,153 B2
Estimate ratios associated with current CTU .
1 : Tused + T /Ttarget,i
2: SSEused + SSESSEtarget.i
3 : if ( SSEused > 1 ) and ( Tused > 1 ) then
Above the target.
if ( SSEused < Tused ) then
Reduce time to meet the curve .
6:
7
Ttarget, i = a · Qtarget
else
o Reduce SSE to meet the curve .
Qtarget = ( Ttarget / a )1/6
8:
end if
9 ; else
Below the target .
if (SSEysed > Tused ) then
Increase time to meet the curve .
12 :
13 :
Ttarget = (Qtarget /a ) 1/6
else
Increase SSE to meet the curve .
Qtarget = a . Ttarget
end if
15 : end if
FIG . 10
U.S. Patent
Jul . 27 , 2021
Sheet 8 of 14
US 11,076,153 B2
D Use CTU SSE and bitrates R to estimate a , b .
1 : if ( SSEtop ! = SSEleft ) and (Rtop ! = Rleft ) then
2:
3:
b = log ( SSEtop / SSEleft ) / log (Rtop /Rieft )
a = SSEtop / Rtop
4 : end if
FIG . 11
1 : Rused + R / Rtarget.i
2 : SSEused + SSE /SSEtarget, i
3 : if (Rused > 1 ) and ( SSEused > 1 ) then
4:
5:
6:
7:
8:
if ( SSEused < Rused ) then
Rtarget (Qtarget / a )1/6
else
Qtarget = a · Rtarget
???
end if
9 : else
10 :
if (SSEused > Rused ) then
12 :
else
Rtarget (Atarget / a )1/6
Qtarget = a · Rtarget"
end if
15 : end if
FIG . 12
U.S. Patent
Jul . 27 , 2021
Sheet 9 of 14
US 11,076,153 B2
1 : D Use CTU encoding times T and rates R
2 : D to estimate a , b for the model .
3 : if ( Ttop ! = Tleft ) ) and ( Rtop ! = Rleft ) then
b = log ( Ttop /bTieft ) / log (Rtop /Rleft)
6 : end if
Ttop /Rtop
FIG . 13
1 : Tused for T / Ttarget,i
2: Rysed + R /Rtarget ,i
3: if (Rused > 1 ) and ( Tused > 1 ) then
if ( Tused < Rused ) then
Ttarget a · Ritarget
6:
else
8:
end if
7:
Rtarget = ( Ttarget / a )1 /
9 : else
11 :
kamera
13 :
if ( Tused > Rused ) then
Rtarget = ( Ttarget /a )1/6
else
Ttarget a . Rtarget
end if
15: end if
penis
FIG . 14
U.S. Patent
Jul . 27 , 2021
US 11,076,153 B2
Sheet 10 of 14
90
00
20
OB
BOL
Beh
90
10
.
TOT
OL
og
50
650
5
05
on
more
90
60
50
US
=
40
.
oleh
OF
OE
30
30
20
2
20
ha
hinh
10
5 32.5 30.012.)0 Mbps(4,5BPS2.04.0 0 3.125 2. 50 1.375 490
40.0 37.UNS
) s (time
FIG . 15
iFnrdaemx
.
40
42
4
35 28
?
waone
config
U.S. Patent
Jul . 27 , 2021
OL
den
20
US 11,076,153 B2
Sheet 11 of 14
8
06
00
og
08
sa
za
ma
g
Og
5
Og
50
50
40
40
30
30
30
OC
20
20
.
OT
OT
40.0 37.535.0
ZE
S
12.0
4.5
?
)Mbps( BPS
wexaput
4.0 0 3.125 2. 50
499
FIG . 16
?
42
db
87
150
OE
U.S. Patent
Jul . 27 , 2021
Sheet 12 of 14
US 11,076,153 B2
200
Videos
202
Configurations
Encode Videos
204
Linear
Regression
206
Forward Models
(Quality , Bitrate ,
Encoding Time )
208
FIG . 17
function Video EncodingAndForward Models
Input: input videos Vd, configuration files Cnf, parameter values
Prmval.
Output: equations of forward models Fwdeq
for (each Video in Vd )
for ( eachConfigutation in Cnf)
Encode video and extract parameter values in terms of QP, SSIM ,
Frame Rate and Bitrate and store in Prmval.
end for
end for
for ( SSIM / Frame Rate / Bitrate in Prmval)
Train and validate forward individual regression models for SSIM ,
Frame Rate and Bitrate . For each model create an equation in store it
to Fwdeq.
end for
end function
FIG . 18
U.S. Patent
Jul . 27 , 2021
US 11,076,153 B2
Sheet 13 of 14
302
300
303
M
304
ForwardModel
1
-306
304A
304B
308
CTime)Real-V(oansrtyaintgs
304C
|GOPLevelAdaptation
Select Optimal 310
Configuration
I
FIG . 19
312
U.S. Patent
Jul . 27 , 2021
Sheet 14 of 14
US 11,076,153 B2
function Adaptive Encoding
Input: equations of forward models Fwdeq , parameter values Prmval, group
of pictures GOP, time varying constraints Tvc, parameter constraints
Prmcns, forward models predictions Fwdmdpd.
Output: new encoding parameters Nencpm .
#Time varying constraints initialization
Initialize SSIM constraint in Tvc
Initialize Frame Rate constraint in Tvc
Initialize Bitrate constraint in Tvc
for ( eachGop in GOP )
if ( current Tvc ! = previous Tvc) then
Create Prmcns for Tvc
else
Use empty current setting
end if
for ( each Forward Model in Fwdeq )
Input Prmcns in each Forward Model .
Use Newton's algorithm to predict QP values as Fwdmdpd and create
initial candidate configurations
for (eachForward ModelPrediction in Fwdmdpd)
Create inverse model equation and optimal solution and predict
quantization parameter value that meets SSIM , Frame Rate and Bitrate
constraints. These are the final candidate configurations
end for
end for
Selectoptimal configuration and store values to Nencpm .
end for
end function
FIG . 20
US 11,076,153 B2
1
SYSTEM AND METHODS FOR JOINT AND
2
The design of most video coding standards is primarily
ADAPTIVE CONTROL OF RATE , QUALITY,
AND COMPUTATIONAL COMPLEXITY FOR
VIDEO CODING AND VIDEO DELIVERY
CROSS - REFERENCE TO RELATED
APPLICATIONS
This application claims the benefit of U.S. Provisional
Patent Application No. 62 / 199,438 filed Jul . 31 , 2015 ,
incorporated by reference .
FEDERALLY -SPONSORED RESEARCH OR
DEVELOPMENT
This invention was made with government support under
CNS1422031 awarded by the National Science Foundation
(NSF ) . The government has certain rights in the invention .
FIELD OF THE INVENTION
aimed at having the highest compression efficiency, or
ability to encode video at the lowest possible bit rate while
maintaining a certain level of video quality. High -efficiency
5 video coding (HEVC ) , also known as H.265 , is a video
compression standard that has provided substantial improve
ments to video compression . Compared to H.264 , HEVC
aims at a 50 % bit rate reduction at equivalent video quality
levels . Unfortunately, bitrate performance improvements
10
come at substantial increase in computational complexity .
HEVC benefits from the use of larger coding tree unit
(CTU) sizes to increase coding efficiency while also reduc
ing decoding time . HEVC also uses other coding tools .
15 These coding tools include context -adaptive binary arithme
tic coding (CABAC ) as the only entropy encoder method,
transform units ( TUS ) to code the prediction residual, recur
sive coding, complex intra -prediction modes and asymmet
ric inter prediction unit division . In addition , two loop filters
20 are applied sequentially, with the deblocking filter ( DBF )
applied first and the sample adaptive offset ( SAO ) filter
applied afterwards.
The invention relates generally to computer software for
At a higher - level, for inter encoding , HEVC relies on the
video communications. More specifically, the invention use of Group Of Pictures (GOP ) configurations to achieve
relates to image processing, intra -mode video encoding, and 25 different levels of performance. Video encoding efficiency
inter -mode video encoding that is compatible with the depends heavily on the GOP configurations.
high -efficiency video coding ( HEVC ) standard .
There has been strong research interest in reducing HEVC
The following patent applications are incorporated by encoding complexity for both inter- and intra - coding. Inter
reference : U.S. patent application Ser. No. 14 /069,822 filed coding compresses pictures based on their GOP configura
Nov. 1 , 2013 , now U.S. Pat . No. 9,111,059 ; U.S. patent 30 tion . Intra - coding compresses each picture independent of
application Ser. No. 14/ 791,627 filed Jul. 6 , 2015 ; and the other. For reducing the computational complexity for
International Patent Application PCT/US14 /70371 filed inter coding , for example, use of different configuration
Dec. 15 , 2014 , now U.S. patent application Ser. No. 15/103 , modes have been introduced . Methods that have been used
977 .
for reducing the computational complexity for intra -coding
35 include the use of a rough mode set (RMS ) , gradient based
BACKGROUND OF THE INVENTION
intra -prediction, and coding unit (CU) depth control. Unfor
tunately, these prior approaches did not take into account
Computer systems include hardware and software . Hard- that video compression requirements can jointly vary with
ware includes the physical components that make up a network conditions, energy /power constraints, or varying
computer system . Software includes programs and related 40 expectations of video quality. Thus, it is not sufficient to
data that provide the instructions for telling computer hard- reduce computational complexity without considering the
ware what to do and how to do it .
implications on bitrate and video quality.
Computer system hardware includes a processor that
permits access to a collection of computing resources and
process , or other resource for a limited or defined duration .
Although HEVC is considered a high -efficiency codec,
there is a need to jointly control bitrate, video quality, and
computational complexity for both intra-coding and inter
coding . The invention satisfies this demand .
digital signal processor configured to carry out the instruc
SUMMARY OF THE INVENTION
components that can be invoked to instantiate a machine, 45
A processor may be special purpose or general- purpose
tions of a computer program by performing the basic arith
metic , logical, control and input/output (1/0 ) operations 50 The invention is directed to adaptive methods that can
specified by the instructions. Specifically, a processor or adjust video compression parameters and jointly control
central processing unit ( CPU ) —includes a processing unit computational complexity, image quality , and bandwidth ( or
and control unit ( CU) . Most modern CPUs are micropro- bitrate ). The system and methods simultaneously minimize
cessors contained on a single integrated circuit ( IC ) chip. A computational complexity, maximize image quality, and
computer system also includes non -transitory computer- 55 minimize bandwidth subject to constraints on available
readable storage medium such as a main memory , for energy /power, bandwidth , and the minimum level of accept
able video quality. The proposed system and methods extend
example random access memory (RAM ).
Computer systems may include any device through the the previously filed patent applications that are cited above
use of which implements the methods according to the by providing effective methods for: (i ) generating a Pareto
invention, for example as computer code . Computer systems 60 front for intra-coding by varying CTU parameters and the
may include, for example, traditional computer, portable QP, ( ii ) generating a Pareto front for inter-coding by varying
computer, handheld device , mobile phone , personal digital GOP configurations and the QP , (iii ) real - time and offline
assistant, smart hand -held computing device , cellular tele- Pareto model front estimation using regression methods , ( iv )
phone , or a laptop or netbook computer, hand held console determining the optimal encoding configurations based on
or MP3 player, tablet , or similar hand held computer device , 65 the Pareto model by root finding and local search , and ( v )
such as an iPad® or iPhone® , and embedded devices or robust adaptation of the constraints and model updates at
those that contain a special -purpose computing system .
both the CTU and GOP levels . The system and methods
3
US 11,076,153 B2
4
apply to both inter - coding ( each picture is compressed ling the minimum size of the coding unit (CU) . The mini
independent of the other) and intra - coding (pictures are mum size encoding parameter is used to ensure hierarchical
compressed in groups ).
partitioning. An increase in the minimum code size always
Advantageously, the system and methods of the invention results in better coding performance since there are more
can
be applied to both intra -coding and inter-coding for the 5 choices. Thus, increasing the minimum code size increases
high -efficiency video coding (HEVC ) , previous, and future quality, increase computational complexity, and bitrate.
video encoding standards.
Similarly, decreasing the minimum code size decreases
The invention designs methods that can solve mincec ( T, quality, computational complexity, and bitrate .
R , -Q ) with T representing encoding time per frame, R
Another object of the invention is static and dynamic
representing the number of bits per sample, C representing 10 control of rate - quality -performance. According to the inven
the set of all possible video encoding configurations, and Q tion , the rate - quality -performance surface depends on the
representing a measure of video quality ( e.g. , PSNR of minimum coding size and QP and uses the model to imple
average SSIM )—the negative sign expressing maximum ment the minimum bitrate, maximum quality, and maximum
quality ( and hence minimize -Q ) . The multi -objective sur- performance modes . The approach also allows dynamic
face of solutions that satisfy mince ( T, R , -Q ) forms the 15 switching between modes . For example , using an HEVC
Pareto front. The invention describes optimization methods standard test video and the dynamic reconfiguration between
that select encoding configurations c E C that produces low, medium and high profiles proved to meet constraints
points on the Pareto front.
93 % (low ), 83 % (medium ), 93 % ( high ) — , while delivering
The invention uses a controller embedded in software to
encoding time savings of 13 % , 49 % and 40% respectively.
handle the optimization process. The controller is provided 20 The invention uses cross -validated regression to quickly
with measurements of encoding time , rate, image quality build optimal models since thousands of possibilities do not
and constraints (e.g. , available network bandwidth , available need to be evaluated . A root finding algorithm is used to
battery energy, user determined quality ). For intra -coding, solve for the optimal values . These solutions are used by a
the controller dynamically adjusts CTU configurations and relaxation procedure to find actual, integer -based , software
the quantization parameter (QP ) . For inter - coding, the con- 25 parameters.
troller dynamically adjusts the GOP configurations and the
The invention also applies to inter -mode HEVC encoding.
QP. The dynamic control is used to realize the optimization For inter mode
HEVC encoding, encoding efficiency
modes listed above in the approved patent application.
depends heavily on the GOP configurations. Initially, for
The invention provides constraint optimization solutions inter -mode HEVC encoding , the approach generates Pareto
to the minimum computational complexity mode, the maxi- 30 front models using an offline process . These models are used
mum quality mode, and the minimum bitrate mode . For to adapt to time -varying constraints during real -time opera
example, video quality may be related to application -mo- tion . Thus, an advantage of the invention is an offline
dality level adaptation , bitrate demands may be related to process of video encoding including forward model creation
wireless network adaptation and encoding frame rate may and another advantage is the real -time adaptation to time
relate
to device adaptation for real -time operation. For each 35 varying constraints - for example state of a wireless network
mode , one of the objectives (e.g. , computational complexity, to guarantee acceptable performance throughout a streaming
quality, or bitrate ) is optimized, while suitable constraints session . Yet another advantage is the adaptation to con
are placed on the other two . For example, for the minimum straints of modes — maximum video quality, minimum
computational complexity mode , the invention minimizes bitrate , maximum frame rate on a GOP basis .
computational complexity of HEVC subject to constraints in 40 The invention and its attributes and advantages may be
bitrate and reconstruction quality. The constraint-optimiza- further understood and appreciated with reference to the
tion approach provides an extension to the use of bit detailed description below of one contemplated embodi
constrained rate - distortion optimization by also minimizing ment, taken in conjunction with the accompanying draw
or constraining computational complexity . Overall, the ings .
invention provides joint control of reconstructed video qual- 45
BRIEF DESCRIPTION OF THE DRAWINGS
ity, computational complexity, and compression rate .
For intra -mode HEVC encoding , the approach uses a
configuration parameter that controls the partitioning of the
The preferred embodiments of the invention will be
coding tree unit ( CTU) so as to provide for finer control of described in conjunction with the appended drawings pro
the encoding process. By jointly sampling the quantization 50 vided to illustrate and not to limit the invention. FIGS . 1-16
parameter ( QP) and the CTU configuration mode , the are directed to intra -coding and FIGS . 17-20 are directed to
approach generates a finely - sampled , Pareto - optimal, rate- inter -coding, where like designations denote like elements,
and in which :
quality -performance surface .
The quantization parameter ( QP ) and a quad -tree -depth
FIG . 1 is a block diagram of the intra -coding system and
oriented coding tree unit ( CTU) configuration are adaptively 55 methods of the invention .
controlled to deliver performance that is optimal in the
FIG . 2 illustrates a figure of the CTU partition control
complexity -rate -quality performance space . The invention based on the config parameter according to the invention .
employs a spatially adaptive model that uses neighboring
FIG . 3 is a plot diagram of a rate -distortion -complexity
configurations to estimate optimal values for QP and the
performance example for intra -coding according to the
coding tree unit configuration (CTU) . More specifically, the 60 invention .
invention provides a robust, spatially -adaptive control algoFIG . 4 illustrates a model update using 3 neighboring
rithm for solving the minimum bitrate, maximum quality, CTUs according to the invention .
and minimum computational complexity optimization probFIG . 5A and FIG . 5B illustrates pseudo code of a common
lems .
framework for intra - coding mode implementation according
One object of the invention is Hierarchical coding unit 65 to the invention .
( CU) partitioning for fine, joint control of rate - qualityFIG . 6 illustrates a model update for the first row and the
performance. Intra - encoding control is achieved by control- first column according to the invention .
5
US 11,076,153 B2
6
FIG . 7 illustrates a performance constraint model update
important than the other. However, while allocating more
using neighbor CTUs according to the invention .
resources to , for example, performance, the system strives to
FIG . 8 illustrates a table of constraint violation objectives maintain optimal energy , power, and accuracy at the highest
level without taking away from performance resources . As
according to the invention .
example, digital video processing requires significant
FIG . 9 illustrates pseudo code of the time- quality rela- 5 an
hardware
resources to achieve acceptable performance.
tionship model update for minimum bitrate mode for intra
The invention is directed to a system and methods for
coding according to the invention .
of software parameters for various
FIG . 10 illustrates pseudo code of the constraint updates dynamic reconfiguration
such as digital signal, image , and video . For
for minimum bitrate mode for intra -coding according to the 10 applications
applications such as digital signal, image , and video , con
invention .
FIG . 11 illustrates pseudo code of the quality -rate rela
tionship model update for minimum computational com
plexity mode according to the invention .
FIG . 12 illustrates pseudo code of the constraint update 15
for minimum computational complexity mode according to
the invention .
FIG . 13 illustrates pseudo code of the time- rate relationship model update for maximum quality (minimum distortion mode) according to the invention .
20
FIG . 14 illustrates pseudo code of the constraint update
for the minimum distortion mode according to the invention .
FIG . 15 illustrates a graph of the results of current
methods of only using fixed CTU configuration while varying the QP only that cannot be used to achieve real - time 25
control of rate - complexity - quality.
FIG . 16 illustrates a graph of the results using optimal QP
and CTU configuration to achieve optimal and real - time
control of rate -complexity -quality for intra -coding accord
ing to the invention.
FIG . 17 illustrates a flow chart of an offline process of 30
video encoding and forward model creation for inter -coding
according to the invention .
FIG . 18 illustrates pseudo code of the offline process of
straints may include, for example, dynamic power / energy
consumption, performance, accuracy, bitrate, and quality of
output or image reconstruction quality.
An optimal approach for jointly controlling rate- quality
complexity for both intra -mode and inter mode
is provided .
According to the invention , an effective control mechanism
model dynamically adjusts the quantization parameter ( QP )
and the coding tree unit (CTU) partition mechanism so as to
achieve variable constraints on bitrate and video quality. The
model is dynamically updated based on the input video .
More specifically, the invention provides a new , efficient
implementation of the minimum computational complexity
mode , maximum image quality mode , and the minimum
bitrate mode. For all of the modes , video encoding configu
rations are specified so that they produce mincec ( T, R , -Q )
with T representing encoding time per frame, R representing
the number of bits per sample , C representing the set of all
possible video encoding configurations, and Q representing
a measure of video quality.
In order to jointly control T, R and Q , bounds can be
provided on each one of them . For improving performance
and guarantee computations within specific time limits , T ,
denotes an upper bound on the encoding time . Similarly, for
communicating within a specific bandwidth , Rmax denotes
max
an upper bound on the available bits per pixel . Then , to
video
encoding and forward model creation for inter - coding 35 guarantee a minimum level of quality , Qmin denote a lower
according to the invention.
FIG . 19 illustrates a flow chart of a real- time adaptation
using time -varying constraints for inter - coding according to
the invention .
FIG . 20 illustrates pseudo code of the real -time adaptation
using time -varying constraints for inter -coding according to
40
the invention .
DETAILED DESCRIPTION OF EMBODIMENTS
OF THE INVENTION
bound on the encoded video quality . Thus, in general, it is
desired to encode configurations that jointly satisfy :
(RsRmax ) & ( TsTmax) & ( QzQmin ) .
The following optimization modes are considered : maxi
mum performance mode , minimum rate mode , maximum
quality mode .
The maximum performance mode provides the best com
putational performance by minimizing encoding time. An
acceptable, optimal encoding configuration is obtained by
solving :
45
The following patent applications are incorporated by
Equation ( 1.1 )
mint
subject to : ( Q2 min ) & ( R = Rmax)
CEC
reference : U.S. patent application Ser. No. 14 /069,822 filed
Nov. 1 , 2013 , now U.S. Pat . No. 9,111,059; U.S. patent
application Ser. No. 14/ 791,627 filed Jul. 6 , 2015; and
International Patent Application PCT/US14 /70371 filed 50 The minimum rate mode reduces bitrate requirements
Dec. 15 , 2014 , now U.S. patent application Ser. No. 15/103 , without sacrificing quality or slowing down encoding time
977 .
to an unacceptable level. The optimal configuration requires
Dynamically reconfigurable frameworks offer unique the solution of:
advantages over non -dynamic systems. Dynamic adaptation
provides the ability to adapt software and hardware 55
resources to meet real - time varying requirements.
Equation ( 1.2)
minR
subject to : ( Q2 Qmin ) & ( T < Tmax )
CEC
Embodiments of the invention include a system and
methods for improving resource management in embedded
computer systems. The managed resources (or objectives)
may be directed to constraints. The term constraint is also 60 The maximum quality mode : provides the best possible
referred to as real- time constraint or time-varying constraint. quality without exceeding bitrate or computational require
Time
-varying constraints include, for example, constraints ments. The optimal encoding is selected by solving:
on the supplied power, required performance, accuracy
levels , available bandwidth , and quality of output such as
image reconstruction . It is contemplated that constraints can
Equation ( 1.3 )
max
subject to : ( T < Tmax ) & ( R = Rmax )
CEC
be generated by a user, by the system , or by data inputs. 65
During operation of a computer system , various states
may exist in which one or more of the constraints is more
7
US 11,076,153 B2
An advantage of the invention is that the modes given by
Equations ( 1.1 ) - ( 1.3 ) can be used to describe a large number
of different, practical, scenarios. For example, for video
streaming applications, Tmar can be set to Tmax = 1/fps where
video is generated. As another example , adapting to a
8
The example is based on the first 6 frames of a video
( 832x480 ) referred to as the standard RaceHorsesC to
produce the median objective surface plot shown in FIG . 3 .
To generate the space , QP is varied in the range of [ 6 , 51 )
config. In total , there are 340 possible combinations that
fps denotes the number of frames per second at which the 5 with a step of 3 and all 14 possible values are considered for
time -varying communications channel may be achieved by have been verified to be optimal in the multi -objective sense
setting Rmax to the time-varying, available bandwidth .
( Pareto optimal ) . As expected, as config is increased better
An advantage of the invention includes the development 10 Rate -Distortion performance is obtained at the price of
of a control mechanism that solves the optimization problem increased computational complexity. On the other hand,
given in Equation ( 1 ) for HEVC intra- encoding based on the higher values of QP produce configurations that require
Coding Tree Unit (CTU) level . Another advantage of the lower bitrates with lower quality and reduced computational
invention includes the effective implementation of the con- complexity.
trol mechanism using CTU performance models.
A simple linear model is considered for describing the
15
FIG . 1 is a block diagram of the system and methods of relationship between the objectives and the parameters.
the intra - coding optimization process 100 according to the
Q =aQP + b 1 Config + C1
invention . A Dynamically Reconfigurable Architecture Sys
tem for Time - varying Image Constraints (DRASTIC ) con
troller or processor 102 is provided with measurements of 20
T = a2 QP + 62 Config + C2
encoding time 102A , rate 102B , and image quality 102C that
the processor 102 uses to select methods for splitting the
Equation ( 2 )
R = az QP + b ; Config + C3
coding units (CU) 104 and transform units (TU) 106 and to
set the quantization parameter (QP ) 108 for the next incomwhere Q is measured in terms of the mean squared error
ing frame.
25
(
MSE
) , T denotes the time in ns ( 10-9 second ) required for
Optimal configuration management is based on scalable processing
a single pixel , and R denotes the number of bits
parametrization . The optimal configuration is based on a per sample .
quantization parameter ( QP ) and a scalable parametrization
The linear model of Equation (2 ) needs to be updated
of the CU tree based on config. It is noted that QP affects throughout the video frame. This model is dynamical and
encoding time since larger QP values result in smaller 30 adjusts to the input sequence . The model may be updated
bitrates, lower quality, and lower encoding times since there based on local measurements .
are fewer coefficients to encode . On the other hand, config
The invention allocates time , quality, and rate to each
is used for controlling the search space for specifying the CTU
by controlling QP and Config. A feedback loop is used
coding unit sizes .
FIG . 2 illustrates a figure of Scalable Coding Tree Unit 35 tocontrol
provide
measurements
time, quality
, andisrate
to the
. The mainof control
algorithm
presented
(CTU ) partitioning following a breadth - first - search splitting in FIG . algorithm
4 and FIG . 5A , FIG . 5B . The basic idea is to encode
pattern . Each block is recursively partitioned into four
sub - blocks using a quadtree decomposition. The case of each CTU independently while staying within the budget
config = 6 is shown in FIG . 2. The labeled partitioned block allocated to the entire frame.
ids are also shown with the CU partition control based on the 40 FIG . 4 illustrates a block diagram of a model update using
config parameter. The config parameter is allowed to vary 3 neighboring CTUs according to the invention. A shown in
from 0 to 13. Here , scalability is achieved by making sure FIG . 4 , the CTU is indexed as (CTU ,, CTUx ), the 3 neighbor
that the search space uses a nested subset of the full partition CTUs are indexed as ( CTU??, CTU , -1 ), ( CTU , -1 CTU , -1)
tree . The quad - tree partition process is controlled using a and (CTU , -1, CTU . ). When the neighboring CTUS share
process_id ( “ proc. id ” ) as shown in FIG . 2 , a depth first 45 encodings, the model is constructed using the best predic
search ( DFS ) . Here, the config parameter gets mapped to a tions as described below. Thus, it is possible for a model to
maximum value of the process_id . Thus, partitioning select model parameters. FIG . 5A , FIG . 5B illustrates a
beyond the maximum value of the process_id is not con- common framework for mode implementation according to
sidered. For example, for config = 0 , any splitting is not the invention.
considered . For config = 1, the original 64x64 coding unit can 50 Budget allocation is now described . Budget allocation
be split into 4 32x32 regions, but splitting is allowed except refers to not only to bit allocation , but also quality and
for the first 32x32 region. The decision on whether splitting computational complexity allocation . For target rate , quality
is optimal or not is decided using RD optimization . For and computational complexity, the following are used :
config = 6 , the search tree is illustrated by “ A ” in FIG . 2. Tree Rarger Qtarget and Ttarger. Bits per sample ( all is referred as
space search is performed using depth first search ( DFS ) . It 55 pixel in video encoding ) is used for the rate, Peak Signal
is contemplated that the invention may be applied to TU to - Noise Ratio (PSNR) , Mean of Square Error (MSE) , and
control also , unless a split is needed , i.e. there is no 64x64 Sum of Square Error ( SSE ) for image quality, and nano
TU , a split to 32x32 TU is accepted. As shown by “ B ” in seconds per sample for computational complexity measure
FIG . 2 any splitting for processes with id> 9 is prohibited . ments . Performance budget allocation is based on the pre
The proposed scalable approach can be used to generate 60 computed mean absolute deviation (MAD ) computed by the
a Time -Rate - Quality performance space as shown in FIG . 3 . HEVC reference standard .
FIG . 3 is a plot diagram of a rate -distortion - complexity
Bit allocation requires that encoding bits are assigned for
performance example according to the invention for intra- each CTU . The bit allocation strategy is not simple average
coding. For each plot the following is measured: ( i ) time bit allocation for all CTUs . Instead , bit allocation is based on
using the number of seconds per sample ( SPS ) , ( ii ) rate 65 pre - computed MAD that also take into account uncon
based on the number bits per sample ( BPS ) , and (iii ) quality trolled, internal factors of the HEVC that are associated with
live video streaming.
using PSNR (dB ) .
US 11,076,153 B2
9
10
where Timetarget denotes the number of seconds allocated
per frame. The total amount of time allocated to the entire
The required number of bits per pixel bpp target is estimated using:
Rtarget / f - HeaderBits
boptarget
:
Npixels
frame Ttarget is given by :
Equation ( 3.1 ) 5
Ttarget = Npixels-time_per_pixeltarget
Equation (4.2 )
The amount of time left for encoding the remaining CTUS
Tient is given by :
Equation (4.3 )
Tleft = Ttarget- Tcoded
where Rtarget denotes the target number of bits per second for
each video frame, f denotes the number of frames per 10 where Tcoded refers to the total amount of bits already used .
second, Npixels denotes the number of pixels in each frame, The allocated time for each CTU is adjusted using Tadj given
and HeaderBits = 25 are used for storing the header for
HEVC intra - frame encoding. Each frame gets Rtarget bits
using:
Rtarget = Npixels
, bbp target
by :
Equation ( 3.2 )
Using Rcoded the total number of bits already used in the
current frame, the number of bits remaining is estimated for
the rest of the image using :
20
Equation ( 3.3 )
Rleft = Rtarger - Rcoded
Equation ( 4.4)
based on remaining MAD to cover , as done for the rate .
The allocated time for entire CTU is similarly update using :
Equation (4.5 )
Tallocated = Tleft - Tadj
Finally, the amount of allocated for the CTU is given by
its share of the remaining MAD :
where Rieft denotes the number bits allocate in the
budget that are still available. With Radi referring to the
budget correction needed to make based on mean absolute 25
Trangeri= { Dremaining
deviation (MAD ) such that Rad; is used as given by
D;
Tallocated
Equation ( 4.6 )
Equation ( 3.4)
Rallocated = Rleft - Radj
to modify the number of bits that have been allocated for
the entire frame. The budget is adjusted using :
Radi = Reated – (1
.). Target
Dleft
Diotal
Tad; = T coded - ( 1
15
Dieft
Diotal · Rtarget
30
Equation ( 3.5 )
35
Image quality is measured using the PSNR . At the CTU
level , it is more efficient to work with the sum of squared
error ( SSE ) . Thus , there is a need to convert back and forth
between PSNR and SSE budget requirements. As for rate
and computational complexity, allocation is based on the
MAD .
PSNR requirements are converted into SSE requirements
using:
where Dieft refers to the pre -computed MAD sum for the
remaining CTUs , and Dtotal refers to the total MAD allo
cated for the current frame. The goal of Equation 3.5 is to
22bitDepth · Npixels
Equation ( 5.1 )
Qtarget = SS Etarget
weight bit allocation to be proportional to the remaining 40
10PSNR / 10
MAD that needs to be accounted for. After encoding each
CTU using Equation 3.5, Dieft gets reduced. Dieft should
converge to zero . Thus, effectively, the use of Equation 3.5
where SSEtarget refers to the allocated SSE for the entire
is meant to ensure that the remaining CTUs get a number of 45 frame, and bitDepth refers to the number of bits used to
bits that is proportional to their contribution towards the represent each pixel . After encoding a CTU , the remaining
reduction of Dtotal to zero . After updating Rallocated by SSE budget is similarly given by:
substituting Equation 3.5 into Equation 3.4 , the number of
bits is allocated for the current, i -th CTU using :
50
D;
Rtarget,i Dremaining
Rallocated
Equation ( 3.6 )
where D , refers to the MAD reduction associated with the 55
i -th CTU, Dremaining
aining refers to the MAD still left to do for the
entire frame.
Similar to bit allocation , the computational complexity
budget for each CTU is based on the pre - computed MAD .
The encoding time per pixel time_per_pixeltarget is com-
puted using:
target
time_per_pixelarget Time
Npixels
Lieft = Qtarget- Qcoded
Equation ( 5.2 )
Qadi = Qcoded – (1 Dtotal
Duet).Quarze
Equation ( 5.3 )
Qallocated = Qleft - Qadi
Equation ( 5.4)
Adjustments are similarly made using:
and
Also , the CTU SSE is given by :
60
SSEtarget,i
Equation (4.1 )
65
=
Di
.).· SSE allocated
Dremaining
Equation ( 5.5 )
Significant content variation can lead to mis -prediction of
is taken if the variations stay within the budgets. However,
the required budgets for each frame. In such cases , no action
US 11,076,153 B2
11
when mis - prediction results in budget deficits, the remaining
budget needs to be reallocated to avoid significant artifacts
in the reconstructed video . Thus, after the budget is used up ,
the remaining budget needs to be adjusted to minimize the
5
budget violation .
Budget violations are reduced by reducing the estimates
of the remaining budget using :
Badi= a :(Dider / D ;).Btarget
Equation ( 6.1 )
Tadj = a :(Di,iefdD ;).Ttarget
Equation ( 6.2 )
Equation ( 6.3 )
SSEadjFa :(Di,lef/ D ;).SSEtarget
where a was set to 0.15 after experimenting with different
videos . Clearly, a=0 would lead to significant artifacts while
a = 1 would not attempt to minimize budget violations and
would thus allow significant changes in video content to
violate the constraints .
The rate - quality - complexity model is spatially adapted to
the input video content. A linear model is built based on the
encoding of three neighboring CTUs as depicted in FIG . 4 .
With i = 1 , 2 , 3 denote the neighboring CTUs and each
CTU encoded using the pair of (QPi , Config .) to results in
( SSE ;, Ti , R; ) . To estimate the linear model , the parameter
matrix A is defined using :
10
Equation (7.1 )
a3 b3 c3
Then the basic linear model is described by :
SSE;
al b1 c1
T;
QP;
a2 b2 c2 || Config;
R;
?? b3c3
Equation (7.2)
For robust model update, the case is also considered when
the neighboring CTUs do not use 3 independent encodings.
In this case , [ a , b ; c ; ] is selected as associated with the best
predictions. To implement this approach , for the i - th CTU ,
the prediction errors are computed using :
SSEerror, = ISSE ; -a , QP ; -b , Config: -ci !
Rerror,i= \R ,-a2 QP ;-b2 Config :-cz!
Terror,i= 1T;-az QP ;-bz.Config;-C31
The model is then built by using the coefficients associ
ated with the minimum prediction errors. For example, for
A1, [al,i b1 , C1, ], the following is solved :
15
20
Equation ( 7.6 )
min ; SSEerror, i
and Ajj isEquation
used to 7.6
associate
model
that
minimizes
( see alsowithFIG .j-th4 ) . CTU
Another
problem
occurs in coming up with an initial model for the first row
and first column in each frame. For this case , virtual CTUS
are created above the first row and to the left of the first
column as shown in FIG . 6. The virtual CTU encodings
and then updated based on the encodings of the first few
frames of the current video .
More specifically, for each virtual CTU , the Pareto front
based is computed on the average of the current encodings.
According to one embodiment, and initial model trained on
other videos may be used . After a few frames, the Pareto
front is computed from the current video . Here, it is noted
that the Pareto front is obtained through an exhaustive
assume the Pareto front that is initialized from other videos
25
30
al b1 c1
AE a2 b2 c2
12
evaluation of all possible Config and QP values . However,
the cost of estimating the Pareto front is restricted to CTUS
over a few frames and offline computations using other
videos .
Updated linear models are used to estimate values for QP
35 and Config that can satisfy the constraints and minimize
bitrate , maximize quality , or minimize computational com
plexity. In addition , the invention provides a robust approach
for minimizing constraint violations .
40
The minimum bitrate mode is used to demonstrate the
basic concepts . All other models are similar. As explained
above , the constraints are used to determine target values for
Q , T, R as needed . For the minimum bitrate mode , it is
desired match the constraints on quality Qtarget and time
TOtarget The linear model is used to determine the encoding
Suppose that the 3 CTU encodings use 3 different pairs of 45 parameters:
( QP;, Config . ). In this case , it is expected that the 3 rows of
[ QP;, Config; 1 ] should also be linearly independent since
Qtarget
the ranges of QP and Config are quite different. Thus, when
Ttarget
working with three different CTU encodings , the parameters 50
le
al b1 c1
a2 b2 c2
can be estimated using:
al
b1
=
cl
QP Configi
QP2 Config2
QP3 Config ;
a2
QP Configi
b2 = QP2 Config2
cl
| QPz Config ;
a3
13
c3
=
[ QP Configi 1
QP2 Config2
QPz Config
SSE
55
SSE :
TE
OPest
content
- Panzer
Configest
Equation (7.4)
T2
Ri
60
Equation (7.5 )
al bl
a2 62
Qtarget - cl
Ttarget - c2
Equation ( 8.2)
QPest and Configest are rounded to the nearest integer
values and the model used as given by :
Q = Q1 QP +by Config + C1
R2
R3
Equation ( 8.1 )
Using Equation 8.1 , the initial values of the encoding
parameters are estimated using :
Equation (7.3 )
SSE2
Ti
QP ;
Config;
65
Traz: QP + b2.Config + C2
R =az QP + bz.Config + C3
Equation ( 8.3 )
13
US 11,076,153 B2
to perform a local search with QP E [QPest - 2, QP est+ 2 ]
and Config E [ Configes - 2, Configes + 2] for the minimum
bitrate that also satisfies the constraints. Alternatively, if no
parameters can satisfy the constraints, the normalized constraint violations is computed using :
5
14
One embodiment of the invention is applied to a dynamic
reconfiguration example referred to above as the standard
RaceHorsesC to demonstrate the advantages of the inven
tion . Specifically, the goal of the following example is to
demonstrate the ability to switch from a low profile mode to
a medium and then back to a high profile mode .
The low, medium , and high profiles are defined by fixing
Equation ( 8.4) QP to QP =37 , 32 and 27 , respectively . Furthermore , for
norm ( X ) = X – Xmin
Xmean
comparing to the proposed approach , for controlling both
10 the bitrate and PSNR , the full range depth configuration
( config = 13) is used and the resulting PSNR constraints
Then , a ( QP, Config ) pair is selected that minimizes the reduced
a little bit to generate the low, medium , and high
total normalized constraint violation as given in FIG . 8 for profiles
.
the minimum bitrate mode .
The results are compared for the fixed QP configuration
Similarly, for the maximum quality mode , the target
budget values are first used for bitrate and performance to 15 shown
inmodeFIG according
. 15 with tothetheminimum
inventioncomputational
shown in FIGcom
. 16 .
determine initial estimates and select optimal encoding plexity
For
constraint
satisfaction
,
mild
violations
may
be
allowed
parameters based on local search or minimum constraint in the order of 10 % of the constraints . As shown in FIG . 16 ,
violation. Then , for the minimum computational complexity it can be seen that DRASTIC control achieves constraint
mode , the target bitrate and quality is used for the initial 20 satisfaction at the high rates of 93 % for low, 83 % for
search .
, and 93 % for the high profile. Furthermore, com
While the linear model is simple and robust , it can fail to medium
pared
to
the fixed QP results, the invention achieves savings
produce valid values for QP and Config . This failure occurs of 13 % for
, 49 % for the medium , and 40 % for the
because the linear model does not impose any restrictions on high profile .theThelowinvention
proves not only to meet given
the constraints . Thus, the constraints end up being signifi 25 constraints, but while also minimizing
the encoding time .
cantly above or below the rate - performance - quality surface .
. 17 illustrates a flow chart of an offline process 200
When the constraints are significantly off, they are auto of FIG
video encoding and forward model creation for inter
matically modifies to bring them close to the control surface.
For valid encodings, it is required that QP E [ 0 , 51 ] and
coding according to the invention . With the objective to
determine a suitable model to be used to determine the most
Config
€
[
0
,
13
]
.
When
either
parameter
falls
out
of
range
,
30
relevant encoding configuration parameters that affect video
the constraints are modified to produce valid encodings .
, bitrate, and frame rate, videos at step 202 are
In general, rate, constraint, and computation complexity quality
encoded
at step 204 .
are non - linearly related . The linear model according to the
To determine a suitable model for each of the afore
invention is excellent for local approximations to the non described encoded video characteristics a linear regression
linear relationship .
is employed at step 206 to identify and select the most
The relationship between any pair of constraints is pro 35 model
important encoding parameters ( profile, encoding structure ,
vided using :
GOP structure , QP, max intra period) to construct the
relevant forward model . Stepwise regression is used to both
T= a1.SSEBI , al >0 , b1 <0 .
select important parameters as well as reduce the dimen
40 sionality of the encoding parameter vectors to determine at
SSE = a2.Rb2, a2 >0 , 62 <0 .
step 208 the following optimal models :
T= a3 -Rb3 , a3 > 0 , b3 > 0 .
Equation ( 8.5 )
Following are explanations of how to modify the con
straints for the minimum rate algorithm . As for the linear 45
model , the neighboring CTU encodings are used to adap
tively
estimate the relationships between the constraints as
shown in FIG . 7 .
log( SSIM ) = 2 , QP +bo
log( Bitrate )= a , QP +b1
log ( FPS ) = a2 QP + 62
Equation ( 9.1 )
FIG . 18 illustrates pseudo code of the offline process of
The main algorithm for estimating T = a :SSE " is given in video encoding and forward model creation for inter -coding
FIG . 9. Based on the relationship , either the quality or the 50 according to the invention .
computational complexity constraint is moved to lie on the
FIG . 19 illustrates a flow chart of a real -time adaptation
curve as given in FIG . 10. Similarly, for the minimum 300 using time -varying constraints for inter -coding accord
computational complexity mode, SSE =a : R" is estimated as ing to the invention . For each of the three forward models
given in FIG . 11 and the constraints updated as given in FIG . shown in Equation 9.1 , an inverse process is applied at step
12. The model update and algorithm for the maximum 55 308 to predict the optimal quantization parameter values that
quality (minimum distortion ) model is given in FIG . 13 and meet the input constraints. According to one embodiment,
FIG . 14 .
To account for the case of failing to estimate the model ,
Newton's algorithm may be used to find a solution to the
forward model that describes the most dominant constraint.
for example , if the left and top CTUs are encoded in the Depending on the employed mode of operation (minimum
same way , the configuration from the last CTU is used . 60 computational complexity mode , the maximum quality
Similarly, if the constraint update is excessive , the configu- mode, and the minimum bitrate mode ), mild violations may
be allowed . For example, either in the order of -10 % for
ration from the last CTU may also be used .
The updated constraints are used for estimating new , valid maximum quality mode and frame rate models or in the
values for QP and Config. Large changes are prevented by order of + 0.5 % for the minimum bitrate models . When more
requiring that the QP to remain within 24 of the average of 65 than one solution in terms of QP is generated, the results are
the neighboring CTUs . Furthermore, the final encoding rounded up to the nearest integer QP value since the output
parameters are forced to stay within the valid ranges.
is a continuous numerical value as shown by FIG . 19. By
15
US 11,076,153 B2
adopting this inverse process , some QP predictions may be
found outside from the range of QP used in the encoding
parameters such that additional configurations may be ran in
order to complete the missing values of SSIM , bitrate and
frame rate for the missing predicted QPs . FIG . 20 illustrates 5
pseudo code of the real- time adaptation using time -varying
constraints for inter -coding according to the invention .
While the disclosure is susceptible to various modifications and alternative forms, specific exemplary embodi-
16
encoding each video segment using different video encod
ing parameters, Coding Tree Unit configurations, and
GOP configurations;
evaluating the video quality, required bitrate , and video
encoding rate in frames per second for each video
segment; and
learning the forward regression models that map the video
encoding parameters , Coding Tree Unit configurations,
and GOP configurations to the video quality, required
ments of the invention have been shown by way of example 10
bitrate, and video encoding rate over a training set of
in the drawings and have been described in detail . It should
video segments .
be understood , however, that there is no intent to limit the
4. The method of claim 1 , wherein the inverse models use
disclosure to the particular embodiments disclosed , but on Newton's algorithm to determine final candidate encoding
the contrary, the intention is to cover all modifications, configurations from the forward regression models and
equivalents, and alternatives falling within the scope of the 15 constraints on video quality, maximum bitrate, and mini
disclosure as defined by the appended claims .
mum video encoding rate .
5. The method of claim 1 , wherein the optimal encoding
The invention claimed is :
configuration is one selected from the group : a maximum
1. A method for real -time adaptive encoding digital video video encoding performance mode , a minimum bitrate
signals comprising:
20 mode, and a maximum video quality mode .
( a ) receiving an input video comprising a plurality of
6. The method of claim 5 , wherein the maximum perfor
video segments ;
mance mode is defined according to :
( b ) applying, to a video segment, real -time input con
straints on : ( 1 ) video quality remaining above a mini
mum value Q , (2 ) bandwidth with bitrate remaining 25
mint
CEC subject to : (Q2 Qmin ) & (R = Rmax )
below a maximum value representing available bitrate ,
and (3 ) encoding frame rate with a number of frames
per second ( FPS) remaining above a minimum encodwith C representing a set of video encoding configura
ing rate value , to select initial candidate encoding
tions , R representing a number of bits per pixel , T
configurations, wherein applying further comprises 30
representing encoding time per frame, and Q represent
using pre - computed forward regression models,
ing a measure of video quality .
wherein the pre -computed forward regression models
can vary based on an encoding eme, and are given
7. The method of claim 5 , wherein the minimum bitrate
mode is defined according to :
by :
35
log( Q ) = 2o+ bo QP + co QP ,
minR
subject to : (QzQmin ) & ( T < Tmax )
CEC
log ( Bitrate ) = 2, + b , QP + C1: QP ,
log (FPS ) = az + b2QP + c2QP2,
Equation ( 9.1 )
40
with C representing a set of video encoding configura
tions , R representing a number of bits per pixel , T
representing encoding time per frame, and Q represent
ing a measure of video quality.
8. The method of claim 5 , wherein the maximum quality
mode is defined according to :
wherein QP is a quantization parameter and ag , bo , Co , aj , b1 ,
C1 , a2 , b2 , C2 represent regression coefficients determined
using a training process that uses video segments similar to
45
the video segments of the plurality;
(c ) using the pre -computed forward regression models to
derive inverse models to determine final candidate
minQ
subject to : ( T < Tmax ) & ( R = Rmax )
CEC
encoding configurations from the initial candidate
encoding configurations;
( d ) selecting an optimal encoding configuration from the 50 with C representing a set of video encoding configura
final candidate encoding configurations, wherein the
tions , R representing a number of bits per pixel , T
optimal encoding configuration satisfies constraints and
representing encoding time per frame, and represent
achieves a maximum video quality, a minimum band
ing a measure of video quality .
width , or a maximum frame rate , wherein the optimal
9. The method of claim 1 , wherein the forward regression
encoding configuration comprises of a Group of Pic- 55 model is defined in terms of a quantization parameter ( QP) ,
tures ( GOP ) configuration and a Coding Tree Unit the GOP configuration, and the Coding Tree Unit configu
( CTU ) configuration ;
ration .
(e ) encoding the video segment using the optimal encod
10. The method of claim 1 , wherein video constraints and
ing configuration, and
the optimization modes are applied individually in a CTU or
( f) repeating ( b ) - ( e ) for all video segments of the plurality 60 a GOP while staying within a budget.
of video segments.
11. The method of claim 10 , wherein the budget com
2. The method of claim 1 further comprising creating prises
a target bitrate (Rtarget) of a number of bits per second
off - line the pre - computed forward regression models .
for each video frame according to the equation:
3. The method of claim 2 , wherein creating further
Rtarget= Npixels bbPtarget
65
comprises:
inputting a plurality of videos that is composed of video
wherein Npixets is a number of pixels in each frame and
segments;
bbptarget is a required number of bits per pixel .
US 11,076,153 B2
17
12. The method of claim 10 , wherein the budget com-
prises a target frame rate ( Ttarget) of a total amount of time
allocated to an entire frame according to the equation :
Ttarget= Npixelstime_per_pixelfarget
wherein N pixels
, is a number of pixels in each frame and
time_per_pixel,arget is an encoding time per pixel.
13. The method of claim 10 , wherein the budget comerror ( SSE ) for an entire frame according to the equation :
14. The method of claim 1 , wherein ( a ) - ( f) are applied to
the video delivery system can support live and on demand
different video segments in a video delivery system , wherein
5
prises a target video quality (Quaget) of a sum of squared
10
· Npixels
Qtarget 22-bitDepth
10PSNR /10
wherein bitDepth is a number of bits used to represent
each pixel , N pixels
, is a number of pixels in each frame
and PSNR is Peak Signal -to - Noise Ratio .
18
settings.
15. The method of claim 14 , wherein the video delivery
system includes adaptive HTTP streaming (e.g. , MPEG
DASH protocol) and RTP protocol based systems.
16. The method of claim 1 , wherein the encoding scheme
comprises a GOP configuration.
17. The method of claim 1 , wherein the encoding scheme
comprises encoding parameters that do not include QP.
18. The method of claim 1 , wherein the video quality is
15
one selected from the group : structural similarity index
measure ( SSIM) , peak signal - to -noise ratio ( PSNR) , and
video multimethod assessment fusion (VMAF ).