Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
/
// /
E D
4 35 22 5 3 IK
4 145 21 2 A
4 CA 4 . 2 2. 2
4 CA 2 G M
) (
) # 2 (2
1 . 2 !
/- . / -/ -
Failures are a given and
everything will eventually
fail over time.
https://www.allthingsdistributed.com/2016/03/10-lessons-from-10-years-of-aws.html
https://www.youtube.com/watch?v=zoz0ZjfrQ9s
Amazon 2006
GameDay: Creating
Resiliency Through
Destruction
Jesse Robbins
Netflix 2011
Chaos Monkeys:
Test the resilience
of its Infrastructure
Simian Army – Open Source
https://github.com/Netflix/SimianArmy
T
T r
T a o W
D e W
A W
E n
(c) Dave Hahn, A Day in the Life of a Netflix Engineer Using 37% of the Internet, re:Invent 2015
W
0 5 5 9
• ( E n Un NS
• ( , c
• ( - 1 T
T c 31 /-12
• ( /3 /12
• ( ) 31 /12 (
- 8 - 8
• ( i
-
https://www.youtube.com/watch?v=OczG5FQIcXwhttps://www.youtube.com/watch?v=-mL3zT1iIKw
Edge
ELB
Zuul
NCCP
API
Middle Tier & Platform
Product
• Bucket testing
• Subscriber
• Recommendations
Platform
• Routing
• Configuration
• Crypto
Persistence
• Cache
• Database
(c) Josh Evans, Mastering Chaos A Netflix Guide to Microservices, QCon SF 2016
E
• (
• ( / F o Sv c dB
xe n B l F +/
+/ k F D y u
c d
• +/ / ( v n L Cp r
: v b Cp e t s
R ic E
• (
• ( , / / / c i c
c as
• )
• ) / ) c c d d
B e / u y m
(c) Ruslan Meshenberg, From Asgard to Zuul, re:Invent 2014
F A I
R ( c
R a
Z d )/ ,
( c
R e
o nP dS e F
(c) Josh Evans, Mastering Chaos A Netflix Guide to Microservices, QCon SF 2016
Chaos Monkey
https://github.com/ne
tflix/chaosmonkey
Instance Fail?
Chaos Gorilla
Zone Fail?
Chaos Kong
Region Fail?
.
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
, ) )( .
j a l o i
c
e
n tF r I
n
•
• U
• P
•
• ), ( )
• P CP !
우리 사장님이 허락해 줄까요?
Chaos doesn’t cause problems.
It reveals them.
C
E C C
!
https://www.oreilly.com/webops-
perf/free/chaos-engineering.csp
http://principlesofchaos.org/
http://channy.creation.net/blog/netflix-
principles-of-chaos-engineering
•
•
•
•
•
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
. ( / ) - / / /( : .
(
Y S
• 4 A A9 4 A4 4 4 A 4 . : %
• .AA: A9 4 A4 4 ( 9 4 04 4 04 %
• 4 AA! ) A9 4 A4 4 4 L 4 5 A9
A A M54 N 5 9A 4: G A4 1 A 2 G4 %
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
• C aP
• 0 C
•
• ? 0 V
• 5 3
•
• 4
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
•
•
•
•
•
• ! !
•
•
Users
) ( )
99%
users
1%
users
Start with...
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
• ?
• ?
•
•
•
•
•
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
( 5 5 5
) 5
Failure free operations require
experience with failure.
http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
( ) ( ()
No single
point of failure
ReplicatedDistributed
Automated Cloud
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
1
34
. 2
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
App
!
/ / I
/ / C B S T
/ F AOC S D
/ / /
/ / /
!
!
?
?
!
.
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
Chaos
Engineering
Team
Chaos
Engineering
Team
ChaosMoneky
GameDay
Failure Injection
ChAP
Gremlin
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
Chaos
Engineering
Team
GameDay
Failure Injection
ChAP
Gremlin
- - -
• A
S
• ,
N ,,
S
https://github.com/netflix/chaosmonkey
Spinnaker is an open source, multi-cloud continuous delivery platform for releasing software
changes with high velocity and confidence. http://www.spinnaker.io/
- - -
AWS managed
Customer account
Auto-scaling
Auto-scaling Group
HA/Backup
mycluster.eks.amazonaws.com
Availability
Zone 1
Availability
Zone 2
Availability
Zone 3
Kubectl
Istio
Chaos Toolkit
Kube Monkey
PowerfulSeal
Gremlin
Simian Army
https://github.com/asobti/kube-monkey
• ) C M ) ) K
( M
mycluster.eks.amazonaws.com
Availability
Zone 1
Availability
Zone 2
Availability
Zone 3
Kubectl
x
x
Health check?
Dead node?
x
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
.
Amazon EKS
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
Chaos
Engineering
Team
ChaosMonkey
Failure Injection
ChAP
Gremlin
,
A
- + +
- +
. , ()
Bug
Integration
Distributed
Engineering
Team
Operation/Security
Team
Customer Center
Team
PR
Team
• G m a
• P D e
• A y
• , - P G
• A P Aa G
• I
• P )(
? ?
• P a r
• e i e
• m t ?
• u e ? ?
-
• ( ( d n ) ,
• yd c Da eG e P G o
G ?D
G
•
•
• :
G :
-
• D
• ( .
• Z B a ) i a
• e R Zc a
• S ( c Ah
• 2
• ) . ),.1,0 C
• a (
• a
• 3C
© DIUS https://dius.com.au/resources/game-day/
© DIUS https://dius.com.au/resources/game-day/
• c -
-
•
,c - - n - c
• la
•
• - n - c
• - O . , . c
) ( (
• 0E B Yo N…# cd e O O
• C N… # Z Z n d N…
• wr d O# n N…
R Z O m n
w al /- 0 0 w /2, w P i
3 B
R 0 h
t
n
L 51
w
% E /B DC BG Y
wr
51 ACB Ss 51 Z p
I
• e
: r n
A
:
• A )( , .
: )(
W I
IA
( S
- ,
D
• -
• ,
•
•
• )
• : , ( ,
•
•
• (
• 2 1:/ 36 -7 -7 63 7:3 36 :7. - 376 7 1 /.
• 2 /-267 71 9 3. -7 7 : 03: /6136//:361 1 / .
• 2 71 6/ :/ 3- -7 1 / . / 361
https://www.infoq.com/presentations/gameday-chaos-engineering
https://dius.com.au/resources/game-day !
© DIUS https://dius.com.au/resources/game-day/
Microservices (applications)
DevOps(Culture)
C
haos
Engineering
Cloud (Scale)
https://bit.ly/2uKOJMQ
https://github.com/chaoseng/wg-chaoseng
• @LLH N AJ BJ@ L / 5 H=J 2 = 183A H
• @LLH ?J= A
• @LLH = = J? =L A -A
• @LLH L J== ?A ==JA ? A
• @LLH ?AL@ : L=J? = = J=
• @LLH = AP J? L= A = =J= = A A H H=J H
• @LLH = A .4=L AP7= @0 ?
• @LLH H= =J = L : L @ = ?A ==JA ? : L H
• @LLH ?AL@ : @ J = = @ = ?A ==JA ?
• @LLH A HJ= = L LA =L AP @ A J =JNA =
• @LLH J HA ? H L= L H HA ? 8 HLA =8 @= L8 @==L H
• @LLH A ? =? =JA J L = A
References
https://www.facebook.com/groups/chaosengkorea/
https://www.meetup.com/Korea-Chaos-Engineering-Community/

More Related Content

신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018