Quantcast
Channel: 개발 노트
Viewing all articles
Browse latest Browse all 299

[Kubernetes] Amazon EKS를 사용하는 사례 정리

$
0
0
<h3 data-ke-size="size23">EKS + NLB + Istio</h3> <p data-ke-size="size16"><a href="https://youtu.be/_7boiqxyzzQ">카카오 페이 사례</a></p> <p><figure class="imageblock alignCenter" data-ke-mobileStyle="widthOrigin" data-filename="image(13).png" data-origin-width="1385" data-origin-height="669"><span data-url="https://blog.kakaocdn.net/dn/bu84Gn/btsAw4UlG3t/ye4SPVSckxlLRqXNFD2TD0/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/bu84Gn/btsAw4UlG3t/ye4SPVSckxlLRqXNFD2TD0/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbu84Gn%2FbtsAw4UlG3t%2Fye4SPVSckxlLRqXNFD2TD0%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" data-filename="image(13).png" data-origin-width="1385" data-origin-height="669"/></span></figure> </p> <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>기존에 Gateway 역할로 nginx를 사용하다가 istio의 Gateway를 사용하기로 결정 <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>트래픽이 증가할 경우 nginx 서버를 증설해야하고, 관리 부담이 있었음</li> <li>istio의 경우 envoy가 트래픽 처리를 하고, 유연하게 확장될 수 있기 때문에 운영 부담이 줄어듦</li> </ul> </li> <li>카카오페이에서 3년간 istio 운영 경험이 있어서 handling 할 수 있는 영역에 한해 제한적으로 istio 적용</li> <li>금융 업계의 보안 정책 준수를 위해 고정 IP를 사용해야 했기 때문에 NLB 선택</li> </ul> <p><figure class="imageblock alignCenter" data-ke-mobileStyle="widthOrigin" data-filename="image(14).png" data-origin-width="1682" data-origin-height="777"><span data-url="https://blog.kakaocdn.net/dn/oH07F/btsAueDD4Nv/JpzkpMXqNO7ZHsTQcnYJD1/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/oH07F/btsAueDD4Nv/JpzkpMXqNO7ZHsTQcnYJD1/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FoH07F%2FbtsAueDD4Nv%2FJpzkpMXqNO7ZHsTQcnYJD1%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" data-filename="image(14).png" data-origin-width="1682" data-origin-height="777"/></span></figure> </p> <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>AWS 인프라 관리에 Terraform 사용</li> <li>GitOps 도구인 atlantis로 배포 및 상태 관리</li> <li>Git PR의 tag 정보로 배포 이력 관리</li> </ul> <p><figure class="imageblock alignCenter" data-ke-mobileStyle="widthOrigin" data-filename="image(15).png" data-origin-width="1678" data-origin-height="786"><span data-url="https://blog.kakaocdn.net/dn/pqU0C/btsAxX8CHF4/esCcMXxwiPKiN0xogCFgmk/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/pqU0C/btsAxX8CHF4/esCcMXxwiPKiN0xogCFgmk/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FpqU0C%2FbtsAxX8CHF4%2FesCcMXxwiPKiN0xogCFgmk%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" data-filename="image(15).png" data-origin-width="1678" data-origin-height="786"/></span></figure> </p> <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>애플리케이션은 Jenkins와 Spinnaker로 배포</li> <li>istio를 통해 롤링 업데이트나 Canary 등의 배포 전략 사용</li> <li>노드 그룹(ASG)와 Cluster Autoscaler로 자동 확장 관리</li> <li>서비스 별 노드 그룹 분리 및 Spot 활용</li> </ul> <h3 data-ke-size="size23">HA Proxy + NLB, ALB Target Group Binding</h3> <p data-ke-size="size16"><a href="https://youtu.be/BM8otOWxLO8">당근 마켓 사례</a></p> <p><figure class="imageblock alignCenter" data-ke-mobileStyle="widthOrigin" data-filename="image(16).png" data-origin-width="864" data-origin-height="344"><span data-url="https://blog.kakaocdn.net/dn/JZSlH/btsACnE1MTm/u87S922Tz2T0cjl1BOmdi0/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/JZSlH/btsACnE1MTm/u87S922Tz2T0cjl1BOmdi0/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FJZSlH%2FbtsACnE1MTm%2Fu87S922Tz2T0cjl1BOmdi0%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" data-filename="image(16).png" data-origin-width="864" data-origin-height="344"/></span></figure> </p> <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>멀티클러스터 방식으로 버전 관리 <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>NLB는 gRPC, HTTP 트래픽용</li> <li>ALB는 기존 서비스 유지를 위함</li> </ul> </li> <li>Route53의 Weight를 사용한 방식은 DNS 캐싱으로 인한 지연이 있어서 HA Proxy로 트래픽 전환 <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>사내 HA Proxy 운영 경험이 있었음</li> </ul> </li> <li>ALB는 terraform으로 사전에 프로비저닝 후 TargetGroup Binding으로 멀티 클러스터 관리</li> <li>istio는 Virtual Service와 Gateway만 사용 → 싱글 클러스터 업그레이드 시 istio 업그레이드에 어려움이 있었음</li> </ul> <h3 data-ke-size="size23">대규모 게임 사례</h3> <p data-ke-size="size16"><a href="https://youtu.be/78EjMQYt0O0">크래프톤 PUBG 사례</a> (Battlegrounds on AWS)</p> <h4 data-ke-size="size20">AS-IS 아키텍처</h4> <p><figure class="imageblock alignCenter" data-ke-mobileStyle="widthOrigin" data-filename="image(17).png" data-origin-width="1423" data-origin-height="679"><span data-url="https://blog.kakaocdn.net/dn/zgsaK/btsAuNlF6jm/l4PZk2Ck1t7nZpAtlqhuHk/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/zgsaK/btsAuNlF6jm/l4PZk2Ck1t7nZpAtlqhuHk/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FzgsaK%2FbtsAuNlF6jm%2Fl4PZk2Ck1t7nZpAtlqhuHk%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" data-filename="image(17).png" data-origin-width="1423" data-origin-height="679"/></span></figure> </p> <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>인프라 생성 및 관리 부담 증가 <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>신규 서비스 추가 시 지속적으로 인프라 작업 필요</li> <li>각 서비스별로 ASG와 CodeDeploy 구성의 번거로움</li> </ul> </li> <li>신규 QA 환경 구성 요청 증가 <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>환경 구성 시마다 DevOps 팀 지원 필요</li> </ul> </li> <li>작은 서버 배포 증가 <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>서버 운영툴, CS툴, 모니터링 툴 등 배포 및 관리해야할 서버 증가</li> <li>ECS를 사용하고 있었으며 case by case로 대응함</li> </ul> </li> <li>30여개 이상의 서비스로 구성됨</li> </ul> <h4 data-ke-size="size20">TO-BE 아키텍처</h4> <p><figure class="imageblock alignCenter" data-ke-mobileStyle="widthOrigin" data-filename="image(18).png" data-origin-width="1417" data-origin-height="676"><span data-url="https://blog.kakaocdn.net/dn/cJLGZF/btsAuNsrrco/cOIkkoRGd6GfeXC5S0X8S0/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/cJLGZF/btsAuNsrrco/cOIkkoRGd6GfeXC5S0X8S0/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcJLGZF%2FbtsAuNsrrco%2FcOIkkoRGd6GfeXC5S0X8S0%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" data-filename="image(18).png" data-origin-width="1417" data-origin-height="676"/></span></figure> <figure class="imageblock alignCenter" data-ke-mobileStyle="widthOrigin" data-filename="image(19).png" data-origin-width="1289" data-origin-height="535"><span data-url="https://blog.kakaocdn.net/dn/ZAtAd/btsAxIcPjQ2/cZkz7ByjyC027804K8IaA0/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/ZAtAd/btsAxIcPjQ2/cZkz7ByjyC027804K8IaA0/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FZAtAd%2FbtsAxIcPjQ2%2FcZkz7ByjyC027804K8IaA0%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" data-filename="image(19).png" data-origin-width="1289" data-origin-height="535"/></span></figure> </p> <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>GameShard는 Agones 사용</li> </ul> <p><figure class="imageblock alignCenter" data-ke-mobileStyle="widthOrigin" data-filename="image(20).png" data-origin-width="1359" data-origin-height="352"><span data-url="https://blog.kakaocdn.net/dn/vfTXh/btsAwrh8N9T/zGD8wrauUfHQFUlsGypVr1/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/vfTXh/btsAwrh8N9T/zGD8wrauUfHQFUlsGypVr1/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FvfTXh%2FbtsAwrh8N9T%2FzGD8wrauUfHQFUlsGypVr1%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" data-filename="image(20).png" data-origin-width="1359" data-origin-height="352"/></span></figure> </p> <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>EKS 도입 후 QA 환경을 Kubernetes 상에 배포 할 수 있는 자동화 플랫폼을 개발하여 웹 UI를 통해 연관된 팀들에서 자유롭게 환경 구성</li> <li>100여개의 QA 환경이 있지만 운영 부담이 거의 없음</li> </ul> <p><figure class="imageblock alignCenter" data-ke-mobileStyle="widthOrigin" data-filename="image(21).png" data-origin-width="1136" data-origin-height="454"><span data-url="https://blog.kakaocdn.net/dn/DOD7E/btsAxdDJS1s/ky2tXGKjIj9AoFUc7epyTk/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/DOD7E/btsAxdDJS1s/ky2tXGKjIj9AoFUc7epyTk/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FDOD7E%2FbtsAxdDJS1s%2Fky2tXGKjIj9AoFUc7epyTk%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" data-filename="image(21).png" data-origin-width="1136" data-origin-height="454"/></span></figure> </p> <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>글로벌 서비스라서 여러 리전을 사용하는데 리전마다 ECR 레파지토리의 도메인이 달라지는 문제로 오픈소스인 Harbor 사용</li> <li>CF 캐싱 기능으로 15분 → 7분으로 pull 속도 개선</li> <li>Kubelet flag에서 —serialized-image-pull=false로 설정하여 7분 → 3분으로 개선</li> <li>Pod가 스케쥴 되기 전부터 이미지를 미리 pull 할 수 있는 DaemonSet을 구성해서 Pod 준비 시간을 더 단축</li> <li>Node Bootstrap에도 이미지를 미리 Pull하도록 구성함</li> </ul> <h3 data-ke-size="size23">ALB 기반 멀티 클러스터</h3> <p data-ke-size="size16"><a href="https://youtu.be/PIg-apdilRk">토스 페이먼츠 사례</a></p> <p><figure class="imageblock alignCenter" data-ke-mobileStyle="widthOrigin" data-filename="image(22).png" data-origin-width="1269" data-origin-height="679"><span data-url="https://blog.kakaocdn.net/dn/dm7M5z/btsAxJQmQSf/K25gV95Ir2R7Jia738pNhk/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/dm7M5z/btsAxJQmQSf/K25gV95Ir2R7Jia738pNhk/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fdm7M5z%2FbtsAxJQmQSf%2FK25gV95Ir2R7Jia738pNhk%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" data-filename="image(22).png" data-origin-width="1269" data-origin-height="679"/></span></figure> </p> <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>장애 대응을 유연하게 하기 위해 EKS 클러스터 이중화</li> <li>AZ 장애를 감지할 수 있는 모니터링 시스템 구축</li> <li>문제 발생 시 트래픽을 정상 클러스터로 즉시 우회 <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>작업자의 심리적 불안감을 줄여서 개발 민첩성 증가</li> </ul> </li> </ul> <h3 data-ke-size="size23">Istio LocalLB로 비용 절감 및 장애 대응</h3> <p data-ke-size="size16"><a href="https://youtu.be/ea8E74wvj3M">데브시스터즈 쿠키런:킹덤 사례</a></p> <p><figure class="imageblock alignCenter" data-ke-mobileStyle="widthOrigin" data-filename="image(23).png" data-origin-width="1259" data-origin-height="445"><span data-url="https://blog.kakaocdn.net/dn/bBlRse/btsAvVJ7TG0/ecl3nLqg5ENBa5OqRnV7r0/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/bBlRse/btsAvVJ7TG0/ecl3nLqg5ENBa5OqRnV7r0/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbBlRse%2FbtsAvVJ7TG0%2Fecl3nLqg5ENBa5OqRnV7r0%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" data-filename="image(23).png" data-origin-width="1259" data-origin-height="445"/></span></figure> </p> <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>Global Accelerator 사용으로 글로벌 게임 서버 응답 시간을 줄임</li> <li>Global Accelerator 장애 시 NLB로 전환하도록 아키텍처 구성</li> </ul> <p><figure class="imageblock alignCenter" data-ke-mobileStyle="widthOrigin" data-filename="image(24).png" data-origin-width="752" data-origin-height="443"><span data-url="https://blog.kakaocdn.net/dn/tFKt9/btsAzbZtEm3/PH6tIEBxG089vymG3cP9lk/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/tFKt9/btsAzbZtEm3/PH6tIEBxG089vymG3cP9lk/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FtFKt9%2FbtsAzbZtEm3%2FPH6tIEBxG089vymG3cP9lk%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" data-filename="image(24).png" data-origin-width="752" data-origin-height="443"/></span></figure> </p> <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>게임 서버는 모두 Multi-AZ로 운영</li> <li>서버간 통신이 많아져서 DTO 비용 증가</li> <li>istio의 Locality LB로 같은 AZ로 트래픽을 보낼 수 있어서 성능 개선 및 DTO 비용 절감 <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li><figure class="imageblock alignLeft" data-ke-mobileStyle="widthOrigin" data-filename="image(27).png" data-origin-width="964" data-origin-height="354"><span data-url="https://blog.kakaocdn.net/dn/tu6SH/btsAw4z3ggv/J3UayNUFjDAw2993MpuHO0/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/tu6SH/btsAw4z3ggv/J3UayNUFjDAw2993MpuHO0/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Ftu6SH%2FbtsAw4z3ggv%2FJ3UayNUFjDAw2993MpuHO0%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" width="493" data-filename="image(27).png" data-origin-width="964" data-origin-height="354"/></span></figure> </li> </ul>  </li> </ul> <p><figure class="imageblock alignCenter" data-ke-mobileStyle="widthOrigin" data-filename="image(25).png" data-origin-width="1202" data-origin-height="482"><span data-url="https://blog.kakaocdn.net/dn/bWe2I7/btsAyL7Zr4i/nbVHAuiKOX6ikEvLJMhBhK/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/bWe2I7/btsAyL7Zr4i/nbVHAuiKOX6ikEvLJMhBhK/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbWe2I7%2FbtsAyL7Zr4i%2FnbVHAuiKOX6ikEvLJMhBhK%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" data-filename="image(25).png" data-origin-width="1202" data-origin-height="482"/></span></figure> </p> <p data-ke-size="size16"> </p> <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>iptables 사용으로 인한 운영 부담과 성능 저하를 개선하기 위해 별도의 CNI 구성</li> <li>DSR, eBPF 사용</li> <li>프로덕션까지 적용은 안했고, 검토 중</li> </ul> <p><figure class="imageblock alignCenter" data-ke-mobileStyle="widthOrigin" data-filename="image(26).png" data-origin-width="1088" data-origin-height="329"><span data-url="https://blog.kakaocdn.net/dn/qVwvY/btsACmlOJdM/vsZAnOjeQxxtCRn84ntx6k/img.png" data-lightbox="lightbox"><img src="https://blog.kakaocdn.net/dn/qVwvY/btsACmlOJdM/vsZAnOjeQxxtCRn84ntx6k/img.png" srcset="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FqVwvY%2FbtsACmlOJdM%2FvsZAnOjeQxxtCRn84ntx6k%2Fimg.png" onerror="this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';" data-filename="image(26).png" data-origin-width="1088" data-origin-height="329"/></span></figure> </p> <ul style="list-style-type: disc;" data-ke-list-type="disc"> <li>Node 장애 시 5분간 Pod 재배치가 안되는 문제로 Shardcake라는 오픈소스 개발</li> <li>kube-apiserver로 지속적으로 Pod 상태를 모니터링해서 비정상 감지시 바로 재배포</li> </ul> <p data-ke-size="size16"> </p>

Viewing all articles
Browse latest Browse all 299

Trending Articles