5.5 EXPLAIN ANALYZE 읽는 법

5.4에서 EXPLAIN의 기본 문법을 봤다면, 본 절에서는 EXPLAIN (ANALYZE, BUFFERS) 출력을 실제로 병목을 찾는 도구로 쓰는 법을 봅니다. 큰 plan을 어디부터 봐야 하는지, 흔한 사고 패턴이 무엇인지가 핵심입니다.

우선 봐야 할 4가지

긴 plan 앞에서 운영자가 매번 보는 곳:

actual time: 어느 노드가 실제로 느린가
actual rows vs rows: 추정 오차가 큰 노드
Buffers: shared read: cache miss가 큰 노드
loops: nested loop의 inner가 몇 번 호출됐는가

이 4개로 90% 진단 가능합니다.

시간 분석

 Sort  (actual time=120.5..120.8 rows=50 loops=1)
   ->  Hash Join  (actual time=10.2..118.7 rows=5000 loops=1)
         ->  Seq Scan on a  (actual time=0.1..50.0 rows=100000 loops=1)
         ->  Hash  (actual time=8.0..8.0 rows=200 loops=1)
               ->  Index Scan on b  (actual time=0.05..7.8 rows=200 loops=1)

읽기:

Sort 노드의 마지막 row 시간 120.8ms = 쿼리 전체 시간 (Execution Time와 거의 같음)
Hash Join 시작이 10.2ms = 자식 노드 누적
가장 시간 차이가 큰 구간은 Seq Scan on a의 50ms — 100000 row를 다 읽음

병목: a 테이블 풀스캔. 인덱스 또는 파티션 검토 대상입니다.

`loops > 1` 핵심

 Nested Loop  (actual time=0.1..3500 rows=10000 loops=1)
   ->  Seq Scan on a  (actual time=0.05..2.0 rows=10000 loops=1)
   ->  Index Scan on b  (actual time=0.1..0.3 rows=1 loops=10000)
         Index Cond: (a_id = a.id)

내부 노드 Index Scan on b:

actual time=0.1..0.3 — 한 번 실행에 약 0.3ms
loops=10000 — 10000번 호출됨
실질 누적 시간 = 0.3 × 10000 = 3000ms

Nested Loop이 느린 이유입니다. inner 호출이 너무 많을 때는 Hash Join이 더 빠를 수 있다 — 통계 또는 enable_nestloop을 의심.

`actual rows` vs `rows` 격차

추정 오차는 plan 미스의 1순위 원인입니다.

 Hash Join  (cost=100..200 rows=10) (actual time=5..3500 rows=1000000 loops=1)

추정 10, 실제 1M. 10만 배 오차. 이 노드 위의 옵티마이저 선택은 거의 확실히 잘못됐습니다.

원인 후보:

통계 오래됨 → ANALYZE
조건이 옵티마이저가 추정 못 하는 형태 (예: WHERE col LIKE 'a%b%c%')
컬럼 간 상관관계 → CREATE STATISTICS (5.3)
함수 호출 결과 (WHERE func(col) = X) → 함수 statistics 없음

Buffer 분석

Buffers: shared hit=14000 read=8500 dirtied=12 written=8

항목	의미
`hit`	shared_buffers에서 찾음
`read`	디스크에서 읽음 (OS cache 또는 진짜 디스크)
`dirtied`	페이지를 dirty로 만듦 (UPDATE 등)
`written`	backend가 직접 write — bgwriter 부족 신호

read가 큰 노드는 cold cache 또는 큰 스캔. 같은 쿼리 두 번 실행해 read가 줄어드는지로 hot/cold 판정.

메모리 spill 진단

work_mem(1.4 참고)이 부족하면 정렬·해시가 디스크로 spill됩니다.

 Sort  (actual time=200..210 rows=50000)
   Sort Method: external merge  Disk: 32MB

Sort Method	의미
`quicksort`	메모리 안
`top-N heapsort`	LIMIT 절과 결합돼 메모리 안
`external merge`	디스크 spill — 느림
`external sort`	같음 (옛 표현)

해시도 비슷:

 Hash  (actual time=...)
   Buckets: 16384 (originally 1024)  Batches: 8 (originally 1)  Memory Usage: 24MB

Batches > 1은 메모리 부족으로 분할 처리합니다. 일시적으로 SET work_mem = '256MB'로 늘리면 빨라지는지 확인합니다.

Parallel 분석

 Gather  (actual time=0.5..200 rows=10000 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on big  (actual time=0.1..50 rows=2500 loops=5)

loops=5 — 워커 4 + leader 1
각 워커가 약 2500 row씩 처리 → 총 12500 row

Workers Planned보다 Launched가 적으면 max_parallel_workers 또는 max_parallel_workers_per_gather 부족.

JIT 영향

 JIT:
   Functions: 12
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 5.2 ms, Inlining 12.0 ms, Optimization 45.0 ms, Emission 30.0 ms, Total 92.2 ms

JIT는 cost가 jit_above_cost(기본 100000)를 넘으면 활성화. 작은 쿼리에 JIT가 켜지면 컴파일 비용이 실행 시간을 압도하는 사고가 발생합니다. PG 16+는 default가 좋아졌지만 운영에서 항상 점검:

잦은 짧은 쿼리에 JIT 시간이 들어가면 → SET jit = off 또는 jit_above_cost를 100M으로 올림
무거운 분석 쿼리에만 JIT 활용

Rows Removed by …

 Index Scan
   Index Cond: (user_id = 42)
   Filter: (status = 'paid')
   Rows Removed by Filter: 9000
   Rows Removed by Index Recheck: 0

Removed by Filter: 인덱스로 가져온 row를 SQL의 다른 조건으로 거른 양 — 크면 인덱스가 부족 (multi-column 인덱스 검토)
Removed by Index Recheck: Bitmap Scan에서 lossy bitmap이라 재검사로 거른 양 — work_mem 부족 신호

actual ms가 줄어드는 패턴

같은 쿼리를 여러 번 EXPLAIN ANALYZE해 보면 보통 두 번째부터 빨라집니다. 이유:

Buffers: shared read가 줄어 cache hit으로
backend의 syscache/relcache가 워밍

진짜 운영 환경에서 cold·warm 둘 다 확인하는 게 좋습니다.

통합 워크플로

    flowchart TD
  Q["느린 쿼리"] --> E["EXPLAIN (ANALYZE, BUFFERS)"]
  E --> CHECK1{"actual time 가장<br/>큰 노드?"}
  CHECK1 --> ROWS{"rows estimate vs<br/>actual 차이 큼?"}
  ROWS -- "yes" --> STAT["ANALYZE,<br/>CREATE STATISTICS"]
  ROWS -- "no" --> BUF{"shared read 큼?"}
  BUF -- "yes" --> CACHE["shared_buffers,<br/>인덱스 검토"]
  BUF -- "no" --> MEM{"Sort Method:<br/>external merge?"}
  MEM -- "yes" --> WORK["work_mem 증액"]
  MEM -- "no" --> JOIN{"loops 매우 큼?"}
  JOIN -- "yes" --> HASH["Hash Join 유도,<br/>또는 인덱스 검토"]
  JOIN -- "no" --> OTHER["JIT, parallel,<br/>filter 컬럼 검토"]

  classDef diag fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a
  classDef fix fill:#d1fae5,stroke:#047857,color:#064e3b
  class Q,E,CHECK1,ROWS,BUF,MEM,JOIN diag
  class STAT,CACHE,WORK,HASH,OTHER fix

정리

4가지 우선 신호: actual time, rows 격차, Buffers read, loops
추정 오차 10배 이상이면 통계 갱신 또는 extended statistics
external merge·Batches > 1은 work_mem 부족
nested loop의 loops가 매우 크면 Hash Join 유도 검토
JIT는 짧은 쿼리에서 오히려 손해 — cost threshold 점검
시각화는 depesz·dalibo로 큰 plan에서 병목 빠르게 발견

다음 절(5.6)에서는 plan 노드의 핵심인 조인 알고리즘(nested loop, hash, merge)을 더 깊게 봅니다.

5.4 EXPLAIN 읽는 법 5.6 조인 알고리즘