Garm Claude Opus 4.7 (1M context) commited on
Commit
e5edc40
·
1 Parent(s): 9a9a136

test(proxy): warm up TestClient before measuring /livez backpressure latency

Browse files

The test drained the anthropic pre-upstream semaphore and asserted that 20
subsequent /livez calls stayed under 100 ms. With only 20 samples, the p99
computation falls through to max(latencies) — one cold-start outlier was
enough to fail the test. Observed on CI py3.10 runners where the first
TestClient request paid ~330 ms of one-time ASGI lifespan / import /
route-resolution cost while every subsequent request was sub-ms.

Add 3 warm-up requests before timing starts. Preserves the test's
signal (if /livez were actually blocked on the drained semaphore, all
post-warmup samples would still be slow) while removing the
runner-speed flake.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tests/test_anthropic_pre_upstream_backpressure.py CHANGED
@@ -564,6 +564,14 @@ def test_livez_unaffected_under_anthropic_backpressure():
564
 
565
  latencies: list[float] = []
566
  with TestClient(app) as client:
 
 
 
 
 
 
 
 
567
  for _ in range(20):
568
  t0 = time.perf_counter()
569
  resp = client.get("/livez")
 
564
 
565
  latencies: list[float] = []
566
  with TestClient(app) as client:
567
+ # Warm up: the first few requests pay one-time costs (TestClient
568
+ # ASGI lifespan, route resolution, import side effects) that are
569
+ # unrelated to what this test measures. Without warm-up, the
570
+ # single cold-start sample dominates `max(latencies)` (which is
571
+ # what the p99 fallback below reduces to for small N) and causes
572
+ # flakes on slow CI runners.
573
+ for _ in range(3):
574
+ client.get("/livez")
575
  for _ in range(20):
576
  t0 = time.perf_counter()
577
  resp = client.get("/livez")