Work

Production systems, open-source contributions, and project notes. The work entries keep the measurable details, but avoid internal implementation specifics.

Systems

Zero-Downtime Migration Across 60 OpenSearch Clusters

Amazon, 2025

Led a staged migration of 1PB+ search data while maintaining availability and improving latency.

Scope
56B seller listings, 8.7M partners, multi-region traffic; sustained approximately 5K read TPS and approximately 200K write TPS during a 2-day migration window.
Result
Completed migration with zero client downtime. Reduced p90 API latency by 15% through Graviton-based OpenSearch capacity optimization.
Work
Phased cutover with guardrails and rollback criteria. Capacity planning tied to read/write traffic envelopes. Real-time validation and operational runbooks.

Traffic Prioritization Reduced Seller-Visible Ingestion Delay by 80%

Amazon, 2025

Designed a schema-driven prioritization layer so seller-facing updates remained fast during heavy backfills.

Scope
Ingestion pipelines supporting seller contribution updates and backfill workloads.
Result
Reduced p99 seller-facing delay from 90 minutes to 18 minutes. Improved listing visibility timeliness during periods of high system load.
Work
Schema-based sidelining and prioritization controls. Explicit SLO-oriented queue and throughput policies. Operational metrics and alerting tied to customer-visible latency.

Amazon Haul: Seller Management and Buyability Visibility

Amazon, 2024

Delivered platform capabilities that let sellers manage Haul listings and see accurate buyability status from offer creation.

Scope
Seller onboarding and listing lifecycle flows for a new marketplace program.
Result
Enabled launch-critical seller workflows for Haul. Introduced an extensible data model for future marketplace programs.
Work
Modeled program-aware data contracts and lifecycle states. Built extensibility points to avoid one-off launch logic. Integrated with seller-facing management surfaces.

Cluster Stability Remediation Eliminated Repeated SEV2s

Amazon, 2025

Diagnosed infrastructure bottlenecks and implemented scaling/rate controls that stopped recurring customer-impacting incidents.

Scope
OpenSearch clusters backing high-throughput marketplace indexing workloads.
Result
Eliminated 12+ SEV2 incidents across a 6+ week window. Stabilized service behavior under sustained ingestion pressure.
Work
Root-cause analysis that isolated EBS throughput throttling. Node scaling and write-rate limiting strategy. Joint execution with AWS TAM and OpenSearch stakeholders.

Tools

ch

Docker config and shell manager for using containers as ad-hoc dev environments

CS350 Docker

Containerized development environment for USC CS104

CFS-RS

Completely Fair Scheduler implementation in Rust

Runner

Code execution engine with Docker