Zero-Downtime Migration Across 60 OpenSearch Clusters
Amazon, 2025
Led a staged migration of 1PB+ search data while maintaining availability and improving latency.
- Scope
- 56B seller listings, 8.7M partners, multi-region traffic; sustained approximately 5K read TPS and approximately 200K write TPS during a 2-day migration window.
- Result
- Completed migration with zero client downtime. Reduced p90 API latency by 15% through Graviton-based OpenSearch capacity optimization.
- Work
- Phased cutover with guardrails and rollback criteria. Capacity planning tied to read/write traffic envelopes. Real-time validation and operational runbooks.
LinkedIn profile · Public OpenSearch contribution index
Traffic Prioritization Reduced Seller-Visible Ingestion Delay by 80%
Amazon, 2025
Designed a schema-driven prioritization layer so seller-facing updates remained fast during heavy backfills.
- Scope
- Ingestion pipelines supporting seller contribution updates and backfill workloads.
- Result
- Reduced p99 seller-facing delay from 90 minutes to 18 minutes. Improved listing visibility timeliness during periods of high system load.
- Work
- Schema-based sidelining and prioritization controls. Explicit SLO-oriented queue and throughput policies. Operational metrics and alerting tied to customer-visible latency.
LinkedIn profile
Amazon Haul: Seller Management and Buyability Visibility
Amazon, 2024
Delivered platform capabilities that let sellers manage Haul listings and see accurate buyability status from offer creation.
- Scope
- Seller onboarding and listing lifecycle flows for a new marketplace program.
- Result
- Enabled launch-critical seller workflows for Haul. Introduced an extensible data model for future marketplace programs.
- Work
- Modeled program-aware data contracts and lifecycle states. Built extensibility points to avoid one-off launch logic. Integrated with seller-facing management surfaces.
Amazon Haul announcement context · LinkedIn profile
Cluster Stability Remediation Eliminated Repeated SEV2s
Amazon, 2025
Diagnosed infrastructure bottlenecks and implemented scaling/rate controls that stopped recurring customer-impacting incidents.
- Scope
- OpenSearch clusters backing high-throughput marketplace indexing workloads.
- Result
- Eliminated 12+ SEV2 incidents across a 6+ week window. Stabilized service behavior under sustained ingestion pressure.
- Work
- Root-cause analysis that isolated EBS throughput throttling. Node scaling and write-rate limiting strategy. Joint execution with AWS TAM and OpenSearch stakeholders.
LinkedIn profile · Public OpenSearch contribution index