We are looking for an outstanding DevOps and Site Reliability Engineer to join the NVIDIA e-commerce team. You will be a key architect of our NVIDIA Marketplace platform, ensuring that our systems are scalable, resilient, and automated. The ideal candidate is a Terraform expert who views infrastructure as code (IaC) not just as a tool, but as a philosophy. You will bridge the gap between development and operations, focusing on system reliability, high availability, and the performance of our global e-commerce platform.
What you’ll be doing:
Architect and refine automated deployment Jenkins pipelines to ensure seamless, zero-downtime releases.
Design, build, and maintain enterprise-scale infrastructure using Terraform. Establish modular, reusable patterns for AWS resources.
Optimize and manage sophisticated AWS environments with a focus on cost-efficiency and security.
Transition our monitoring from reactive to proactive using AI-powered observability tools (e.g., Datadog Watchdog) for automated root cause analysis (RCA) and anomaly detection.
Define and monitor SLOs and SLAs. Lead incident response and conduct thorough post-mortems to improve system resilience.
What we need to see:
8+ years or equivalent industry experience
Bachelor's/Master's Degree in Computer Science, Software Engineering, or equivalent experience.
Exceptionally strong background in developing CI/CD processes and deployment pipelines using Jenkins.
Extensive experience architecting on AWS Cloud and running services such as API Gateway, Lambda, EKS/ECS, RDS, S3, and SQS.
Expert-level knowledge of Terraform (including state management, workspaces, and complex module development).
Advanced experience with Kubernetes (EKS) and Docker, including orchestration, service meshes, and Helm.
Strong proficiency in a scripting language, such as Python, for automation and custom tooling.
Strong communication skills.
Ways to stand out from the crowd:
Deep understanding of DNS and CDNs (e.g., Akamai, CloudFront).
Demonstrated use of AI tools to improve productivity and the quality of releases.
Applies secure-by-design principles across infrastructure, deployment automation, and operational processes.
AWS certifications are preferred.
#LI-Hybrid
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 176,000 USD - 276,000 USD for Level 4, and 208,000 USD - 333,500 USD for Level 5.
You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until January 27, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.