Overview
We are seeking a skilled Data Engineer to design, build, and maintain large-scale data pipelines that process and transform vast datasets for machine learning (ML) applications. You will play a critical role in optimizing data workflows, ensuring data quality, and deploying ML models in a production-grade on cloud environment.
The ideal candidate has strong expertise in ETL/ELT pipelines and cloud data services, with hands-on experience in ML model deployment (MLOps). You will collaborate closely with Data Scientists, ML Engineers, and DevOps teams to ensure scalable, reliable, and efficient data infrastructure.
Responsibilities
1. Data Pipeline Development & Maintenance
- Design, build, and optimize scalable data pipelines to ingest, process, and transform large volumes of structured and unstructured data.
2. ML Deployment & MLOps on cloud environment
- Collaborate with ML teams to productionize models using cloud services.
- Automate CI/CD pipelines for ML models.
- Implement monitoring and logging for ML models in production (e.g., CloudWatch, SageMaker Model Monitor).
- Optimize model inference performance (scaling, latency, cost-efficiency).
3. Infrastructure & Cloud Optimization
- Manage IaC (Infrastructure as Code) using Terraform, CloudFormation, or CDK.
- Optimize AWS resource usage (cost monitoring, auto-scaling, spot instances).
- Ensure security and compliance (IAM roles, encryption, VPC configurations).
- Implement batch and real-time (streaming) data processing using cloud services.
- Maintain and improve data warehousing for efficient storage and retrieval.
- Ensure data quality, reliability, and performance through monitoring, logging, and automated testing.
Required Skills & Qualifications
- 3+ years of experience in Data Engineering, with a focus on large-scale data pipelines.
- Strong proficiency in Python, SQL, PySpark, and distributed computing frameworks.
- Hands-on experience with AWS data stack:
- Storage & Processing: S3, Glue, EMR, Redshift, Athena
- Streaming: Kinesis, MSK, Lambda
- ML Deployment: SageMaker, Lambda, Step Functions, ECR/EKS
- Experience with workflow orchestration (Airflow, Step Functions, MWAA).
- Familiarity with MLOps practices (model versioning, A/B testing, monitoring).
- Knowledge of IaC (Terraform, CloudFormation) and DevOps best practices.
Nice-to-Have Skills
- Experience with feature stores (SageMaker Feature Store, Feast).
- Knowledge of real-time ML inference (API deployments, SageMaker endpoints).
- Familiarity with data observability tools (DataDog, Monte Carlo, Great Expectations).
- AWS certifications (AWS Certified Data Analytics, AWS Certified ML Specialty).
Job Type: Full-time
Pay: $120,000.00 - $170,000.00 per year
Benefits:
- Flexible schedule
- Paid time off
Compensation Package:
Schedule:
Ability to Commute:
- Irvine, CA 92614 (Required)
Ability to Relocate:
- Irvine, CA 92614: Relocate before starting work (Required)
Work Location: In person