Role Summary:
We are seeking a Senior Databricks Data Engineer (QA Testing) with strong hands on experience building and optimizing data pipelines on Databricks. This role focuses on implementation, performance, and reliability while working within a well defined architectural framework.
Key Responsibilities:
- Develop and maintain data pipelines on Databricks using medallion architecture principles
- Build, optimize, and maintain Databricks notebooks for ingestion, transformation, and analytics
- Implement data flows across boundary systems ensuring reliability, scalability, and data quality
- Apply performance tuning techniques including query optimization, cluster sizing, and Delta Lake optimization
- Work with Unity Catalog to manage schemas, permissions, and data access controls
- Collaborate closely with architects to implement approved data designs and standards
- Troubleshoot pipeline failures, data issues, and performance bottlenecks
- Ensure code quality, documentation, and operational readiness of data solutions
Required Qualifications:
- Strong hands on experience with Databricks and Delta Lake
- Strong hands-on experience with AWS services such as S3, EC2, Lambda, Glue, and Redshift.
- Working experience with IBM DataStage for ETL development, data integration, and data transformation.
- Proven ability to design metadata-driven test harnesses that scale — parameterized test configs that can run against 500 jobs today and 80K tomorrow without rearchitecting
- Experience with data validation tooling (Great Expectations, Deequ, or custom PySpark validation libraries) for row count, schema, transformation logic, and output comparison checks
- Performance and throughput benchmarking — rows/sec, latency comparisons between legacy (DataStage) and target (Databricks) pipelines
- Experience with Iceberg Tables
- Proven experience implementing medallion architecture in real world use cases
- Experience building notebooks using Python, SQL, or Scala
- Solid understanding of data integration across multiple systems
- Strong performance tuning and debugging skills in Databricks environments
- Experience working in enterprise scale data platforms
Scale & Automation Mindset
- Experience designing test coverage strategies for large-scale migrations (thousands of jobs), including prioritization by complexity tier (Low/Medium/High) and risk scoring
- Ability to build reusable test templates that align with ingestion patterns — Full Load, Incremental/CDC, File Ingest, API — so each converted job inherits its test suite from the framework
- Comfort with GraphFrames or lineage-based dependency analysis to sequence integration tests based on upstream/downstream job relationships
Nice to Have:
- Databricks certification
- Experience working with governed data platforms using Unity Catalog
Pay: $60.00 - $70.00 per hour
Experience:
- Databricks: 5 years (Required)
- DataStage: 5 years (Preferred)
- Metadata: 5 years (Preferred)
- AWS: 5 years (Preferred)
- Data Validation: 5 years (Preferred)
- Deeq: 1 year (Preferred)
- Great Expectations: 1 year (Preferred)
- Pyspark: 5 years (Required)
- Iceberg: 3 years (Preferred)
Work Location: Remote