Job Overview
We are seeking a dynamic and innovative Data Engineer to join our team and drive the development of scalable, high-performance data solutions. In this role, you will be at the forefront of designing, building, and maintaining robust data pipelines and architectures that empower data-driven decision-making across the organization. Your expertise will enable us to harness the full potential of big data technologies and cloud platforms, ensuring our data infrastructure is efficient, reliable, and secure. If you thrive in a fast-paced environment where your technical skills can make a tangible impact, this opportunity is perfect for you!
Duties
- Design, develop, and optimize scalable data pipelines using ETL (Extract, Transform, Load) processes to facilitate seamless data flow from diverse sources such as AWS, Azure Data Lake, Hadoop, and on-premises systems
- Build and maintain data warehouses and data lakes utilizing tools like Apache Hive, Spark, Talend, Informatica, and Microsoft SQL Server to support advanced analytics and reporting
- Collaborate with cross-functional teams to understand data requirements and translate them into efficient database designs and models
- Implement RESTful APIs for secure and efficient data access across various applications
- Conduct data analysis to identify trends, anomalies, and opportunities for process improvements
- Develop scripts using Python, Bash (Unix shell), Shell Scripting, VBA, or other languages to automate workflows and streamline operations
- Support model training efforts by preparing datasets and ensuring data quality for machine learning initiatives
- Maintain comprehensive documentation of data architecture, pipelines, and processes in accordance with best practices
- Participate actively in Agile development cycles to deliver iterative improvements rapidly and efficiently
- Monitor system performance and troubleshoot issues proactively to ensure high availability of data services
Skills
- Extensive experience with cloud platforms such as AWS and Azure Data Lake for scalable storage solutions
- Strong proficiency in programming languages including Java and Python for building robust data applications
- Deep understanding of Big Data technologies like Hadoop ecosystem components (HDFS, MapReduce) and Apache Spark for large-scale data processing
- Expertise in SQL databases including Microsoft SQL Server and Oracle; skilled in database design, optimization, and management
- Familiarity with Data Warehouse concepts and tools such as Informatica, Talend, or similar ETL platforms
- Knowledge of Linked Data principles for connecting disparate datasets across platforms
- Experience working with Apache Hive for querying large datasets efficiently
- Hands-on experience with Model Training workflows for machine learning projects
- Strong analysis skills to interpret complex datasets and generate actionable insights
- Ability to design scalable database schemas that support business needs effectively
- Proficiency in Shell Scripting (Bash) for automation tasks in Unix/Linux environments
- Familiarity with RESTful API development for integrating various systems securely
- Experience working within Agile methodologies to promote collaborative project delivery
Join us if you’re passionate about transforming raw data into strategic assets! We value innovative thinkers who are eager to leverage cutting-edge technologies like Spark, Hadoop, Informatica, Looker analytics tools, and more. Your expertise will directly influence our ability to deliver insightful analytics that drive business success. This is an exciting opportunity to grow your career while working on impactful projects in a vibrant team environment!
Pay: $58.39 - $70.32 per hour
Expected hours: 40 per week
Benefits:
Work Location: In person