Data Engineer- AWS-based data lake and analytics platform. – AWS
About Distillery
Distillery Tech Inc accelerates innovation through an unyielding approach to nearshore software development. The world’s most innovative technology teams choose Distillery to help accelerate strategic innovation, fill pressing technology gaps, and hit mission-critical deadlines. We support essential applications, mobile apps, websites, and eCommerce platforms through the placement of senior, strategic technical leaders and by deploying fully managed technology teams that work intimately alongside our client’s in-house development teams.
At Distillery Tech Inc, we’re not here to reinvent nearshore software development—we’re on a mission to perfect it. We are committed to diversity and inclusion, cultivating a workforce that reflects the rich tapestry of perspectives, backgrounds, and experiences present in our society.
About the Position
We are seeking an experienced Data Engineer to support the re-implementation of a large-scale AWS-based data lake and analytics platform. This is a lift-and-shift initiative that includes the recreation of ingestion pipelines, EMR/Glue workflows, Redshift loading logic, Databricks integration, and SageMaker ML orchestration in a new AWS account using Terraform.
You’ll play a key role in rebuilding core data workflows, validating source integrations, and ensuring data accuracy across the stack.
Responsibilities
Rebuild and validate data ingestion pipelines using AWS services (Lambda, Kinesis Firehose, MSK, S3).
Migrate and reconfigure processing jobs in Glue, EMR, and Amazon Managed Workflows for Apache Airflow (MWAA).
Recreate and validate table definitions in Glue Data Catalog to support Athena queries.
Ingest data from third-party APIs (e.g., Revature API, eCommerce Affiliates) via Lambda functions or Airflow DAGs.
Collaborate with ML engineers to ensure the successful reconstruction and deployment of SageMaker/Personalize workflows.
Partner with DevOps to align Terraform-managed infrastructure with data engineering needs.
Perform rigorous data validation during migration: object counts, schema consistency, and source-to-target QA.
Document data flows and maintain data lineage across ingestion, transformation, and analytics layers.
Technical Expertise
5+ years in data engineering or analytics engineering roles.
Deep hands-on experience with core AWS services: Lambda, Kinesis (Streams & Firehose), MSK, S3, Glue, Athena, EMR, and Redshift.
Proficiency with Apache Airflow (self-managed or MWAA).
Strong Python skills, especially for ETL and Lambda development.
Experience with Glue Data Catalog schema design and partitioning strategies.
Knowledge of data serialization formats such as JSON, Parquet, AVRO.
Skilled in external API integration, authentication, and secure secrets management.
Nice to Have
Familiarity with Martech tools such as Sailthru, Zephr, or Databricks.
Experience with SageMaker pipelines, endpoint deployment, or feature store management.
Understanding of cross-account data ingestion in AWS.
Hands-on experience using Terraform for data infrastructure.
Knowledge of Redshift Spectrum and federated query strategies.
Why You'll Like Working Here
Collaborate with multi-national teams aligned with our core values: Unyielding Commitment, Relentless Pursuit, Courageous Ambition, and Authentic Connection.
Enjoy a competitive compensation package, generous vacation, and comprehensive benefits.
Work remotely in a flexible, supportive environment.
Access professional and personal development opportunities to advance your career.