Data Engineering on AWS

Field

Description

Purpose

To equip data professionals with the architectural and technical skills required to design, implement, and secure modern data solutions—including data lakes, warehouses, and complex pipelines—at scale on AWS.

Audience

Professionals interested in the end-to-end lifecycle of data, from ingestion and transformation to storage and consumption.

Role

Data Engineers, Data Architects, Backend Developers, and Data Scientists looking to operationalize data workflows.

Domain

Data Engineering / Big Data / Analytics.

Skill Level

Intermediate.

Style

A balanced mix of theory and hands-on labs covering batch and streaming architectures, orchestration, and performance tuning.

Duration

3 Days.

Related Technologies

Amazon S3 (Data Lakes), Amazon Redshift Serverless, AWS Glue, Amazon Kinesis, Open Table Formats, and SQL.

Course Description

Data Engineering on AWS is a comprehensive 3-day deep dive into the practices and solutions required to manage data at scale. Participants explore the foundational roles of data engineering and learn to build production-ready environments. The course covers the implementation of data lakes and Amazon Redshift Serverless warehouses, as well as the creation of both batch and streaming data pipelines. Beyond just building, the curriculum emphasizes optimization and security, ensuring that data solutions are cost-effective, compliant, and performant.

Who is this course for

This course is intended for technical individuals responsible for the plumbing of data-driven organizations. It is ideal for:

  • Data Engineers who need to move beyond simple ETL to complex cloud-native architectures.

  • Data Architects designing scalable data environments for analytics and machine learning.

  • Software Developers tasked with integrating application data into centralized data lakes or warehouses.

Course Objectives

  • Foundational Strategy: Understand data personas, discovery, and the orchestration of AWS services for data movement.

  • Data Lake Implementation: Design and secure data lakes using S3, incorporating open table formats and transformation workflows.

  • Data Warehousing: Set up and optimize Amazon Redshift Serverless, including query tuning and automated orchestration.

  • Batch Pipelines: Build comprehensive batch processing pipelines that cover cataloging, integration, and secure serving.

  • Streaming Solutions: Architect real-time streaming pipelines, focusing on ingestion, storage, and live analysis with security and compliance.

  • Operational Excellence: Apply CI/CD, Infrastructure as Code (IaC), and cost optimization practices to data engineering projects.

Prerequisites

  • Programming: Working knowledge of Python and libraries like NumPy and Pandas.

  • AI/ML Basics: Familiarity with supervised/unsupervised learning and basic algorithms (regression, classification).

  • Cloud & Data: A basic understanding of cloud computing and the AWS platform; familiarity with SQL and relational databases is highly recommended.

Section 1: Data Engineering Roles and Key Concepts

  • Role of a Data Engineer

  • Key functions of a Data Engineer

  • Data Personas

  • Data Discovery

  • AWS Data Services

Section 2: AWS Data Engineering Tools and Services

  • Orchestration and Automation

  • Data Engineering Security

  • Monitoring

  • Continuous Integration and Continuous Delivery

  • Infrastructure as Code

  • AWS Serverless Application Model

  • Networking Considerations

  • Cost Optimization Tools

Section 3: Designing and Implementing Data Lakes

  • Hands-on lab: Setting up a Data Lake on AWS

  • Data lake introduction

  • Data lake storage

  • Ingest data into a data lake

  • Catalog data

  • Transform data

  • Server data for consumption

Section 4: Optimizing and Securing a Data Lake Solution

  • Open Table Formats

  • Security using AWS Lake Formation

  • Setting permissions with Lake Formation

  • Security and governance

  • Troubleshooting

  • Hand-on lab: Automating Data Lake Creation using AWS Lake Formation Blueprints

Section 5: Data Warehouse Architecture and Design Principles

  • Hands-on Lab: Setting up a Data Warehouse using Amazon Redshift Serverless

  • Introduction to data warehouses

  • Amazon Redshift Overview

  • Ingesting data into Redshift

  • Processing data

  • Serving data for consumption

Section 6: Performance Optimization Techniques for Data Warehouses

  • Monitoring and optimization options

  • Data optimization in Amazon Redshift

  • Query optimization in Amazon Redshift

  • Orchestration options

Section 7: Security and Access Control for Data Warehouses

  • Hands-on lab: Managing Access Control in Redshift

  • Authentication and access control in Amazon Redshift

  • Data security in Amazon Redshift

  • Auditing and compliance in Amazon Redshift

Section 8: Designing Batch Data Pipelines

  • Introduction to batch data pipelines

  • Designing a batch data pipeline

  • AWS services for batch data processing

Section 9: Implementing Strategies for Batch Data Pipeline

  • Hands-on lab: Aa Data Engineer

  • Elements of a batch data pipeline

  • Processing and transforming data

  • Integrating and cataloging your data

  • Serving data for consumption

Section 10: Optimizing, Orchestrating, and Securing Batch Data Pipelines

  • Hands-on lab: Orchestrating Data Processing in Spark using AWS Step Functions

  • Optimizing the batch data pipeline

  • Orchestrating the batch data pipeline

  • Securing the batch data pipeline

Section 11: Streaming Data Architecture Patterns

  • Hands-on lab: Streaming Analytics with Amazon Managed Service for Apache Flink

  • Introduction to streaming data pipelines

  • Ingesting data from stream sources

  • Streaming data ingestion services

  • Storing streaming data

  • Processing Streaming Data

  • Analyzing Streaming Data with AWS Services

Section 12: Optimizing and Securing Streaming Solutions

  • Hands-on lab: Access Control with Amazon Managed Streaming for Apache Kafka

  • Optimizing a streaming data solution

  • Securing a streaming data pipeline

  • Compliance considerations

Copyright © 2026 microskill.ai

Copyright © 2026 microskill.ai