Serverless ETL Pipeline Handling Timestamp with Timezone AWS Glue & Apache Spark | AWS Glue Tutorial

Опубликовано: 20 Февраль 2025
на канале: Tech Eclipser

964

End to End Serverless ETL Pipeline Demo handling Redshift Timestamp with TimeZone DataType which AWS Glue or Spark doesn't handle directly. In this video, we would be showing a couple of ways to handle that use case. We would be reading the data from the Redshift Database and writing back the result back to Redshift Table. Once the Data is loaded we will set up ETL Pipeline using AWS Glue to read the data from Redshift Database apply the SQL Transformation on the same and persist the result back to Redshift Table.

Chapters
00:00 Introduction
00:30 AWS Glue job setup for the POC
02:35 Crawler Setup for the table
04:35 Changing Target as Redshift Table
06:16 Exception Not able to Write to Redshift Table with Timestamp Column with Timezone
07:28 Converting AWS Glue Studio Job to Script
07:48 Staging Table Creation
09:40 AWS Glue Script Changes to Support the Timestamp with Tz Handling
12:17 Result Verification
13:48 Spark Data Frame Writer with Glue Writer
18:23 Final Run and Verification Post Changes
19:05 Thank You

References :
AWS Main Page: https://aws.amazon.com/
AWS Free Tier Details: https://aws.amazon.com/free/

Redshift Cluster Setup:
   • Amazon Redshift Setup Demo and JDBC C...

Redshift AWS Glue Demo :
   • Serverless ETL Pipeline Demo with Ama...

AWS Glue Studio Serverless Pipeline:
   • Serverless ETL Pipeline Demo with Ama...

AWS S3 Bucket Name Guidelines: https://docs.aws.amazon.com/AmazonS3/...

#ETL #Glue #apacheSpark #Redshift #Serverless #JDBC #AWS #Glue-Studio #S3 #IAM #Amazon #DataWareHouse