A Detailed Walkthrough: Running SQL Queries on Data Lake CSV Files with AWS Athena

Опубликовано: 08 Ноябрь 2024
на канале: Analytica Learning
54
0

AWS Athena is a serverless query service that allows you to analyze data stored in Amazon S3 using SQL queries. It is particularly useful for ad-hoc querying and interactive analysis of data in a data lake or data warehouse on S3.
Athena does not require you to load or transform data before querying. Instead, it works directly on the data stored in S3, and the queries are executed on an on-demand basis, with pricing based on the amount of data scanned.
You can use standard SQL to run queries on semi-structured or structured data formats like JSON, Parquet, ORC, and more.


In this video, I demonstrate how to rectify data stored in CSV files on S3. I introduce the concept of a landing zone for cleansing erroneous data. Furthermore, I provide a practical demonstration by running several SQL queries against multiple CSV files.