SageMaker Data Wrangler Hands-On Tutorial

Опубликовано: 03 Март 2025
на канале: ML Workbench
2,475
27

This hands-on tutorial uses a stroke prediction dataset from Kaggle and processes it using Amazon SageMaker Data Wrangler. This is Amazon’s low-code/no-code solution for pre-processing datasets for machine Learning tasks. In this video, we will:
1. Create a Data Wrangler Flow
2. Load a dataset into Data Wrangler
3. Perform data exploration
4. Use Data Wrangler’s built-in feature transformations
5. Create custom feature transformations
6. Generate data quality and insight reports
7. And finally export the processed data for future use
If there are any other tutorials that would be helpful, please comment below (We read them all).

Links:
The flow file I used in this tutorial is available here:
https://github.com/chaeAclark/literat...

The dataset used for this tutorial is available here:
https://www.kaggle.com/datasets/fedes...

Good Data Wrangler Blogs:
https://aws.amazon.com/blogs/machine-...

The Data Wrangler Developer's Guide:
https://docs.aws.amazon.com/sagemaker...

Thanks for viewing!!

P.S. The “Other” label in the “gender” column corresponded to a single point. My recommendation in this case is still to remove the point and collect new data if possible. Even if it was used for training, there is no telling how future points (when we want to perform inference) would behave. I would not trust the model in this instance, and there are no metrics that could be obtained with a single data point to convince me. Once a few data points are collected, the “Other” value can be added into training.