Pipelines with LineaPy + JAX/Ray/NVIDIA + SageMaker Data Wrangler with MLOps

Опубликовано: 30 Сентябрь 2024
на канале: Generative AI on AWS

532

RSVP Webinar: https://www.eventbrite.com/e/webinark...

Talk #0: Introductions and Meetup Announcements by Chris Fregly and Antje Barth

Talk #1: Messy Code to Clean Pipelines with LineaPy by Doris Xin, Founder, and Andrew Cui, Software Engineer @ Linea (https://linea.ai)

Going from data science development to production is full of friction. It either requires data scientists to spend weeks to months manually refactoring and rewriting their development workflows or enlisting a data engineering team to translate research code into production pipelines.

This talk introduces LineaPy - an open source library that automatically captures data science development workflows without user annotation and transforms them into data pipelines to be run on industry-leading data platforms. LineaPy works by tracing a Python program’s execution and using a mix of dynamic and static program analysis to build an intermediate graph representation. This graph representation can then be transformed into different MLOps toolchain outputs, which in turn can automatically be run in production.

Talk #2: Run distributed JAX training with Ray on NVIDIA GPUs with Amazon SageMaker by Neal Vaidya, Technical Marketing Engineer @ Nvidia

In this talk, we show how to set up a Ray cluster on SageMaker and run a multi-node, multi-GPU training job with JAX for a GPT-2 model from Hugging Face Model Hub. The scripts and notebooks used in this post are available here: https://github.com/aws-samples/aws-sa...

JAX is an open source library for high-performance numerical computing and machine learning (ML) research. JAXincludes Numpy-like APIs, automatic differentiation, XLA compilation and acceleration, and simple primitives for multi-node, multi-GPU scaling. These features make JAX an increasingly popular choice for training large language models such as GPT and T5 across multi-GPU and multi-node. Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning (ML) models at any scale. Typically, you can use the pre-built training and inference containers that have been optimized for AWS hardware. Although those containers cover many deep learning workloads, you may have training use cases where you want to scale across multi-GPU and multi-node. To accommodate this, SageMaker provides integration with Ray. Ray programs are able to parallelize and distribute by leveraging an underlying Ray runtime. The Ray runtime can be started using the SageMaker Python SDK on one or multiple nodes, forming a Ray cluster. This functionality enables you to use existing SageMaker training capabilities to run distributed training.

Talk #3: Integrate Amazon SageMaker Data Wrangler with MLOps workflows by Ganapathi Krishnamoorthi, Senior Machine Learning SA @ AWS

In this talk, we demonstrate how users can integrate data preparation using Data Wrangler with Amazon SageMaker Pipelines, AWS Step Functions, and Apache Airflow with Amazon Managed Workflow for Apache Airflow (Amazon MWAA).

SageMaker Pipelines is a feature that is a purpose-built and easy-to-use continuous integration and continuous delivery (CI/CD) service for ML. Step Functions is a serverless, low-code visual workflow service used to orchestrate AWS services and automate business processes. Amazon MWAA is a managed orchestration service for Apache Airflow that makes it easier to operate end-to-end data and ML pipelines.

Blog: https://aws.amazon.com/blogs/machine-...

GitHub: https://github.com/aws-samples/sm-dat...

RSVP Webinar: https://www.eventbrite.com/e/webinark...

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links
O'Reilly Book: https://www.amazon.com/dp/1492079391/
Website: https://datascienceonaws.com
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com