Control a web browser from R to web scrap static and dynamic websites using {chromote}

Опубликовано: 31 Октябрь 2024
на канале: TheCoatlessProfessor
285
20

Learn to scrape static and dynamic JavaScript websites using R and Chromote! Perfect for data scientists working with modern web applications.

In this tutorial, you'll learn:
How to use Chromote R package with Chrome DevTools Protocol
Scraping JavaScript-generated content
Handling dynamic page elements
Converting web data to R dataframes
Cleaning and analyzing scraped data

Projects Covered:
1. Basic scraping with R-project.org
2. Advanced weather data scraping from Windy.com

Required R Packages 📦 :
Chromote
rvest
lubridate
dplyr

Required Software:
Chrome or a Chromium-based browser

Timeline:
0:00 - Introduction to Chromote and dynamic web scraping
0:42 - Overview of the tutorial structure
1:31 - Initial setup and browser demonstration
2:16 - Installing required R packages (Chromote, rvest, lubridate)
2:47 - Launching a Chrome browser session from R
3:37 - Getting Chrome version information
4:15 - Navigating to r-project.org websites programmatically
4:48 - Using system sleep for page loading
5:28 - Introduction to Chrome DevTools for element inspection
6:22 - Using CSS selectors to identify page elements
7:35 - Highlighting and capturing page elements
8:42 - Extracting HTML content with Chromote
10:04 - Processing HTML with rvest
11:22 - Advanced example: Scraping Windy.com
12:27 - Handling dynamic search functionality
13:44 - Programmatically entering search queries
14:35 - Interacting with search results
15:23 - Extracting weather forecast data
16:28 - Converting HTML tables to R dataframes
19:13 - Cleaning and processing weather data
20:29 - Analyzing the extracted weather data
21:19 - Cleaning up browser sessions
21:49 - Conclusion

💻 Code & Resources:
Blog post: https://blog.thecoatlessprofessor.com...
GitHub Repository: https://github.com/coatless-videos/ch...

🔗 Connect with me:
GitHub: https://github.com/coatless
Website: https://thecoatlessprofessor.com
LinkedIn:   / jamesbalamuta  
BlueSky: https://bsky.app/profile/coatless.bsk...
Mastodon: https://mastodon.social/@coatless
Twitter/X:   / axiomsofxyz  

#Rstats #DataScience #WebScraping #Programming #DataAnalysis #Tutorial

❓ Have questions? Leave them in the comments below!