I recently had to create a crawler to scrape some statistics from a blog website with a web crawler. I checked a few options and ended up using Python and Scrapy for the deed. I wanted to record some useful tips and show how easy it is to get started.
Timecodes:
0:00 - Intro
0:26 - Scrapy library for web crawling
1:10 - Getting started with Scrapy tutorial
1:55 - Run your web crawler
2:40 - Modify crawler to crawl a developer blog site
3:40 - How Scrapy works?
4:40 - Scrape page titles,
6:15 - Find navigation links to follow
9:02 - Run Scrapy from Python code
11:38 - How to gather the results of the crawl?
14:13 - Control Scrapy logging
15:25 - Conclusion
In this case, the place to start is https://www.scrapy.org/
As always, if you like the content, I appreciate it if you show it. If you have questions, comments, or suggestions, please drop a line in the comments section.