As web pages become more and more sophisticated with CSS and Javascript functionality, we would need to use CSS selectors to dynamically isolate the areas where we want to scrape data from.
Many of these CSS selectors contain lots of metadata – additional information that tells you more about the HTML tags. They can intuitively help you to automate your web scraping processes.
Hello HTML
Let’s open up the file(s) in the 01-Ins_CSS_Identifiers
folder to get started.
The goal of using CSS identifiers is to be able to easily identify the data you need and map them into a variable for manipulation and/or storage.
Students Do: CSS Case Study
Let’s open up the file(s) in the 02-Stu_CSS_Case_Study
folder to get started.
Students Do: Pandas Scrape
Let’s open up the file(s) in the 03-Stu_Pandas_CSS_Scrape
folder to get started.
DevTools
Let’s open up the file in the 04-Ins_DevTools
folder to get started.
DevTools is a debugging console on Chrome (or other major browser that you might be using.
Keyboard shortcuts
- Mac: Option + Command ⌘ + i
- Windows: Shift + CTRL + i
I’ll walk you through what some of these things can do, and how you can extract data out of it.
We’ll be scraping from this website: https://stackoverflow.com/questions/tagged/python?sort=MostVotes&edited=true
Students Do: Stack Scrape
Let’s open up the file in the 05-Stu_StackOverflow
folder to get started.
Styling HTML Elements with CSS
Let’s open up the file(s) in the 06-Stu_Scraping_Mars_News
folder to get started.