This guide is for folks new to data collection through Python web scraping, where installing development tools on your laptop remains a learning curve for you.
I will provide some demo code for you to use in my public Github repository: https://github.com/jonathan-moo/python-webscraping-demo.
Why are we using ChromeDriver?
To scrape a website, we would load a website accessible via the Internet on a browser.
As a browser renders the content (HTML tags), we can extract information programmatically through the browser and load it into our Python environment as variables in our development environment.
There arises a question. Which browser should we use to render the content?
As the name suggests, it would be Chrome, and Google provided a dev kit to spin up a browser programmatically.
Prerequisites
You’ll need to download a ChromeDriver executable file from https://googlechromelabs.github.io/chrome-for-testing/ as shown:
You can check your Chrome browser’s version in the Help > About Google Chrome
navigation at the top right-hand side of your window, where you can see the three dots.
There are a couple of ways to run ChromeDriver for web scraping:
- Reference the
chromedriver.exe
within your PATH environments - Reference the
chromedriver.exe
in your code.
Install Selenium Binding With Splinter
While installing splinter
as a Python package, you’ll need to add bindings for Selenium.
pip install splinter[selenium]
You can check the official docs here: https://splinter.readthedocs.io/en/0.21.0/install/driver_install.html
Reference ChromeDriver within your PATH environments
In Windows 10/11, use the search bar to locate the option to edit system environment variables.
In the System Properties
panel, click Environment Variables
.
Select Path
and click Edit...
to add the reference to the chromedriver.exe
.
Click New on the Edit environment variable panel and reference the folder where your chromedriver.exe is stored.
Important notes:
- Please don’t follow the path I set up, as it is unique to my Windows laptop. Instead, locate where you downloaded your
chromedriver.exe
and use that folder path. - It is best practice to create a new folder,
bin
, in your user folder and place all your executable or binary files there. This way, you wouldn’t need to keep referencing new paths for your system variables because Windows would use the folder path you defined. - Executable or binary (bin) files are applications or scripts to run commands.
Once you are done, run the /notebooks/test_your_chromedriver_setup.ipynb
in your Jupyter notebook to test.
Reference ChromeDriver in your code
If you can’t get the path environments to work, consider referencing ChromeDriver in your code.
The only caveat is that you must keep referencing the ChromeDriver file for each script or Jupyter Notebook to use it.
My GitHub repo has an easy reference for you to use.
- Download the
chromedriver.exe
into the/drivers
folder. - Run the
/notebooks/test_executable_setup.ipynb
.
In short, use the following code to reference the ChromeDriver file.
# Install and import the relevant dependencies
from splinter import Browser
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.service import Service
# Check if ChromeDriver exists in the `drivers` folder
import os
path = "../drivers/chromedriver.exe"
os.path.isfile(path)
# Point the service to use the ChromeDriver within your project folder
my_service = Service(executable_path=path)
my_service
# Open a Chrome window using Splinter
browser = Browser('chrome', service=my_service)
How do you know if you’re successful?
In both instances, the code would start a Chrome browser with the heading, “Chrome is being controlled by automated test software.”
This is where you can continue to build on your code to scrap data from websites.