This guide is for folks new to data collection through Python web scraping, where installing development tools on your laptop remains a learning curve for you.

I will provide some demo code for you to use in my public Github repository: https://github.com/jonathan-moo/python-webscraping-demo.

Why are we using ChromeDriver?

To scrape a website, we would load a website accessible via the Internet on a browser.

As a browser renders the content (HTML tags), we can extract information programmatically through the browser and load it into our Python environment as variables in our development environment.

This Chrome app is being controlled by my Python script or Jupyter Notebook.

There arises a question. Which browser should we use to render the content?

As the name suggests, it would be Chrome, and Google provided a dev kit to spin up a browser programmatically.

Prerequisites

You’ll need to download a ChromeDriver executable file from https://googlechromelabs.github.io/chrome-for-testing/ as shown:

Modern systems should be in 64 bits.

You can check your Chrome browser’s version in the Help > About Google Chrome navigation at the top right-hand side of your window, where you can see the three dots.

The version number should align with your Chrome Driver’s version.

There are a couple of ways to run ChromeDriver for web scraping:

  • Reference the chromedriver.exe within your PATH environments
  • Reference the chromedriver.exe in your code.

Install Selenium Binding With Splinter

While installing splinter as a Python package, you’ll need to add bindings for Selenium.

pip install splinter[selenium]

You can check the official docs here: https://splinter.readthedocs.io/en/0.21.0/install/driver_install.html

Reference ChromeDriver within your PATH environments

In Windows 10/11, use the search bar to locate the option to edit system environment variables.

In the System Properties panel, click Environment Variables.

Select Path and click Edit... to add the reference to the chromedriver.exe.

Click New on the Edit environment variable panel and reference the folder where your chromedriver.exe is stored.

Important notes:

  • Please don’t follow the path I set up, as it is unique to my Windows laptop. Instead, locate where you downloaded your chromedriver.exe and use that folder path.
  • It is best practice to create a new folder, bin, in your user folder and place all your executable or binary files there. This way, you wouldn’t need to keep referencing new paths for your system variables because Windows would use the folder path you defined.
  • Executable or binary (bin) files are applications or scripts to run commands.

Once you are done, run the /notebooks/test_your_chromedriver_setup.ipynb in your Jupyter notebook to test.

Reference ChromeDriver in your code

If you can’t get the path environments to work, consider referencing ChromeDriver in your code.

The only caveat is that you must keep referencing the ChromeDriver file for each script or Jupyter Notebook to use it.

My GitHub repo has an easy reference for you to use.

  • Download the chromedriver.exe into the /drivers folder.
  • Run the /notebooks/test_executable_setup.ipynb.

In short, use the following code to reference the ChromeDriver file.

# Install and import the relevant dependencies

from splinter import Browser

from bs4 import BeautifulSoup

from selenium.webdriver.chrome.service import Service

# Check if ChromeDriver exists in the `drivers` folder

import os

path = "../drivers/chromedriver.exe"

os.path.isfile(path)

# Point the service to use the ChromeDriver within your project folder

my_service = Service(executable_path=path)

my_service

# Open a Chrome window using Splinter

browser = Browser('chrome', service=my_service)

How do you know if you’re successful?

In both instances, the code would start a Chrome browser with the heading, “Chrome is being controlled by automated test software.” 

This is where you can continue to build on your code to scrap data from websites.