This guide is for folks new to data collection through Python web scraping, where installing development tools on your laptop remains a learning curve for you.

I will provide some demo code for you to use in my public Github repository: https://github.com/jonathan-moo/python-webscraping-demo.

Why are we using Firefox? 

To scrape a website, we would load a website accessible via the Internet on a browser.

As a browser renders the content (HTML tags), we can extract information programmatically through the browser and load it into our Python environment as variables in our development environment.

This Firefox app is controlled by my Python script or Jupyter Notebook.

There arises a question. Which browser should we use to render the content?

As the name suggests, it would be Firefox. 

Prerequisites

You’ll need to download and install Mozilla Firefox: https://www.mozilla.org/en-US/firefox/windows/

Splinter will use the existing Firefox app within your Windows machine as its default browser.

Install GeckoDriver

You’ll need to download a GeckoDriver executable file from https://github.com/mozilla/geckodriver/releases as shown:

  • If you’re using an Intel based Windows machine, select the *-win32.zip file.
  • If you’re using an ARM-based Windows machine, select the *-win-aarch64.zip file

In these zipped files, there is a geckodriver.exe. You’ll need to reference the geckodriver.exe file within your PATH environments.

Install Selenium Binding With Splinter

While installing splinter as a Python package, you’ll need to add bindings for Selenium.

pip install splinter[selenium]

You can check the official docs here: https://splinter.readthedocs.io/en/0.21.0/install/driver_install.html

Reference GeckoDriver within your PATH environments

In Windows 10/11, use the search bar to locate the option to edit system environment variables.

In the System Properties panel, click Environment Variables.

Select Path and click Edit... to add the reference to the geckodriver.exe.

Click New on the Edit environment variable panel and reference the folder where your geckodriver.exe is stored.

Important notes:

  • Please don’t follow the path I set up, as it is unique to my Windows laptop. Instead, locate where you downloaded your geckodriver.exe and use that folder path.
  • It is best practice to create a new folder, bin, in your user folder and place all your executable or binary files there. This way, you wouldn’t need to keep referencing new paths for your system variables because Windows would use the folder path you defined.
  • Executable or binary (bin) files are applications or scripts to run commands.

Once you are done, run the /notebooks/test_firefox_setup.ipynb in your Jupyter notebook to test.

How do you know if you’re successful?

Splinter uses Firefox as its default browser when it is run.

from splinter import Browser

# Open a Firefox window using Splinter
browser = Browser()

browser.visit("https://google.com")

If you run the above code and your Firefox app boots up, and accesses Google home page, this means GeckoDriver is working in your system to boot up Firefox.

You can then build on the above code to scrap data from websites.