Web Automation with Splinter
Have you ever had to click through a clunky, difficult to navigate, website? Maybe it was a school web portal, or a local government website. If you have to click through a frustrating web page often, I have good news for you. We can automate that with Python.
The library that we will be using to spin up and control a headless web browser is called Splinter. There is some prep work before we get started.
First make sure that you have Homebrew installed. Then run the following in your terminal:
# run this in terminal
brew cask install chromedriver
If you are on a Windows machine, I recommend navigating to the Chromedriver download page and downloading the version of Chromedriver that is equal to the current version of Google Chrome that you have installed. After downloading, move the Chromedriver executable file to your project directory.
Now, let’s get splinter installed.
# run this in terminal or (Windows) powershell
pip3 install splinter
Awesome! If all has gone smoothly up to this point, fire up a Jupyter notebook or run python3
in terminal and follow along.
from splinter import Browser
import time# we are going to use NYC Open Data as an example
URL = 'https://opendata.cityofnewyork.us/data/'# un-comment this if you are using Windows!
# executable_path = {'executable_path': 'chromedriver.exe'}
# browser = Browser('chrome', **executable_path)# this will open up a new headless browser and navigate to our URL
browser = Browser('chrome')
browser.visit(URL)# tell python to pause for 10s so we can observe the browser
time.sleep(10)browser.quit()
So far we have started up a new headless browser and visited the website we specified. Try replacing the URL variable with another website and watch as the browser navigates to it.
Interacting with the web page
Now that we are on the NYC Open Data website, let’s try to automate placing text in the search bar and clicking the Search button.
By right-clicking on the search bar and clicking Inspect, we can quickly pull up the HTML tag for the element that we want to interact with. According to the Splinter docs, there are a few ways to interact with HTML elements:
# finding HTML elements with Splinter
# ref: https://splinter.readthedocs.io/en/latest/finding.html
browser.find_by_css('h1')
browser.find_by_xpath('//h1')
browser.find_by_tag('h1')
browser.find_by_name('name')
browser.find_by_text('Hello World!')
browser.find_by_id('firstheader')
browser.find_by_value('query')
By inspecting the HTML of the NYC Open Data search box, we can see that the full element for the search bar is the following:
<input id="search-catalog-input" class="highlighted-search search-input form-control form-control-lg space-top" autocomplete="off" name="q" type="text" placeholder="Search NYC Open Data">
Using the same process on the Search button shows the following HTML tag:
<button class="btn btn-primary btn-lg" style="width: 100%;" type="submit">Search</button>
Looks like we can use browser.find_by_id('search-catalog-input')
to interact with the search box and browser.find_by_text('Search')
to interact with the search button! Let’s alter our code from earlier to show this functionality.
We have successfully navigated to a website, filled a search input with custom text, and clicked a button with an automated browser! These are the basics needed to get up and running with any web automation task you have in mind. I challenge you to implement this library at work or in school to automate a web task!