More recently, however, advanced technologies in web development have made the task a bit more difficult.

40% off TNW Conference!

But as I got further into it, I found obstacles that could not be overcome with traditional methods.

Web scraping with Python: common roadblocks and solutions

In my case, this seemed like it could be useful.

And sure enough, aSelenium libraryexists for Python.

This would allow me to instantiate a online window Chrome, Firefox, IE, etc.

Data Scraping

Those included Customer ID, from- month/year, and to-month/year.

It immediately asked me to select a certificate (which I had installed earlier).

The first problem to tackle was the certificate.

Data Scraping

How to choose the proper one and accept it so that get into the website?

In my first test of the script, I got this prompt:

This wasnt good.

I did not want to manually poke the OK button each time I ran my script.

Data Scraping

As it turns out, I was able to find a workaround for this without programming.

Since I only had one certificate loaded, I used the generic format.

Then, armed with this information, I found the element on the page, then clicked it.

Data Scraping

And voila, the form was submitted and the data appeared!

Now, I could just scrape all of the data on the result page and save it as required.

Getting the data

First, I had to handle the case where the search found nothing.

Data Scraping

That was pretty straightforward.

An opened transaction showed a minus sign (-) which when clicked would shut the div.

Clicking a plus sign would call a URL to open its div and close any open one.

Data Scraping

For this project, the count was returned back to a calling utility.

Here are a few:

For simple prompts (like whats 2 + 3?

), these can generally be read and figured out easily.

However, for more advanced barriers, there are libraries that can help make a run at crack it.

Some examples are2Captcha,Death by Captcha, andBypass Captcha.

Now, as a caveat, it does not mean that every websiteshouldbe scraped.

Some have legitimate restrictions in place, and there have been numerouscourt casesdeciding the legality of scraping certain sites.

Either way, its best to check with the terms and conditions before starting any project.

But if you do go ahead, be assured that you could get the job done.

Also tagged with