More recently, however, advanced technologies in web development have made the task a bit more difficult.

But as I got further into it, I found obstacles that could not be overcome with traditional methods.

In my case, this seemed like it could be useful.

How to use Python and Selenium to scrape websites

And sure enough, aSelenium libraryexists for Python.

This would allow me to instantiate a net web client Chrome, Firefox, IE, etc.

then pretend I was using the web client myself to gain access to the data I was looking for.

Data Scraping

Those included Customer ID, from- month/year, and to-month/year.

It immediately asked me to select a certificate (which I had installed earlier).

The first problem to tackle was the certificate.

Article image

How to choose the proper one and accept it so that get into the website?

In my first test of the script, I got this prompt:

This wasnt good.

I did not want to manually tap the OK button each time I ran my script.

Article image

As it turns out, I was able to find a workaround for this without programming.

Since I only had one certificate loaded, I used the generic format.

Then, armed with this information, I found the element on the page, then clicked it.

Data Scraping

And voila, the form was submitted and the data appeared!

Now, I could just scrape all of the data on the result page and save it as required.

Getting the data

First, I had to handle the case where the search found nothing.

Data Scraping

That was pretty straightforward.

An opened transaction showed a minus sign (-) which when clicked would end the div.

Clicking a plus sign would call a URL to open its div and close any open one.

Data Scraping

For this project, the count was returned back to a calling program.

Here are a few:

For simple prompts (like whats 2 + 3?

), these can generally be read and figured out easily.

Article image

However, for more advanced barriers, there are libraries that can help have a go at crack it.

Some examples are2Captcha,Death by Captcha, andBypass Captcha.

Basically, if you’re free to browse the site yourself, it generally can be scraped.

Article image

Now, as a caveat, it does not mean that every websiteshouldbe scraped.

Some have legitimate restrictions in place, and there have been numerouscourt casesdeciding the legality of scraping certain sites.

Either way, its best to check with the terms and conditions before starting any project.

Data Scraping

But if you do go ahead, be assured that you could get the job done.

Also tagged with

Article image

Data Scraping

Article image

Article image