More recently, however, advanced technologies in web development have made the task a bit more difficult.
But as I got further into it, I found obstacles that could not be overcome with traditional methods.
In my case, this seemed like it could be useful.

And sure enough, aSelenium libraryexists for Python.
This would allow me to instantiate a net web client Chrome, Firefox, IE, etc.
then pretend I was using the web client myself to gain access to the data I was looking for.

Those included Customer ID, from- month/year, and to-month/year.
It immediately asked me to select a certificate (which I had installed earlier).
The first problem to tackle was the certificate.

How to choose the proper one and accept it so that get into the website?
In my first test of the script, I got this prompt:
This wasnt good.
I did not want to manually tap the OK button each time I ran my script.

As it turns out, I was able to find a workaround for this without programming.
Since I only had one certificate loaded, I used the generic format.
Then, armed with this information, I found the element on the page, then clicked it.

And voila, the form was submitted and the data appeared!
Now, I could just scrape all of the data on the result page and save it as required.
Getting the data
First, I had to handle the case where the search found nothing.

That was pretty straightforward.
An opened transaction showed a minus sign (-) which when clicked would end the div.
Clicking a plus sign would call a URL to open its div and close any open one.

For this project, the count was returned back to a calling program.
Here are a few:
For simple prompts (like whats 2 + 3?
), these can generally be read and figured out easily.

However, for more advanced barriers, there are libraries that can help have a go at crack it.
Some examples are2Captcha,Death by Captcha, andBypass Captcha.
Basically, if you’re free to browse the site yourself, it generally can be scraped.

Now, as a caveat, it does not mean that every websiteshouldbe scraped.
Some have legitimate restrictions in place, and there have been numerouscourt casesdeciding the legality of scraping certain sites.
Either way, its best to check with the terms and conditions before starting any project.

But if you do go ahead, be assured that you could get the job done.
Also tagged with



