Solving Captchas with Puppeteer
This tutorial is for text captchas and not Google’s reCAPTCHA(see the end of this post).
Requirements:
- Anticaptcha or any other captcha solving service
- async-captcha NPM module (I’m the author 🤞)
These services are paid and current prices are around $0.60–0.80 per 1,000 captchas. I’ll use Anti-captcha in this tuttorial
asyn-captcha module currently supports anti-captcha only
How Captcha Solving Works:
You need to send captcha images as base64 string to solving services and will get a TaskID in return. With that task id, you will check your captcha’s value until it’s solved(yes, you need to send request manually because they don’t let you know when your captcha is solved).
These diagrams below actually showing whole steps very clearly, this is for you to only understand the concept behind the scenes. We will simplify the whole process with async-captcha module.
Puppeteer
Converting images in a browser is pretty tedious work, but luckily we are using Puppeteer!
There is a .screenshot method which gets the screenshot of a page or an element and returns its base64 encoded string
This code snippet is very straightforward. We loaded a website and get the reference of captcha image first. Then we get the base64 string of that image using elementHandle.screenshot method. Don’t forget to set encoding:”base64" otherwise it’ll return in binary as default.
So, we have the base64 string of our captcha now, all we need to do is send that image string alongside with our API key to a captcha solving service with async-captcha wrapper.
Install the asyn-captcha module and include in your code. Then initialize with your API key as shown below.
captchaCode is your captcha solution as text. Remember that if something goes wrong or you reach the retry limit(default is 10), the getResult function will return null.
That’s all 🤞 Happy scraping!
Follow my GitHub
Follow me on Medium for more Puppeteer tutorials.
reCAPTCHA
The process is nearly the same except reCAPTCHA solving requires a little more work. But the concept is similar. Instead of sending base64 string of the image you need to send reCAPTCHA key and after getting the result you should attach the response payload in proper DOM element. I’ll make another tutorial for reCAPTCHA later.