Developing automatic website scrapers can sometimes be challenging due to how modern websites are built. Some of these challenges are the continuous layout changes, captchas, bot patterns, paginations, being anonymous, logins and websites that populate new content by continuously scrolling.
I was tasked with developing a Google Chrome Extension that would help with speeding up manual scraping. The biggest benefit of developing a plugin is that the user can load a website as if they are browsing it normally which reduces all the challenges automated scraping is faced with.
The plugin was built in such a way that it allows the user to click anywhere in the page and automatically extract the text. The text is then passed through to a system which is then translated, cleaned and mapped to certain fields in the system. A summary is also visible showing the scraping progress.