GPT Auto Scraper is a project that leverages AI capabilities to carry out web scraping tasks. It offers a solution for pulling out information from HTML data sources according to the specifications set by the user. It not only creates the necessary scraping code in Python but also runs it to fetch the data of interest.
Yes. As scraping HTML pages uses thousands of tokens, it is not possible to use the API version of GPT-3.5. Instead, we use the free version of ChatGPT via browser automation tool Selenium.
Download Chrome. I strongly recommend that you update to the latest version of Chrome.
No Chrome? We need something Chromium based for undetected_chromedriver to remain undetected. You can download Microsoft Edge or Chromium and change Line 28 in server.py to the path of the browser you downloaded.
Install the required packages Selenium and undetected-chromedriver
pip install -r requirements.txt
Run the ChatGPT browser server as a mock API. Do python server.py
in the terminal, and login to your OpenAI account. Then, press Enter
in the terminal to start the server.
Now, run python scraper.py
in the terminal. Make sure to open scraper.py
and switch out the needed data format in the prompt,
as well as the URL of the website you want to scrape.
- Add websites.txt, list of websites to scrape
- Add settings.json, settings for the scraper