-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build scrapper to continuously update the unstructured data folder with latest Lucknow data #38
Comments
I can do it but i will need list of websites from which to fetch the data. Like if there's a blogging site then whenever we will run our scrapper so new blogs will be added to unstructured data. |
How about we build this scraper in parts, like someone takes the tourism part, someone takes the hospitals part, and later on, we can combine them to make a fully automated raw data scraper? |
That would be nice, but we will still need list of sites ( that regularly update data on specific topic ) to target them for latest data. Or we can have another folder called scrapped in Unstrcured_data folder and we can scrap any data related to lucknow by our program, ( can be in different files that are named based on date or something else ) in it. |
@pratyakshSoni1 @AayushSharma-1 That's a great idea to take care of one topic and build the scrapper step by step. |
Yes, Sure! |
Right now the unstructured data folder contains limited data, we need scrappers to scrape the data from different Lucknow websites so that if we want to add more data in the future or update the database of the Lucknow we can simply run those scrappers agents.
The text was updated successfully, but these errors were encountered: