- Flask
- BeautifulSoup
- MongoDB (Currently not in use as they have started charging for the clusters, will shift to some other NoSQL DB)
- NLTK, Sumy (for news summarization)
The reason to build my own web scraper was that third party webscraping tools charge for their News APIs providing limited content.
- Collects news of following categories from forbes
- Leadership, Businesses, world-billionaires, money, lifestyle
- For AI specific news content, it collects news from wired.com
- Collect news from different sources (wired, moneycontrol, forbes, yahoo finance, bloomberg)
- Integrate authentication to store user activities
- bookmarks
- personalized news recommendation based on past reads
- sentiment analysis based on the comments
- share the news article on any social platform
This scraper api is deployed on pythonanywhere.com (currently down due to certain dependency issues, we'll be back soon)