News-Scraper-Web

This is Scraper API which scrapes news of different categories from Forbes website

Flask
BeautifulSoup
MongoDB (Currently not in use as they have started charging for the clusters, will shift to some other NoSQL DB)
NLTK, Sumy (for news summarization)

The reason to build my own web scraper was that third party webscraping tools charge for their News APIs providing limited content.

Collects news of following categories from forbes
- Leadership, Businesses, world-billionaires, money, lifestyle
For AI specific news content, it collects news from wired.com

Collect news from different sources (wired, moneycontrol, forbes, yahoo finance, bloomberg)
Integrate authentication to store user activities
- bookmarks
- personalized news recommendation based on past reads
- sentiment analysis based on the comments
- share the news article on any social platform

This scraper api is deployed on pythonanywhere.com (currently down due to certain dependency issues, we'll be back soon)

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
news_clustering.ipynb		news_clustering.ipynb
newscraper.py		newscraper.py
requirements.txt		requirements.txt