Web_scraping - Extracting Information of Github Topics

web scraping is a automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database.

Steps Followed

import requests which allows to send HTTP requests using python and then the HTTP request returns a Response Object with all the response data (content, encoding, status, etc).
requests.get('https://github.com') sends a GET method request to the github url and print the content of page in the HTML form using requests.text method.
We use bs4, BeautifulSoup Python library for pulling data out of HTML and XML files.
Parse the content into BeautifulSoup and iterate the topics titles by class name using doc.findall() method similary, we did for their respective descriptions and topics page link.
Created a list of extracted topics titles, their description and links.
Converted the lists in a dataframe using pandas.
Created a CSV files from extracted information.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
web scraping project.ipynb		web scraping project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web_scraping - Extracting Information of Github Topics

Steps Followed

About

Releases

Packages

Languages

akanshajais/web_scraping-github.topics.and.description-

Folders and files

Latest commit

History

Repository files navigation

Web_scraping - Extracting Information of Github Topics

Steps Followed

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages