Skip to content

akanshajais/web_scraping-github.topics.and.description-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Web_scraping - Extracting Information of Github Topics

web scraping is a automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database.

Steps Followed

  1. import requests which allows to send HTTP requests using python and then the HTTP request returns a Response Object with all the response data (content, encoding, status, etc).
  2. requests.get('https://github.com') sends a GET method request to the github url and print the content of page in the HTML form using requests.text method.
  3. We use bs4, BeautifulSoup Python library for pulling data out of HTML and XML files.
  4. Parse the content into BeautifulSoup and iterate the topics titles by class name using doc.findall() method similary, we did for their respective descriptions and topics page link.
  5. Created a list of extracted topics titles, their description and links.
  6. Converted the lists in a dataframe using pandas.
  7. Created a CSV files from extracted information.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published