Skip to content

salmazz/web-scrapping-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Web Scrapping with python

Web scraping is a data extraction process used to extract data from different websites and store them in a desired file format like csv,excel etc. to perform web scraping there are few modules available in the market which can be used for web sraping.

Following are the modules used mainly for web scraping :

  • Requests
  • BeautifulSoup
  • CSV

Details to be observed :

Step 0 : Setting up the Environment

  1. In order to use the power of python to scrap websites, we can use existing libraries to get the job done.
  2. We will install the following libraries using pip :
    pip install requests
    
    pip install beautifulsoup4
    
    pip install python-csv
    

Step 1 : Fetching the HTML content

  1. In order to work with the HTML, we will have to get the HTML as a string.
  2. We will leverage the power of python requests module to get this done!
  3. The next step then will be to parse the HTML content and give it a tree like structure so that it can be traversed.

Step 2 : Parse the HTML

  1. Once the HTML is fetched using the requests as an string, we need to parse it.
  2. For parsing, we will use python's BeautifulSoup module which will create a tree like structure for our DOM.

Step 3 : HTML tree traversal

  1. Once the HTML is fetched and parsed, the next step is to manipulate the tree using BeautifulSoup's functions to get our job done.
  2. This tutorial will teach you how to get started and traverse the tree.

To Open the result in excel sheet with arabic words

  1. Open Excel on a blank workbook
  2. Within the Data tab, click on From Text button (if not activated, make sure an empty cell is selected)
  3. Browse and select the CSV file
  4. In the Text Import Wizard, change the File_origin to "Unicode (UTF-8)"
  5. Go next and from the Delimiters, select the delimiter used in your file e.g. comma
  6. Finish and select where to import the data

The Arabic characters should show correctly.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages