Web Scrapping with python

Web scraping is a data extraction process used to extract data from different websites and store them in a desired file format like csv,excel etc. to perform web scraping there are few modules available in the market which can be used for web sraping.

Following are the modules used mainly for web scraping :

Requests
BeautifulSoup
CSV

Details to be observed :

Step 0 : Setting up the Environment

In order to use the power of python to scrap websites, we can use existing libraries to get the job done.

We will install the following libraries using pip :

pip install requests

pip install beautifulsoup4

pip install python-csv

Step 1 : Fetching the HTML content

In order to work with the HTML, we will have to get the HTML as a string.
We will leverage the power of python requests module to get this done!
The next step then will be to parse the HTML content and give it a tree like structure so that it can be traversed.

Step 2 : Parse the HTML

Once the HTML is fetched using the requests as an string, we need to parse it.
For parsing, we will use python's BeautifulSoup module which will create a tree like structure for our DOM.

Step 3 : HTML tree traversal

Once the HTML is fetched and parsed, the next step is to manipulate the tree using BeautifulSoup's functions to get our job done.
This tutorial will teach you how to get started and traverse the tree.

To Open the result in excel sheet with arabic words

Open Excel on a blank workbook
Within the Data tab, click on From Text button (if not activated, make sure an empty cell is selected)
Browse and select the CSV file
In the Text Import Wizard, change the File_origin to "Unicode (UTF-8)"
Go next and from the Delimiters, select the delimiter used in your file e.g. comma
Finish and select where to import the data

The Arabic characters should show correctly.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
readme.md		readme.md
yalla-kora.py		yalla-kora.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scrapping with python

Following are the modules used mainly for web scraping :

Details to be observed :

Step 0 : Setting up the Environment

Step 1 : Fetching the HTML content

Step 2 : Parse the HTML

Step 3 : HTML tree traversal

To Open the result in excel sheet with arabic words

About

Releases

Packages

Languages

salmazz/web-scrapping-python

Folders and files

Latest commit

History

Repository files navigation

Web Scrapping with python

Following are the modules used mainly for web scraping :

Details to be observed :

Step 0 : Setting up the Environment

Step 1 : Fetching the HTML content

Step 2 : Parse the HTML

Step 3 : HTML tree traversal

To Open the result in excel sheet with arabic words

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages