Web Crawler for URL Extraction

Overview

This repository contains a Python-based web crawler designed to extract URLs based on specified parameters. The primary purpose is to provide a convenient tool for gathering URLs related to medical content, with the ability to filter by categories, geography, and date range.

Features

Search for relevant URLs on Google.
Filter URLs based on primary category, secondary category, geography, and date range.
Output the results to a CSV file.

Getting Started

Prerequisites

Make sure you have the following installed:

Python (version X.X.X)
Any other specific dependencies

Installation

Clone the repository:

git clone https://github.com/your-username/web-crawler.git

Navigate to the project directory:
```
cd web-crawler
```
Install dependencies:
```
pip install -r requirements.txt
```

Usage

To run the web crawler, execute the following command:

python crawler.py --parameters parameters.json

Parameters

Provide the input parameters in a JSON object format. Here's an example:

{
  "primary_category": "Medical Journal",
  "secondary_category": "Orthopedic",
  "geography": "India",
  "date_range": "2022"
}

Adjust the parameters according to your specific requirements.

Output

The crawler will generate a CSV file containing the extracted URLs. Additional relevant data may be included in the output as deemed appropriate.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
crawler.py		crawler.py
output.csv		output.csv
output_filtered.csv		output_filtered.csv
parameters.json		parameters.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Crawler for URL Extraction

Overview

Table of Contents

Features

Getting Started

Prerequisites

Installation

Usage

Parameters

Output

About

Releases

Packages

Languages

Swrnv-qc/web-crawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler for URL Extraction

Overview

Table of Contents

Features

Getting Started

Prerequisites

Installation

Usage

Parameters

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages