Skip to content

Commit

Permalink
docs: Update installation and usage instructions for uv
Browse files Browse the repository at this point in the history
  • Loading branch information
neiromaster committed Jan 15, 2025
1 parent 543cc7f commit bb95b78
Showing 1 changed file with 20 additions and 20 deletions.
40 changes: 20 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,55 +4,55 @@ This script allows you to extract unique domains visited when browsing a given U

## Features

- Opens a URL in a browser using Playwright.
- Opens a URL in a browser.
- Allows user interaction with the browser.
- Extracts all unique domains visited during the browsing session.
- Saves the list of unique domains to a text file.
- Appends new domains to an existing file, avoiding duplicates.
- Generates a filename based on the URL.
- Uses `argparse` to accept the URL as a command-line argument.

## Usage

To use the script, you need to have Python and Playwright installed.
To use the script, you need to have `uv` installed.

1. **Install Playwright:**
1. **Install uv:**

Follow the instructions for your system at: https://docs.astral.sh/uv/getting-started/installation/

2. **Install Playwright and Chromium:**

```bash
pip install playwright
playwright install
uvx playwright install chromium
```

2. **Run the script:**

3. **Run the script:**
```bash
python -m src.domain_collector <URL>
uvx domain-collector <URL>
```

Replace `<URL>` with the URL you want to open in the browser. For example:

```bash
python -m src.domain_collector https://www.example.com
uvx domain-collector https://ya.ru
```

You can also provide a URL without the scheme (e.g., `www.example.com`), and the script will automatically add `https://`.

3. **Interact with the browser:**
You can also provide a URL without the scheme (e.g., `ya.ru`), and the script will automatically add `https://`.
4. **Interact with the browser:**

The script will open a browser window. You can interact with the page as you normally would.

4. **Close the browser:**
5. **Close the browser:**

After you are done browsing, press Enter in the terminal to close the browser and save the domains.
After you are done browsing, press `Enter` in the terminal to close the browser and save the domains.

5. **Output:**
6. **Output:**

The script will save the unique domains to a file named `<domain>_domains.txt` (e.g., `example_com_domains.txt`) in the same directory where you ran the script. If the file already exists, new domains will be added to the existing list, avoiding duplicates.
The script will save the unique domains to a file named `<domain>_domains.txt` (e.g., `ya_ru_domains.txt`) in the same directory where you ran the script. If the file already exists, new domains will be added to the existing list, avoiding duplicates.

## Example

```bash
python -m src.domain_collector https://www.wikipedia.org
uvx domain-collector https://www.wikipedia.org
```

This will open the Wikipedia homepage in a browser. After you interact with the page and close the browser, the script will save the visited domains to a file named `wikipedia_org_domains.txt`.
Expand Down

0 comments on commit bb95b78

Please sign in to comment.