tor-browser-crawler

experimental - PLEASE BE CAREFUL. Intended for reasearch purposes.

This is a fork of the tor-browser-crawler, updated to run correctly with updated libraries and TBB version.

Usage

This project may be run natively on the host system or in a docker container. If running natively, the required libraries and python 3.X modules must be installed. Reference the Dockerfile and requirements.txt for the list of requirements. Running through docker is easier and more reproducible. As such, this section will focus on the docker container setup.

Steps

Install Docker
- follow their documentation
- don't forget to add your user to the docker group after install
Build the docker container
- install the make utility if it is not native on your system
- run make build to compile the docker image
Setup your crawl configuration files
- replace sites.txt with the list of websites you wish to crawl
- edit Makefile to use the correct network interface of your host
- adjust the --timeout value in the Makefile to higher values if needed
- make any desired changes to config.ini
Start the crawl
- run make run to launch a container
- the logs and packet captures should appear in the newly created results directory

Notes

Library Versions
- versions of some components are important as different version combinations may be incompatible
- this project has been frozen to v10.0.10 of the TBB
- to use the latest TBB version, remove the version number from the dockerfile
- newer versions of TBB may however require different version of selenium and geckodriver
Crawler has been modified to use the tcpdump utiltity in place of dumpcap to capture traffic.
- This avoids runtime issues that exist when using dumpcap on some system configurations.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bin		bin
tbcrawler		tbcrawler
.gitignore		.gitignore
Dockerfile		Dockerfile
Entrypoint.sh		Entrypoint.sh
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.ini		config.ini
requirements.txt		requirements.txt
setup.py		setup.py
sites.txt		sites.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tor-browser-crawler

Usage

Steps

Notes

About

Releases

Packages

Languages

License

notem/tor-browser-crawler

Folders and files

Latest commit

History

Repository files navigation

tor-browser-crawler

Usage

Steps

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages