TumblrCollector

Desc

this is tumblr collector tools, improve tumblr-crawler.

Setup

$ git clone https://github.com/webignorant/tumblr_collector.git TumblrCollector
$ cd TumblrCollector
$ pip install -r requirements.txt

Config

conf_sites.txt

Set up the collected site tumblr site name, one per line!

conf.json

custom collector conf

{
    // threads num
    "THREADS": 10,
    "REQUEST": {
        "TIMEOUT": 10,
        "RETRY": 5,
        "OFFSET": 0,
        "LIMIT": 50,
        "IS_DOWNLOAD_IMG": true,
        "IS_DOWNLOAD_VIDEO": true,
        "IS_DOWNLOAD_TEXT": true
    },
    "LOG": {
        "FORCE_POSTS_LOG": false
    }
}

conf_proxies.json

use requests proxy

{
    "http": "127.0.0.1:8787",
    "https": "127.0.0.1:8787",
    "http": "socks5://user:pass@host:port",
    "https": "socks5://127.0.0.1:1080"
}

Run

python tumblr-collector.py

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conf.json		conf.json
conf_proxies.json		conf_proxies.json
conf_sites.txt		conf_sites.txt
requirements.txt		requirements.txt
run.bat		run.bat
run.sh		run.sh
setup.py		setup.py
tumblr-collector.py		tumblr-collector.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TumblrCollector

Desc

Setup

Config

conf_sites.txt

conf.json

conf_proxies.json

Run

About

Releases

Packages

Languages

License

webignorant/tumblr_collector

Folders and files

Latest commit

History

Repository files navigation

TumblrCollector

Desc

Setup

Config

conf_sites.txt

conf.json

conf_proxies.json

Run

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages