Predator

Lightweight crawler framework based on template analysis.

Dependencies

APScheduler==3.1.0
beautifulsoup4==4.4.1
dulwich==0.12.0
PyYAML==3.11
selenium==2.53.2

Need to install cyaml to support CSafeDumper/CSafeLoader.

Downloader

Support urllib downloader and webdriver downloader.
Set cookies to login when using urllib.
Set profile path to login when using webdriver download.

Parser

Use beautifulsoup to parse downloaded html.

Storage

Use filesystem or git interface to storage data.

Develop

Downloader/Parser/Storage can be very simple to customize.
Add-on functions 'development by using Class inheritance.

DEMO

The project has a number of examples of climbing up cv,
you can call them by using these command:

Download CV:

    python tools/plan.py jobs.cv.youprojectname sources.industry_needed

Download Classify:

    python tools/plan.py jobs.classify.youprojectname sources.industry_needed

You can add several files you need ,but this files should be split by ',' , such as:

    python tools/plan.py jobs.classify.youprojectname sources.industry1,sources.industry2,sources.industry3

You can resume download by add '-r' or '--resume' like this:

    python tools/plan.py jobs.classify.youprojectname sources.industry_needed -r

Name		Name	Last commit message	Last commit date
Latest commit History 387 Commits
downloader		downloader
jobs		jobs
precedure		precedure
sources		sources
storage		storage
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
industry.py		industry.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predator

Dependencies

Downloader

Parser

Storage

Develop

DEMO

About

Releases

Packages

Languages

License

winky216/predator

Folders and files

Latest commit

History

Repository files navigation

Predator

Dependencies

Downloader

Parser

Storage

Develop

DEMO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages