Welcome to git repository for SME crawler project. For historical reason, it is called Kemenperin crawler instead of SME crawler. Now it can crawl multiple websites. Supported websites are listed in this google sheet. The statistics for each site can be seen on the same sheet.
The result of crawling Kemenperin site is stored in crawled-data
directory. While
the result of crawling from other sites are stored in crawled-data-1
.
Exporter Crawler for Kemenperin Site. Built using NodeJS with axios + cheerio.
- NodeJS version >= 8
$ npm install
$ npm start
Data will be generated at data.csv
These two process are used to generate the data to produce heatmap.
$ node geocoder.js
$ node transformer.js
The sourcecode of Indonetwork crawler are included in scrapy_indonetwork
directory.
Telpon info crawler are contained in telponinfo.js
, to run it use
$ npm run telpon
To have more visibility on the result of the crawling, you can use the analytics.js
to analyze the CSVs. It simply count the number of data for each CSVs from the crawling
result. To run it, use the following command:
$ npm run analytics