You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Other libraries are not well suited for large datasets containing complex properties (such as country size polygons) which take some time to process on the java-side, as a result, naive indexers cause elasticsearch to fill up the bulk indexing threadpool which results in those batches being rejected and data loss.
What's left to do:
Write readme and explain how concurrency, retries and the cli work
Rethink and test the concurrency control mechanism to achieve optimum load
Refactor some of the code to emit events
Write a stats module which captures Transaction events and emits stat digests.
Module Goals:
☑ batched writes
☑ adjustable batch size
☑ partialy retry failed batches
☑ backpressure (flood control)
☑ concurrency setting, better highwatermark
☐ actionable error reporting
☑ elasticsearch client injectable
☑ well tested via unit tests &in production
☑ bin file, input streams from cli with id, type mapper
☑ minimal dependencies, dependency injection
☑ usable outside pelias project & not strictly tied to pelias config
☑ ensure no data loss due to ES errors or failure to flush batches
☐ healthcheck via threadpool status
☐ compatibility with different nodejs stream versions
☑ better logging - via winston
Issues with dbclient:
☑ badly named, doesnt describe purpose
☑ not abstracted from pelias
☑ strictly dependency on other pelias modules
☑ not generally useful to 3rd parties
☑ difficult for 3rd party developers to contribute
☑ untidy code
☑ not fully unit tested
☐ not well documented
Duplication across modules (causing confusion):
- https://github.com/geopipes/elasticsearch-backend
- https://github.com/pelias/esclient
- https://github.com/pelias/dbclient
Dependants:
- dat-elasticsearch-upload
- pelias-geonames
- pelias-openaddresses
- pelias-openstreetmap
Similar projects / implementations:
https://github.com/hmalphettes/elasticsearch-streams
https://www.npmjs.com/package/elasticstream
https://github.com/simianhacker/bunyan-elasticsearch/blob/master/index.js
running unit tests
$> npm test
running integration tests
$> npm run integration
The text was updated successfully, but these errors were encountered:
This lib is currently incomplete, although it is not far off being worthy of publishing.
This lib stands to replace both
pelias/dbclient
and the olderpelias/esclient
modules.The key points of differentiation from other streaming elasticsearch indexers are:
Other libraries are not well suited for large datasets containing complex properties (such as country size polygons) which take some time to process on the java-side, as a result, naive indexers cause elasticsearch to fill up the bulk indexing threadpool which results in those batches being rejected and data loss.
What's left to do:
running unit tests
running integration tests
$> npm run integration
The text was updated successfully, but these errors were encountered: