A schema is a description of one or more fields that describes the document type and how to handle the different fields of a document.
Elasticsearch has the ability to be schema-less, which means that documents can be indexed without explicitly providing a schema.
If you do not specify a mapping, Elasticsearch will by default generate one dynamically when detecting new fields in documents during indexing. However, this dynamic mapping generation comes with a few caveats:
- Detected types might not be correct.
- Uses default analyzers and settings for indexing and searching.
By explicitly specifying the schema, we can avoid these problems.
- scripts/mapping-content.json
content
index schema - scripts/mapping-twitter.json
twitter
index schema
In order to provide a mapping to Elasticsearch run commands described below.
If you've just created a database and there are no data and indices yet, please, run
# local setup
cd scripts
env "ES_URL=localhost:9200" "ES_INDEX=content" "MAPPING_FILE=mapping-content.json" ./mapping-create.sh
# remote setup
cd scripts
env "ES_URL=https://xxx:[email protected]" "ES_INDEX=content" "MAPPING_FILE=mapping-content.json" ./mapping-create.sh
If the index has been created already, please, run
# local setup
cd scripts
env "ES_URL=localhost:9200" "ES_INDEX=content" "MAPPING_FILE=mapping-content.json" ./mapping-update.sh
# remote setup
cd scripts
env "ES_URL=https://xxx:[email protected]" "ES_INDEX=content" "MAPPING_FILE=mapping-content.json" ./mapping-update.sh
Jenkins is an open source automation server. Jenkins can be used as a simple CI server or turned into the continuous delivery hub for any project. Jenkins can be easily set up and configured via its web interface, which includes on-the-fly error checks and built-in help.
Jenkins Pipeline is a suite of plugins which supports implementing and integrating continuous delivery pipelines into Jenkins. Pipeline provides an extensible set of tools for modeling simple-to-complex delivery pipelines "as code" via the Pipeline domain-specific language (DSL) syntax. The definition of a Jenkins Pipeline is written into a text file (called a Jenkinsfile) which in turn can be committed to a project’s source control repository.
For this project we are using Jenkins and Pipelines to automate web scraping and data ingestion. Currently, it is not used as a CI tool but it's functionality can be extended later.
WEB Scrapers Jenkinsfile RSS Scrapers Jenkinsfile Twitter Scrapers Jenkinsfile Topics Jenkinsfile
Create new pipeline jobs in Jenkins and copy-paste required Jenkinsfile into the Script
field. All other fields will be filled in automatically. Keep in mind, that only the script should be modified as other updates might be overriden by the script.
Kibana lets you visualize your Elasticsearch data. Kibana core ships with the classics: histograms, line graphs, pie charts, sunbursts, and more.
Kibana Tutorial – Part 1: Introduction
Kibana Tutorial – Part 2: Discover
Kibana Tutorial – Part 3: Visualize
Kibana Tutorial – Part 4: Dashboard
Kibana dashboards are stored in kibana/export.json
. Unfortunately, there is no way to automate dashboards import. Therefore it should be done manually. Go to Kibana
-> Management
-> Saved Objects
-> Import
and point it to the json file. Assign the indexes thoroughly.
If you update dashboards it might be a good idea to commit changes to git.