Skip to content

Spark Accelerator framework ; It enables secondary indices to remote data stores.

License

Notifications You must be signed in to change notification settings

seankao-az/opensearch-spark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenSearch Flint

OpenSearch Flint is ... It consists of four modules:

  • flint-core: a module that contains Flint specification and client.
  • flint-commons: a module that provides a shared library of utilities and common functionalities, designed to easily extend Flint's capabilities.
  • flint-spark-integration: a module that provides Spark integration for Flint and derived dataset based on it.
  • ppl-spark-integration: a module that provides PPL query execution on top of Spark See PPL repository.

Documentation

Please refer to the Flint Index Reference Manual for more information.

PPL-Language

Prerequisites

Version compatibility:

Flint version JDK version Spark version Scala version OpenSearch
0.1.0 11+ 3.3.1 2.12.14 2.6+
0.2.0 11+ 3.3.1 2.12.14 2.6+
0.3.0 11+ 3.3.2 2.12.14 2.13+
0.4.0 11+ 3.3.2 2.12.14 2.13+
0.5.0 11+ 3.5.1 2.12.14 2.17+
0.6.0 11+ 3.5.1 2.12.14 2.17+
0.7.0 11+ 3.5.1 2.12.14 2.17+

Flint Extension Usage

To use this application, you can run Spark with Flint extension:

spark-sql --conf "spark.sql.extensions=org.opensearch.flint.spark.FlintSparkExtensions"

PPL Extension Usage

To use PPL to Spark translation, you can run Spark with PPL extension:

spark-sql --conf "spark.sql.extensions=org.opensearch.flint.spark.FlintPPLSparkExtensions"

Running With both Extension

spark-sql --conf "spark.sql.extensions=org.opensearch.flint.spark.FlintPPLSparkExtensions,org.opensearch.flint.spark.FlintSparkExtensions"

Build

To build and run this application with Spark, you can run (requires Java 11):

sbt clean standaloneCosmetic/publishM2

then add org.opensearch:opensearch-spark-standalone_2.12 when run spark application, for example,

bin/spark-shell --packages "org.opensearch:opensearch-spark-standalone_2.12:0.7.0-SNAPSHOT" \
                --conf "spark.sql.extensions=org.opensearch.flint.spark.FlintSparkExtensions" \
                --conf "spark.sql.catalog.dev=org.apache.spark.opensearch.catalog.OpenSearchCatalog"

PPL Build & Run

To build and run this PPL in Spark, you can run (requires Java 11):

sbt clean sparkPPLCosmetic/publishM2

Then add org.opensearch:opensearch-spark-ppl_2.12 when run spark application, for example,

bin/spark-shell --packages "org.opensearch:opensearch-spark-ppl_2.12:0.7.0-SNAPSHOT" \
                --conf "spark.sql.extensions=org.opensearch.flint.spark.FlintPPLSparkExtensions" \
                --conf "spark.sql.catalog.dev=org.apache.spark.opensearch.catalog.OpenSearchCatalog"

PPL Run queries on a local spark cluster

See ppl usage sample on local spark cluster PPL on local spark

Running integration tests on a local spark cluster

See integration test documentation Docker Integration Tests

Code of Conduct

This project has adopted an Open Source Code of Conduct.

Security

If you discover a potential security issue in this project we ask that you notify OpenSearch Security directly via email to [email protected]. Please do not create a public GitHub issue.

License

See the LICENSE file for our project's licensing. We will ask you to confirm the licensing of your contribution.

Copyright

Copyright OpenSearch Contributors. See NOTICE for details.

About

Spark Accelerator framework ; It enables secondary indices to remote data stores.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 72.0%
  • Java 25.9%
  • ANTLR 1.7%
  • Other 0.4%