Skip to content

Commit

Permalink
Implement Cryptographic hash functions (opensearch-project#788)
Browse files Browse the repository at this point in the history
* Implement Cryptographic hash functions

Signed-off-by: Gokul R <[email protected]>

* update documentation

Signed-off-by: Gokul R <[email protected]>

* added integration tests and updated readme file

Signed-off-by: Gokul R <[email protected]>

* format the code

Signed-off-by: Gokul R <[email protected]>

* fix integration tests

Signed-off-by: Gokul R <[email protected]>

---------

Signed-off-by: Gokul R <[email protected]>
Signed-off-by: Gokul-Radhakrishnan <[email protected]>
  • Loading branch information
Gokul-Radhakrishnan authored Oct 22, 2024
1 parent b09a6e3 commit 0b6da30
Show file tree
Hide file tree
Showing 9 changed files with 211 additions and 24 deletions.
4 changes: 4 additions & 0 deletions docs/ppl-lang/PPL-Example-Commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,10 @@ Assumptions: `a`, `b`, `c` are existing fields in `table`
- `source = table | eval f = case(a = 0, 'zero', a = 1, 'one', a = 2, 'two', a = 3, 'three', a = 4, 'four', a = 5, 'five', a = 6, 'six', a = 7, 'se7en', a = 8, 'eight', a = 9, 'nine')`
- `source = table | eval f = case(a = 0, 'zero', a = 1, 'one' else 'unknown')`
- `source = table | eval f = case(a = 0, 'zero', a = 1, 'one' else concat(a, ' is an incorrect binary digit'))`
- `source = table | eval digest = md5(fieldName) | fields digest`
- `source = table | eval digest = sha1(fieldName) | fields digest`
- `source = table | eval digest = sha2(fieldName,256) | fields digest`
- `source = table | eval digest = sha2(fieldName,512) | fields digest`

#### Fillnull
Assumptions: `a`, `b`, `c`, `d`, `e` are existing fields in `table`
Expand Down
2 changes: 2 additions & 0 deletions docs/ppl-lang/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ For additional examples see the next [documentation](PPL-Example-Commands.md).

- [`Type Conversion Functions`](functions/ppl-conversion.md)

- [`Cryptographic Functions`](functions/ppl-cryptographic.md)


---
### PPL On Spark
Expand Down
77 changes: 77 additions & 0 deletions docs/ppl-lang/functions/ppl-cryptographic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
## PPL Cryptographic Functions

### `MD5`

**Description**

Calculates the MD5 digest and returns the value as a 32 character hex string.

Usage: `md5('hello')`

**Argument type:**
- STRING
- Return type: **STRING**

Example:

os> source=people | eval `MD5('hello')` = MD5('hello') | fields `MD5('hello')`
fetched rows / total rows = 1/1
+----------------------------------+
| MD5('hello') |
|----------------------------------|
| 5d41402abc4b2a76b9719d911017c592 |
+----------------------------------+

### `SHA1`

**Description**

Returns the hex string result of SHA-1

Usage: `sha1('hello')`

**Argument type:**
- STRING
- Return type: **STRING**

Example:

os> source=people | eval `SHA1('hello')` = SHA1('hello') | fields `SHA1('hello')`
fetched rows / total rows = 1/1
+------------------------------------------+
| SHA1('hello') |
|------------------------------------------|
| aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d |
+------------------------------------------+

### `SHA2`

**Description**

Returns the hex string result of SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). The numBits indicates the desired bit length of the result, which must have a value of 224, 256, 384, 512

Usage: `sha2('hello',256)`

Usage: `sha2('hello',512)`

**Argument type:**
- STRING, INTEGER
- Return type: **STRING**

Example:

os> source=people | eval `SHA2('hello',256)` = SHA2('hello',256) | fields `SHA2('hello',256)`
fetched rows / total rows = 1/1
+------------------------------------------------------------------+
| SHA2('hello',256) |
|------------------------------------------------------------------|
| 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 |
+------------------------------------------------------------------+

os> source=people | eval `SHA2('hello',512)` = SHA2('hello',512) | fields `SHA2('hello',512)`
fetched rows / total rows = 1/1
+----------------------------------------------------------------------------------------------------------------------------------+
| SHA2('hello',512) |
|----------------------------------------------------------------------------------------------------------------------------------|
| 9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca72323c3d99ba5c11d7c7acc6e14b8c5da0c4663475c2e5c3adef46f73bcdec043 |
+----------------------------------------------------------------------------------------------------------------------------------+
Original file line number Diff line number Diff line change
Expand Up @@ -785,6 +785,42 @@ class FlintSparkPPLBuiltinFunctionITSuite
assert(results.sameElements(expectedResults))
}

test("test cryptographic hash functions - md5") {
val frame = sql(s"""
| source = $testTable | eval a = md5('Spark') = '8cde774d6f7333752ed72cacddb05126' | fields age, a
| """.stripMargin)

val results: Array[Row] = frame.collect()
val expectedResults: Array[Row] =
Array(Row(70, true), Row(30, true), Row(25, true), Row(20, true))
implicit val rowOrdering: Ordering[Row] = Ordering.by[Row, Integer](_.getAs[Integer](0))
assert(results.sorted.sameElements(expectedResults.sorted))
}

test("test cryptographic hash functions - sha1") {
val frame = sql(s"""
| source = $testTable | eval a = sha1('Spark') = '85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c' | fields age, a
| """.stripMargin)

val results: Array[Row] = frame.collect()
val expectedResults: Array[Row] =
Array(Row(70, true), Row(30, true), Row(25, true), Row(20, true))
implicit val rowOrdering: Ordering[Row] = Ordering.by[Row, Integer](_.getAs[Integer](0))
assert(results.sorted.sameElements(expectedResults.sorted))
}

test("test cryptographic hash functions - sha2") {
val frame = sql(s"""
| source = $testTable | eval a = sha2('Spark',256) = '529bc3b07127ecb7e53a4dcf1991d9152c24537d919178022b2c42657f79a26b' | fields age, a
| """.stripMargin)

val results: Array[Row] = frame.collect()
val expectedResults: Array[Row] =
Array(Row(70, true), Row(30, true), Row(25, true), Row(20, true))
implicit val rowOrdering: Ordering[Row] = Ordering.by[Row, Integer](_.getAs[Integer](0))
assert(results.sorted.sameElements(expectedResults.sorted))
}

// Todo
// +---------------------------------------+
// | Below tests are not supported (cast) |
Expand Down
5 changes: 5 additions & 0 deletions ppl-spark-integration/src/main/antlr4/OpenSearchPPLLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,11 @@ RADIANS: 'RADIANS';
SIN: 'SIN';
TAN: 'TAN';

// CRYPTOGRAPHIC FUNCTIONS
MD5: 'MD5';
SHA1: 'SHA1';
SHA2: 'SHA2';

// DATE AND TIME FUNCTIONS
ADDDATE: 'ADDDATE';
ADDTIME: 'ADDTIME';
Expand Down
8 changes: 8 additions & 0 deletions ppl-spark-integration/src/main/antlr4/OpenSearchPPLParser.g4
Original file line number Diff line number Diff line change
Expand Up @@ -508,6 +508,7 @@ evalFunctionName
| systemFunctionName
| positionFunctionName
| coalesceFunctionName
| cryptographicFunctionName
;

functionArgs
Expand Down Expand Up @@ -623,6 +624,12 @@ trigonometricFunctionName
| TAN
;

cryptographicFunctionName
: MD5
| SHA1
| SHA2
;

dateTimeFunctionName
: ADDDATE
| ADDTIME
Expand Down Expand Up @@ -954,6 +961,7 @@ keywordsCanBeId
| textFunctionName
| mathematicalFunctionName
| positionFunctionName
| cryptographicFunctionName
// commands
| SEARCH
| DESCRIBE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,11 @@ public enum BuiltinFunctionName {
SIN(FunctionName.of("sin")),
TAN(FunctionName.of("tan")),

/** Cryptographic Functions. */
MD5(FunctionName.of("md5")),
SHA1(FunctionName.of("sha1")),
SHA2(FunctionName.of("sha2")),

/** Date and Time Functions. */
ADDDATE(FunctionName.of("adddate")),
// ADDTIME(FunctionName.of("addtime")),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,30 +13,7 @@
import java.util.List;
import java.util.Map;

import static org.opensearch.sql.expression.function.BuiltinFunctionName.ADD;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.ADDDATE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DATEDIFF;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DAY_OF_MONTH;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.COALESCE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.SUBTRACT;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.MULTIPLY;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DIVIDE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.MODULUS;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DAY_OF_WEEK;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DAY_OF_YEAR;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.HOUR_OF_DAY;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.IS_NOT_NULL;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.IS_NULL;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.LENGTH;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.LOCALTIME;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.MINUTE_OF_HOUR;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.MONTH_OF_YEAR;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.SECOND_OF_MINUTE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.SUBDATE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.SYSDATE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.TRIM;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.WEEK;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.WEEK_OF_YEAR;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.*;
import static org.opensearch.sql.ppl.utils.DataTypeTransformer.seq;
import static scala.Option.empty;

Expand Down Expand Up @@ -68,6 +45,10 @@ public interface BuiltinFunctionTranslator {
.put(DATEDIFF, "datediff")
.put(LOCALTIME, "localtimestamp")
.put(SYSDATE, "now")
// Cryptographic functions
.put(MD5, "md5")
.put(SHA1, "sha1")
.put(SHA2, "sha2")
// condition functions
.put(IS_NULL, "isnull")
.put(IS_NOT_NULL, "isnotnull")
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.flint.spark.ppl

import org.opensearch.flint.spark.ppl.PlaneUtils.plan
import org.opensearch.sql.ppl.{CatalystPlanContext, CatalystQueryPlanVisitor}
import org.opensearch.sql.ppl.utils.DataTypeTransformer.seq
import org.scalatest.matchers.should.Matchers

import org.apache.spark.SparkFunSuite
import org.apache.spark.sql.catalyst.analysis.{UnresolvedAttribute, UnresolvedFunction, UnresolvedRelation, UnresolvedStar}
import org.apache.spark.sql.catalyst.expressions.{Alias, EqualTo, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual, Literal, Not}
import org.apache.spark.sql.catalyst.plans.PlanTest
import org.apache.spark.sql.catalyst.plans.logical.{Filter, Project}

class PPLLogicalPlanCryptographicFunctionsTranslatorTestSuite
extends SparkFunSuite
with PlanTest
with LogicalPlanTestUtils
with Matchers {

private val planTransformer = new CatalystQueryPlanVisitor()
private val pplParser = new PPLSyntaxParser()

test("test md5") {
val context = new CatalystPlanContext
val logPlan = planTransformer.visit(plan(pplParser, "source=t a = md5(b)"), context)

val table = UnresolvedRelation(Seq("t"))
val filterExpr = EqualTo(
UnresolvedAttribute("a"),
UnresolvedFunction("md5", seq(UnresolvedAttribute("b")), isDistinct = false))
val filterPlan = Filter(filterExpr, table)
val projectList = Seq(UnresolvedStar(None))
val expectedPlan = Project(projectList, filterPlan)
comparePlans(expectedPlan, logPlan, false)
}

test("test sha1") {
val context = new CatalystPlanContext
val logPlan = planTransformer.visit(plan(pplParser, "source=t a = sha1(b)"), context)

val table = UnresolvedRelation(Seq("t"))
val filterExpr = EqualTo(
UnresolvedAttribute("a"),
UnresolvedFunction("sha1", seq(UnresolvedAttribute("b")), isDistinct = false))
val filterPlan = Filter(filterExpr, table)
val projectList = Seq(UnresolvedStar(None))
val expectedPlan = Project(projectList, filterPlan)
comparePlans(expectedPlan, logPlan, false)
}

test("test sha2") {
val context = new CatalystPlanContext
val logPlan = planTransformer.visit(plan(pplParser, "source=t a = sha2(b,256)"), context)

val table = UnresolvedRelation(Seq("t"))
val filterExpr = EqualTo(
UnresolvedAttribute("a"),
UnresolvedFunction("sha2", seq(UnresolvedAttribute("b"), Literal(256)), isDistinct = false))
val filterPlan = Filter(filterExpr, table)
val projectList = Seq(UnresolvedStar(None))
val expectedPlan = Project(projectList, filterPlan)
comparePlans(expectedPlan, logPlan, false)
}
}

0 comments on commit 0b6da30

Please sign in to comment.