Skip to content

Commit

Permalink
1.0.6.1-GA (#18)
Browse files Browse the repository at this point in the history
* feat: add connector config and connector stats update functions

* Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs

* feat: added descriptions for default configurations

* feat: added descriptions for default configurations

* feat: modified kafka connector input topic

* feat: obsrv setup instructions

* feat: revisiting open source features

* feat: masterdata processor job config

* Build deploy v2 (#19)

* #0 - Refactor Dockerfile and Github actions workflow
---------

Co-authored-by: Santhosh Vasabhaktula <[email protected]>
Co-authored-by: ManojCKrishna <[email protected]>

* Update DatasetModels.scala

* Release 1.3.0 into Main branch (#34)

* testing new images

* testing new images

* testing new images

* testing new images

* testing new images

* build new image with bug fixes

* update dockerfile

* update dockerfile

* #0 fix: upgrade packages

* #0 feat: add flink dockerfiles

* #0 fix: add individual extraction

* Issue #0 fix: upgrade ubuntu packages for vulnerabilities

* #0 fix: update github actions release condition

---------

Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>

* Update DatasetModels.scala

* Issue #2 feat: Remove kafka connector code

* feat: add function to get all datasets

* Release 1.3.1 into Main (#43)

* testing new images

* testing new images

* testing new images

* testing new images

* testing new images

* build new image with bug fixes

* update dockerfile

* update dockerfile

* #0 fix: upgrade packages

* #0 feat: add flink dockerfiles

* feat: update all failed, invalid and duplicate topic names

* feat: update kafka topic names in test cases

* #0 fix: add individual extraction

* feat: update failed event

* Update ErrorConstants.scala

* feat: update failed event

* Issue #0 fix: upgrade ubuntu packages for vulnerabilities

* feat: add exception handling for json deserialization

* Update BaseProcessFunction.scala

* Update BaseProcessFunction.scala

* feat: update batch failed event generation

* Update ExtractionFunction.scala

* feat: update invalid json exception handling

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 fix: remove cloning object

* Issue #46 feat: update batch failed event

* #0 fix: update github actions release condition

* Issue #46 feat: add error reasons

* Issue #46 feat: add exception stack trace

* Issue #46 feat: add exception stack trace

* Release 1.3.1 Changes (#42)

* Dataset enhancements (#38)

* feat: add connector config and connector stats update functions
* Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs
* Update DatasetModels.scala
* #0 fix: upgrade packages
* #0 feat: add flink dockerfiles
* #0 fix: add individual extraction

---------

Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>

* #0000 [SV] - Fallback to local redis instance if embedded redis is not starting

* Update DatasetModels.scala

* #0000 - refactor the denormalization logic
1. Do not fail the denormalization if the denorm key is missing
2. Add clear message whether the denorm is sucessful or failed or partially successful
3. Handle denorm for both text and number fields

* #0000 - refactor:
1. Created a enum for dataset status and ignore events if the dataset is not in Live status
2. Created a outputtag for denorm failed stats
3. Parse event validation failed messages into a case class

* #0000 - refactor:
1. Updated the DruidRouter job to publish data to router topics dynamically
2. Updated framework to created dynamicKafkaSink object

* #0000 - mega refactoring:
1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres
2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures
3. Refactored serde - merged map and string serialization into one function and parameterized the function
4. Moved failed events sinking into a common base class
5. Master dataset processor can now do denormalization with another master dataset as well

* #0000 - mega refactoring:
1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres
2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures
3. Refactored serde - merged map and string serialization into one function and parameterized the function
4. Moved failed events sinking into a common base class
5. Master dataset processor can now do denormalization with another master dataset as well

* #0000 - mega refactoring:
1. Added validation to check if the event has a timestamp key and it is not blank nor invalid
2. Added timezone handling to store the data in druid in the TZ specified by the dataset


* #0000 - minor refactoring: Updated DatasetRegistry.getDatasetSourceConfig to getAllDatasetSourceConfig

* #0000 - mega refactoring: Refactored logs, error messages and metrics

* #0000 - mega refactoring: Fix unit tests

* #0000 - refactoring:
1. Introduced transformation mode to enable lenient transformations
2. Proper exception handling for transformer job

* #0000 - refactoring: Fix test cases and code

* #0000 - refactoring: upgrade embedded redis to work with macos sonoma m2

* #0000 - refactoring: Denormalizer test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Router test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Validator test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Framework test cases and bug fixes

* #0000 - refactoring: kafka connector test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: improve code coverage and fix bugs

* #0000 - refactoring: improve code coverage and fix bugs --- Now the code coverage is 100%

* #0000 - refactoring: organize imports

* #0000 - refactoring:
1. transformer test cases and bug fixes - code coverage is 100%

* #0000 - refactoring: test cases and bug fixes

---------

Co-authored-by: shiva-rakshith <[email protected]>
Co-authored-by: Aniket Sakinala <[email protected]>
Co-authored-by: Manjunath Davanam <[email protected]>
Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>
Co-authored-by: Anand Parthasarathy <[email protected]>

* #000:feat: Removed the provided scope of the kafka-client in the framework (#40)

* #0000 - feat: Add dataset-type to system events (#41)

* #0000 - feat: Add dataset-type to system events

* #0000 - feat: Modify tests for dataset-type in system events

* #0000 - feat: Remove unused getDatasetType function

* #0000 - feat: Remove unused pom test dependencies

* #0000 - feat: Remove unused pom test dependencies

---------

Co-authored-by: Santhosh <[email protected]>
Co-authored-by: shiva-rakshith <[email protected]>
Co-authored-by: Aniket Sakinala <[email protected]>
Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>
Co-authored-by: Anand Parthasarathy <[email protected]>

* Main conflicts fixes (#44)

* feat: add connector config and connector stats update functions

* Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs

* Update DatasetModels.scala

* Release 1.3.0 into Main branch (#34)

* testing new images

* testing new images

* testing new images

* testing new images

* testing new images

* build new image with bug fixes

* update dockerfile

* update dockerfile

* #0 fix: upgrade packages

* #0 feat: add flink dockerfiles

* #0 fix: add individual extraction

* Issue #0 fix: upgrade ubuntu packages for vulnerabilities

* #0 fix: update github actions release condition

---------

Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>

* Update DatasetModels.scala

* Issue #2 feat: Remove kafka connector code

* feat: add function to get all datasets

* #000:feat: Resolve conflicts

---------

Co-authored-by: shiva-rakshith <[email protected]>
Co-authored-by: Aniket Sakinala <[email protected]>
Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>
Co-authored-by: Santhosh <[email protected]>
Co-authored-by: Anand Parthasarathy <[email protected]>
Co-authored-by: Ravi Mula <[email protected]>

---------

Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: shiva-rakshith <[email protected]>
Co-authored-by: Manjunath Davanam <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>
Co-authored-by: Santhosh <[email protected]>
Co-authored-by: Aniket Sakinala <[email protected]>
Co-authored-by: Anand Parthasarathy <[email protected]>
Co-authored-by: Ravi Mula <[email protected]>

* update workflow file to skip tests (#45)

* Release 1.3.1 into Main (#49)

* testing new images

* testing new images

* testing new images

* testing new images

* testing new images

* build new image with bug fixes

* update dockerfile

* update dockerfile

* #0 fix: upgrade packages

* #0 feat: add flink dockerfiles

* feat: update all failed, invalid and duplicate topic names

* feat: update kafka topic names in test cases

* #0 fix: add individual extraction

* feat: update failed event

* Update ErrorConstants.scala

* feat: update failed event

* Issue #0 fix: upgrade ubuntu packages for vulnerabilities

* feat: add exception handling for json deserialization

* Update BaseProcessFunction.scala

* Update BaseProcessFunction.scala

* feat: update batch failed event generation

* Update ExtractionFunction.scala

* feat: update invalid json exception handling

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 fix: remove cloning object

* Issue #46 feat: update batch failed event

* #0 fix: update github actions release condition

* Issue #46 feat: add error reasons

* Issue #46 feat: add exception stack trace

* Issue #46 feat: add exception stack trace

* Release 1.3.1 Changes (#42)

* Dataset enhancements (#38)

* feat: add connector config and connector stats update functions
* Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs
* Update DatasetModels.scala
* #0 fix: upgrade packages
* #0 feat: add flink dockerfiles
* #0 fix: add individual extraction

---------

Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>

* #0000 [SV] - Fallback to local redis instance if embedded redis is not starting

* Update DatasetModels.scala

* #0000 - refactor the denormalization logic
1. Do not fail the denormalization if the denorm key is missing
2. Add clear message whether the denorm is sucessful or failed or partially successful
3. Handle denorm for both text and number fields

* #0000 - refactor:
1. Created a enum for dataset status and ignore events if the dataset is not in Live status
2. Created a outputtag for denorm failed stats
3. Parse event validation failed messages into a case class

* #0000 - refactor:
1. Updated the DruidRouter job to publish data to router topics dynamically
2. Updated framework to created dynamicKafkaSink object

* #0000 - mega refactoring:
1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres
2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures
3. Refactored serde - merged map and string serialization into one function and parameterized the function
4. Moved failed events sinking into a common base class
5. Master dataset processor can now do denormalization with another master dataset as well

* #0000 - mega refactoring:
1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres
2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures
3. Refactored serde - merged map and string serialization into one function and parameterized the function
4. Moved failed events sinking into a common base class
5. Master dataset processor can now do denormalization with another master dataset as well

* #0000 - mega refactoring:
1. Added validation to check if the event has a timestamp key and it is not blank nor invalid
2. Added timezone handling to store the data in druid in the TZ specified by the dataset


* #0000 - minor refactoring: Updated DatasetRegistry.getDatasetSourceConfig to getAllDatasetSourceConfig

* #0000 - mega refactoring: Refactored logs, error messages and metrics

* #0000 - mega refactoring: Fix unit tests

* #0000 - refactoring:
1. Introduced transformation mode to enable lenient transformations
2. Proper exception handling for transformer job

* #0000 - refactoring: Fix test cases and code

* #0000 - refactoring: upgrade embedded redis to work with macos sonoma m2

* #0000 - refactoring: Denormalizer test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Router test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Validator test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Framework test cases and bug fixes

* #0000 - refactoring: kafka connector test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: improve code coverage and fix bugs

* #0000 - refactoring: improve code coverage and fix bugs --- Now the code coverage is 100%

* #0000 - refactoring: organize imports

* #0000 - refactoring:
1. transformer test cases and bug fixes - code coverage is 100%

* #0000 - refactoring: test cases and bug fixes

---------

Co-authored-by: shiva-rakshith <[email protected]>
Co-authored-by: Aniket Sakinala <[email protected]>
Co-authored-by: Manjunath Davanam <[email protected]>
Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>
Co-authored-by: Anand Parthasarathy <[email protected]>

* #000:feat: Removed the provided scope of the kafka-client in the framework (#40)

* #0000 - feat: Add dataset-type to system events (#41)

* #0000 - feat: Add dataset-type to system events

* #0000 - feat: Modify tests for dataset-type in system events

* #0000 - feat: Remove unused getDatasetType function

* #0000 - feat: Remove unused pom test dependencies

* #0000 - feat: Remove unused pom test dependencies

---------

Co-authored-by: Santhosh <[email protected]>
Co-authored-by: shiva-rakshith <[email protected]>
Co-authored-by: Aniket Sakinala <[email protected]>
Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>
Co-authored-by: Anand Parthasarathy <[email protected]>

* Main conflicts fixes (#44)

* feat: add connector config and connector stats update functions

* Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs

* Update DatasetModels.scala

* Release 1.3.0 into Main branch (#34)

* testing new images

* testing new images

* testing new images

* testing new images

* testing new images

* build new image with bug fixes

* update dockerfile

* update dockerfile

* #0 fix: upgrade packages

* #0 feat: add flink dockerfiles

* #0 fix: add individual extraction

* Issue #0 fix: upgrade ubuntu packages for vulnerabilities

* #0 fix: update github actions release condition

---------

Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>

* Update DatasetModels.scala

* Issue #2 feat: Remove kafka connector code

* feat: add function to get all datasets

* #000:feat: Resolve conflicts

---------

Co-authored-by: shiva-rakshith <[email protected]>
Co-authored-by: Aniket Sakinala <[email protected]>
Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>
Co-authored-by: Santhosh <[email protected]>
Co-authored-by: Anand Parthasarathy <[email protected]>
Co-authored-by: Ravi Mula <[email protected]>

* #0000 - fix: Fix null dataset_type in DruidRouterFunction (#48)

---------

Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: shiva-rakshith <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>
Co-authored-by: Santhosh <[email protected]>
Co-authored-by: Aniket Sakinala <[email protected]>
Co-authored-by: Anand Parthasarathy <[email protected]>
Co-authored-by: Ravi Mula <[email protected]>

* Develop to Release-1.0.0-GA (#52) (#53)

* testing new images

* testing new images

* testing new images

* testing new images

* testing new images

* build new image with bug fixes

* update dockerfile

* update dockerfile

* #0 fix: upgrade packages

* #0 feat: add flink dockerfiles

* feat: update all failed, invalid and duplicate topic names

* feat: update kafka topic names in test cases

* #0 fix: add individual extraction

* feat: update failed event

* Update ErrorConstants.scala

* feat: update failed event

* Issue #0 fix: upgrade ubuntu packages for vulnerabilities

* feat: add exception handling for json deserialization

* Update BaseProcessFunction.scala

* Update BaseProcessFunction.scala

* feat: update batch failed event generation

* Update ExtractionFunction.scala

* feat: update invalid json exception handling

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 fix: remove cloning object

* Issue #46 feat: update batch failed event

* #0 fix: update github actions release condition

* Issue #46 feat: add error reasons

* Issue #46 feat: add exception stack trace

* Issue #46 feat: add exception stack trace

* Dataset enhancements (#38)

* feat: add connector config and connector stats update functions
* Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs
* Update DatasetModels.scala
* #0 fix: upgrade packages
* #0 feat: add flink dockerfiles
* #0 fix: add individual extraction

---------





* #0000 [SV] - Fallback to local redis instance if embedded redis is not starting

* Update DatasetModels.scala

* #0000 - refactor the denormalization logic
1. Do not fail the denormalization if the denorm key is missing
2. Add clear message whether the denorm is sucessful or failed or partially successful
3. Handle denorm for both text and number fields

* #0000 - refactor:
1. Created a enum for dataset status and ignore events if the dataset is not in Live status
2. Created a outputtag for denorm failed stats
3. Parse event validation failed messages into a case class

* #0000 - refactor:
1. Updated the DruidRouter job to publish data to router topics dynamically
2. Updated framework to created dynamicKafkaSink object

* #0000 - mega refactoring:
1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres
2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures
3. Refactored serde - merged map and string serialization into one function and parameterized the function
4. Moved failed events sinking into a common base class
5. Master dataset processor can now do denormalization with another master dataset as well

* #0000 - mega refactoring:
1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres
2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures
3. Refactored serde - merged map and string serialization into one function and parameterized the function
4. Moved failed events sinking into a common base class
5. Master dataset processor can now do denormalization with another master dataset as well

* #0000 - mega refactoring:
1. Added validation to check if the event has a timestamp key and it is not blank nor invalid
2. Added timezone handling to store the data in druid in the TZ specified by the dataset


* #0000 - minor refactoring: Updated DatasetRegistry.getDatasetSourceConfig to getAllDatasetSourceConfig

* #0000 - mega refactoring: Refactored logs, error messages and metrics

* #0000 - mega refactoring: Fix unit tests

* #0000 - refactoring:
1. Introduced transformation mode to enable lenient transformations
2. Proper exception handling for transformer job

* #0000 - refactoring: Fix test cases and code

* #0000 - refactoring: upgrade embedded redis to work with macos sonoma m2

* #0000 - refactoring: Denormalizer test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Router test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Validator test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Framework test cases and bug fixes

* #0000 - refactoring: kafka connector test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: improve code coverage and fix bugs

* #0000 - refactoring: improve code coverage and fix bugs --- Now the code coverage is 100%

* #0000 - refactoring: organize imports

* #0000 - refactoring:
1. transformer test cases and bug fixes - code coverage is 100%

* #0000 - refactoring: test cases and bug fixes

---------









* #000:feat: Removed the provided scope of the kafka-client in the framework (#40)

* #0000 - feat: Add dataset-type to system events (#41)

* #0000 - feat: Add dataset-type to system events

* #0000 - feat: Modify tests for dataset-type in system events

* #0000 - feat: Remove unused getDatasetType function

* #0000 - feat: Remove unused pom test dependencies

* #0000 - feat: Remove unused pom test dependencies

* #67 feat: query system configurations from meta store

* #67 fix: Refactor system configuration retrieval and update dynamic router function

* #67 fix: update system config according to review

* #67 fix: update test cases for system config

* #67 fix: update default values in test cases

* #67 fix: add get all system settings method and update test cases

* #67 fix: add test case for covering exception case

* #67 fix: fix data types in test cases

* #67 fix: Refactor event indexing in DynamicRouterFunction

* Issue #67 refactor: SystemConfig read from DB implementation

* #226 fix: update test cases according to the refactor

---------

Co-authored-by: Manjunath Davanam <[email protected]>
Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: shiva-rakshith <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>
Co-authored-by: Santhosh <[email protected]>
Co-authored-by: Aniket Sakinala <[email protected]>
Co-authored-by: Anand Parthasarathy <[email protected]>

* Develop to 1.0.1-GA (#59) (#60)

* testing new images

* testing new images

* testing new images

* testing new images

* testing new images

* build new image with bug fixes

* update dockerfile

* update dockerfile

* #0 fix: upgrade packages

* #0 feat: add flink dockerfiles

* feat: update all failed, invalid and duplicate topic names

* feat: update kafka topic names in test cases

* #0 fix: add individual extraction

* feat: update failed event

* Update ErrorConstants.scala

* feat: update failed event

* Issue #0 fix: upgrade ubuntu packages for vulnerabilities

* feat: add exception handling for json deserialization

* Update BaseProcessFunction.scala

* Update BaseProcessFunction.scala

* feat: update batch failed event generation

* Update ExtractionFunction.scala

* feat: update invalid json exception handling

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 fix: remove cloning object

* Issue #46 feat: update batch failed event

* #0 fix: update github actions release condition

* Issue #46 feat: add error reasons

* Issue #46 feat: add exception stack trace

* Issue #46 feat: add exception stack trace

* Dataset enhancements (#38)

* feat: add connector config and connector stats update functions
* Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs
* Update DatasetModels.scala
* #0 fix: upgrade packages
* #0 feat: add flink dockerfiles
* #0 fix: add individual extraction

---------





* #0000 [SV] - Fallback to local redis instance if embedded redis is not starting

* Update DatasetModels.scala

* #0000 - refactor the denormalization logic
1. Do not fail the denormalization if the denorm key is missing
2. Add clear message whether the denorm is sucessful or failed or partially successful
3. Handle denorm for both text and number fields

* #0000 - refactor:
1. Created a enum for dataset status and ignore events if the dataset is not in Live status
2. Created a outputtag for denorm failed stats
3. Parse event validation failed messages into a case class

* #0000 - refactor:
1. Updated the DruidRouter job to publish data to router topics dynamically
2. Updated framework to created dynamicKafkaSink object

* #0000 - mega refactoring:
1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres
2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures
3. Refactored serde - merged map and string serialization into one function and parameterized the function
4. Moved failed events sinking into a common base class
5. Master dataset processor can now do denormalization with another master dataset as well

* #0000 - mega refactoring:
1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres
2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures
3. Refactored serde - merged map and string serialization into one function and parameterized the function
4. Moved failed events sinking into a common base class
5. Master dataset processor can now do denormalization with another master dataset as well

* #0000 - mega refactoring:
1. Added validation to check if the event has a timestamp key and it is not blank nor invalid
2. Added timezone handling to store the data in druid in the TZ specified by the dataset


* #0000 - minor refactoring: Updated DatasetRegistry.getDatasetSourceConfig to getAllDatasetSourceConfig

* #0000 - mega refactoring: Refactored logs, error messages and metrics

* #0000 - mega refactoring: Fix unit tests

* #0000 - refactoring:
1. Introduced transformation mode to enable lenient transformations
2. Proper exception handling for transformer job

* #0000 - refactoring: Fix test cases and code

* #0000 - refactoring: upgrade embedded redis to work with macos sonoma m2

* #0000 - refactoring: Denormalizer test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Router test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Validator test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Framework test cases and bug fixes

* #0000 - refactoring: kafka connector test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: improve code coverage and fix bugs

* #0000 - refactoring: improve code coverage and fix bugs --- Now the code coverage is 100%

* #0000 - refactoring: organize imports

* #0000 - refactoring:
1. transformer test cases and bug fixes - code coverage is 100%

* #0000 - refactoring: test cases and bug fixes

---------









* #000:feat: Removed the provided scope of the kafka-client in the framework (#40)

* #0000 - feat: Add dataset-type to system events (#41)

* #0000 - feat: Add dataset-type to system events

* #0000 - feat: Modify tests for dataset-type in system events

* #0000 - feat: Remove unused getDatasetType function

* #0000 - feat: Remove unused pom test dependencies

* #0000 - feat: Remove unused pom test dependencies

* #67 feat: query system configurations from meta store

* #67 fix: Refactor system configuration retrieval and update dynamic router function

* #67 fix: update system config according to review

* #67 fix: update test cases for system config

* #67 fix: update default values in test cases

* #67 fix: add get all system settings method and update test cases

* #67 fix: add test case for covering exception case

* #67 fix: fix data types in test cases

* #67 fix: Refactor event indexing in DynamicRouterFunction

* Issue #67 refactor: SystemConfig read from DB implementation

* #226 fix: update test cases according to the refactor

* Dataset Registry Update (#57)

* Issue #0000: feat: updateConnectorStats method includes last run timestamp

* Issue #0000: fix: updateConnectorStats sql query updated

* Issue #0000: fix: updateConnectorStats sql query updated

---------

Co-authored-by: Manjunath Davanam <[email protected]>
Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: shiva-rakshith <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>
Co-authored-by: Santhosh <[email protected]>
Co-authored-by: Aniket Sakinala <[email protected]>
Co-authored-by: Anand Parthasarathy <[email protected]>
Co-authored-by: Shreyas Bhaktharam <[email protected]>

* Develop to 1.0.2-GA (#65) (#66)

* testing new images

* testing new images

* testing new images

* testing new images

* testing new images

* build new image with bug fixes

* update dockerfile

* update dockerfile

* #0 fix: upgrade packages

* #0 feat: add flink dockerfiles

* feat: update all failed, invalid and duplicate topic names

* feat: update kafka topic names in test cases

* #0 fix: add individual extraction

* feat: update failed event

* Update ErrorConstants.scala

* feat: update failed event

* Issue #0 fix: upgrade ubuntu packages for vulnerabilities

* feat: add exception handling for json deserialization

* Update BaseProcessFunction.scala

* Update BaseProcessFunction.scala

* feat: update batch failed event generation

* Update ExtractionFunction.scala

* feat: update invalid json exception handling

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 feat: update batch failed event

* Issue #46 fix: remove cloning object

* Issue #46 feat: update batch failed event

* #0 fix: update github actions release condition

* Issue #46 feat: add error reasons

* Issue #46 feat: add exception stack trace

* Issue #46 feat: add exception stack trace

* Dataset enhancements (#38)

* feat: add connector config and connector stats update functions
* Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs
* Update DatasetModels.scala
* #0 fix: upgrade packages
* #0 feat: add flink dockerfiles
* #0 fix: add individual extraction

---------





* #0000 [SV] - Fallback to local redis instance if embedded redis is not starting

* Update DatasetModels.scala

* #0000 - refactor the denormalization logic
1. Do not fail the denormalization if the denorm key is missing
2. Add clear message whether the denorm is sucessful or failed or partially successful
3. Handle denorm for both text and number fields

* #0000 - refactor:
1. Created a enum for dataset status and ignore events if the dataset is not in Live status
2. Created a outputtag for denorm failed stats
3. Parse event validation failed messages into a case class

* #0000 - refactor:
1. Updated the DruidRouter job to publish data to router topics dynamically
2. Updated framework to created dynamicKafkaSink object

* #0000 - mega refactoring:
1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres
2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures
3. Refactored serde - merged map and string serialization into one function and parameterized the function
4. Moved failed events sinking into a common base class
5. Master dataset processor can now do denormalization with another master dataset as well

* #0000 - mega refactoring:
1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres
2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures
3. Refactored serde - merged map and string serialization into one function and parameterized the function
4. Moved failed events sinking into a common base class
5. Master dataset processor can now do denormalization with another master dataset as well

* #0000 - mega refactoring:
1. Added validation to check if the event has a timestamp key and it is not blank nor invalid
2. Added timezone handling to store the data in druid in the TZ specified by the dataset


* #0000 - minor refactoring: Updated DatasetRegistry.getDatasetSourceConfig to getAllDatasetSourceConfig

* #0000 - mega refactoring: Refactored logs, error messages and metrics

* #0000 - mega refactoring: Fix unit tests

* #0000 - refactoring:
1. Introduced transformation mode to enable lenient transformations
2. Proper exception handling for transformer job

* #0000 - refactoring: Fix test cases and code

* #0000 - refactoring: upgrade embedded redis to work with macos sonoma m2

* #0000 - refactoring: Denormalizer test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Router test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Validator test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: Framework test cases and bug fixes

* #0000 - refactoring: kafka connector test cases and bug fixes. Code coverage is 100% now

* #0000 - refactoring: improve code coverage and fix bugs

* #0000 - refactoring: improve code coverage and fix bugs --- Now the code coverage is 100%

* #0000 - refactoring: organize imports

* #0000 - refactoring:
1. transformer test cases and bug fixes - code coverage is 100%

* #0000 - refactoring: test cases and bug fixes

---------









* #000:feat: Removed the provided scope of the kafka-client in the framework (#40)

* #0000 - feat: Add dataset-type to system events (#41)

* #0000 - feat: Add dataset-type to system events

* #0000 - feat: Modify tests for dataset-type in system events

* #0000 - feat: Remove unused getDatasetType function

* #0000 - feat: Remove unused pom test dependencies

* #0000 - feat: Remove unused pom test dependencies

* #67 feat: query system configurations from meta store

* #67 fix: Refactor system configuration retrieval and update dynamic router function

* #67 fix: update system config according to review

* #67 fix: update test cases for system config

* #67 fix: update default values in test cases

* #67 fix: add get all system settings method and update test cases

* #67 fix: add test case for covering exception case

* #67 fix: fix data types in test cases

* #67 fix: Refactor event indexing in DynamicRouterFunction

* Issue #67 refactor: SystemConfig read from DB implementation

* #226 fix: update test cases according to the refactor

* Dataset Registry Update (#57)

* Issue #0000: feat: updateConnectorStats method includes last run timestamp

* Issue #0000: fix: updateConnectorStats sql query updated

* Issue #0000: fix: updateConnectorStats sql query updated

* #0000 - fix: Fix Postgres connection issue with defaultDatasetID (#64)

* Metrics implementation for MasterDataIndexerJob (#55)

* Issue #50 fix: Kafka Metrics implementation for MasterDataIndexerJob

* Issue #50 fix: Changed 'ets' to UTC

* Issue #50 feat: added log statements

* Issue #50 fix: FIxed issue related to update query

* Issue #50 fix: Code refactoring

* Issue #50 fix: updated implementation of 'createDataFile' method

* Issue #50 fix: code refactorig

* Issue #50 test: Test cases for MasterDataIndexer

* Issue #50 test: test cases implementation

* Issue #50 test: Test case implementation for data-products

* Issue #50 test: Test cases

* Issue #50 test: test cases

* Issue #50 test: test cases for data-products

* Issue #50-fix: fixed jackson-databind issue

* Isuue-#50-fix: code structure modifications

* Issue #50-fix: code refactoring

* Issue #50-fix: code refactoing

* Issue-#50-Fix: test case fixes

* Issue #50-fix: code formatting and code fixes

* feat #50 - refactor the implementation

* Issue-#50-fix: test cases fix



* modified README file



* revert readme file changes



* revert dataset-registry



* Issue-#50-fix: test cases fix



* Issue-#50-fix: adding missing tests



* Issue-#50-fix: refatoring code



* Issue-#50-fix: code fixes and code formatting



* fix #50: modified class declaration



* fix #50: code refactor



* fix #50: code refactor



* fix #50: test cases fixes



---------




* Remove kafka connector as it is moved to a independent repository

---------

Signed-off-by: SurabhiAngadi <[email protected]>
Co-authored-by: Manjunath Davanam <[email protected]>
Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: shiva-rakshith <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>
Co-authored-by: Santhosh <[email protected]>
Co-authored-by: Aniket Sakinala <[email protected]>
Co-authored-by: Anand Parthasarathy <[email protected]>
Co-authored-by: Shreyas Bhaktharam <[email protected]>
Co-authored-by: SurabhiAngadi <[email protected]>

* Release 1.0.3-GA (#72)

* Pipeline Bug fixes (#74)

* Sanketika-Obsrv/issue-tracker#106:fix: Fix postgres connection issue with dataset read and handling an errors while parsing the message

* Sanketika-Obsrv/issue-tracker#107:fix: Denorm job fix to handle error when denorm field node is contains empty value

* Sanketika-Obsrv/issue-tracker#106:fix: Review comments fix - Changed the generic exception to actual exception (NullPointer)

* Pipeline Bug fixes (#74) (#77)

* fix: #0000: update datasourceRef only if dataset has records

* Sanketika-Obsrv/issue-tracker#180 fix: Datasource DB schema changes to include type. (#79)

Co-authored-by: sowmya-dixit <[email protected]>

* Hudi connector flink job implementation (#80)

* feat: Hudi Flink Implementation.
* feat: local working with metastore and localstack.
* #0000 - feat: Hudi Sink implementation
* #0000 - feat: Hudi Sink implementation
* #0000 - feat: Initialize dataset RowType during job startup
* refactor: Integrate hudi connector with dataset registry.
* refactor: Integrate hudi connector with dataset registry.
* Sanketika-Obsrv/issue-tracker#141 refactor: Enable timestamp based partition
* Sanketika-Obsrv/issue-tracker#141 refactor: Fix Hudi connector job to handle empty datasets list for lakehouse.
* Sanketika-Obsrv/issue-tracker#141 fix: Set Timestamp based partition configurations only if partition key is of timestamp type.
* Sanketika-Obsrv/issue-tracker#170 fix: Resolve timestamp based partition without using TimestampBasedAvroKeyGenerator.
* Sanketika-Obsrv/issue-tracker#177 fix: Lakehouse connector flink job fixes.
* Sanketika-Obsrv/issue-tracker#177 fix: Dockerfile changes for hudi-connector
* Sanketika-Obsrv/issue-tracker#177 fix: Lakehouse connector flink job fixes.
* Sanketika-Obsrv/issue-tracker#177 fix: remove unused code
* Sanketika-Obsrv/issue-tracker#177 fix: remove unused code
* Sanketika-Obsrv/issue-tracker#177 fix: remove unused code
* Sanketika-Obsrv/issue-tracker#177 fix: remove commented code

* Release 1.0.6-GA (#81)

* Pipeline Bug fixes (#74)

* Sanketika-Obsrv/issue-tracker#106:fix: Fix postgres connection issue with dataset read and handling an errors while parsing the message

* Sanketika-Obsrv/issue-tracker#107:fix: Denorm job fix to handle error when denorm field node is contains empty value

* Sanketika-Obsrv/issue-tracker#106:fix: Review comments fix - Changed the generic exception to actual exception (NullPointer)

* fix: #0000: update datasourceRef only if dataset has records

* Sanketika-Obsrv/issue-tracker#180 fix: Datasource DB schema changes to include type. (#79)

Co-authored-by: sowmya-dixit <[email protected]>

* Hudi connector flink job implementation (#80)

* feat: Hudi Flink Implementation.
* feat: local working with metastore and localstack.
* #0000 - feat: Hudi Sink implementation
* #0000 - feat: Hudi Sink implementation
* #0000 - feat: Initialize dataset RowType during job startup
* refactor: Integrate hudi connector with dataset registry.
* refactor: Integrate hudi connector with dataset registry.
* Sanketika-Obsrv/issue-tracker#141 refactor: Enable timestamp based partition
* Sanketika-Obsrv/issue-tracker#141 refactor: Fix Hudi connector job to handle empty datasets list for lakehouse.
* Sanketika-Obsrv/issue-tracker#141 fix: Set Timestamp based partition configurations only if partition key is of timestamp type.
* Sanketika-Obsrv/issue-tracker#170 fix: Resolve timestamp based partition without using TimestampBasedAvroKeyGenerator.
* Sanketika-Obsrv/issue-tracker#177 fix: Lakehouse connector flink job fixes.
* Sanketika-Obsrv/issue-tracker#177 fix: Dockerfile changes for hudi-connector
* Sanketika-Obsrv/issue-tracker#177 fix: Lakehouse connector flink job fixes.
* Sanketika-Obsrv/issue-tracker#177 fix: remove unused code
* Sanketika-Obsrv/issue-tracker#177 fix: remove unused code
* Sanketika-Obsrv/issue-tracker#177 fix: remove unused code
* Sanketika-Obsrv/issue-tracker#177 fix: remove commented code

---------

Co-authored-by: Manjunath Davanam <[email protected]>
Co-authored-by: SurabhiAngadi <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>
Co-authored-by: sowmya-dixit <[email protected]>

* Sanketika-Obsrv/issue-tracker#228 feat: updated github actions for lakehouse job

* # Issue:52655194 Feat - Add processingStartTime if missing during extractor job (#76)

Co-authored-by: Santhosh Vasabhaktula <[email protected]>

* Sanketika-Obsrv/issue-tracker#240 feat: lakehouse job changes to support retire workflow

* Sanketika-Obsrv/issue-tracker#240 feat: master data enhancements for lakehouse

* Migrating Raw SQL Statements to Prepared Statements (#87) (#88)

* #OBS-I148: Migration of SQL queries to prepared statement to avoid the SQL injection

* #OBS-I148: Migration of SQL queries to prepared statement to avoid the SQL injection

* #OBS-I148: Removed the unwanted imports

* #OBS-I148: System Config Changes - Converted from raw query to prepared statements

Co-authored-by: Ravi Mula <[email protected]>

---------

Signed-off-by: SurabhiAngadi <[email protected]>
Co-authored-by: shiva-rakshith <[email protected]>
Co-authored-by: Aniket Sakinala <[email protected]>
Co-authored-by: GayathriSrividya <[email protected]>
Co-authored-by: Manoj Krishna <[email protected]>
Co-authored-by: Santhosh Vasabhaktula <[email protected]>
Co-authored-by: ManojCKrishna <[email protected]>
Co-authored-by: ManojKrishnaChintaluri <[email protected]>
Co-authored-by: Praveen <[email protected]>
Co-authored-by: Sowmya N Dixit <[email protected]>
Co-authored-by: Anand Parthasarathy <[email protected]>
Co-authored-by: Ravi Mula <[email protected]>
Co-authored-by: Manoj Krishna <[email protected]>
Co-authored-by: Shreyas Bhaktharam <[email protected]>
Co-authored-by: SurabhiAngadi <[email protected]>
Co-authored-by: SurabhiAngadi <[email protected]>
Co-authored-by: sowmya-dixit <[email protected]>
Co-authored-by: GayathriSrividya <[email protected]>
Co-authored-by: GayathriSrividya <[email protected]>
  • Loading branch information
19 people authored Aug 20, 2024
1 parent fae940e commit 67fd310
Show file tree
Hide file tree
Showing 31 changed files with 1,051 additions and 55 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/build_and_deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ jobs:
target: "merged-image"
- image: "master-data-processor"
target: "master-data-processor-image"
- image: "lakehouse-connector"
target: "lakehouse-connector-image"
steps:
- uses: actions/checkout@v4
with:
Expand Down
19 changes: 10 additions & 9 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,34 +8,35 @@ COPY --from=build-core /root/.m2 /root/.m2
COPY . /app
RUN mvn clean package -DskipTests -f /app/pipeline/pom.xml

FROM --platform=linux/x86_64 sunbird/flink:1.15.2-scala_2.12-jdk-11 as extractor-image
FROM --platform=linux/x86_64 sanketikahub/flink:1.15.2-scala_2.12-jdk-11 as extractor-image
USER flink
COPY --from=build-pipeline /app/pipeline/extractor/target/extractor-1.0.0.jar $FLINK_HOME/lib/

FROM --platform=linux/x86_64 sunbird/flink:1.15.2-scala_2.12-jdk-11 as preprocessor-image
FROM --platform=linux/x86_64 sanketikahub/flink:1.15.2-scala_2.12-jdk-11 as preprocessor-image
USER flink
COPY --from=build-pipeline /app/pipeline/preprocessor/target/preprocessor-1.0.0.jar $FLINK_HOME/lib/

FROM --platform=linux/x86_64 sunbird/flink:1.15.2-scala_2.12-jdk-11 as denormalizer-image
FROM --platform=linux/x86_64 sanketikahub/flink:1.15.2-scala_2.12-jdk-11 as denormalizer-image
USER flink
COPY --from=build-pipeline /app/pipeline/denormalizer/target/denormalizer-1.0.0.jar $FLINK_HOME/lib/

FROM --platform=linux/x86_64 sunbird/flink:1.15.2-scala_2.12-jdk-11 as transformer-image
FROM --platform=linux/x86_64 sanketikahub/flink:1.15.2-scala_2.12-jdk-11 as transformer-image
USER flink
COPY --from=build-pipeline /app/pipeline/transformer/target/transformer-1.0.0.jar $FLINK_HOME/lib/

FROM --platform=linux/x86_64 sunbird/flink:1.15.2-scala_2.12-jdk-11 as router-image
FROM --platform=linux/x86_64 sanketikahub/flink:1.15.2-scala_2.12-jdk-11 as router-image
USER flink
COPY --from=build-pipeline /app/pipeline/druid-router/target/druid-router-1.0.0.jar $FLINK_HOME/lib/

FROM --platform=linux/x86_64 sunbird/flink:1.15.2-scala_2.12-jdk-11 as merged-image
FROM --platform=linux/x86_64 sanketikahub/flink:1.15.2-scala_2.12-jdk-11 as merged-image
USER flink
COPY --from=build-pipeline /app/pipeline/pipeline-merged/target/pipeline-merged-1.0.0.jar $FLINK_HOME/lib/

FROM --platform=linux/x86_64 sunbird/flink:1.15.2-scala_2.12-jdk-11 as master-data-processor-image
FROM --platform=linux/x86_64 sanketikahub/flink:1.15.2-scala_2.12-jdk-11 as master-data-processor-image
USER flink
COPY --from=build-pipeline /app/pipeline/master-data-processor/target/master-data-processor-1.0.0.jar $FLINK_HOME/lib

FROM --platform=linux/x86_64 sunbird/flink:1.15.2-scala_2.12-jdk-11 as kafka-connector-image
FROM --platform=linux/x86_64 sanketikahub/flink:1.15.0-scala_2.12-lakehouse as lakehouse-connector-image
USER flink
COPY --from=build-pipeline /app/pipeline/kafka-connector/target/kafka-connector-1.0.0.jar $FLINK_HOME/lib
RUN mkdir $FLINK_HOME/custom-lib
COPY ./pipeline/hudi-connector/target/hudi-connector-1.0.0.jar $FLINK_HOME/custom-lib
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ object MasterDataProcessorIndexer {
val ingestionSpec: String = updateIngestionSpec(datasource, paths.datasourceRef, paths.ingestionPath, config)
if (eventsCount > 0L) {
submitIngestionTask(dataset.id, ingestionSpec, config)
DatasetRegistry.updateDatasourceRef(datasource, paths.datasourceRef)
}
DatasetRegistry.updateDatasourceRef(datasource, paths.datasourceRef)
if (!datasource.datasourceRef.equals(paths.datasourceRef)) {
deleteDataSource(dataset.id, datasource.datasourceRef, config)
}
Expand Down
1 change: 1 addition & 0 deletions dataset-registry/src/main/resources/dataset-registry.sql
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ CREATE INDEX IF NOT EXISTS datasets_status ON datasets(status);
CREATE TABLE IF NOT EXISTS datasources (
datasource text PRIMARY KEY,
dataset_id text REFERENCES datasets (id),
type text NOT NULL,
ingestion_spec json NOT NULL,
datasource_ref text NOT NULL,
retention_period json,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ object DatasetModels {
@JsonProperty("status") status: String, @JsonProperty("connector_stats") connectorStats: Option[ConnectorStats] = None)

case class DataSource(@JsonProperty("id") id: String, @JsonProperty("datasource") datasource: String, @JsonProperty("dataset_id") datasetId: String,
@JsonProperty("ingestion_spec") ingestionSpec: String, @JsonProperty("datasource_ref") datasourceRef: String)
@JsonProperty("type") `type`: String, @JsonProperty("status") status: String, @JsonProperty("ingestion_spec") ingestionSpec: String, @JsonProperty("datasource_ref") datasourceRef: String)

}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ import scala.collection.mutable

object DatasetRegistry {

private val datasets: mutable.Map[String, Dataset] = mutable.Map[String, Dataset]()
lazy private val datasets: mutable.Map[String, Dataset] = mutable.Map[String, Dataset]()
datasets ++= DatasetRegistryService.readAllDatasets()
private val datasetTransformations: Map[String, List[DatasetTransformation]] = DatasetRegistryService.readAllDatasetTransformations()
lazy private val datasetTransformations: Map[String, List[DatasetTransformation]] = DatasetRegistryService.readAllDatasetTransformations()

def getAllDatasets(datasetType: String): List[Dataset] = {
val datasetList = DatasetRegistryService.readAllDatasets()
Expand Down Expand Up @@ -42,6 +42,11 @@ object DatasetRegistry {
DatasetRegistryService.readDatasources(datasetId)
}

def getAllDatasources(): List[DataSource] = {
val datasourceList = DatasetRegistryService.readAllDatasources()
datasourceList.getOrElse(List())
}

def getDataSetIds(datasetType: String): List[String] = {
datasets.filter(f => f._2.datasetType.equals(datasetType)).keySet.toList
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import org.sunbird.obsrv.model.DatasetModels._
import org.sunbird.obsrv.model.{DatasetStatus, TransformMode}

import java.io.File
import java.sql.{ResultSet, Timestamp}
import java.sql.{PreparedStatement, ResultSet, Timestamp}

object DatasetRegistryService {
private val configFile = new File("/data/flink/conf/baseconfig.conf")
Expand Down Expand Up @@ -42,22 +42,28 @@ object DatasetRegistryService {
}

def readDataset(id: String): Option[Dataset] = {

val postgresConnect = new PostgresConnect(postgresConfig)
var preparedStatement: PreparedStatement = null
var resultSet: ResultSet = null
try {
val rs = postgresConnect.executeQuery(s"SELECT * FROM datasets where id='$id'")
if (rs.next()) {
Some(parseDataset(rs))
val query = "SELECT * FROM datasets WHERE id = ?"
preparedStatement = postgresConnect.prepareStatement(query)
preparedStatement.setString(1, id)
resultSet = postgresConnect.executeQuery(preparedStatement = preparedStatement)
if (resultSet.next()) {
Some(parseDataset(resultSet))
} else {
None
}
} finally {
if (resultSet != null) resultSet.close()
if (preparedStatement != null) preparedStatement.close()
postgresConnect.closeConnection()
}
}

def readAllDatasetSourceConfig(): Option[List[DatasetSourceConfig]] = {

def readAllDatasetSourceConfig(): Option[List[DatasetSourceConfig]] = {
val postgresConnect = new PostgresConnect(postgresConfig)
try {
val rs = postgresConnect.executeQuery("SELECT * FROM dataset_source_config")
Expand All @@ -70,16 +76,23 @@ object DatasetRegistryService {
}
}

def readDatasetSourceConfig(datasetId: String): Option[List[DatasetSourceConfig]] = {

def readDatasetSourceConfig(datasetId: String): Option[List[DatasetSourceConfig]] = {
val postgresConnect = new PostgresConnect(postgresConfig)
var preparedStatement: PreparedStatement = null
var resultSet: ResultSet = null
try {
val rs = postgresConnect.executeQuery(s"SELECT * FROM dataset_source_config where dataset_id='$datasetId'")
Option(Iterator.continually((rs, rs.next)).takeWhile(f => f._2).map(f => f._1).map(result => {
val query = "SELECT * FROM dataset_source_config WHERE dataset_id = ?"
preparedStatement = postgresConnect.prepareStatement(query)
preparedStatement.setString(1, datasetId)
resultSet = postgresConnect.executeQuery(preparedStatement = preparedStatement)
Option(Iterator.continually((resultSet, resultSet.next)).takeWhile(f => f._2).map(f => f._1).map(result => {
val datasetSourceConfig = parseDatasetSourceConfig(result)
datasetSourceConfig
}).toList)
} finally {
if (resultSet != null) resultSet.close()
if (preparedStatement != null) preparedStatement.close()
postgresConnect.closeConnection()
}
}
Expand All @@ -99,46 +112,95 @@ object DatasetRegistryService {
}

def readDatasources(datasetId: String): Option[List[DataSource]] = {

val postgresConnect = new PostgresConnect(postgresConfig)
var preparedStatement: PreparedStatement = null
var resultSet: ResultSet = null
try {
val rs = postgresConnect.executeQuery(s"SELECT * FROM datasources where dataset_id='$datasetId'")
Option(Iterator.continually((rs, rs.next)).takeWhile(f => f._2).map(f => f._1).map(result => {
val query = "SELECT * FROM datasources WHERE dataset_id = ?"
preparedStatement = postgresConnect.prepareStatement(query)
preparedStatement.setString(1, datasetId)
resultSet = postgresConnect.executeQuery(preparedStatement = preparedStatement)
Option(Iterator.continually((resultSet, resultSet.next)).takeWhile(f => f._2).map(f => f._1).map(result => {
parseDatasource(result)
}).toList)
} finally {
if (resultSet != null) resultSet.close()
if (preparedStatement != null) preparedStatement.close()
postgresConnect.closeConnection()
}
}

def updateDatasourceRef(datasource: DataSource, datasourceRef: String): Int = {
val query = s"UPDATE datasources set datasource_ref = '$datasourceRef' where datasource='${datasource.datasource}' and dataset_id='${datasource.datasetId}'"
updateRegistry(query)
def readAllDatasources(): Option[List[DataSource]] = {

val postgresConnect = new PostgresConnect(postgresConfig)
try {
val rs = postgresConnect.executeQuery(s"SELECT * FROM datasources")
Option(Iterator.continually((rs, rs.next)).takeWhile(f => f._2).map(f => f._1).map(result => {
parseDatasource(result)
}).toList)
}
}

def updateDatasourceRef(datasource: DataSource, datasourceRef: String): Int = {
val postgresConnect = new PostgresConnect(postgresConfig)
var preparedStatement: PreparedStatement = null
val query = "UPDATE datasources SET datasource_ref = ? WHERE datasource = ? AND dataset_id = ?"
try {
preparedStatement = postgresConnect.prepareStatement(query)
preparedStatement.setString(1, datasourceRef)
preparedStatement.setString(2, datasource.datasource)
preparedStatement.setString(3, datasource.datasetId)
postgresConnect.executeUpdate(preparedStatement)
} finally {
if (preparedStatement != null) preparedStatement.close()
postgresConnect.closeConnection()
}
}

def updateConnectorStats(id: String, lastFetchTimestamp: Timestamp, records: Long): Int = {
val query = s"UPDATE dataset_source_config SET connector_stats = coalesce(connector_stats, '{}')::jsonb || " +
s"jsonb_build_object('records', COALESCE(connector_stats->>'records', '0')::int + '$records'::int) || " +
s"jsonb_build_object('last_fetch_timestamp', '${lastFetchTimestamp}'::timestamp) || " +
s"jsonb_build_object('last_run_timestamp', '${new Timestamp(System.currentTimeMillis())}'::timestamp) WHERE id = '$id';"
updateRegistry(query)
val postgresConnect = new PostgresConnect(postgresConfig)
var preparedStatement: PreparedStatement = null
val query = "UPDATE dataset_source_config SET connector_stats = COALESCE(connector_stats, '{}')::jsonb || jsonb_build_object('records', COALESCE(connector_stats->>'records', '0')::int + ? ::int) || jsonb_build_object('last_fetch_timestamp', ? ::timestamp) || jsonb_build_object('last_run_timestamp', ? ::timestamp) WHERE id = ?;"
try {
preparedStatement = postgresConnect.prepareStatement(query)
preparedStatement.setString(1, records.toString)
preparedStatement.setTimestamp(2, lastFetchTimestamp)
preparedStatement.setTimestamp(3, new Timestamp(System.currentTimeMillis()))
preparedStatement.setString(4, id)
postgresConnect.executeUpdate(preparedStatement)
} finally {
if (preparedStatement != null) preparedStatement.close()
postgresConnect.closeConnection()
}
}


def updateConnectorDisconnections(id: String, disconnections: Int): Int = {
val query = s"UPDATE dataset_source_config SET connector_stats = jsonb_set(coalesce(connector_stats, '{}')::jsonb, '{disconnections}','$disconnections') WHERE id = '$id'"
updateRegistry(query)
val postgresConnect = new PostgresConnect(postgresConfig)
var preparedStatement: PreparedStatement = null
val query = "UPDATE dataset_source_config SET connector_stats = jsonb_set(coalesce(connector_stats, '{}')::jsonb, '{disconnections}', to_jsonb(?)) WHERE id = ?"
try {
preparedStatement = postgresConnect.prepareStatement(query)
preparedStatement.setInt(1, disconnections)
preparedStatement.setString(2, id)
postgresConnect.executeUpdate(preparedStatement)
} finally {
if (preparedStatement != null) preparedStatement.close()
postgresConnect.closeConnection()
}
}

def updateConnectorAvgBatchReadTime(id: String, avgReadTime: Long): Int = {
val query = s"UPDATE dataset_source_config SET connector_stats = jsonb_set(coalesce(connector_stats, '{}')::jsonb, '{avg_batch_read_time}','$avgReadTime') WHERE id = '$id'"
updateRegistry(query)
}

private def updateRegistry(query: String): Int = {
val postgresConnect = new PostgresConnect(postgresConfig)
var preparedStatement: PreparedStatement = null
val query = "UPDATE dataset_source_config SET connector_stats = jsonb_set(coalesce(connector_stats, '{}')::jsonb, '{avg_batch_read_time}', to_jsonb(?)) WHERE id = ?"
try {
postgresConnect.executeUpdate(query)
preparedStatement = postgresConnect.prepareStatement(query)
preparedStatement.setLong(1, avgReadTime)
preparedStatement.setString(2, id)
postgresConnect.executeUpdate(preparedStatement)
} finally {
if (preparedStatement != null) preparedStatement.close()
postgresConnect.closeConnection()
}
}
Expand Down Expand Up @@ -190,10 +252,12 @@ object DatasetRegistryService {
val id = rs.getString("id")
val datasource = rs.getString("datasource")
val datasetId = rs.getString("dataset_id")
val datasourceType = rs.getString("type")
val datasourceStatus = rs.getString("status")
val ingestionSpec = rs.getString("ingestion_spec")
val datasourceRef = rs.getString("datasource_ref")

DataSource(id, datasource, datasetId, ingestionSpec, datasourceRef)
DataSource(id, datasource, datasetId, datasourceType, datasourceStatus, ingestionSpec, datasourceRef)
}

private def parseDatasetTransformation(rs: ResultSet): DatasetTransformation = {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ class BaseSpecWithDatasetRegistry extends BaseSpecWithPostgres {
private def createSchema(postgresConnect: PostgresConnect): Unit = {

postgresConnect.execute("CREATE TABLE IF NOT EXISTS datasets ( id text PRIMARY KEY, type text NOT NULL, validation_config json, extraction_config json, dedup_config json, data_schema json, denorm_config json, router_config json NOT NULL, dataset_config json NOT NULL, status text NOT NULL, tags text[], data_version INT, created_by text NOT NULL, updated_by text NOT NULL, created_date timestamp NOT NULL, updated_date timestamp NOT NULL );")
postgresConnect.execute("CREATE TABLE IF NOT EXISTS datasources ( id text PRIMARY KEY, dataset_id text REFERENCES datasets (id), ingestion_spec json NOT NULL, datasource text NOT NULL, datasource_ref text NOT NULL, retention_period json, archival_policy json, purge_policy json, backup_config json NOT NULL, status text NOT NULL, created_by text NOT NULL, updated_by text NOT NULL, created_date Date NOT NULL, updated_date Date NOT NULL );")
postgresConnect.execute("CREATE TABLE IF NOT EXISTS datasources ( id text PRIMARY KEY, dataset_id text REFERENCES datasets (id), type text NOT NULL, ingestion_spec json NOT NULL, datasource text NOT NULL, datasource_ref text NOT NULL, retention_period json, archival_policy json, purge_policy json, backup_config json NOT NULL, status text NOT NULL, created_by text NOT NULL, updated_by text NOT NULL, created_date Date NOT NULL, updated_date Date NOT NULL );")
postgresConnect.execute("CREATE TABLE IF NOT EXISTS dataset_transformations ( id text PRIMARY KEY, dataset_id text REFERENCES datasets (id), field_key text NOT NULL, transformation_function json NOT NULL, status text NOT NULL, mode text, created_by text NOT NULL, updated_by text NOT NULL, created_date Date NOT NULL, updated_date Date NOT NULL, UNIQUE(field_key, dataset_id) );")
postgresConnect.execute("CREATE TABLE IF NOT EXISTS dataset_source_config ( id text PRIMARY KEY, dataset_id text NOT NULL REFERENCES datasets (id), connector_type text NOT NULL, connector_config json NOT NULL, status text NOT NULL, connector_stats json, created_by text NOT NULL, updated_by text NOT NULL, created_date Date NOT NULL, updated_date Date NOT NULL, UNIQUE(connector_type, dataset_id) );")
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,8 @@ class TestDatasetRegistrySpec extends BaseSpecWithDatasetRegistry with Matchers
postgresConnect.execute("insert into dataset_source_config values('sc1', 'd1', 'kafka', '{\"kafkaBrokers\":\"localhost:9090\",\"topic\":\"test-topic\"}', 'Live', null, 'System', 'System', now(), now());")
postgresConnect.execute("insert into dataset_source_config values('sc2', 'd1', 'rdbms', '{\"type\":\"postgres\",\"tableName\":\"test-table\"}', 'Live', null, 'System', 'System', now(), now());")

//postgresConnect.execute("CREATE TABLE IF NOT EXISTS datasources ( id text PRIMARY KEY, dataset_id text REFERENCES datasets (id), ingestion_spec json NOT NULL, datasource text NOT NULL, datasource_ref text NOT NULL, retention_period json, archival_policy json, purge_policy json, backup_config json NOT NULL, status text NOT NULL, created_by text NOT NULL, updated_by text NOT NULL, created_date Date NOT NULL, updated_date Date NOT NULL );")
postgresConnect.execute("insert into datasources values('ds1', 'd1', '{}', 'd1-datasource', 'd1-datasource-1', null, null, null, '{}', 'Live', 'System', 'System', now(), now());")
//postgresConnect.execute("CREATE TABLE IF NOT EXISTS datasources ( id text PRIMARY KEY, dataset_id text REFERENCES datasets (id), type text NOT NULL, ingestion_spec json NOT NULL, datasource text NOT NULL, datasource_ref text NOT NULL, retention_period json, archival_policy json, purge_policy json, backup_config json NOT NULL, status text NOT NULL, created_by text NOT NULL, updated_by text NOT NULL, created_date Date NOT NULL, updated_date Date NOT NULL );")
postgresConnect.execute("insert into datasources values('ds1', 'd1', 'druid', '{}', 'd1-datasource', 'd1-datasource-1', null, null, null, '{}', 'Live', 'System', 'System', now(), now());")
postgresConnect.closeConnection()
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ object Constants {
val EVENT = "event"
val INVALID_JSON = "invalid_json"
val OBSRV_META = "obsrv_meta"
val PROCESSING_START_TIME = "processingStartTime"
val SRC = "src"
val ERROR_CODE = "error_code"
val ERROR_MSG = "error_msg"
Expand All @@ -14,5 +15,6 @@ object Constants {
val LEVEL = "level"
val TOPIC = "topic"
val MESSAGE = "message"

val DATALAKE_TYPE = "datalake"
val MASTER_DATASET_TYPE = "master-dataset"
}
Loading

0 comments on commit 67fd310

Please sign in to comment.