[EPIC] Ripple Integration Project #1042

LorneLeonard-NOAA · 2025-01-15T20:08:40Z

This is an EPIC card. As items from this list are addressed, their active cards will be linked.

Overall Design / Tasks

Source Data -> Process & Prepare Data ->Publish Services

Learning Ras2Fim previous processing

Phase 1: Start with example dataset

Step 3: Source Code Analysis
Step 4: Process, data integration, loading for Viz.
Step 5: Publish, various tests, not public, internal use only.
Feedback from leadership
Feedback from end-users

After Phase 1

Scale the amount of data (1/3 of total)
Fine tuning hardware and software
Increase the amount of data (2/3 of total)
Fine tuning hardware and software
Full scale testing

DEV(TI) -> UAT -> PRD

[1] Sensitive Information Locations [WILL NOT BE visible in github]

Secret manager
Google Drive

[2] Infrastructure Setup for Development Environment

S3 Bucket(s)

S3 bucket created for development
Folder naming conventions
Optimizing Performance
Measuring Performance

EC2 Instance(s) for design evaluation and testing

Windows
[In progress] Linux for Viz and FIM dev. Multiple EC2's with different sizes for benchmark analysis.
[In progress] Optimizing Performance (low volume)
[In progress] Measuring Performance (low volume)

Folder structure conventions

Data
Measuring
Optimizing
Software

Software conventions

Install Locations
Code Version control (likely in HydroVIS github. Code version not in use for HydroVIS)

[3] Source Data

Heidi Safa has been evaluating and analyzing Ripple data.

Heidi copying example datasets into S3 bucket
[in progess] Deciphering data we need (can we pre-filter some of it? maybe a subfolder for HV consumption? Rest kept for debugging?
Download all Dewberry FIM_30 Ripple model files and folders to FIM/HV s3 buckets. Note: 485 models available. Total size could be over 1 TiB, numbers unconfirmed. ie) determine volumes.
Build script to pull all or filtered data from RTX to HV s3 buckets. While possible to call their buckets remote, Rob strongly advises against it for multiple reasons including; permissions, moving to other enviros (UAT, PROD across multiple regions) and pre-processing if required.
[in progress] Investigate gaps in data reaches. Note: Full replacement dataset from RTX is coming including re-adding missed FIM_10 data, missed in current releases.
Eval an internal version system to reflect what version of a dataset we get. May not want auto tie it to the Ripple public release name of "Ripple 3.0". Maybe subfolders in our S3 to distinguish differences. ie) (current FIM_30 version), replacement FIM_30 dataset coming, FIM_60 dataset coming. Internal dataset name/number convention TBD.

Example to start with:

ble_12030106_EastForkTrinity

[4A] Process Data

Part One: Data flow with small sample (i.e. one or two HUCs)

Part Two: Data flow upscaling

Lessons learnt from part one
Dynamic Cross Walk to handle x75 increase from previous ras2fim data volume.??? TBD
Apply upscaling to resources
Performance and processing analysis, and scale with more HUCs.

Windows
- Relying on ESRI tools for processing (TBD)
Ubuntu
- Using QGIS and open source for processing (TBD)
[In Discussion] Convert Raster datasets to Polygon datasets
- Benchmark
  - - Disk Speeds
  - - Memory Usage
  - - Network Usage
  - - Disk Size

[4B] Loading Static data

Ripple dataset at 1-3 releases per year.
Performance getting ready, large volumes of data uploaded and available for dynamic processing by HV interacting with HAND data. FIM_60 next fall?
Make a FIM to HV deployment tool. It can look through multiple source Ripple model folders and pull out just folders/file it needs to be sent to the HV deployment bucket for automated processing. TBD... might not be needed, depend on pre-filtering or additional processing between Ripple and HV integration if needed.

[5] Publish Data

Part One: Testing workflows to process data for publication

#1051
-[ ] 1065: Ripple Task Details: HydroVIS Changes / Integration

(Note: ones below are TBD as part of the integration design processes evaluations)

Windows
- Relying on ESRI tools for processing
Ubuntu
- Using QGIS and open source for processing
Convert Raster datasets to Polygon datasets
- Benchmark
  - - Disk Speeds
  - - Memory Usage
  - - Network Usage
  - - Disk Size
  - Note: We are talking to RTX about pre-building csv with vectors in them so we don't have to convert the tifs to extents. TBD

Part Two: Testing with small samples (i.e. one or two HUCs)

Lessons learnt from part one

Part Three: Scaling all available Ripple data.

Lessons learnt from part two

LorneLeonard-NOAA added the Request label Jan 15, 2025

LorneLeonard-NOAA assigned RobHanna-NOAA and LorneLeonard-NOAA Jan 15, 2025

LorneLeonard-NOAA changed the title ~~[EPIC] Ripple Project~~ [EPIC] Ripple Integration Project Jan 16, 2025

nickchadwick-noaa added this to the V2.1.x milestone Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC] Ripple Integration Project #1042

[EPIC] Ripple Integration Project #1042

LorneLeonard-NOAA commented Jan 15, 2025 •

edited by RobHanna-NOAA

Loading

[EPIC] Ripple Integration Project #1042

[EPIC] Ripple Integration Project #1042

Comments

LorneLeonard-NOAA commented Jan 15, 2025 • edited by RobHanna-NOAA Loading

Overall Design / Tasks

Phase 1: Start with example dataset

After Phase 1

DEV(TI) -> UAT -> PRD

[1] Sensitive Information Locations [WILL NOT BE visible in github]

[2] Infrastructure Setup for Development Environment

[3] Source Data

[4A] Process Data

Part One: Data flow with small sample (i.e. one or two HUCs)

Part Two: Data flow upscaling

[4B] Loading Static data

[5] Publish Data

Part One: Testing workflows to process data for publication

Part Two: Testing with small samples (i.e. one or two HUCs)

Part Three: Scaling all available Ripple data.

LorneLeonard-NOAA commented Jan 15, 2025 •

edited by RobHanna-NOAA

Loading