Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Ripple Integration Project #1042

Open
5 of 55 tasks
LorneLeonard-NOAA opened this issue Jan 15, 2025 · 0 comments
Open
5 of 55 tasks

[EPIC] Ripple Integration Project #1042

LorneLeonard-NOAA opened this issue Jan 15, 2025 · 0 comments
Assignees
Labels
Milestone

Comments

@LorneLeonard-NOAA
Copy link

LorneLeonard-NOAA commented Jan 15, 2025

This is an EPIC card. As items from this list are addressed, their active cards will be linked.

Overall Design / Tasks

Source Data -> Process & Prepare Data ->Publish Services 

  • Learning Ras2Fim previous processing

Phase 1: Start with example dataset

  • Step 3: Source Code Analysis
  • Step 4: Process, data integration, loading for Viz.
  • Step 5: Publish, various tests, not public, internal use only.
  • Feedback from leadership
  • Feedback from end-users

After Phase 1

  • Scale the amount of data (1/3 of total)
  • Fine tuning hardware and software 
  • Increase the amount of data (2/3 of total)
  • Fine tuning hardware and software 
  • Full scale testing

DEV(TI) -> UAT -> PRD

[1] Sensitive Information Locations [WILL NOT BE visible in github]

  • Secret manager
  • Google Drive

[2] Infrastructure Setup for Development Environment

S3 Bucket(s)

  • S3 bucket created for development
  • Folder naming conventions
  • Optimizing Performance
  • Measuring Performance

EC2 Instance(s) for design evaluation and testing

  • Windows
  • [In progress] Linux for Viz and FIM dev. Multiple EC2's with different sizes for benchmark analysis.
  • [In progress] Optimizing Performance (low volume)
  • [In progress] Measuring Performance (low volume)

Folder structure conventions

  • Data
  • Measuring
  • Optimizing 
  • Software

Software conventions

  • Install Locations
  • Code Version control (likely in HydroVIS github. Code version not in use for HydroVIS)

[3] Source Data

Heidi Safa has been evaluating and analyzing Ripple data.

  • Heidi copying example datasets into S3 bucket
  • [in progess] Deciphering data we need (can we pre-filter some of it? maybe a subfolder for HV consumption? Rest kept for debugging?
  • Download all Dewberry FIM_30 Ripple model files and folders to FIM/HV s3 buckets. Note: 485 models available. Total size could be over 1 TiB, numbers unconfirmed. ie) determine volumes.
  • Build script to pull all or filtered data from RTX to HV s3 buckets. While possible to call their buckets remote, Rob strongly advises against it for multiple reasons including; permissions, moving to other enviros (UAT, PROD across multiple regions) and pre-processing if required.
  • [in progress] Investigate gaps in data reaches. Note: Full replacement dataset from RTX is coming including re-adding missed FIM_10 data, missed in current releases.
  • Eval an internal version system to reflect what version of a dataset we get. May not want auto tie it to the Ripple public release name of "Ripple 3.0". Maybe subfolders in our S3 to distinguish differences. ie) (current FIM_30 version), replacement FIM_30 dataset coming, FIM_60 dataset coming. Internal dataset name/number convention TBD.

Example to start with:

  • ble_12030106_EastForkTrinity

[4A] Process Data

Part One: Data flow with small sample (i.e. one or two HUCs)

  • [in progress]Access to flows2fim.exe software, windows/linux environments. Maybe in the lambdas?

  • [in progress] Develop Workflow

  • Isolated workflow test #1058

    • Strategy to choose mip, ble datasets when both available.
    • Lookup strategy for picking stage, extent, depths.
    • flows2fim.exe controls
    • flows2fim.exe fim -lib EXTENT
    • flows2fim.exe fim -lib DEPTH (HOLD: Scope for this project does not include depth tasks).
    • Evaluate Cross Walk strategies (this is an idea)
    • Geometry processing and partitioning.
    • Benchmark
        • Disk Speeds
        • Memory Usage
        • Network Usage
        • Disk Size
  • Develop Misc tools

    • "search by HUC" for HydroVIS - Create a HUC S3 search code (not a tool) that can take a HUC number in, and pull down from an S3 bucket, just the files and folders required for processing or paths. Can optionally get just s3 paths or download files or both. Needed for HV code, but a variant of it for FIM. Basic py code already exists in ras2fim and can be ported and adjusted. It has a S3 wildcard system in it.
    • "search by HUC" for FIM: tool for standard command line use for FIM debugging / Testing. Same as HV system in logic, using the ras2fim S3 wildcard search system.

Part Two: Data flow upscaling

  • Lessons learnt from part one
  • Dynamic Cross Walk to handle x75 increase from previous ras2fim data volume.??? TBD
  • Apply upscaling to resources
  • Performance and processing analysis, and scale with more HUCs.
  • Windows
    • Relying on ESRI tools for processing (TBD)
  • Ubuntu
    • Using QGIS and open source for processing (TBD)
  • [In Discussion] Convert Raster datasets to Polygon datasets 
    • Benchmark
        • Disk Speeds
        • Memory Usage
        • Network Usage
        • Disk Size

[4B] Loading Static data

  • Ripple dataset at 1-3 releases per year.
  • Performance getting ready, large volumes of data uploaded and available for dynamic processing by HV interacting with HAND data. FIM_60 next fall?
  • Make a FIM to HV deployment tool. It can look through multiple source Ripple model folders and pull out just folders/file it needs to be sent to the HV deployment bucket for automated processing. TBD... might not be needed, depend on pre-filtering or additional processing between Ripple and HV integration if needed.

[5] Publish Data

Part One: Testing workflows to process data for publication

  • #1051
    -[ ] 1065: Ripple Task Details: HydroVIS Changes / Integration

(Note: ones below are TBD as part of the integration design processes evaluations)

  • Windows
    • Relying on ESRI tools for processing
  • Ubuntu
    • Using QGIS and open source for processing
  • Convert Raster datasets to Polygon datasets 
    • Benchmark
        • Disk Speeds
        • Memory Usage
        • Network Usage
        • Disk Size
      • Note: We are talking to RTX about pre-building csv with vectors in them so we don't have to convert the tifs to extents. TBD

Part Two: Testing with small samples (i.e. one or two HUCs)

  • Lessons learnt from part one

Part Three: Scaling all available Ripple data.

  • Lessons learnt from part two
@LorneLeonard-NOAA LorneLeonard-NOAA changed the title [EPIC] Ripple Project [EPIC] Ripple Integration Project Jan 16, 2025
@nickchadwick-noaa nickchadwick-noaa added this to the V2.1.x milestone Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants