Forest Tasks are not making All Output Data available #415

biblicabeebli · 2025-02-06T20:34:15Z

There is code that should be wrapping all data placed into the task's output folder into a zip file, uploading it to s3 for storage, providing a link on task history.

This feature cannot be disabled, but for some reason it is not working

beiwe-backend/services/celery_forest.py

Lines 446 to 467 in e880bc0

    
           def compress_and_upload_raw_output(forest_task: ForestTask): 
        
               """ Compresses raw output files and uploads them to S3. """ 
        
               # I think it is correct that the file path is present twice. 
        
               base_file_path = f"{forest_task.id}_{timezone.now().strftime(API_TIME_FORMAT)}_output" 
        
               s3_path = f"{forest_task.forest_tree}_" + base_file_path + ".zip" 
        
               file_path = path_join(forest_task.root_path_for_task, base_file_path) 
        
               filename = shutil.make_archive( 
        
                   base_name=file_path,  # base_name is the zip file path minus the extension 
        
                   format="zip",  # its a zip 
        
                   root_dir=forest_task.data_output_path,  # the root directory of the zip file 
        
               ) 
        
               # (this only ever runs on *nix, path_join is always correct) 
        
               forest_task.update( 
        
                   output_zip_s3_path=path_join( 
        
                       forest_task.participant.study.object_id, forest_task.participant.patient_id, s3_path 
        
                   ) 
        
               ) 
        
               with open(filename, "rb") as f: 
        
                   # TODO: someday, optimize s3 stuff so we don't have this hanging out in-memory... 
        
                   save_output_file(forest_task, f.read())

On our production server there are functional historical forest tasks 2023-12-1 - 2024-4-9, and then they stop.

Maybe there were changes to the file output in forest?

@hydawo This was from the meeting earlier today, the raw data output from sycamore (and all forest task runs) should be available via a separate link, but that link is not present.

@hackdna @MMel099 Can you point me to the code or documentation on where and what output files are supposed exist when running forest tree, and how it differs between trees. (I'm not ruling out errors in my code.)

hackdna · 2025-02-10T19:12:21Z

All trees have different outputs with descriptions in the corresponding Forest docs. For example, Jasmine has a good description: https://forest.beiwe.org/en/latest/jasmine.html#output

biblicabeebli · 2025-02-10T21:03:45Z

It could be this, cannot find task report (log of forest task runner output)....
error from staging

hydawo · 2025-02-21T16:24:16Z

Tying in this issue https://github.com/onnela-lab/beiwe-discussions/issues/290, specifically the task list here - https://github.com/onnela-lab/beiwe-discussions/issues/290#issuecomment-2672588675

Forest on Server
[] Update Forest
[] Modify headers on server to match local forest script headers (specifically Jasmine & Oak) - @biblicabeebli is this difficult? If so, we can consider modifying the headers in local forest script to match server headers
[] Remove "GPS Data Missing Duration" column entirely for Jasmine output
[] Modify order of columns in large Forest output CSV
[] Remove spaces in column header titles

Local Forest Script
[] modify date output from 3 columns (y, m, d) to single column (mm/dd/yyyy)
[] adding Physical Circadian Rhythm code to Jasmine
[] add Oak concatenate function

Final Tasks
[] Update Forest on server (after all modifications above)
[] Make any necessary modifications to Forest Wiki

biblicabeebli assigned biblicabeebli, MMel099 and hydawo Feb 6, 2025

biblicabeebli added the Bug Sounds like a bug! label Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forest Tasks are not making All Output Data available #415

Forest Tasks are not making All Output Data available #415

biblicabeebli commented Feb 6, 2025

hackdna commented Feb 10, 2025

biblicabeebli commented Feb 10, 2025

hydawo commented Feb 21, 2025 •

edited

Loading

Forest Tasks are not making All Output Data available #415

Forest Tasks are not making All Output Data available #415

Comments

biblicabeebli commented Feb 6, 2025

hackdna commented Feb 10, 2025

biblicabeebli commented Feb 10, 2025

hydawo commented Feb 21, 2025 • edited Loading

hydawo commented Feb 21, 2025 •

edited

Loading