Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forest Tasks are not making All Output Data available #415

Open
biblicabeebli opened this issue Feb 6, 2025 · 3 comments
Open

Forest Tasks are not making All Output Data available #415

biblicabeebli opened this issue Feb 6, 2025 · 3 comments
Assignees
Labels
Bug Sounds like a bug!

Comments

@biblicabeebli
Copy link
Member

There is code that should be wrapping all data placed into the task's output folder into a zip file, uploading it to s3 for storage, providing a link on task history.

This feature cannot be disabled, but for some reason it is not working

def compress_and_upload_raw_output(forest_task: ForestTask):
""" Compresses raw output files and uploads them to S3. """
# I think it is correct that the file path is present twice.
base_file_path = f"{forest_task.id}_{timezone.now().strftime(API_TIME_FORMAT)}_output"
s3_path = f"{forest_task.forest_tree}_" + base_file_path + ".zip"
file_path = path_join(forest_task.root_path_for_task, base_file_path)
filename = shutil.make_archive(
base_name=file_path, # base_name is the zip file path minus the extension
format="zip", # its a zip
root_dir=forest_task.data_output_path, # the root directory of the zip file
)
# (this only ever runs on *nix, path_join is always correct)
forest_task.update(
output_zip_s3_path=path_join(
forest_task.participant.study.object_id, forest_task.participant.patient_id, s3_path
)
)
with open(filename, "rb") as f:
# TODO: someday, optimize s3 stuff so we don't have this hanging out in-memory...
save_output_file(forest_task, f.read())

On our production server there are functional historical forest tasks 2023-12-1 - 2024-4-9, and then they stop.

Maybe there were changes to the file output in forest?

@hydawo This was from the meeting earlier today, the raw data output from sycamore (and all forest task runs) should be available via a separate link, but that link is not present.

@hackdna @MMel099 Can you point me to the code or documentation on where and what output files are supposed exist when running forest tree, and how it differs between trees. (I'm not ruling out errors in my code.)

@biblicabeebli biblicabeebli added the Bug Sounds like a bug! label Feb 6, 2025
@hackdna
Copy link
Member

hackdna commented Feb 10, 2025

All trees have different outputs with descriptions in the corresponding Forest docs. For example, Jasmine has a good description: https://forest.beiwe.org/en/latest/jasmine.html#output

@biblicabeebli
Copy link
Member Author

It could be this, cannot find task report (log of forest task runner output)....
error from staging

@hydawo
Copy link
Collaborator

hydawo commented Feb 21, 2025

Tying in this issue https://github.com/onnela-lab/beiwe-discussions/issues/290, specifically the task list here - https://github.com/onnela-lab/beiwe-discussions/issues/290#issuecomment-2672588675

Forest on Server
[] Update Forest
[] Modify headers on server to match local forest script headers (specifically Jasmine & Oak) - @biblicabeebli is this difficult? If so, we can consider modifying the headers in local forest script to match server headers
[] Remove "GPS Data Missing Duration" column entirely for Jasmine output
[] Modify order of columns in large Forest output CSV
[] Remove spaces in column header titles

Local Forest Script
[] modify date output from 3 columns (y, m, d) to single column (mm/dd/yyyy)
[] adding Physical Circadian Rhythm code to Jasmine
[] add Oak concatenate function

Final Tasks
[] Update Forest on server (after all modifications above)
[] Make any necessary modifications to Forest Wiki

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Sounds like a bug!
Projects
None yet
Development

No branches or pull requests

4 participants