Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote Jupyter Lab on Frontera #40

Merged
merged 22 commits into from
May 22, 2024
Merged

remote Jupyter Lab on Frontera #40

merged 22 commits into from
May 22, 2024

Conversation

Aangniu
Copy link
Collaborator

@Aangniu Aangniu commented May 10, 2024

1 It seems that the current tacc-apptainer module on Frontera does not support 'sigularity' as the command. But 'apptainer' serves the purpose.
2 It seems the current docker will automatically build a singularity container. I tested that it can also be used. So I add the following suggestion to the frontera.md
You can also use the automatically generated container after pulling the docker container

module load tacc-apptainer
apptainer pull -F docker://seissol/training:latest
apptainer run training_latest.sif

3 I change the name of 'northridge/northridge_resampled.srf' to 'northridge/northridge_resampled.nrf'.

@Aangniu Aangniu requested a review from Thomas-Ulrich May 10, 2024 14:50
@Aangniu
Copy link
Collaborator Author

Aangniu commented May 10, 2024

I will test the generated docker and get back to confirm the changes are valid.

@Thomas-Ulrich Thomas-Ulrich requested a review from wangyinz May 10, 2024 14:54
Copy link
Collaborator

@sebwolf-de sebwolf-de May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't rename this. An srf file is an ASCII file defined here: http://equake-rc.info/static/paper/SRF-Description-Graves_2.0.pdf.
An nrf file is our SeisSol specific netcdf (binary) converted version of an srf file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, OK. I will than just rename the parameter file:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before it was still asking for .nrf file

&SourceType
Type = 42   ! 42: finite source in netcdf format
FileName = './northridge_resampled.srf' ! input file.
/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an intermediate step that convert srf to nrf (see rconv in readthedocs).
So you do not need to change the parameter file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry. Maybe it means I just need to convert that srf to nrf after pulling the files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it in the python notebook. I will test it on Frontera again then.

!rconv -i northridge_resampled.srf -o northridge_resampled.nrf -x visualization.xdmf -m "+proj=tmerc +datum=WGS84 +k=0.9996 +lon_0=-118.5150 +lat_0=34.3440 +axis=enu"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that I can use the above command to properly generate the .nrf file.

1 I removed my previous rename of the -srf file. So with the latest commit, the only difference between the main branch is in the README file.
2 I add the following instructions on how to run the Northridge senario to the README.

To run the northridge scenario, you should:

cd seissol-training/northridge
mpirun apptainer run ~/my-training.sif pumgen -s msh2 mesh_northridge.msh
mpirun apptainer run ~/my-training.sif rconv -i northridge_resampled.srf -o northridge_resampled.nrf -x visualization.xdmf -m "+proj=tmerc +datum=WGS84 +k=0.9996 +lon_0=-118.5150 +lat_0=34.3440 +axis=enu"
OMP_NUM_THREADS=28 mpirun -n 2 apptainer run ~/my-training.sif seissol parameters.par

You can change seissol to SeisSol_Release_dhsw_4_viscoelastic2 if you want to run visco-elastic simulation instead of the default elastic one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But rconv is not MPI parallel, so you can omit the mpirun there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. I removed the mpirun for rconv.

@wangyinz
Copy link
Collaborator

The apptainer module still has the singularity command and it is simply a symbolic link to the apptainer binary, so it is fine to call any of those singularity commands. Still, I think it is less confusing to update that and use apptainer directly.

$ ls -l /opt/apps/tacc-apptainer/1.1.8/bin
total 50892
-rwxr-xr-x 1 root root 52107416 May  2  2023 apptainer
-rwxr-xr-x 1 root root     1455 May  2  2023 run-singularity
lrwxrwxrwx 1 root root        9 May 10  2023 singularity -> apptainer

@Aangniu
Copy link
Collaborator Author

Aangniu commented May 10, 2024

Thanks for commenting on this @wangyinz !
I checked again. You are right, the 'singularity' command also works.

@Aangniu Aangniu requested a review from sebwolf-de May 10, 2024 21:09
@sebwolf-de
Copy link
Collaborator

Apptainer is the new name, so I'd like to use apptainer wherever possible, such that we have forward compability.

@Aangniu
Copy link
Collaborator Author

Aangniu commented May 20, 2024

I added the job.jupyter for accessing jupyter notebook on Frontera with local machine.

@Aangniu
Copy link
Collaborator Author

Aangniu commented May 20, 2024

The environment for visualization is installed while creating the jupyter environment in job.jupyter with

pip install vtk pyvista

frontera.md Outdated Show resolved Hide resolved
frontera.md Outdated Show resolved Hide resolved
frontera.md Outdated
```
Step 6: Paste the link to your local browser, you will have access to the Frontera environment on your local machine.
```
https://frontera.tacc.utexas.edu:60320/?token=2e0fade1f8b1ce00b303a7e97dd962c5cd10c17f03a245e8c761ca7e1d5e1597
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works, but then I get:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also note that OMP_NUM_THREADS=4 could be increased
(could add a note on that)
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I found from slack that you have to use, e.g.
!mpirun singularity run /work2/09160/ulrich/frontera/Training/my_training.sif -3 tpv13_training.geo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my solution (and to be honest meta-llama's):

# Define variables depending on the environment
is_remote = True  # Set to True if running remotely on Frontera, False otherwise
singularity_image = '/work2/09160/ulrich/frontera/Training/training_pr-40.sif'
# Define a function to run a command with an optional Singularity wrapper
def run_command(cmd, singularity=False):
    if singularity:
        return f"mpirun singularity run {singularity_image} {cmd}"
    else:
        return f"{cmd}

and then:

gmsh_cmd = run_command("gmsh -3 tpv13_training.geo", is_remote)
!{gmsh_cmd}

and also:

pumgen_cmd = run_command("pumgen -s msh2 tpv13_training.msh", is_remote)
seissol_cmd = run_command("OMP_NUM_THREADS=28 seissol parameters.par", is_remote)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you are right. That was because it is not running in the container environment, but just on Frontera. So we will need to run with something like mpirun apptainer run <container> pumgen ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alice suggested in the slack we'd better have an additional session in the notebook for running remotely on Frontera.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my option allows not duplicating the notebook.
@davschneller @sebwolf-de what do you think? How should we proceed?

Copy link
Contributor

@Thomas-Ulrich Thomas-Ulrich May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other option is

!pumgen -s msh2 tpv13_training.msh
# on frontera with singularity
# !mpirun singularity run $singularity_image pumgen -s msh2 tpv13_training.msh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i think this option is clear for both remote and frontera users

frontera.md Outdated Show resolved Hide resolved
frontera.md Outdated
```
You can change `seissol` to `SeisSol_Release_dhsw_4_viscoelastic2` if you want to run visco-elastic simulation instead of the default elastic one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to account for attenuation (https://seissol.readthedocs.io/en/latest/attenuation.html) instead of assuming a fully elastic rheology?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated in the latest commit.

frontera.md Outdated
```

You can abort the jupyter lab with Ctrl-C, confirm with `y`.
Now you should see a directory `seissol-training`.
This folder should contain four directories for different scenarios.

You can also open the jupyter notebook that runs on Frontera on your local machine with the following steps:

Step 1: change `SHARED_PATH="/your/path/to/container/"` in line 75 of `job.jupyter` to the path where your sigularity container is built.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have SHARED_PATH as an argument and run e.g.
sbatch job.jupyter /work2/09160/ulrich/frontera/Training/my_training.sif

and then in job.jupyter
my_container=$1?
(I mean I don't know if that works with sbatch)

@Thomas-Ulrich Thomas-Ulrich changed the title update frontera.md change file name in northridge/ remote Jupyter Lab on Frontera May 21, 2024
frontera.md Outdated Show resolved Hide resolved
@AliceGabriel
Copy link
Collaborator

@sebwolf-de we would like to merge this one before the training tomorrow

@Thomas-Ulrich Thomas-Ulrich merged commit e31b2b2 into main May 22, 2024
1 check passed
@Thomas-Ulrich Thomas-Ulrich deleted the zihua/testing branch May 22, 2024 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants