-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mongodb version skew error #477
Comments
There shouldn't be as @Aristoeu has been using the container on HPC system recently. @Aristoeu could you please confirm that the latest container works? Also, maybe provide some details on how you've been running it. btw, I wonder if this error is due to using the newer mongoDB with an old database setup by an old version. |
Having just done this there is a strong possibility @Aristoeu is using an older version of the container. With apptainer (aka singularity) you normally build a file that apptainer uses for a run rather than name as is done with docker. He could easily be using a version he had built earlier. That is a warning to anyone who is using the latest version of any container. |
I actually just tested pulling the latest mspass container on Frontera with |
Yes, I was using an older version. I just pulled the latest container and ran it in distributed nodes using script |
In an email on this topic @wangyinz noted there were tricky issues with running containers on HPC that could be an issue here. I am suspicious there is some file system issue. If found mongod dies when I run the container this way:
where the two shell variables point to that their names imply. However, if I run this without the --home argument like this:
and then connect to the jupyter server I see that mongod is running and will accept a connection with mongosh. That doubly confirms the problem is not with the container, but suggests strongly it something about the system configuration on this new cluster. Note the same script used to run on the ancestor to this cluster that this one is replacing. It is not something simple as a write permission error as with the terminal in jupyter I can create and delete files in $WORK_DIR. Seems I need to get the cluster sysadmins involved. @wangyinz any ideas I could give them to help sort this out? |
The first thing to look at is what directories are by default mounted into the container. At TACC, the default behavior is to mount $HOME, $WORK, and $SCRATCH, so that inside the container user will have access to all the files just as on the host. However, IU might have that set differently. A simple test you can do is to run a bash shell within a container and see what directories are mounted. Then, you may use the bind option (see here) to explicitly add those that aren't there by default. |
Seems pretty clear you are right that this issue is that I need to get the right combination of the bind option to mount the right file systems along with the --home option. The only combination I've found that works is to not define --home at all, in which case home is literally my home directory on the cluster (~ in shell lingo), and use -B to mount the file system where I have data. Notebooks and a terminal running in container can access files in the file system defined by -B, but if I try anything with --home mongod doesn't run. Note I've also even tried reversing the order of -B and --home and that makes no difference. It may be telling that I have an older singularity container I built a few months ago that runs fine on this same cluster with apptainer even though it was created with singularity on a different machine. Makes me think we need to heed the error message I always find in mongod log file about a version problem that seems to be always posted when it crashes. |
I'm not yet sure why, but I have a solid workaround for the problem I encountered on this new IU cluster. It turned out that for some reason running the container using the --home options caused problems. If I omitted the --home options and used the -B option to mount the lustre file systems (here called /N/slate) AND make sure the container was launched following a cd to director passed as APPTAINER_MSPASS_DBPATH things worked. I'm guessing the behavior is that --home defaults to the current directory. Why using --home aborts monogd remains a mystery to me, but this is a simple fix that actually simplified the run line from the script I used before anyway. For the record, here is a record of the job script that ran (without SBATCH commands that are a side issue for this issue and comments).
I think you can mark this issue closed for now. |
I was trying to get mspass running on a new cluster at Indiana. As usual I had to do some customization to launch the mspass container. That actually only involved two things:
If we need to dig into my changes later we can do so, but I don't think that is the issue I'm facing here. For the record I found any attempt to do anything with mongodb in the container would fail with a timeout error. Running jupyter-lab terminal I quickly learned that mongod was not running in the container. I found the mongo log file (in the usual logs directory). There is the usual pile of stuff, but I cut out and did a pretty formatting of the line that shows the error that caused mongod to exit:
Is there a problem with the current container? I just pulled this version yesterday.
The text was updated successfully, but these errors were encountered: