Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gl 1358 migrate #2001

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Gl 1358 migrate #2001

wants to merge 7 commits into from

Conversation

kinow
Copy link
Member

@kinow kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 12, 2024, 09:22

  • Fixes Include 3.15.19 fixes in the migrate command of >4.1.X #1358

  • Refactored a bit the code, added typing

  • Added rsync offer in case FTP channel fails ( mn5 doesn't work in some cases with only FTP channel )

  • Pytests working on local, but as it needs a ssh connection, it also doesn't work in the pipeline ( waiting for github)

  • Draft: ( I have to check the cov )

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 12, 2024, 12:55

added 1 commit

Compare with previous version

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 12, 2024, 13:06

added 1 commit

Compare with previous version

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 12, 2024, 13:07

marked this merge request as ready

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 12, 2024, 13:08

ready to review,

Some codes are reported as not covered, like the pickup function, but they are actually being called, so something weird is happening with the coverage.

Anyway, I need to address two TODOs added when we move to github (like enabling the tests in the pipeline) so I can test again there.

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @kinow on Nov 12, 2024, 16:09

Commented on docs/source/troubleshooting/error-codes.rst line 211

s/Comprobe/Check?

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @kinow on Nov 12, 2024, 16:09

Commented on autosubmit/platforms/paramiko_platform.py line 1497

As e is never used, I think this is not needed?

image

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @kinow on Nov 12, 2024, 16:09

Commented on autosubmit/platforms/paramiko_platform.py line 520

Maybe it'd be a good time to also change this default 150 times to a configurable value (#986)?

If we let it retry rsync 150 times, without any sleep period we could have the same issue @utigerstedt & @Nortamo found in the VM that caused performance issues (although here it would be only when autosubmit migrate is used, so the risk is much lower).

Probably better to add a configuration with a default of... 2? 3?, and add a short interval time to sleep before retrying (can be configurable too, although we didn't do that in the other issue, so we can use the same timeout here, I think).

Or, if adding the configuration requires too much effort, maybe we can just lower from 150 here, and add the sleep in-between each retry.

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @kinow on Nov 12, 2024, 16:09

Commented on autosubmit/migrate/migrate.py line 114

Wouldn't a set or list be faster then constructing the string here? https://stackoverflow.com/a/16147471

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @kinow on Nov 12, 2024, 16:09

Refactored code and docs are looking great. Did a partial review, but I am have to go home 🏃 ☔

Also, I uncommented the test/unit/test_migrate.py and even though I have passwordless SSH local connection, the test failed. I didn't have time to look more into that, and it may be that the test was already broken. But just something else that we can take a look during the review too.

Thanks Dani!

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @kinow on Nov 13, 2024, 10:13

Commented on test/unit/test_migrate.py line 58

On master, with this change, test_type='test', and removing the pytest.mark.skip decorator the tests pass for me.

But on this branch, even with test_type='test' and removing the decorator, the tests are not passing for me.

FAILED      [100%]
test/unit/test_migrate.py:181 (TestMigrate.test_migrate_remote_rsync)
self = <test.unit.test_migrate.TestMigrate object at 0x7c21cec88dc0>
migrate_remote_only = <autosubmit.migrate.migrate.Migrate object at 0x7c21f9f69760>
migrate_tmpdir = local('/tmp/pytest-of-kinow/pytest-3/migrate_tests0')
mocker = <pytest_mock.plugin.MockerFixture object at 0x7c21f9f697c0>

    def test_migrate_remote_rsync(self, migrate_remote_only, migrate_tmpdir, mocker):
        # Expected behavior: migrate everything from scratch/whatever to scratch/whatever_new
>       assert migrate_tmpdir.join(f'scratch/whatever/{migrate_tmpdir.owner}/t000').check(dir=True)
E       AssertionError: assert False
E        +  where False = check(dir=True)
E        +    where check = local('/tmp/pytest-of-kinow/pytest-3/migrate_tests0/scratch/whatever/kinow/t000').check
E        +      where local('/tmp/pytest-of-kinow/pytest-3/migrate_tests0/scratch/whatever/kinow/t000') = join('scratch/whatever/kinow/t000')
E        +        where join = local('/tmp/pytest-of-kinow/pytest-3/migrate_tests0').join

unit/test_migrate.py:184: AssertionError

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @kinow on Nov 13, 2024, 10:39

Finished reviewing it this morning. There were parts of the code that I didn't understand if they were added here, or if they had been simply refactored (the latter is the case here). Great job splitting the job into smaller functions! Maybe in the future we can discuss using exceptions to control-flow too, and not only boolean flags (I think that could simplify our code a bit, without performance issues).

👍

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 14, 2024, 08:35

Commented on autosubmit/platforms/paramiko_platform.py line 520

found in the VM that caused performance issues

In that case, the problem with the performance was that the command was infinite until the end of execution and always doing stuff

Or, if adding the configuration requires...

The issue with the rsync was that sometimes the command ended abruptly without sending all the data.

The number of retries needed is unknown because there could be Terabytes of data or just megas. From people's experience, the crash is not related to the size, and just prompting the command again solves the issue. So it is hard to set a number. I guess I can lower the rsync to 30 times?

Sleep time is not a connection issue, so I'm unsure if it is useful. Thirty times should be fast to process anyway ( if there are connection issues* ), and if there is no data left, the loop ends. The idea is to end the transfer as fast as possible, and sleep would delay it without much benefit ( I believe ). Also, the mayor CPU/io time is done in the remote platform while in local, it is blocked without using resources

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @kinow on Nov 14, 2024, 11:26

Commented on autosubmit/platforms/paramiko_platform.py line 520

Thanks Dani! I thought that'd be running locally. Let's fix with a) choose either lower number, or add timeouts, or leave as-is (either way is fine by me) and b) add a small note to https://autosubmit.readthedocs.io/en/master/userguide/manage/index.html#how-to-migrate-an-experiment

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 28, 2024, 15:33

Commented on docs/source/troubleshooting/error-codes.rst line 211

changed this line in version 4 of the diff

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 28, 2024, 15:33

Commented on autosubmit/platforms/paramiko_platform.py line 1497

changed this line in version 4 of the diff

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 28, 2024, 15:33

Commented on autosubmit/platforms/paramiko_platform.py line 520

changed this line in version 4 of the diff

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 28, 2024, 15:33

Commented on autosubmit/migrate/migrate.py line 114

changed this line in version 4 of the diff

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 28, 2024, 15:33

added 1 commit

Compare with previous version

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 28, 2024, 15:34

Commented on autosubmit/platforms/paramiko_platform.py line 520

  • Reduced to 25 tries
  • Added some warnings in rst

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 28, 2024, 15:35

Commented on test/unit/test_migrate.py line 58

You're right, I also have this issue, but at some point, it was working ( perhaps some rebase?)

It seems that assert happens because the original folder is not removed.

  • Fix the rsync function

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 29, 2024, 12:07

added 1 commit

Compare with previous version

@kinow
Copy link
Member Author

kinow commented Dec 13, 2024

In GitLab by @dbeltrankyl on Nov 29, 2024, 12:10

added 1 commit

  • d5377ed - wrong syntax for skip fix

Compare with previous version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include 3.15.19 fixes in the migrate command of >4.1.X
2 participants