A new error I got while I bash server/startup/start.sh #676

holiday01 · 2022-06-02T02:59:20Z

holiday01
Jun 2, 2022

A new error I got while I bash server/startup/start.sh

E0602 10:38:02.562101740 608351 ssl_transport_security.cc:1495] Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.

Originally posted by @holiday01 in #631 (comment)

YuanTingHsieh · 2022-06-03T19:38:17Z

YuanTingHsieh
Jun 3, 2022
Maintainer

Hi,

Thanks for the report.
Can you try sudo apt-get update && sudo apt-get upgrade or something similar to update your system and restart the machine and try again?

0 replies

holiday01 · 2022-06-03T23:53:53Z

holiday01
Jun 3, 2022
Author

Hi,

Thanks for the report. Can you try sudo apt-get update && sudo apt-get upgrade or something similar to update your system and restart the machine and try again?

Thank you. I had updated it already. The error is still.

0 replies

YuanTingHsieh · 2022-06-06T22:18:29Z

YuanTingHsieh
Jun 6, 2022
Maintainer

Hi @holiday01 thanks for the reply.

Can you try if "poc" mode work for you?

https://nvidia.github.io/NVFlare/quickstart.html#setting-up-the-application-environment-in-poc-mode

Try to run an example app using the POC setup.

If that works,
Can you provide your system's information like OS, OpenSSL version, Python version?

0 replies

holiday01 · 2022-06-07T01:23:13Z

holiday01
Jun 7, 2022
Author

Hi @holiday01 thanks for the reply.

Can you try if "poc" mode work for you?

https://nvidia.github.io/NVFlare/quickstart.html#setting-up-the-application-environment-in-poc-mode

Try to run an example app using the POC setup.

If that works, Can you provide your system's information like OS, OpenSSL version, Python version?

The server worked by using poc mode. The client and admin connect it from the different machines. When I run the example "brats18" and the clients have the mdoel training results (e.g. model.pt), but I get the error and the model.pt is not be sent to server and admin. Hence, I guess I should run it by using provision which have securet setting.

0 replies

yhwen · 2022-06-07T13:25:27Z

yhwen
Jun 7, 2022
Maintainer

The error message shows that the client could not connect to the server at "103.124.23.130:8002". Please change the server to use hostname, instead of IP.

0 replies

holiday01 · 2022-06-07T14:09:35Z

holiday01
Jun 7, 2022
Author

The error message shows that the client could not connect to the server at "103.124.23.130:8002". Please change the server to use hostname, instead of IP.

Actually, the clients had connected to the server. After the clients got local_model.pt, the error was displayed.

0 replies

yhwen · 2022-06-07T14:13:23Z

yhwen
Jun 7, 2022
Maintainer

Are you using the POC mode? or the provision secure mode? Provision secure mode can only work with hostname. Also make sure both the server and client side use the same.

0 replies

holiday01 · 2022-06-08T01:27:05Z

holiday01
Jun 8, 2022
Author

I used the poc mode to get grpc error. Following this https://nvidia.github.io/NVFlare/quickstart.html#quickstart.
And provision secure model I had used the hostname.

0 replies

YuanTingHsieh · 2022-06-08T20:25:22Z

YuanTingHsieh
Jun 8, 2022
Maintainer

Can you try using POC mode: https://nvidia.github.io/NVFlare/quickstart.html#quickstart

And follow the hello-numpy-sag example first: https://nvidia.github.io/NVFlare/examples/hello_numpy.html

See if you can get this to finish with 1 servers and 2 clients (on different machines)

0 replies

holiday01 · 2022-06-08T23:19:59Z

holiday01
Jun 8, 2022
Author

Can you try using POC mode: https://nvidia.github.io/NVFlare/quickstart.html#quickstart

And follow the hello-numpy-sag example first: https://nvidia.github.io/NVFlare/examples/hello_numpy.html

See if you can get this to finish with 1 servers and 2 clients (on different machines)

I tried it.
I got the best_numpy.npy in the clients and server.npy in the server (on different machines)

0 replies

YuanTingHsieh · 2022-06-08T23:34:20Z

YuanTingHsieh
Jun 8, 2022
Maintainer

Thanks, looks like hello-numpy-sag is running correctly in POC mode with your systems.

This is a good start.

If you want to try the SECURE mode, you will need to follow here:

https://nvidia.github.io/NVFlare/user_guide/overview.html#provisioned-setup

And please also try hello-numpy-sag example first and see that results.

If things are good, please then try hello-pt to see if it completes.

I am sorry this is not a short process, we are working on improving the user experience in the next release.

If you want to try the BRATS example, you will need to install the required package in all the machines.

The required package is listed here:

https://github.com/NVIDIA/NVFlare/blob/main/examples/brats18/virtualenv/min-requirements.txt

And you need to split the BRATs dataset into the number of clients you have.

@holgerroth or @ZiyueXu77 can help from there if you got any questions.

0 replies

holiday01 · 2022-06-08T23:50:59Z

holiday01
Jun 8, 2022
Author

Thanks, looks like hello-numpy-sag is running correctly in POC mode with your systems.

This is a good start.

If you want to try the SECURE mode, you will need to follow here:

https://nvidia.github.io/NVFlare/user_guide/overview.html#provisioned-setup

And please also try hello-numpy-sag example first and see that results.

If things are good, please then try hello-pt to see if it completes.

I am sorry this is not a short process, we are working on improving the user experience in the next release.

If you want to try the BRATS example, you will need to install the required package in all the machines.

The required package is listed here:

https://github.com/NVIDIA/NVFlare/blob/main/examples/brats18/virtualenv/min-requirements.txt

And you need to split the BRATs dataset into the number of clients you have.

@holgerroth or @ZiyueXu77 can help from there if you got any questions.

Thank you for your response and help. I know I should have the high cost for the debug.

This issue had occurred as I ran "provision -p project.yml" which I edited the project.yml. Hence I ask it here.

I ran the hello-pt. And got the cross_site_val folder in the server (cross_site_val - cross_val_results.json, model_shareables, result_shareables)

I had finished the BRATS example, but I could not get the model in the server, whereas I got local_model.pt and best_local_model.pt in the clients.

0 replies

YuanTingHsieh · 2022-06-09T00:42:20Z

YuanTingHsieh
Jun 9, 2022
Maintainer

Hi @holiday01 thanks for more information and your experiment.

From "I ran the hello-pt. And got the cross_site_val folder in the server (cross_site_val - cross_val_results.json, model_shareables, result_shareables)", it seems that your system finish the hello-pt example correctly, which is a good thing.

Can you elaborate on what steps you take to run the BRATS example?

You said "I got local_model.pt and best_local_model.pt in the clients." => this means each client is doing their work.

Note that the config provided in the BRATS example folder: https://github.com/NVIDIA/NVFlare/blob/main/examples/brats18/configs/brats18_fedavg/config/config_fed_server.json#L3

Specify the min_clients to be 13.

So if you have client sites that is less than this number, the server side will not proceed (as it is waiting for all clients to submit result back).

You should modify this number to the number of clients you want to run.

0 replies

holiday01 · 2022-06-09T01:28:35Z

holiday01
Jun 9, 2022
Author

Hi @holiday01 thanks for more information and your experiment.

From "I ran the hello-pt. And got the cross_site_val folder in the server (cross_site_val - cross_val_results.json, model_shareables, result_shareables)", it seems that your system finish the hello-pt example correctly, which is a good thing.

Can you elaborate on what steps you take to run the BRATS example?

You said "I got local_model.pt and best_local_model.pt in the clients." => this means each client is doing their work.

Note that the config provided in the BRATS example folder: https://github.com/NVIDIA/NVFlare/blob/main/examples/brats18/configs/brats18_fedavg/config/config_fed_server.json#L3

Specify the min_clients to be 13.

So if you have client sites that is less than this number, the server side will not proceed (as it is waiting for all clients to submit result back).

You should modify this number to the number of clients you want to run.

Only two clients I used.
And I thought the process was finished, because the check_status client showed the stoped and I got pt files in the clients.

I edited the path of data in the config of clients, and image path in image json files.
Run it by admin shell.

set_run_number 1
upload_app brats18_fedavg
deploy_app brats18_fedavg all
start_app all

0 replies

YuanTingHsieh · 2022-06-09T06:36:03Z

YuanTingHsieh
Jun 9, 2022
Maintainer

Hi @holiday01 ,

Thanks for more information.

Can you change this line to False:

https://github.com/NVIDIA/NVFlare/blob/main/examples/brats18/configs/brats18_fedavg/config/config_fed_server.json#L98

I think if this is false then the server side model will not be deleted.

{
  "id": "global_model_eval",
  "name": "GlobalModelEval",
  "args": {
    "model_locator_id": "model_locator",
    "formatter_id": "formatter",
    "validation_timeout": 6000,
    "cleanup_models": true => change to false
  }
}

0 replies

holiday01 · 2022-06-10T01:26:42Z

holiday01
Jun 10, 2022
Author

Hi @holiday01 ,

Thanks for more information.

Can you change this line to False:

https://github.com/NVIDIA/NVFlare/blob/main/examples/brats18/configs/brats18_fedavg/config/config_fed_server.json#L98

I think if this is false then the server side model will not be deleted.
{
  "id": "global_model_eval",
  "name": "GlobalModelEval",
  "args": {
    "model_locator_id": "model_locator",
    "formatter_id": "formatter",
    "validation_timeout": 6000,
    "cleanup_models": true => change to false
  }
}

Thank you. And I tried it. And the clients also got model.pt, whereas the server did not.

0 replies

YuanTingHsieh · 2022-06-10T20:01:40Z

YuanTingHsieh
Jun 10, 2022
Maintainer

Hi @holiday01 ,

Thanks for getting back.

You said in the end "check_status client" show stopped.

What about "check_status server", what did it show in the end?

If possible, can you attach your server and client logs?

They can be found in startup folder as log.txt at each site.

0 replies

holiday01 · 2022-06-12T00:22:04Z

holiday01
Jun 12, 2022
Author

Hi @YuanTingHsieh

The check_status for the beginning.

To show the log.txt only, I removed the log.txt before running.

Server log

Clients did not exist log.txt.

The check_status for the end, while I checked the clients had the models.

Thanks

0 replies

YuanTingHsieh · 2022-06-15T07:18:34Z

YuanTingHsieh
Jun 15, 2022
Maintainer

I just run it with 1 server and 2 clients and it is working on my machine.

Seeing from your check_status server in the end, I see "Run number has not been set."

Did your server died?

Can you remove your old workspace and start fresh with 1 server and 2 clients and share the whole log file on both server and client here?

2 replies

holiday01 Jun 16, 2022
Author

r

There are the different machines.
The output of model had created, but the log of the clients were missing, after the process was stopped.

YuanTingHsieh Jun 16, 2022
Maintainer

Can you share all your machine's CPU memory and CPU and GPU and GPU memory information?

Client side should also have logs, I am wondering if client side has some execution exception so the logs are missing.

I know this experiment is long to run, can you reduce the total round to just 1 so we can check where is the problem first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new error I got while I bash server/startup/start.sh #676

{{title}}

Replies: 19 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

A new error I got while I bash server/startup/start.sh #676

holiday01 Jun 2, 2022

Replies: 19 comments · 2 replies

YuanTingHsieh Jun 3, 2022 Maintainer

holiday01 Jun 3, 2022 Author

YuanTingHsieh Jun 6, 2022 Maintainer

holiday01 Jun 7, 2022 Author

yhwen Jun 7, 2022 Maintainer

holiday01 Jun 7, 2022 Author

yhwen Jun 7, 2022 Maintainer

holiday01 Jun 8, 2022 Author

YuanTingHsieh Jun 8, 2022 Maintainer

holiday01 Jun 8, 2022 Author

YuanTingHsieh Jun 8, 2022 Maintainer

holiday01 Jun 8, 2022 Author

YuanTingHsieh Jun 9, 2022 Maintainer

holiday01 Jun 9, 2022 Author

YuanTingHsieh Jun 9, 2022 Maintainer

holiday01 Jun 10, 2022 Author

YuanTingHsieh Jun 10, 2022 Maintainer

holiday01 Jun 12, 2022 Author

YuanTingHsieh Jun 15, 2022 Maintainer

holiday01 Jun 16, 2022 Author

YuanTingHsieh Jun 16, 2022 Maintainer

holiday01
Jun 2, 2022

Replies: 19 comments 2 replies

YuanTingHsieh
Jun 3, 2022
Maintainer

holiday01
Jun 3, 2022
Author

YuanTingHsieh
Jun 6, 2022
Maintainer

holiday01
Jun 7, 2022
Author

yhwen
Jun 7, 2022
Maintainer

holiday01
Jun 7, 2022
Author

yhwen
Jun 7, 2022
Maintainer

holiday01
Jun 8, 2022
Author

YuanTingHsieh
Jun 8, 2022
Maintainer

holiday01
Jun 8, 2022
Author

YuanTingHsieh
Jun 8, 2022
Maintainer

holiday01
Jun 8, 2022
Author

YuanTingHsieh
Jun 9, 2022
Maintainer

holiday01
Jun 9, 2022
Author

YuanTingHsieh
Jun 9, 2022
Maintainer

holiday01
Jun 10, 2022
Author

YuanTingHsieh
Jun 10, 2022
Maintainer

holiday01
Jun 12, 2022
Author

YuanTingHsieh
Jun 15, 2022
Maintainer

holiday01 Jun 16, 2022
Author

YuanTingHsieh Jun 16, 2022
Maintainer