Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing context causes clients to disconnect #393

Open
ottj opened this issue Apr 3, 2023 · 6 comments
Open

Changing context causes clients to disconnect #393

ottj opened this issue Apr 3, 2023 · 6 comments

Comments

@ottj
Copy link

ottj commented Apr 3, 2023

(I use the newest version with the changes of the fork from amandaJonesAway)

If I connect with an existing or new user they spawn in the immigration context. Upon exit the client gets disconnected:

`neohabitat-neohabitat-1 | - 2023/04/03 18:43:57.417 EVN cont (User.connectionDied:User.java:159) User 'user-testuser-7751387903278591649' connection died: TCP(2)

neohabitat-neohabitat-1 | debug: Elko port disconnected...
neohabitat-neohabitat-1 | debug: {Bridge being shutdown...}
neohabitat-neohabitat-1 | - 2023/04/03 18:43:57.441 EVN comm-cli : TCP(3) new connection from 172.18.0.4:32896
neohabitat-neohabitat-1 | debug: Server connection established on: 172.18.0.4:60228 (unknown)
neohabitat-neohabitat-1 | debug: Connecting: 172.18.0.4:1337 <-> 172.18.0.4:60228
neohabitat-neohabitat-1 | debug: Habitat client disconnected.
neohabitat-neohabitat-1 | debug: Client disconnected for Client 172.18.0.4:60228 (unknown), moving session to ASLEEP
neohabitat-neohabitat-1 | debug: Disconnecting Habiproxy connection on: 172.18.0.4:60228 (unknown)
neohabitat-neohabitat-1 | - 2023/04/03 18:43:57.456 EVN comm-cli : TCP(3) remote disconnect
neohabitat-neohabitat-1 | - 2023/04/03 18:43:57.459 EVN comm-cli : TCP(3) died: java.io.EOFException
neohabitat-neohabitat-1 | debug: Elko port disconnected...
neohabitat-neohabitat-1 | debug: {Bridge being shutdown...}
neohabitat-neohabitat-1 | warn: Error [ERR_STREAM_WRITE_AFTER_END]: write after end
neohabitat-neohabitat-1 | warn: Error [ERR_STREAM_WRITE_AFTER_END]: write after end
`

If I comment these 4 lines starting at 478 in Habitat2ElkoBridge and prevent any disconnect I can change contexts just fine:
` server.on('end', function() {

        Trace.debug('Elko port disconnected...');
        if (client) {
            Trace.debug("{Bridge being shutdown...}");
            // if (client.userName) {
            //     Users[client.userName].online = false;
            // }
            // client.end();
        }
    });

`
This is more of a hack than a proper fix, hence I did not create a PR. I was not able to trace the source problem yet.

@StuBlad
Copy link
Contributor

StuBlad commented Apr 3, 2023

I'm able to stand up a NeoHabitat service using Amanda's fork as long as I manually run npm install inside the pushserver directory of the container and this issue doesn't occur for me for some reason.

Amanda has the same problem as you though @ottj .

What OS are you using? Does it happen for you if you use an existing user in the database? (Try logging in as Chip, Randy or Stu).

We had suspicions that there was API drift in our version of MongoDB regarding writing to the database, but because I was able to make things work with no changes besides what I mentioned above, I'm not sure.

Unless somehow our versions of Mongo are different, then I'm not sure what's up.

@ottj
Copy link
Author

ottj commented Apr 4, 2023

OS: I have it running in docker on macOS (intel) and in docker on a Synology NAS.
Users: It happens both with existing users like Chip and new users.
The pushserver npm errors can be semi-fixed by forcing install of serve-favicon on startup in the run script:

function start_pushserver() { if [[ "${NODE_ENV}" == "production" ]]; then (cd "${GIT_BASE_DIR}/pushserver" && npm install serve-favicon && npm run start 2>&1 | tee ../pushserver.log) else (cd "${GIT_BASE_DIR}/pushserver" && npm install serve-favicon && npm run debug 2>&1 | tee ../pushserver.log) fi }

I will check which mongo versions are running in the containers.

@ottj
Copy link
Author

ottj commented Apr 7, 2023

This is the output (with Mongo version 4.0.22) of the mongoDB container at startup:
`

2023-03-27T10:20:06.127034160Z stdout 2023-03-27T10:20:06.126+0000 I COMMAND  [initandlisten] setting featureCompatibilityVersion to 4.0
2023-03-27T10:20:06.126425735Z stdout 2023-03-27T10:20:06.126+0000 I SHARDING [initandlisten] Marking collection admin.system.version as collection version: 
2023-03-27T10:20:05.081521959Z stdout 2023-03-27T10:20:05.081+0000 I STORAGE  [initandlisten] createCollection: admin.system.version with provided UUID: 852bc2f2-f4aa-4aca-9c00-6b04b34b076b
2023-03-27T10:20:05.076772803Z stdout 2023-03-27T10:20:05.076+0000 I CONTROL  [initandlisten]
2023-03-27T10:20:05.076702367Z stdout 2023-03-27T10:20:05.076+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2023-03-27T10:20:05.076642649Z stdout 2023-03-27T10:20:05.076+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2023-03-27T10:20:05.076570063Z stdout 2023-03-27T10:20:05.076+0000 I CONTROL  [initandlisten]
2023-03-27T10:20:05.076344663Z stdout 2023-03-27T10:20:05.075+0000 W STORAGE  [initandlisten] Detected configuration for non-active storage engine mmapv1 when current storage engine is wiredTiger
2023-03-27T10:20:03.444543333Z stdout 2023-03-27T10:20:03.444+0000 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(0, 0)
2023-03-27T10:20:02.843229219Z stdout 2023-03-27T10:20:02.842+0000 I STORAGE  [initandlisten] WiredTiger message [1679912402:842923][1:0x7f1751b31a80], txn-recover: Set global recovery timestamp: 0
2023-03-27T10:20:00.271403075Z stdout 2023-03-27T10:20:00.269+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1376M,cache_overflow=(file_max=0M),session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2023-03-27T10:20:00.271351035Z stdout 2023-03-27T10:20:00.257+0000 I CONTROL  [initandlisten] options: { net: { bindIpAll: true }, storage: { mmapv1: { smallFiles: true } } }
2023-03-27T10:20:00.271307062Z stdout 2023-03-27T10:20:00.257+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2023-03-27T10:20:00.271258788Z stdout 2023-03-27T10:20:00.257+0000 I CONTROL  [initandlisten]     distarch: x86_64
2023-03-27T10:20:00.271213261Z stdout 2023-03-27T10:20:00.257+0000 I CONTROL  [initandlisten]     distmod: ubuntu1604
2023-03-27T10:20:00.271167175Z stdout 2023-03-27T10:20:00.256+0000 I CONTROL  [initandlisten] build environment:
2023-03-27T10:20:00.271122374Z stdout 2023-03-27T10:20:00.256+0000 I CONTROL  [initandlisten] modules: none
2023-03-27T10:20:00.271077304Z stdout 2023-03-27T10:20:00.256+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2023-03-27T10:20:00.271024777Z stdout 2023-03-27T10:20:00.256+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2023-03-27T10:20:00.270962252Z stdout 2023-03-27T10:20:00.256+0000 I CONTROL  [initandlisten] git version: 1741806fb46c161a1d42870f6e98f5100d196315
2023-03-27T10:20:00.270903936Z stdout 2023-03-27T10:20:00.256+0000 I CONTROL  [initandlisten] db version v4.0.22
2023-03-27T10:20:00.270250004Z stdout 2023-03-27T10:20:00.256+0000 I CONTROL  [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=neohabitatmongo
2023-03-27T10:20:00.243263159Z stdout 2023-03-27T10:20:00.242+0000 I CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'

`

@kevzettler
Copy link

Further context from Randy on slack messages:

If you'd like to start triggering this problem again, remove the comments from this block of code in the bridge:
10:15
// TODO! 5/19/2023 The 'end' handler below was removed because, when used with the PushServer/HabiProxy
// it would get called inappropriately [region transition!]. NOTE: This does NOT happen when connected
// directly to the elko/game server.
// This should be debugged by someone who knows how to use the appropriate tools...
// What if the Elko server breaks the connection? For now we tear the bridge down also.
// If we ever bring up a "director" based service, this will need to change on context changes.
// server.on('end', function() {
// Trace.debug('Elko port disconnected...');
// if (client) {
// Trace.debug("{Bridge being shutdown...}");
// if (client.userName) {
// Users[client.userName].online = false;
// }
// client.end();
// }
// });
We really need that code to handle disconnects elegantly. Everything I read says it should work - it should NOT shut down the bridge when there is a disconnect on the port connected to the habiproxy, but it does.
As the comment says, the habiproxy is not between the bridge and the game, it does the right thing.
That is all I know.

This appears to be from:

// TODO! 5/19/2023 The 'end' handler below was removed because, when used with the PushServer/HabiProxy
// it would get called inappropriately [region transition!]. NOTE: This does NOT happen when connected
// directly to the elko/game server.
// This should be debugged by someone who knows how to use the appropriate tools...
// What if the Elko server breaks the connection? For now we tear the bridge down also.
// If we ever bring up a "director" based service, this will need to change on context changes.
// server.on('end', function() {
// Trace.debug('Elko port disconnected...');
// if (client) {
// Trace.debug("{Bridge being shutdown...}");
// if (client.userName) {
// Users[client.userName].online = false;
// }
// client.end();
// }
// });

@kevzettler
Copy link

kevzettler commented Jul 25, 2023

From the Keith on the MADE Slack

The above needs a process.nextTick

You want the client socket end to happen the next time I/O pumps, rather than now, because otherwise nothing has the opportunity to process anything before the socket is closed. You may also want to see if there is an earlier socket callback which better reflects what you want to happen other than to end the connection immediately. I can’t recall.

https://nodejs.org/docs/latest-v18.x/api/process.html#processnexttickcallback-args

This is in reference to the code at:

// TODO! 5/19/2023 The 'end' handler below was removed because, when used with the PushServer/HabiProxy
// it would get called inappropriately [region transition!]. NOTE: This does NOT happen when connected
// directly to the elko/game server.
// This should be debugged by someone who knows how to use the appropriate tools...
// What if the Elko server breaks the connection? For now we tear the bridge down also.
// If we ever bring up a "director" based service, this will need to change on context changes.
// server.on('end', function() {
// Trace.debug('Elko port disconnected...');
// if (client) {
// Trace.debug("{Bridge being shutdown...}");
// if (client.userName) {
// Users[client.userName].online = false;
// }
// client.end();
// }
// });

kevzettler added a commit to kevzettler/neohabitat that referenced this issue Jul 25, 2023
This adds a process.nextTick handler to the server.on end event in
Habitat2Elkobridge

This was a suggested fix from Keith at The MADE
frandallfarmer#393 (comment)

This should address frandallfarmer#393 frandallfarmer#431 and others
@kevzettler
Copy link

further context on a test case from: #433 (comment)

 The test that breaks it is to use the web-client (port 1701) and try to change regions.
It still breaks the wrong connection from the habiproxy... Message trace here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants