Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: experiment to measure blocking of looks-like-random traffic #271

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
104 changes: 104 additions & 0 deletions nettests/ts-039-randomtraffic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Specification version number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, rename this file as ts-040-randomtraffic.md. In the meanwhile, we merged ts-039-echcheck.md, therefore, we need to bump the nettest number used by this nettest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, make sure you wrap long lines around ~line 80 to facilitate reading the spec from the terminal. Most users will read on the web, but it does not cost us that much to help people using the terminal. Also, having shorter lines helps with reviewing the spec in GitHub and providing suggestions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the specification accordingly! 👍


2023-01-13-000

# Specification name

Random Traffic

# Test preconditions

An internet connection

# Expected impact

Ability to detect the censorship of fully-encrypted protocols which encrypt every byte of traffic in an attempt to appear completely random.

```
Note: This does not include TLS as TLS has a standard handshake to begin with.
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest expanding upon this section to explain what fully-encrypted protocols are. For example, you can mention ShadowProxy, VMess, and OBFS4. You should integrate the remark that TLS is not a fully encrypted protocol into the whole discussion on fully-encrypted protocols, probably as the last sentence.

This section should also explain that this experiment is based on a paper. Even though the paper is not publicly available, I think you should mention the paper title and its primary author.

I think it's also important for you to summarize the findings of the paper in a very brief way. Basically, I would recommend mentioning the following points:

  • the paper investigated passive blocking of fully-encrypted traffic by the GFW

  • the paper characterized the rules used by the GFW to block such traffic

  • the nettest produces random traffic that should be blocked

  • blocking in this context means that, once the offending payload has been observed, the GFW installs rules that null-route traffic for the server endpoint for a given amount of time, that this blocking is nondeterministic and that sometimes it takes a bunch of connections to the same destination endpoint with offending payload to trigger this form of blocking

  • the nettest records the characteristics of the generated traffic along with whether it was blocked and what are the characteristics of the payload that eventually triggered blocking

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the specification accordingly


# Expected inputs

None

# Test description

The main goal of the test is to inform the user whether or not they are experiencing censorship on connections that send fully encrypted packets that appear random, as well as to record information about censored packets in order to better understand the censorship algorithm. The test seeks to accomplish these goals by doing the following:

1. If no IP address is given by the user, select an IP address from the list of IP addresses in the affected range
Copy link
Contributor

@bassosimone bassosimone Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention that the nettest does not take any input above. But here the first point of the algorithm mentions that the user can provide an IP address. I find this a bit confusing, and we should address this.

You chose to write an experiment that provides itself with input as a static list. You are using the InputNone input policy. This choice is perfectly in line with what I would have done considering the constraints imposed to you by the OONI engine. Then, you additionally added the possibility of users to specify a target from the command line, which seems to me you did mostly for testing purposes. However, it may also be useful to test with a given endpoint from a possibly censored location, bypassing the default set of IP addresses.

Upcoming changes to OONI Probe will eventually allow us to provision this kind of input to your experiment in a smoother way (the high-level activity is ooni/ooni.org#1291, which I am cross referencing here to make sure I reference use cases made possible by this improvement).

Until these changes are ready, I think it does not make sense into the spec to advertise the possibility of providing targets for this experiment using miniooni -O Target=1.2.3.4:5678. Therefore, I suggest you restructure this sentence to say that:

  1. this experiment contains a set of TCP endpoints known to possibly host circumvention servers (e.g., an Outline server) and,

  2. when started, this experiment will randomize this list and operate on the randomized permutation starting to select the first endpoint and then moving on to use subsequent endpoints

I think this is also a good place to point out some metonymy issue across the whole specification and implementation: You refer to "IP addresses" (e.g., 1.2.3.4) while what you are actually dealing with are TCP endpoints (e.g., 1.2.3.4:80/tcp). I think the wording should be more precise and explicitly say the whole experiment only deals with TCP endpoints.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok all sounds good. Changed the specification but left the functionality in there just for testing purposes.

2. Complete a TCP handshake with the IP address and send a stream of null bytes as a control test. If this control test succeeds then proceed with the experiment, otherwise attempt the control test with a new IP address two more times or until the control test is successful. If no control test succeeds end the test and return the error.
Copy link
Contributor

@bassosimone bassosimone Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Please, remember to wrap long sentences for readability.)

I think here you should say that you try with the first three TCP endpoints in the random permutation. If none of them works the test fails. In this case, the test would return an error to signal to the OONI Engine that you do not want to submit a measurement (<- is this the intended behavior?).

Then, you should explain that "success" in this preliminary check consists of performing a TCP connect (aka TCP handshake) and then sending a string of zero bytes with a random length. I think it may be useful here to explain why using all zeroes is considered safe with respect to the GFW. (Would it work in, say, Iran, which is know to have a much more restrictive sets of filters with respect to "unknown" traffic?)

I also have a methodological question here. You are sending a string of zero bytes but your code is not checking for the result of "send". Additionally, even if you would be checking for errors, I am not sure whether the error would be informative in most cases, because you're supposed to be able to enqueue on the socket buffer. Yet, checking for an error here would possibly be interesting in case you received an ICMP or other interference right after establishing the connection, but the opportunity window seems very small to me. That said, I am missing the real point of sending a string of zero bytes here. I suppose you are sending this to trigger some side effect, but I cannot fully see what the side effect is. Maybe your concern is that you want to know you can use a TCP endpoint before actually using it for the test, but, in such a case, what is the gain in sending the zero bytes given that after TCP connect succeeds you are not checking any other error? What would change methodologically if you avoid sending the zero-byte sequence and limit the control check to ensure that you can connect to the given IP address (to rule out it being already blocked, I suppose)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made all of the desired changes. I have one question however. If the test returns an error are the results not recorded by OONI? That is still the desired functionality however it may lead to some test keys becoming irrelevant. Also a great point about the string of zero bytes. You are 100% correct about it being unnecessary in this case. We decided to remove them now but may choose to reimplement them in the future in case we decide to generalize the test. The specification was updated accordingly!

3. Complete a TCP handshake with the IP address and send a stream of random bytes. If this connection times out, we attempt to connect once more to check for residual censorship. If the residual censorship test results in a timeout, we end the test, record information about the blocked packet, and inform the user they are experiencing censorship. Otherwise we continue with the test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. Complete a TCP handshake with the IP address and send a stream of random bytes. If this connection times out, we attempt to connect once more to check for residual censorship. If the residual censorship test results in a timeout, we end the test, record information about the blocked packet, and inform the user they are experiencing censorship. Otherwise we continue with the test
3. Complete a TCP handshake with the IP address and send a stream of random bytes. If this connection times out, we attempt to connect once more to check for residual censorship. If the residual censorship test results in a timeout, we end the test, record information about the blocked packet, and inform the user they are experiencing censorship. Otherwise we continue with the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the suggestion, I think it would be useful to specify what should happen in terms of submitting the measurement when you get an error that is not a timeout. Should the engine submit the measurement also in that case, or do you think we should not submit when we get, say, ENETUNREACH?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we are looking for one of three different results. First, there is the case that the user is not experiencing censorship in which we expect no errors. Then there is the case where the user is indeed experiencing censorship in which we expect a timeout error and only a timeout error. Finally, there is the case that there are any other unexpected network errors in which the test simply returns the error and records the test as failed. The specification was updated to explain this.

4. Step 3 is repeated 19 more times to account for the blocking rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should restructure the algorithm to say you repeat for 20 times and then you should have a nested list containing what is currently the content of step 3.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the specification accordingly! 👍

5. If no errors occurred and the test was completed, all connections are then closed and the test informs the user they are not experiencing censorship.

# Expected output

## Required output data

* The result of the test, 'success' or failure type
* Whether or not the censorship was detected

## Semantics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, restructure this section to read like this:

this experiment generates a "test keys" result object containing the following keys:

Additionally, please, use the name in the JSON output for each key rather than the name inside the Go implementation, which is just an implementation detail. (The data consumer sees the JSON file.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the specification accordingly! 👍


* Success: True if no errors occurred
* ConnectionCount: Number of successful connections
* FinalPopcount: The popcount of the triggering packet
* FirstSix: True if first six bytes of the final payload are printable
* TwentyContig: True if there exist twenty contiguous bytes of printable ASCII in the final payload
* HalfPrintable: True if at least half of the final payload is made up of printable ASCII
* PopcountRange: True if final popcount is less than 3.4 or greater than 4.6
* MatchesHTTP: True if fingerprinted as HTTP
* MatchesTLS: True if fingerprinted as TLS
* Payload: Payload of final packet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this definition should be improved. IIUC, this is the packet that triggered blocking in case there is censorship and the last packet that was generated otherwise.

Additional, broader design questions for you: Is there value in uploading to the OONI backend the final packet in case of success? Could it be that we're missing information by avoiding to submit all the packets that did not generate censorship?

* Censorship: False if all 20 connections succeeded
* Error: String of error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, rename Error to failure. Most OONI experiments use failure rather than error.

Also, it seems to me censorship, success and error could all derive from the value of error. If that is the case, then I would recommend just keeping error around. We don't need redundant information (some OONI experiments have that, but that's no excuse to be more tidy with new experiments.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that it seems both success and censorship could be derived from error, however in our test we do not consider a timeout error to be an error because it is expected in the case where a user is experiencing censorship. Error is used to record the type of any unexpected errors a user may have experienced while running the experiment. It is true however that success can be derived from error as a test is deemed successful if there were no unexpected errors.


## Possible conclusions

Ability to determine if the user is experiencing censorship on fully-encrypted traffic and what packet triggered the censorship.

## Example output sample

```JSON
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, make sure you update the JSON to the latest version of the experiment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the specification accordingly! 👍

"annotations":{
"architecture":"amd64",
"engine_name":"ooniprobe-engine",
"engine_version":"3.16.0-alpha",
"platform":"macos"
},
"data_format_version":"0.2.0",
"input":null,
"measurement_start_time":"2023-01-03 06:53:40",
"probe_asn":"AS6128",
"probe_cc":"US",
"probe_ip":"127.0.0.1",
"probe_network_name":"Cablevision Systems Corp.",
"report_id":"",
"resolver_asn":"AS6128",
"resolver_ip":"167.206.251.142",
"resolver_network_name":"Cablevision Systems Corp.",
"software_name":"miniooni",
"software_version":"3.16.0-alpha",
"test_keys":{
"success":true,
"connection_count":19,
"final_popcount":4.074525745257453,
"first_six":false,
"twenty_contig":false,
"half_printable":false,
"popcount_range":false,
"matches_http":false,
"matches_tls":false,
"payload":"KLpodhNrDfHPs6cEYBe096yVZdxqZ3udlhcs/ziiC11KHXcs2LUfa/CpiiLyo2NfguJ99k+k23XWE59+lw723HpsGJUKJnHop2BLXUCVUJDektT6Hm9rYTeBtAvqPZP+LVQ+WmqpoU7OFpeM3m7mVTut2AfSaH8TPhaDG377uYXz2tvZy+Oa7d/AsLzl4DKc707x+tITtFj4V/Gg2RfaHZe4C9tH9Wujw/62PiM6IgT3IK9fXT2QB0O9ZinY9+KxwVs7AYbXhoYdMoF9+s1wIL1f1NNx/Khgx6eYovROsj4768niLIPy6ketR0jZAA1CLidDAaWOvEDc/Tgv5vHcenUR0VawQFhGTfu+J6z4GEoQoi6e+N1HqvRoLXCd/OWdgybHVBlpPc8Wr7K8xrvdMwGIGKN+rpClGiFwxLJQkptr5kr9oZmM3T9cBy2ViZjdRM7HW3c8YmrGmw0jyVDszHcl4kBHeANgOEGtAudqvoxKPbLZYxvke64wu5RGr3CUEpwAnJW4GgPvl1KSWt9n5HSC0+Lhtbrcd7iUtlufoRjHrw3IGDt+n+S4F1tvV+4cslBRcv+wlJx4zFL+We+gJSg2CUFVLqOdRgpB73lBTe1Sb2vBB1RSZ3Cn0WTwhpbFVASpDS8nnJsD+CSVmXVpOy0PxvrYLA/UY6mE0kFBfqH9oVC8A+TN0IA3/vkzwZ/P9Xs8HRP5xm6shPvpy19MD9YWSK0Co3EXUpQrt4TW4kPeMbt/Dgpxe72zcuh6N9pjp3oR1fz1ioMOIp+1yalhB3XqgYAALUzpYI1Ya2A4if9qQq9nvVdLqDKFTehxKW1+mgJ+3/I7EG+6yprd7UGuQSpc49Yg/LhBchiXhIqTcgpNNNNClnjh31UTQwYT2NjYWuWK0ijGQfDjwP9bgYOPGaUOyzjkZTnWL1ejAaa5saA3q9TzKdZoY5Pw3BbO0WXP6SH2H1hhS/dB8XQPPLnq9jHj",
"censorship":false,
"error":null
},
"test_name":"shadowsocks",
"test_runtime":6.178643611,
"test_start_time":"2023-01-03 06:53:34",
"test_version":"0.1.0"
}

```