-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: experiment to measure blocking of looks-like-random traffic #271
base: master
Are you sure you want to change the base?
Changes from 6 commits
e8ffc2e
3f1e20d
7815821
c409704
39201f2
7ebd29e
0ebaad2
1c6b91e
625f5a7
d6a810e
52648aa
d1ef89f
2ba4b25
1d275cc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,104 @@ | ||||||
# Specification version number | ||||||
|
||||||
2023-01-13-000 | ||||||
|
||||||
# Specification name | ||||||
|
||||||
Random Traffic | ||||||
|
||||||
# Test preconditions | ||||||
|
||||||
An internet connection | ||||||
|
||||||
# Expected impact | ||||||
|
||||||
Ability to detect the censorship of fully-encrypted protocols which encrypt every byte of traffic in an attempt to appear completely random. | ||||||
|
||||||
``` | ||||||
Note: This does not include TLS as TLS has a standard handshake to begin with. | ||||||
``` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest expanding upon this section to explain what fully-encrypted protocols are. For example, you can mention ShadowProxy, VMess, and OBFS4. You should integrate the remark that TLS is not a fully encrypted protocol into the whole discussion on fully-encrypted protocols, probably as the last sentence. This section should also explain that this experiment is based on a paper. Even though the paper is not publicly available, I think you should mention the paper title and its primary author. I think it's also important for you to summarize the findings of the paper in a very brief way. Basically, I would recommend mentioning the following points:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the specification accordingly |
||||||
|
||||||
# Expected inputs | ||||||
|
||||||
None | ||||||
|
||||||
# Test description | ||||||
|
||||||
The main goal of the test is to inform the user whether or not they are experiencing censorship on connections that send fully encrypted packets that appear random, as well as to record information about censored packets in order to better understand the censorship algorithm. The test seeks to accomplish these goals by doing the following: | ||||||
|
||||||
1. If no IP address is given by the user, select an IP address from the list of IP addresses in the affected range | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You mention that the nettest does not take any input above. But here the first point of the algorithm mentions that the user can provide an IP address. I find this a bit confusing, and we should address this. You chose to write an experiment that provides itself with input as a static list. You are using the Upcoming changes to OONI Probe will eventually allow us to provision this kind of input to your experiment in a smoother way (the high-level activity is ooni/ooni.org#1291, which I am cross referencing here to make sure I reference use cases made possible by this improvement). Until these changes are ready, I think it does not make sense into the spec to advertise the possibility of providing targets for this experiment using
I think this is also a good place to point out some metonymy issue across the whole specification and implementation: You refer to "IP addresses" (e.g., There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok all sounds good. Changed the specification but left the functionality in there just for testing purposes. |
||||||
2. Complete a TCP handshake with the IP address and send a stream of null bytes as a control test. If this control test succeeds then proceed with the experiment, otherwise attempt the control test with a new IP address two more times or until the control test is successful. If no control test succeeds end the test and return the error. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Please, remember to wrap long sentences for readability.) I think here you should say that you try with the first three TCP endpoints in the random permutation. If none of them works the test fails. In this case, the test would return an error to signal to the OONI Engine that you do not want to submit a measurement (<- is this the intended behavior?). Then, you should explain that "success" in this preliminary check consists of performing a TCP connect (aka TCP handshake) and then sending a string of zero bytes with a random length. I think it may be useful here to explain why using all zeroes is considered safe with respect to the GFW. (Would it work in, say, Iran, which is know to have a much more restrictive sets of filters with respect to "unknown" traffic?) I also have a methodological question here. You are sending a string of zero bytes but your code is not checking for the result of "send". Additionally, even if you would be checking for errors, I am not sure whether the error would be informative in most cases, because you're supposed to be able to enqueue on the socket buffer. Yet, checking for an error here would possibly be interesting in case you received an ICMP or other interference right after establishing the connection, but the opportunity window seems very small to me. That said, I am missing the real point of sending a string of zero bytes here. I suppose you are sending this to trigger some side effect, but I cannot fully see what the side effect is. Maybe your concern is that you want to know you can use a TCP endpoint before actually using it for the test, but, in such a case, what is the gain in sending the zero bytes given that after TCP connect succeeds you are not checking any other error? What would change methodologically if you avoid sending the zero-byte sequence and limit the control check to ensure that you can connect to the given IP address (to rule out it being already blocked, I suppose)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Made all of the desired changes. I have one question however. If the test returns an error are the results not recorded by OONI? That is still the desired functionality however it may lead to some test keys becoming irrelevant. Also a great point about the string of zero bytes. You are 100% correct about it being unnecessary in this case. We decided to remove them now but may choose to reimplement them in the future in case we decide to generalize the test. The specification was updated accordingly! |
||||||
3. Complete a TCP handshake with the IP address and send a stream of random bytes. If this connection times out, we attempt to connect once more to check for residual censorship. If the residual censorship test results in a timeout, we end the test, record information about the blocked packet, and inform the user they are experiencing censorship. Otherwise we continue with the test | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In addition to the suggestion, I think it would be useful to specify what should happen in terms of submitting the measurement when you get an error that is not a timeout. Should the engine submit the measurement also in that case, or do you think we should not submit when we get, say, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this case we are looking for one of three different results. First, there is the case that the user is not experiencing censorship in which we expect no errors. Then there is the case where the user is indeed experiencing censorship in which we expect a timeout error and only a timeout error. Finally, there is the case that there are any other unexpected network errors in which the test simply returns the error and records the test as failed. The specification was updated to explain this. |
||||||
4. Step 3 is repeated 19 more times to account for the blocking rate | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you should restructure the algorithm to say you repeat for 20 times and then you should have a nested list containing what is currently the content of step 3. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the specification accordingly! 👍 |
||||||
5. If no errors occurred and the test was completed, all connections are then closed and the test informs the user they are not experiencing censorship. | ||||||
|
||||||
# Expected output | ||||||
|
||||||
## Required output data | ||||||
|
||||||
* The result of the test, 'success' or failure type | ||||||
* Whether or not the censorship was detected | ||||||
|
||||||
## Semantics | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please, restructure this section to read like this:
Additionally, please, use the name in the JSON output for each key rather than the name inside the Go implementation, which is just an implementation detail. (The data consumer sees the JSON file.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the specification accordingly! 👍 |
||||||
|
||||||
* Success: True if no errors occurred | ||||||
* ConnectionCount: Number of successful connections | ||||||
* FinalPopcount: The popcount of the triggering packet | ||||||
* FirstSix: True if first six bytes of the final payload are printable | ||||||
* TwentyContig: True if there exist twenty contiguous bytes of printable ASCII in the final payload | ||||||
* HalfPrintable: True if at least half of the final payload is made up of printable ASCII | ||||||
* PopcountRange: True if final popcount is less than 3.4 or greater than 4.6 | ||||||
* MatchesHTTP: True if fingerprinted as HTTP | ||||||
* MatchesTLS: True if fingerprinted as TLS | ||||||
* Payload: Payload of final packet | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this definition should be improved. IIUC, this is the packet that triggered blocking in case there is censorship and the last packet that was generated otherwise. Additional, broader design questions for you: Is there value in uploading to the OONI backend the final packet in case of success? Could it be that we're missing information by avoiding to submit all the packets that did not generate censorship? |
||||||
* Censorship: False if all 20 connections succeeded | ||||||
* Error: String of error | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please, rename Also, it seems to me There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand that it seems both success and censorship could be derived from error, however in our test we do not consider a timeout error to be an error because it is expected in the case where a user is experiencing censorship. Error is used to record the type of any unexpected errors a user may have experienced while running the experiment. It is true however that success can be derived from error as a test is deemed successful if there were no unexpected errors. |
||||||
|
||||||
## Possible conclusions | ||||||
|
||||||
Ability to determine if the user is experiencing censorship on fully-encrypted traffic and what packet triggered the censorship. | ||||||
|
||||||
## Example output sample | ||||||
|
||||||
```JSON | ||||||
{ | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please, make sure you update the JSON to the latest version of the experiment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the specification accordingly! 👍 |
||||||
"annotations":{ | ||||||
"architecture":"amd64", | ||||||
"engine_name":"ooniprobe-engine", | ||||||
"engine_version":"3.16.0-alpha", | ||||||
"platform":"macos" | ||||||
}, | ||||||
"data_format_version":"0.2.0", | ||||||
"input":null, | ||||||
"measurement_start_time":"2023-01-03 06:53:40", | ||||||
"probe_asn":"AS6128", | ||||||
"probe_cc":"US", | ||||||
"probe_ip":"127.0.0.1", | ||||||
"probe_network_name":"Cablevision Systems Corp.", | ||||||
"report_id":"", | ||||||
"resolver_asn":"AS6128", | ||||||
"resolver_ip":"167.206.251.142", | ||||||
"resolver_network_name":"Cablevision Systems Corp.", | ||||||
"software_name":"miniooni", | ||||||
"software_version":"3.16.0-alpha", | ||||||
"test_keys":{ | ||||||
"success":true, | ||||||
"connection_count":19, | ||||||
"final_popcount":4.074525745257453, | ||||||
"first_six":false, | ||||||
"twenty_contig":false, | ||||||
"half_printable":false, | ||||||
"popcount_range":false, | ||||||
"matches_http":false, | ||||||
"matches_tls":false, | ||||||
"payload":"KLpodhNrDfHPs6cEYBe096yVZdxqZ3udlhcs/ziiC11KHXcs2LUfa/CpiiLyo2NfguJ99k+k23XWE59+lw723HpsGJUKJnHop2BLXUCVUJDektT6Hm9rYTeBtAvqPZP+LVQ+WmqpoU7OFpeM3m7mVTut2AfSaH8TPhaDG377uYXz2tvZy+Oa7d/AsLzl4DKc707x+tITtFj4V/Gg2RfaHZe4C9tH9Wujw/62PiM6IgT3IK9fXT2QB0O9ZinY9+KxwVs7AYbXhoYdMoF9+s1wIL1f1NNx/Khgx6eYovROsj4768niLIPy6ketR0jZAA1CLidDAaWOvEDc/Tgv5vHcenUR0VawQFhGTfu+J6z4GEoQoi6e+N1HqvRoLXCd/OWdgybHVBlpPc8Wr7K8xrvdMwGIGKN+rpClGiFwxLJQkptr5kr9oZmM3T9cBy2ViZjdRM7HW3c8YmrGmw0jyVDszHcl4kBHeANgOEGtAudqvoxKPbLZYxvke64wu5RGr3CUEpwAnJW4GgPvl1KSWt9n5HSC0+Lhtbrcd7iUtlufoRjHrw3IGDt+n+S4F1tvV+4cslBRcv+wlJx4zFL+We+gJSg2CUFVLqOdRgpB73lBTe1Sb2vBB1RSZ3Cn0WTwhpbFVASpDS8nnJsD+CSVmXVpOy0PxvrYLA/UY6mE0kFBfqH9oVC8A+TN0IA3/vkzwZ/P9Xs8HRP5xm6shPvpy19MD9YWSK0Co3EXUpQrt4TW4kPeMbt/Dgpxe72zcuh6N9pjp3oR1fz1ioMOIp+1yalhB3XqgYAALUzpYI1Ya2A4if9qQq9nvVdLqDKFTehxKW1+mgJ+3/I7EG+6yprd7UGuQSpc49Yg/LhBchiXhIqTcgpNNNNClnjh31UTQwYT2NjYWuWK0ijGQfDjwP9bgYOPGaUOyzjkZTnWL1ejAaa5saA3q9TzKdZoY5Pw3BbO0WXP6SH2H1hhS/dB8XQPPLnq9jHj", | ||||||
"censorship":false, | ||||||
"error":null | ||||||
}, | ||||||
"test_name":"shadowsocks", | ||||||
"test_runtime":6.178643611, | ||||||
"test_start_time":"2023-01-03 06:53:34", | ||||||
"test_version":"0.1.0" | ||||||
} | ||||||
|
||||||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, rename this file as
ts-040-randomtraffic.md
. In the meanwhile, we mergedts-039-echcheck.md
, therefore, we need to bump the nettest number used by this nettest.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, make sure you wrap long lines around ~line 80 to facilitate reading the spec from the terminal. Most users will read on the web, but it does not cost us that much to help people using the terminal. Also, having shorter lines helps with reviewing the spec in GitHub and providing suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the specification accordingly! 👍