Skip to content
This repository has been archived by the owner on Sep 1, 2022. It is now read-only.

AddressBook: discuss/implement multiple subscription files #337

Closed
1 task done
anonimal opened this issue Sep 8, 2016 · 6 comments
Closed
1 task done

AddressBook: discuss/implement multiple subscription files #337

anonimal opened this issue Sep 8, 2016 · 6 comments

Comments

@anonimal
Copy link
Collaborator

anonimal commented Sep 8, 2016

By submitting this issue, I confirm the following:

  • I have read and understood the contributor guide.
  • I have checked that the issue I am reporting can be replicated or that the feature I am suggesting is not present.
  • I have checked opened or recently closed pull requests for existing solutions/implementations to my issue/suggestion.

Place an X inside the bracket to confirm

  • I confirm.

First, see #336. Then, consider java i2p's naming spec.

For this ticket, we should discuss if we're to have separate subscription files because we currently only use hosts.txt. Also, if we hand edit the file, it will be overridden upon next fetch from a any publisher.

With a little design work, we can easily implement other files that won't be overridden. There's also the question of whether we want to have separate files for separate publishers.

Referencing #305 and #296

@anonimal
Copy link
Collaborator Author

anonimal commented Oct 2, 2016

In today's meeting we discussed an interesting solution:

  1. hosts.txt will continue to be a static file updated upstream and clobbered every 12 hours. Users do not edit this file
  2. uri.host()_filename.txt will be a unique subscription from unique publisher; re-named accordingly with the hostname of publisher and filename replaced with the filename provided by publisher (in case publisher publishes various subscriptions)
  3. private_hosts.txt will be private subscriptions: never clobbered, never shared

This doesn't address where we save inserted addresses that are saved from an address helper since they currently are simply inserted into ./addressbook though we could simply throw them into private_hosts.txt.

Issues that arise from this design:

  • We lack an effective de-duplication impl
    • multiple subscriptions containing the same address(s) will simply take up space
    • If I load only one subscription and it doesn't have an address that another subscription has, after using a jumpservice, why would I save said address to private_hosts.txt when it already exists in another subscription?

Temporary solution:

Really all of this seems messy so I think we should push hard for #385.

anonimal added a commit to anonimal/kovri that referenced this issue Nov 9, 2016
Noticed upon reviewing CID 151160/2/6, control was never allowed to
reach the loading of publishers if a subscription was already loaded at
startup - thus publishers were never loaded.

In addition to publishers not being loaded, the subscriber
would always be in a "downloading" state (though it was never
downloading) - thus allowing the thread to sleep unnecessarily upon
shutdown.

Fixes monero-project#445 monero-project#446

References monero-project#305 monero-project#337 monero-project#263
anonimal added a commit to anonimal/kovri that referenced this issue Nov 9, 2016
- Adds simple debug logging
- Reflects on the fact that update timer is called before loading
  publishers

References monero-project#447 monero-project#305 monero-project#337
@anonimal
Copy link
Collaborator Author

Note: #453 means that, on Win32/64 (only), the currently packaged downloads.getmonero.org subscription can never have ETag checked nor be downloaded/updated. This is an upstream issue being discussed in cpp-netlib/cpp-netlib#696. Until that issue is resolved, we may be able to work with a server-side patch if the issue ends up being cipher-related (or similar).

anonimal added a commit to anonimal/kovri that referenced this issue Nov 20, 2016
Ensures publishers are available to subscriber if client tunnels attempt
to load a subscription before the address book implementation "starts".

Note: current implementation will download/save subscription file if
it's ever missing (even if the file is removed during router run-time)

References monero-project#305 monero-project#337
anonimal added a commit to anonimal/kovri that referenced this issue Nov 20, 2016
@coneiric
Copy link
Contributor

coneiric commented Mar 1, 2018

In today's meeting we discussed an interesting solution

I really like the ideas from 1. 2. and 3, and I've been thinking of ways to implement de-duplication for addresses.

What do you think of:

  • loading all entries from hosts.txt and private_hosts.txt on startup
  • checking local subscription files (host.uri()-filename.txt) exist for all publishers
    • loaded from publishers.txt
  • store (append) unique subscription entries in private_hosts.txt
    • entry not already loaded from hosts.txt or private_hosts.txt
  • writing blank host.uri()-filename.txt to disk to signal the subscription has been processed
    • could also write a single line with timestamp to compare with SubscriberUpdateTimer
    • saves space for multiple subscriptions

Then if we wanted to mirror Java I2P's republish feature, the user could opt to save full subscription files in a separate folder.

Unique entries from republished subscriptions would still be appended to private_hosts.txt, but republished subscriptions wouldn't be processed on startup.

@anonimal
Copy link
Collaborator Author

anonimal commented Mar 2, 2018

loading all entries from hosts.txt and private_hosts.txt on startup

You'll have to watch for the following:

  1. Filesize. A trivial DoS will be to load a large filesize in the gigabytes range into memory without adequate checks or processing. This is one of several reasons why we need a database. Otherwise, we can simply do more disk access with the text file instead of loading everything into memory.
  2. Load hosts.txt first then scan for unique hosts in private_hosts.txt. There's no point in loading both files into memory if there are duplicates.

store (append) unique subscription entries in private_hosts.txt

Concatenate all non-host.txt subscriptions into private_hosts.txt? That defeats the purpose my point 2 in previous comment.

writing blank host.uri()-filename.txt to disk to signal the subscription has been processed
could also write a single line with timestamp to compare with SubscriberUpdateTimer

For signaling that a file's been written? This is wholly unnecessary if that's the case. I can comment further once I know what you are trying to accomplish.

Unique entries from republished subscriptions would still be appended to private_hosts.txt, but republished subscriptions wouldn't be processed on startup.

What do you mean by "republished"? Startup will fetch the latest subscriptions from publishers anyway or else the subscriptions will grow stale. Again, appending to private_hosts.txt defeats the purpose of private_hosts. Private means don't touch unless user explicitly touches it or wants us to touch it (yes, we're still talking about files 🤣).

@coneiric
Copy link
Contributor

coneiric commented Apr 3, 2018

Here is my current design plan, after digesting your previous comment, and working on the code:

Update:

  • Split up subscriptions by type, and add the type to the internal address map entries
  • Load/save the local default hosts.txt addresses into the map as Subscription::Default type
    • Do not overwrite the local default hosts.txt file until PGP verification is implemented
  • Downloaded publisher subscription addresses are loaded into the map as Subscription::User type
    • Discard the publisher stream, I can't think of a good reason to save it
    • Store the Subscription::User addresses into a local user_hosts.txt file
  • Load/Save manually added private subscriptions into the map as Subscription::Private type
    • Store the Subscription::Private addresses into a local private_hosts.txt file
  • If necessary, save Subscription::User and Subscription::Private addresses into {user_, private_}addresses.csv catalogs
    • Otherwise, discard saving loaded addresses into addresses.csv catalogs

@anonimal
Copy link
Collaborator Author

anonimal commented Sep 7, 2018

NOTICE: THIS ISSUE HAS BEEN MOVED TO GitLab. Please continue the discussion there. See #1013 for details.

@anonimal anonimal closed this as completed Sep 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants