-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFD 163 Cloud Firewall Logging discussion #125
Comments
The ipmon(1M) command will produce output IFF a given netstack's ipfilter if logging is enabled. The Sample output from ipmon, where I enabled things with
The ipmon(1M) man page describes this output. Note the high-ish resolution timestamp, the rules (in this case -1, which I believe equals "no rule") the ports & protocol, as well as a bit of protocol-specific information. If we do not use ipmon directly, we should use what it employs, messages from |
I have a couple questions:
|
@askfongjojo said:
In the initial discussion it was proposed that a new TCP connection would be identified by having the SYN flag, which is only present as a TCP connection is being opened. If we only check for the SYN flag (ignoring ACK), each connection open will result in two packets being logged. In the three-way handshake both participants send a packet with SYN. Since a SYN on its own does not mean that the connection will be established, it probably makes more sense to log when a SYN-ACK is seen, as that means both sides are at least half-open. How this will work for UDP is a bit of a mystery to me. It seems we will be forced to keep some state in ipf or cfwlogd so that we know who initiated a conversation. Otherwise, we will consider both ends to be initiating conversations with each request and reply. Since there is no actual connection to tear down there will be no teardown phase and the only way to free up the associated memory will be as a result of inactivity.
We probably need something like that. It's not clear whether this is needed for MVP.
For MVP, I'm hoping to avoid this. It is quite likely it will be needed and the architecture should accommodate it. |
RFC 1122 Section 4.2.2.13 discusses closing a connection with a 4-way handshake or by sending a |
All for multiple, it'll be definitely easier to address the problem from the earlier design phase than trying to apply a patch later - which may never happen.
Just using DNS should do the trick here, as far as we add Audit API to Binder the same way we do with every core service.
It's a good question: I'd say millisecond precision, together with rule uuid, vm uuid and account uuid are elements enough to provide some uniqueness. I'd suggest to use a similar approach than the one used by workflow to avoid, for example, two different runners trying to execute two different machine jobs into the same machine at once: lock by target vm uuid and fw rule. Another, possibly better option, put together a general purpose change feed consumer which could be used by every service going HA. Of course, this one might be little bit out of scope |
Reviewing
nit:
naming nit: All the other agents have this FMRI:
So let's call this one:
s/Hermes/Logarchiver-agent/
Perhaps it is worth noting in the docs/RFD that this is separate from
Perhaps it is discussed somewhere else, but is there a reason for
Some thoughts:
Is there potential contention with VM migration, i.e. if one gets firewall logs
Why the ".json" instead of ".log"? It isn't strictly valid JSON. I understand |
I took a look at the Log Archiver Service section. The UUID translation facility as described seems idiomatic with the date translation stuff that's already there, which is great. Overall, I don't think you need as many phases in the project. The scaling problem is relatively simple to solve: the only bottleneck of which I'm aware is the proxy service which is totally stateless. You can just spin up more processes -- we've even done this before as a kind of hotpatch in the past. Even if you decide to go as far as totally replacing the entire proxy component with something else like Squid that might allow better vertical scale, though that would be a lot more work if done properly, you won't need to add mechanism to the master to sign URLs. The proxy is a straight TCP forwarder; the actors themselves are full Manta clients which are already handed appropriate credentials by the master. I'd be inclined to just do the scaling (and customer UUID mapping) work for Hermes straight up, and add the firewall logging service to the existing logset configuration using the existing Hermes instance. It seems like the shortest path to both solving the existing Hermes scale problems and meeting the new firewall logging needs. |
Replying to @trentm All changes accepted as suggested unless noted below.
I think this would trigger a "hot shard" problem that @dekobon was concerned about. I'm not sure how many VMs per customer it would take to make that a real concern.
As currently specified a restart of Maybe the compression plans should be re-examined. One approach would be to have /var/logs/firewall filesystem have
True enough. We need someone to chime in on the impacts on Manta.
Indeed. I don't have a good answer for this and eliminating the need to use
|
Replying to @jclulow
ok, we'll take a closer look at this.
Good to know, updated accordingly.
I was under the impression that there was a desire to split hermes off from the sdc zone but can't articulate the motivation. @trentm or @kusor - is there anything to this? |
One argument for separating the handling of cfwlog archiving and sdc log archiving was for separation of concerns. They have two different audiences (one for customres, on for operators) and migth have separate reasonable SLOs. Having them both handled by one service doesn't otherwise sound too bad to me. The "sdc" zone is a grabbag dumping ground zone that for some reasons needs to be a singleton. If we think we want to horizontally scale this log archiving service, then I think it should move out to a separate zone to not have the conflict between "log archiving wants multiple zones", "sdc0 zone cron jobs and napi-ufds-watcher don't know how to coordinate between multiple zones". |
@mgerdts, thank you for the recent update of the RFD. I have two additional questions:
|
Yes. The dependencies are set on the services such that the service that rewrite of rules completes before the
That is part of the longer-term plan for hermes, yes. In discussions with the Manta team, it seemed their biggest concern was avoidance of hot shards, which are particularly perturbed by |
I didn't see a mention of it in the RFD but you folks might already be aware of it, so just FYI; There is a standard for logging IP flows called IP Flow Information Export. This standard is basically the IETF version of Cisco's Netflow. For example OpenBSD has an implementation which it exposes via pflow. The advantage being there is a whole slew of tools which support IPFIX for visualizing, reporting, etc. |
This is a generic issue for RFD 163 discussion.
The text was updated successfully, but these errors were encountered: