collaboration guide #163

Byron · 2021-08-16T02:32:59Z

@avoidscorn @edward-shen What do you think about it? Is anything missing, should something be changed?

Please feel free to comment here or use the 'suggestion' feature of github reviews.

Apologies if the guide sounds a bit robotic or unfriendly but I feel pressed for time and rushed it a bit. However, I am happy to spend some time to make collaborating on the project more straightforward and would love changes to the guide that make it feel more approachable/polished :D.

Thanks

edward-shen

What's the recommendation on other crates you don't own mysteriously breaking? e.g. with the mysterious commit size change.

File an issue and don't try and fix it? Try and fix the issue?

COLLABORATING.md

avoidscorn · 2021-08-16T02:48:46Z

Some things I'd like to see in COLLABORATING.md:

I assume we should always use PRs when introducing new dependencies, even if they are feature gated? For my work, it would a bloom filter library, and possibly a library that provides pread.
General guidelines on when to use feature gates. So far I have: optional dependencies, pretty vs. lean, async vs sync, parallel support, and remote repository support. What about unsafe vs. 100% safe?

Byron · 2021-08-16T03:32:49Z

File an issue and don't try and fix it? Try and fix the issue?

Thanks for bringing this up and make me think about this more. I am suggesting to either fix it immediately if the cause is known or to open a PR if it's unclear what's going on. Differentiating this sometimes isn't easy, but I will try my best to do this from now on.

I assume we should always use PRs when introducing new dependencies, even if they are feature gated? For my work, it would a bloom filter library, and possibly a library that provides pread.

On crates you own, I'd think it's still up to you. I usually try to minimize the dependency by deactivating its default features, if possible, and generally try to keep builds fast. This might be for the DEVELOPMENT.md guide.

General guidelines on when to use feature gates. So far I have: optional dependencies, pretty vs. lean, async vs sync, parallel support, and remote repository support. What about unsafe vs. 100% safe?

I hoped that the collaborating guide could be focussed on how to work together, with details on how to setup crates being placed in DEVELOPMENT.md, which might warrant a separate PR and some cleanup. Quick thoughts though:

General guidelines on when to use feature gates…

Additionally I use it to have 'pure' Rust vs bindings with C and assembly.

What about unsafe vs. 100% safe?

I use unsafe only if there is no other way or if it enables far greater performance. So far I only recall one usage of unsafe, and it's to greatly accelerate pack resolution with a tree-like index. Is there any other reason to use unsafe on the level gitoxide is operating?

avoidscorn · 2021-08-16T03:53:14Z

I use unsafe only if there is no other way or if it enables far greater performance. So far I only recall one usage of unsafe, and it's to greatly accelerate pack resolution with a tree-like index. Is there any other reason to use unsafe on the level gitoxide is operating?

I have no use case for it. I just saw a dependency on sha-1 with the asm feature enabled, so the possibility of unsafe stuck in my mind.

Byron · 2021-08-16T05:21:46Z

I have no use case for it. I just saw a dependency on sha-1 with the asm feature enabled, so the possibility of unsafe stuck in my mind.

cargo geiger nicely shows unsafe within the dependency tree as well, and…there is a lot. sha1-asm greatly accelerates fundamental git operations, so it's kind of a requirement for most usage. It is turned off though in the pure-rust builds.

Maybe for everything else you can consult and adapt the development guide.

Could @avoidscorn and @edward-shen approve this PR if it's alright to merge for now? We can also adjust it if it turns out to be lacking somewhere.

Thanks.

kim · 2021-08-16T08:30:34Z

I'm not sure where this initiative is coming from, but I certainly appreciate it. Having been involved with gitoxide for the past several months, I get the impression that the project would benefit a lot from engaging more contributors. @Byron is doing great work, but he's also taken on a monumental task trying to become bug-for-bug compatible with git.git, while at the same time leveraging the implementation language to not only make things faster, but also harder to fuck up (tm). The reason for it to be so monumental is because git's surface API is extremely large -- git is not just the porcelain commands, it's a ~~zoo~~ swiss army knife of ways to manipulate a content-addressable storage. Because this is so, I have a couple of suggestions: The git history of this repository is basically impenetrable for anyone not actively involved in gitoxide development. Commit messages don't carry much information about how a particular change is motivated, what might have been alternative designs, or how the change might be an intermediate step towards a bigger goal, and what that goal might be. That might be okay, if maintainers agree that they don't ever look at the git history anyways without online access to GitHub -- but the context one can extract from GH collaboration facilities doesn't really carry this information either, and neither does the CHANGELOG. I'd would thus appreciate if describing a change in prose would be part of the contribution conventions. Another observation is that code review is not at all part of the development process. Exactly because the API surface extends to the lowest levels, breakage for downstream consumers needs to be considered at virtually every step. For example, f15f1e8 renamed a trait method -- yes it was tagged as a breaking change semver-wise, but it will still cause churn for consumers. And very unnecessarily so: the code is not improved in any way by this change, on the contrary: turning a method name into a sentence because that seems subjectively clearer very strongly smells like this abstraction needs some more fundamental redesign. This could have been caught by asking others for feedback before committing the change to main. So I'd strongly suggest to make PR'ing changes mandatory, even for maintainers. Lastly, and somewhat relatedly to the above, I think it's a misunderstanding to interpret trunk-based development as pushing out every edit (even mistakes) as they are made. The goal of this philosophy is to minimise breakage by not allowing the result of a merge commit to escape CI. It's not about bypassing code review, or discouraging refactoring because every single commit must be self-contained (as opposed to patch series). Please also consider that it comes from environments where all downstream consumers are tracked, and thus breaking changes need to be addressed by the author immediately. I am painfully aware of the fact that GH PRs are utterly unsuitable for a workflow which inherently requires (interactive) rebase. Perhaps worth considering would be to experiment with bors / homu, which can automate away a great deal of maintenance chore when many concurrent changes are in-flight.

Byron · 2021-08-18T00:28:47Z

tl;dr: Thanks for your feedback, it's much appreciated. I will make adjustments here soon to address transparency and motivation of changes, the ability to comment on them, as well as being less surprising for downstream consumers. Bear with me though, it's happening in parallel of feature development. gitoxide will have to grow out of being my playground into software you can depend on, despite it being very much WIP.

I'm not sure where this initiative is coming from, but I certainly appreciate it.

I am glad to hear that, it was postponed forever in the name of simplicity, even though in truth I just didn't want to deal with it.

However, with gitoxide now being used in radicle-link it's time to transition into a way of working which reduces surprises downstream.

That might be okay, if maintainers
agree that they don't ever look at the git history anyways without online access
to GitHub -- but the context one can extract from GH collaboration facilities
doesn't really carry this information either, and neither does the CHANGELOG.
I'd would thus appreciate if describing a change in prose would be part of the
contribution conventions.

To my mind it's certainly a good idea to try to describe in the commit body what the motivation is.
The workflow around a Kanban board, cards, issues with checkboxes definitely helps me to keep track of things and for now there seems to be no way to do it without GitHub. I did hear about some tool to handle/compile/manage features using text-only and it might even have been a Rust project. Something like that might be usable on top of GitHub facilities to add some transparency in offline scenarios. A pointer at this tool would be appreciated, of course I couldn't find it anymore even though I have probably starred it.

Another observation is that code review is not at all part of the development
process. Exactly because the API surface extends to the lowest levels, breakage
for downstream consumers needs to be considered at virtually every step. For
example, f15f1e8 renamed a trait method -- yes
it was tagged as a breaking change semver-wise, but it will still cause churn
for consumers. And very unnecessarily so: the code is not improved in any way by
this change, on the contrary: turning a method name into a sentence because that
seems subjectively clearer very strongly smells like this abstraction needs some
more fundamental redesign. This could have been caught by asking others for
feedback before committing the change to main.

I agree, breaking changes in plumbing should not be done as light-heartedly as that usually was the case, so I am happy to make PRs for those 'strongly encouraged'. Generally I prefer more semver compatible releases with few breaking changes than few releases with a lot of them, especially in plumbing crates. This makes the inevitable upgrade process easier. In any case, these breaking changes should add value and should be discussed. I will write down my thoughts in something like 'stability.md' and post it as separate PR for discussion. Generally I want to keep process around commits and changes minimal and thus not enforce code reviews, but provide some time to comment on breaking changes affecting downstream.

So I'd strongly suggest to make PR'ing changes mandatory, even for maintainers.

My biggest gripe with this is that I don't want them to be too long lived, so I will experiment with creating PRs instead of the 'small issues that are created from a checkbox in an issue' more, which should make the changes more easily digestible.

Lastly, and somewhat relatedly to the above, I think it's a misunderstanding to
interpret trunk-based development as pushing out every edit (even mistakes) as
they are made. The goal of this philosophy is to minimise breakage by not
allowing the result of a merge commit to escape CI. It's not about bypassing
code review, or discouraging refactoring because every single commit must be
self-contained (as opposed to patch series). Please also consider that it comes
from environments where all downstream consumers are tracked, and thus breaking
changes need to be addressed by the author immediately.

Maybe my love for trunk-based also stems from the lack of process and the simplicity of it - thus far my experience with it is very positive and I could go a month without main being temporarily broken. The main benefit for PRs I see are 'a grouping of commits by topic' and the added transparency with the option to comment and suggest changes. This is also why I want to try a workflow that involves them often, which may as well lead to a PR based workflow. PRs or not, what's merged should not break CI (or the software in case CI thinks it's good 😅), so I am probably missing the point about all downstream customers being tracked.

I am painfully aware of the fact that GH PRs are utterly unsuitable for a
workflow which inherently requires (interactive) rebase. Perhaps worth
considering would be to experiment with bors / homu, which can automate away a
great deal of maintenance chore when many concurrent changes are in-flight.

If fast-forward isn't possible when merging a PR, living with merge-commits seems alright to me, so I'd avoid rebasing branches. With the current amount of concurrent changes that should be quite alright. Using a bot is definitely something that should happen once the PR maintenance burden is too high.

kim · 2021-08-18T09:02:52Z

The workflow around a Kanban board, cards, issues with checkboxes definitely helps me to keep track of things and for now there seems to be no way to do it without GitHub.

Organising one's work is a rather ephemeral thing -- nobody will ask you in 5 years what you delivered during Aug 2021. It may however happen that someone is looking at code that's been written back then and ask themselves why it's been written like that. It will help them to be able to extract additional context from the data persisted in the version control system, or at least by following a link.

Generally I prefer more semver compatible releases with few breaking changes than few releases with a lot of them, especially in plumbing crates. This makes the inevitable upgrade process easier.

I don't agree that this is true in general: every change that breaks my build as a downstream user consumes my time, and it's not less annoying to spend smaller amounts of time more frequently, than a larger amount less frequently. Semver 0. releases are a bit of a lie tbh -- many de-facto stable projects never reach 1. because, I don't know, they don't have to commit? I think a good approach are time-based releases, which many larger projects do. You then have a summary of what happened (or didn't), and there's a committment to supporting API contracts for N releases.

So I'd strongly suggest to make PR'ing changes mandatory, even for maintainers. My biggest gripe with this is that I don't want them to be too long lived, so I will experiment with creating PRs instead of the 'small issues that are created from a checkbox in an issue' more, which should make the changes more easily digestible.

Patches tend to become long-lived when they try to do too much (and then, that's also fair). When they do too less, they tend to become only trees, no forest. As far as a checkbox-oriented workflow is concerned, I think Rust's "Tracking issue for RFC xyz" seems to hit a sweet spot in utilising the GH-imposed tooling. There, I can rather easily tell what both the context and the status is, and I can subscribe to notifications or not (well, that works to some extent).

PRs or not, what's merged should not break CI (or the software in case CI thinks it's good ð), so I am probably missing the point about all downstream customers being tracked.

Proponents of trunk-based development typically work with large monorepos, so CI covers all downstream users. That's quite an advantage, especially for things which can't easily be caught by the compiler. I'm not saying that code review is a substitute for this. Code review is about validating that your solution actually solves the problem, and does so in the best possible way given your current knowledge and understanding. That's also on the reviewer, of course: obsessing over whitespace doesn't improve things, but someone might bring something up you haven't thought about. There's a difference between churn, which everyone downstream has to keep up with, and agility as interpreted as getting many small changes in quickly, forward-fixing mistakes as you go. Striking the right balance is obviously not that easy.

If fast-forward isn't possible when merging a PR, living with merge-commits seems alright to me, so I'd avoid rebasing branches.

The result of a merge is not covered by CI in the typical GH flow, that's the thing.

Byron · 2021-08-19T02:23:32Z

Organising one's work is a rather ephemeral thing -- nobody will ask you in 5
years what you delivered during Aug 2021. It may however happen that someone is
looking at code that's been written back then and ask themselves why it's been
written like that. It will help them to be able to extract additional context
from the data persisted in the version control system, or at least by following
a link.

I understand that 'keeping track of what happened' isn't necessarily valuable as most people will want to know why something happened or the motivation behind that, the problem to be solved. The latter may be a bit implicit right now yet I think that raw task tracking can be combined an exploration of the problem to solve. There I want to find a lean process that isn't full-blown RFCs yet while yielding more useful information than now may be the case.

I don't agree that this is true in general: every change that breaks my build as
a downstream user consumes my time, and it's not less annoying to spend smaller
amounts of time more frequently, than a larger amount less frequently.

To elaborate, I think it's valuable to have a finer granularity under the assumptions that breaking changes are usually additive (as opposed to undoing a change done earlier). This would allow those who want to get the upgrade on the way but want to minimize time spent as well to upgrade to the next breaking release only, of which they may be more. The work needed there is small though, possibly only a single breaking change. Others might choose to go all in, upgrading to the latest version with more breaking changes to deal with. To me this sounds like win-win, and at least with pre-release software I consider this beneficial. With release software I would avoid that and rather collect breaking changes while still providing granular .alpha releases to allow testing, similar to how it was with pre-release software.

Semver 0. releases are a bit of a lie tbh -- many de-facto stable projects never
reach 1. because, I don't know, they don't have to commit? I think a good
approach are time-based releases, which many larger projects do. You then have a
summary of what happened (or didn't), and there's a committment to supporting
API contracts for N releases.

I agree, and having an understanding of what warrants a major bump from 0 to 1 is important - it's something I would like to have in stability.md as well. From there I like the idea of making time-based major releases (if there are breaking changes) and more regular feature releases. There it's interesting to think about the stabilization process of new features.

When they do too less, they tend to become only trees, no forest.

This one I don't understand, but I would like to. Tiny PRs might certainly be unnecessary overhead, and it certainly takes some experience to estimate how much work it will take to finish one of these alluringly simple checkboxes.

As far as a checkbox-oriented workflow is concerned, I think Rust's "Tracking
issue for RFC xyz" seems to hit a sweet spot in utilising the GH-imposed
tooling. There, I can rather easily tell what both the context and the status
is, and I can subscribe to notifications or not (well, that works to some
extent).

I think this is also what I ended up describing. An issue for the motivation of some feature along with a list of features/tasks that need implementation. These are then links to their respective PRs once one was created, with the issue being closed once all PRs/tasks are done.

Code review is about validating that your solution actually solves the problem, and does so in the best possible way given your current knowledge and understanding.

I absolutely agree, it's where I see untapped potential in gitoxide's development. When moving to PRs, which will happen very soon for me, there will be some time to see what's going on for those who are inclined to get involved.

There's a difference between churn, which everyone downstream has to keep up with, and agility as interpreted as getting many small changes in quickly, forward-fixing mistakes as you go. Striking the right balance is obviously not that easy.

Indeed, and I feel that with the workflow changes manifesting here we will have a handle for tuning this balance. I also don't mind at all to create PRs that fix breaking changes that have been deemed necessary after some PR based collaboration took place.

The result of a merge is not covered by CI in the typical GH flow, that's the thing.

Thanks for pointing this out - I am easily tricked into thinking that a green PR will remain green after merging even though this isn't a given at all. I'd be watching this and as error rates go up a bot probably is the solution in order to facilitate testing the would-be merge commits as well and maintaining merge queues.

And thanks a lot for chiming in, I find your thoughts very helpful and believe there will be a lasting positive effect on the project.

kim · 2021-08-19T07:50:56Z

There I want to find a lean process that isn't full-blown RFCs yet while yielding more useful information than now may be the case.

I realise that you often leave research notes in source code comments. That's actually quite useful, GHC has employed such a thing for many years [0].

To elaborate, I think it's valuable to have a finer granularity under the assumptions that breaking changes are usually additive (as opposed to undoing a change done earlier).

I mean, yeah, if you can maintain the discipline that the Semver-constraints are never broken, so the user can pick the point in time to bump, all should be fine in theory. Perhaps I was too disgruntled temporarily by a lapsus which can just happen :)

When they do too less, they tend to become only trees, no forest. This one I don't understand, but I would like to. Tiny PRs might certainly be unnecessary overhead, and it certainly takes some experience to estimate how much work it will take to finish one of these alluringly simple checkboxes.

This is most likely a shortcoming of git itself. If you look at projects which employ the traditional patch workflow (for example the epic reftable series [1]), you'll notice that they very diligently massage their series into small commits, including ones which are refactorings. That's a very laborious process, and has an important drawback: what if a subset of the series is good to go, but another subset would need another round? Because git is decidedly bad at reordering patches, this requires some serious interactive rebase skills. It's obviously easier when you have already split your commits into small and ideally non-conflicting changes. After having read a few articles about trunk-based workflows using systems like Phabricator and Gerrit, I have actually mostly given up on topic branches, and am rebasing everything on a single patch queue branch locally. The remaining nuisance is that GitHub PRs still don't like history rewrites very much, but well... [0]: https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Tc/TyCl.hs#L4726 [1]: https://github.com/gitster/git/commits/hn/reftable

Byron · 2021-08-19T08:16:20Z

I realise that you often leave research notes in source code comments. That's
actually quite useful, GHC has employed such a thing for many years [0].

A good point. I am putting these into doc comments usually and maybe they make it into the final documentation in some shape or form. One day all the other meta can be done in repo and will just be presented, instead of being owned by GitHub, I am sure!

I mean, yeah, if you can maintain the discipline that the Semver-constraints are never broken, so the user can pick the point in time to bump, all should be fine in theory.

I solemnly pinky swear that these constraints are to be maintained. It shouldn't even be that hard and I put a lot of faith into cargo smart-release to help with that.

After having read a few articles about trunk-based workflows using systems like Phabricator and Gerrit, I have actually mostly given up on topic branches, and am rebasing everything on a single patch queue branch locally. The remaining nuisance is that GitHub PRs still don't like history rewrites very much, but well...

While there isn't too much going on in main using PRs fortunately won't be much different even without rebasing them. I must admit though that it already feels strange to even have two branches/PRs going right now, definitely some cognitive load, but no good thing is free.

This is an outcome of the discussion in the PR and should help to minimize surprises while increasing quality.

Byron · 2021-08-21T11:40:48Z

Please take a look at another set of changes which aim to incorporate suggestions from the conversation in this PR. This also triggered another note to use the project board to signal planned features and generally what's going on.

My hope is that we can wrap this PR up till Tuesday 2021-08-24.
Thanks

Co-authored-by: Edward Shen <[email protected]>

…eakage

This is an outcome of the discussion in the PR and should help to minimize surprises while increasing quality.

Byron · 2021-08-24T00:03:01Z

Thanks for all your input and involvement. Welcome, collaboration guide :).

Byron added 2 commits August 16, 2021 10:14

Adjust contribution recommendation

0ba47e1

First draft of collaboration guide

a9f45fb

Byron referenced this pull request Aug 16, 2021

More git-config docs

f05a669

edward-shen reviewed Aug 16, 2021

View reviewed changes

COLLABORATING.md Outdated Show resolved Hide resolved

COLLABORATING.md Outdated Show resolved Hide resolved

COLLABORATING.md Show resolved Hide resolved

edward-shen approved these changes Aug 16, 2021

View reviewed changes

Byron mentioned this pull request Aug 16, 2021

git-protocol: fix bad gitoxide release radicle-dev/radicle-link#752

Merged

Byron added a commit that referenced this pull request Aug 21, 2021

Adjust collaboration guidelines to allow and even require PRs (#163)

bd25829

This is an outcome of the discussion in the PR and should help to minimize surprises while increasing quality.

Byron and others added 4 commits August 21, 2021 19:44

Update COLLABORATING.md

9501092

Co-authored-by: Edward Shen <[email protected]>

COLLABORATING: some helpful remarks; be more specific about fixing br…

b65521c

…eakage

Adjust collaboration guidelines to allow and even require PRs (#163)

199b941

This is an outcome of the discussion in the PR and should help to minimize surprises while increasing quality.

A note about the project board to help with transparency

6281729

Byron force-pushed the collaboration-guide branch from 4f5d789 to 6281729 Compare August 21, 2021 11:44

Byron merged commit d850004 into main Aug 24, 2021

Byron added a commit that referenced this pull request Aug 24, 2021

Adjust collaboration guidelines to allow and even require PRs (#163)

998ae6b

This is an outcome of the discussion in the PR and should help to minimize surprises while increasing quality.

Byron deleted the collaboration-guide branch August 24, 2021 06:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

collaboration guide #163

collaboration guide #163

Byron commented Aug 16, 2021

edward-shen left a comment

avoidscorn commented Aug 16, 2021

Byron commented Aug 16, 2021

avoidscorn commented Aug 16, 2021

Byron commented Aug 16, 2021

kim commented Aug 16, 2021 via email

Byron commented Aug 18, 2021

kim commented Aug 18, 2021 via email

Byron commented Aug 19, 2021

kim commented Aug 19, 2021 via email

Byron commented Aug 19, 2021

Byron commented Aug 21, 2021

Byron commented Aug 24, 2021

collaboration guide #163

collaboration guide #163

Conversation

Byron commented Aug 16, 2021

edward-shen left a comment

Choose a reason for hiding this comment

avoidscorn commented Aug 16, 2021

Byron commented Aug 16, 2021

avoidscorn commented Aug 16, 2021

Byron commented Aug 16, 2021

kim commented Aug 16, 2021 via email

Byron commented Aug 18, 2021

kim commented Aug 18, 2021 via email

Byron commented Aug 19, 2021

kim commented Aug 19, 2021 via email

Byron commented Aug 19, 2021

Byron commented Aug 21, 2021

Byron commented Aug 24, 2021