Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we suggest installing git hooks that automatically rerun checkout_externals when needed? #124

Open
billsacks opened this issue Aug 6, 2019 · 5 comments
Labels

Comments

@billsacks
Copy link
Member

billsacks commented Aug 6, 2019

I'm curious what others think about this idea (@mvertens @jedwards4b @gold2718 @mnlevy1981 @ekluzek @bertinia @rsdunlapiv and anyone else with opinions on this):

Background

Currently, it is important for users and developers to remember to rerun checkout_externals when:

  • Checking out a new branch/tag/hash that may have changed the Externals.cfg file
  • Merging changes from one branch into your current branch, if these changes may have changed the Externals.cfg file

I've gotten pretty good at remembering to do this myself (though not 100% of the time), but I've been at least mildly concerned about requiring all users and developers to remember to do this step. We have talked about putting checks in cime that would abort if your current sandbox is out of sync with your externals listed in Externals.cfg, but I just started thinking about the following alternative:

Possible solution

We could solve this problem (at least in most common cases, if not all cases) via the use of git hooks. However, this would have some downsides, as described below, so I'm not sure if it's worth doing: I'm curious what others think.

Projects that use manage_externals could have a top-level script like initialize_repo. Rather than having people run checkout_externals, we'd instruct people to run this initialize_repo script just once, after initially cloning a repository. This script would (a) do the initial run of checkout_externals for you, and (b) install some git hooks. (For (b), I envision that the git hooks would be stored either in the project repository or in the manage_externals repository; these would then be symlinked or copied into .git/hooks/.)

The git hooks would be post-checkout, post-merge and post-rewrite (the latter handles changes after a rebase, among other things). Each of these hooks would rerun checkout_externals for you. Thus, whenever you check out a new branch, merge or rebase in your local repository, manage_externals would automatically be rerun.

I have tested this with this simple script, .git/hooks/post-checkout (which needs to be made executable):

#!/bin/bash

./manage_externals/checkout_externals

and this works as intended. (The scripts could be given some additional logic, if needed.)

Downsides:

  1. checkout_externals can be a little slow to run, at least for someone impatient like myself, so having it run automatically when it doesn't need to may be a pain for some people. There are a few possible ways to mitigate this problem:

    a. We could work on speeding up checkout_externals. One obvious way to do this would be to have it avoid doing any work if it finds that all repos are already in sync. (In a current test of CTSM, checkout_externals took 13 sec, even though I was already in-sync, and checkout_externals -S only took 1.7 sec. This problem may be partly related to Minor issue: Status of sub-externals shown incorrectly during checkout #87 , which seems to result in sub-externals being re-checked out even if they are already in sync.) (The exception here would be if an external points to a branch rather than a stable hash or tag: in this case I think we need to do a fetch to make sure we have the latest version of the branch.)

    b. We could add some intelligence to the hooks so that they only run if the externals file has changed. This seems feasible for the post-checkout hook, with which you can check the hashes before and after the checkout, and examine the diffs in Externals.cfg between those hashes (though things are complicated a bit by repos that have multiple top-level Externals files, such as CTSM's Externals.cfg and Externals_CLM.cfg: we'd need to be sure to check for diffs in either of those). However, I don't see a way to do this for post-merge or post-rewrite. Edit: Actually, this may be possible for the post-merge by examining the previous head (e.g., with git reflog?) and then diffing between the current and previous head; however, I'm not sure if that would be completely robust, opening the door for more possibilities for downside (2) below.

    c. We could recommend this for more casual users / developers, for whom switching branches, merging, etc. is rare; power users/developers could continue to use the existing workflow.

  2. There may be some rare cases where the hooks don't kick in. Because people won't be in the habit of running checkout_externals manually (instead relying on the hooks to do so automatically), it's very likely that checkout_externals won't get run at all in these rare cases (in contrast to the status quo, where people need to be in the habit of running checkout_externals manually). Off-hand, I can't think of situations that wouldn't be covered by these hooks, but there may be some.

What do people think: Is this worth doing? Or am I trying to solve a problem that doesn't really exist (i.e., it's not a big deal that people need to remember to run checkout_externals manually), or do the downsides outweigh the upsides of this proposed solution? If people like this idea, which of the suggested mitigations for downside (1) (if any) should we do?

@jedwards4b
Copy link
Collaborator

How about if we develop these hooks and use them ourselves for a while and then we can decide if they are generally desirable?

@cacraigucar
Copy link

My MICM workflow does not handle running checkout_externals during it and I personally do NOT want checkout_externals being run for me without my knowledge. For MICM, I sometimes need to update a couple of external repos at the same time. I run checkout_externals initially to get all the repos and then I blow away the ones I am going to edit (so I don't have a detached head and am working on my development branch) and manually do a git clone. The few times, I've run checkout_externals during my code modifications, bad things happen. I've learned to run checkout_externals sparingly during my development process.

Sometimes I update the Externals.cfg file with my next tag when I am getting ready to make my commit, but then realize I forgot to run a crucial test. I will run that test. At other times, I need to address an issue that is raised during my pull request and I need to make an edit and rerun my tests. This is all while the Externals.cfg is pointing to a not-yet-created tag because I'm waiting for the pull requests to be accepted.

Of course, my workflow may not be optimal, but its the way I'm working for now.

@gold2718
Copy link
Collaborator

gold2718 commented Aug 6, 2019

What happens if the hook kicks in and one of the repos is in an unclean state? Will the user get an error message when they have done nothing wrong?

@ekluzek
Copy link

ekluzek commented Aug 6, 2019

OK, so this is interesting. This is a problem that even though I'm aware of it, I still miss it on occasion. And I know it messes up external people. So I have to tell them about and make sure they are running manage_externals. So I do think it's an issue.

I also understand the perspective of not wanting to run manage_externals -- unless you do it yourself. Although your proposal allows either workflow. If you don't want the automatic magic to kick in for a clone, you just don't run "initialize_repo". Or if you did, and you can run it with an option to turn itself off.

I think I'd also prefer the practice of just having it run manage_externals with the "-S" option and just report on a problem. Then it would let the user actually run it, or fix the issue in the way they see fit. And one option would be to run "initialize_repo --turn-the-ding-dang-thing-off". So it brings it to your attention -- but doesn't run it without you doing it yourself. This is like the option with the OS of "let me know about updates -- but don't install them automatically". A lot of us have been bitten by automatic updates that then cause trouble -- so we'd rather do run it ourselves when we figure we are up to doing it. It sounds like always running with "-S" would also be better in terms of the time issue as well.

So you could have a couple levels of options. Level 0, is it's turned off. Level 1, it just runs "-S" to check for issues. Level 2, it runs manage_externals when it thinks it needs too. If the user can choose between these levels, they can get whatever behavior they want.

@billsacks
Copy link
Member Author

Thanks for these thoughts. A few replies:

@jedwards4b -

How about if we develop these hooks and use them ourselves for a while and then we can decide if they are generally desirable?

It makes sense for us to try out any tools ourselves to make sure they are robust enough, but I don't feel that I'd be able to tell if someone else would want these for their workflow based on my personal experience: I don't feel that my experience would be representative of a scientist's: I do multiple merges / checkouts per day, and the consequences of not doing checkout_externals is often small or zero (e.g., if I'm just doing branch management, merging to master, etc.). At the other end of the spectrum are the users I'm really thinking about here, who probably just do an update every few months (if that), and it's really important that their final sandbox has all of the appropriate externals. I think I'd personally get annoyed by these hooks very quickly, at least unless we did one of the mitigating things I mentioned in my initial post, whereas a more casual user would (I think) find that the benefits more significantly outweigh the drawbacks. Maybe something slightly more useful would be to enlist a few scientists (e.g., the component liaisons) to give us feedback on this.

@cacraigucar - my initial thought is that these hooks wouldn't be appropriate to your use case, and as @ekluzek mentions, the best thing for you would be to avoid installing these hooks (by not running initialize_repo). If your experience / workflow is common, then I'd say let's just drop this idea, but my guess is that your experience isn't representative of most scientists' / users' experiences (but I'm really not sure). A bigger question is what aspect of your workflow is problematic for the use of manage_externals, and what we can do to help with that. My guess is that this is at least partly related to #34 .

@gold2718 -

What happens if the hook kicks in and one of the repos is in an unclean state? Will the user get an error message when they have done nothing wrong?

Yes, that would certainly be an issue with this proposal, and maybe is a good reason to drop it, or perhaps go with @ekluzek 's suggestion of just printing some information rather than actually doing anything.

@ekluzek - I like your idea of just running checkout_externals with the -S option. This feels like a good compromise. I feel that, for this to be most useful, we'd want to significantly cut the verbosity of the output when your sandbox is fully clean and in-sync... something like:

Checking status of externals...
All externals are clean and in-sync with Externals.cfg; no further action is needed

(where the first line would print immediately to let you know why you're getting a few-second delay). If you are out-of-sync, I'd like to see output like that from checkout_externals -S -v, including a message instructing the user on what to do to bring externals in-sync. (This message would differ depending on whether there are any modified externals, since the action needed differs in that case.)

I'm not sure about the different option levels, though: I feel that might be adding unnecessary complexity for us to maintain, and potentially would make things more, rather than less, difficult for users - particularly since this is something you would need to decide on up-front, when you first clone a repository.

What do others think of @ekluzek 's idea of having hooks that print the external status whenever you do a git checkout, merge or rebase? Is this useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants