-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration test submodule/reset.go often fails #2974
Comments
I read up on concurrency in go a bit more, and I'm getting dizzy. 😄 I just discovered the "-race" command line option of go, which seems very useful. For example, running lazygit as Curious about your thoughts on this @jesseduffield. |
Yep, our concurrency model is shocking haha. You can use a read-write mutex which allows multiple reads at once but exclusive writes, but I don't like the idea of adding a bunch of locks and unlocks everywhere, nor do I have a good idea what a more callback approach would look like (where the mutex stuff is encapsulated). I do try to make it so that model objects are replaced rather than mutated when we refresh, and I try to assign a model object to a variable and then use that variable in a keybinding handler so that if a refresh happens we might be working with stale data but we won't get a slice bounds error But it would be good to structure things in a way where we guarantee that while you're in a handler, the model won't change. All model changes should happen in the refresh helper so that sounds doable to me. Other forms of mutation like gui state and context state I'm less clear on how to fix that. But I would love to solve this problem and have that race detector find no issues so we can add it as a CI step |
Looking closer at the code, I think the invariant that a given model slice won't be mutated does hold in this case, but we can't make use of that fact because in getDisplayStrings we use diff --git a/pkg/gui/presentation/commits.go b/pkg/gui/presentation/commits.go
index b4297f6ed..1ecc413e8 100644
--- a/pkg/gui/presentation/commits.go
+++ b/pkg/gui/presentation/commits.go
@@ -71,6 +71,10 @@ func GetCommitListDisplayStrings(
// this is where my non-TODO commits begin
rebaseOffset := utils.Min(indexOfFirstNonTODOCommit(commits), endIdx)
+ // endIdx may have been based on a prior instance of the commits slice so we
+ // clamp it here to ensure it's within bounds
+ endIdx = utils.Clamp(endIdx, 0, len(commits))
+
filteredCommits := commits[startIdx:endIdx]
bisectBounds := getbisectBounds(commits, bisectInfo)
|
That's good, but both of these only reduce the likelihood of it being a problem. The race is still there. I really see only two ways to solve this:
I really don't think this is an appropriate fix. It only works around the panic, but it doesn't really fix the behavior. What if the slice gets longer rather than shorter? We're only drawing half of the view in that case. It's just not right. |
Fair point: that would indeed only fix panics. As for the bigger fix, perhaps we should do some research into how other gui frameworks (e.g. tview) handle concurrency and see how easily we could adapt the same approach |
Tview has a documentation section about this, but it doesn't say much except that most of tview is not thread-safe, and doesn't have to because you typically interact with it on the main thread only. It doesn't say anything about model data, it seems it's simply the client's responsibility to deal with this in whatever way they want. |
I'm coming back to this now. I made a PR (#3019) that allows running integration tests with the I spent a bit of time experimenting with more locking, for example to extend gocui.View's writeMutex to a general mutex that protects all its fields, not just the lines buffer. And also to put more locks around reading model data, as discussed above. I quickly got the feeling that this is not a viable approach; it's just too messy to get all the locking right. So next I'm planning to experiment with the opposite approach: make it so all model data (and maybe even views?) is read and written only from the main thread. Concretely this means that refresh can still happen in a goroutine (e.g. calculate a new slice of model commits), but then the final assignment to the Model struct needs to be done with an As I said above, this makes it impossible to do a sync refresh and repaint to ensure you can do multiple ctrl-j in quick succession without glitches; but I suppose this could be solved by introducing a new "UI is blocked" mode where we don't dispatch commands but buffer them until the mode is over. This would allow ctrl-j to use a normal WithWaitingStatus instead of the hacked WithWaitingStatusSync that I added in #2966; I think that would be cleaner anyway. I'd like to hear your thoughts though before I start, as I expect it to be a pretty huge amount of work. Do you think it's worth trying? Can you foresee any problems that I might be missing? |
Centralizing the writing of model data to one place is easy, but centralizing the reading will be very complex. It would be worth just testing it out for a single model e.g. commits and see how many touchpoints there are. I'd also use a read-write mutex which allows multiple concurrent reads so long as there's no writes |
Why do you think so? Maybe I'm naïve, but my thinking was that whenever a go routine needs to look at model data, it needs to be passed to it in a variable. My hope would be that this allows us to get rid of the model mutexes altogether. |
Sorry forgot to respond to this. My assumption is that lots of code gets run in goroutines and there's lots of model data that needs to be read, meaning it would be hard to pass it all in (and to pass it around to helper functions). I suspect that using a read-write mutex would end up being less complicated, though still noisy. But like I said it's hard to know without just trying something and seeing how the result looks |
When running this test in a loop, it fails roughly 1 out of 5 times with the following stack:
I think this is a concurrency issue: the commits list is being re-rendered at the end of
refreshBranches
, butrefreshBranches
only locks theRefreshingBranchesMutex
, but not theLocalCommitsMutex
. The following patch seems to fix the problem:However, I'm not sure this is the right fix. To be honest, I'm very confused at the use of model mutexes in lazygit; it seems that only mutation of model data is protected by mutexes, but not use of that data. For example,
RefreshingBranchesMutex
is only locked inrefreshBranches
and nowhere else, so it only protects the mutation of the branches model (and the one re-render that happens inside that function). However, the branches list can be rendered from other places, and there's a lot of other code that uses the branches model; shouldn't all these places lock the mutex as well?The text was updated successfully, but these errors were encountered: