-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add real world concurrent editing traces #20
Comments
Hi @josephg, Thanks for offering to share! How do you currently store the editing trace? For a linear editing trace (no concurrency), I suggest expressing it as an array of replacement operations (as I did in For expressing concurrency, we need to extend the vocabulary. Expressing all kinds of conflicts is very complicated and also not necessary for benchmarking. For the sake of benchmarking conflicts, it would suffice to introduce a const ops = [
[0, 0, 'Hello world'], // A replacement operation inserting the initial text
{ // fork the document into two different branches
main: [[6, 1, 'W']], // Make "world" uppercase
branchXYZ: [[11, 0, '!']] // Append "!"
},src
// We merge the forks into the main branch and discard the other branches
[0, 0, 'this is applied to the previous main branch']
] My idea would be to introduce a "fork operation" that is expressed as an object with named forks. The naming might be useful for future usage. For now, they could be random, or the branch names. We can only continue with a single fork, so there must always be a "main branch". If there is no main branch, we use the first named fork. A fork accepts an array of operations. Once all operations are applied to the fork, the fork is merged with the Once the forks are merged into the main branch, we continue with the main branch.
|
The old paper-editing trace we've all been using for benchmarking actually originally had a bit of concurrency in it as well – I only flattened out that concurrency in the edit-by-index version to make it easier to use. If we can agree on a format for representing concurrency, I can try pulling the concurrent trace out of the original files. Moreover, a collaborator and I have also been writing a new paper with a VSCode tracker to generate a more realistic editing trace. That also has some concurrency, I think. The fork operation you propose would work for simple branching patterns, but not more complex ones (e.g. one user first merges branch A and then branch B, while another user concurrently merges those two branches in the other order). But if we don't actually have complex merging patterns in our benchmark data, then that may not matter. I would also suggest that we make a difference between characters that are typed one-by-one, and character sequences that are inserted all at once (e.g. as a result of a paste). Moreover, what do you think about including a wall-clock timestamp on each operation? Automerge stores such timestamps (with 1-second resolution) for history visualisation purposes; do Yjs/Diamond also have timestamps? The timestamps have no effect on the CRDT logic, but they affect the size of the final file. |
Hey guys! ( @dmonad )
Right now I'm immediately converting it into a diamond types file and then loading & replaying that; but thats an unnecessarily complex file format. The format I'm using at the moment explicitly describes the graph. It looks sort of like this: Causal graph:
And so on. The file format for diamond types has no concept of peers or nodes. (Then I also store the operations themselves in the file in a big list). But the logic for traversing this graph might be tricky with yjs or automerge. How would you implement a I agree with @ept 's comment about forks not necessarily being able to capture all the complex paths. But we could do something like this: const ops = {
'xxx': {
parents: [], // ROOT
ops: [...], // A list of replace operations. (See below)
},
'yyy': { parents: ['xxx'], ops: [...] },
'zzz': { parents: ['xxx'], ops: [...] },
'qqq': { parents: ['yyy', 'zzz'], ops: [...] },
}
const finalState = ['qqq'] That would express the graph structure, and I could generate that directly (and pretty easily) from the traces I have. How hard would it be to replay something like that in yjs? EDIT: ... Or maybe a list would be better? The goal is to recreate / replay the state at the bottom of the file. const ops = [
{ id: 'xxx', parents: [], ops: [...]},
{ id: 'yyy', parents: ['xxx'], ops: [...] },
{ id: 'zzz', { parents: ['xxx'], ops: [...] },
{ id: 'qqq', parents: ['yyy', 'zzz'], ops: [...] },
}
Yeah thats what I'm doing for my other linear traces in my crdt-benchmarks repository. There's some fluff like timestamps, but the heart of those files looks like this: > require('./seph-blog1.json').txns.map(t => t.patches[0]).slice(5000)
[
[ 2572, 0, 'n' ], [ 2573, 0, 'g' ], [ 2574, 0, ' ' ], [ 2566, 9, 'r' ],
[ 2567, 0, 'e' ], [ 2568, 0, 'a' ], [ 2569, 0, 'd' ], [ 2570, 0, 'i' ],
[ 2571, 0, 'n' ], [ 2572, 0, 'g' ], [ 2573, 0, ' ' ], [ 2574, 0, 'p' ],
[ 2575, 0, 'a' ], [ 2576, 0, 'p' ], [ 2577, 0, 'e' ], [ 2578, 0, 'r' ],
... (They're recorded from real editing sessions. They aren't all single character edits even though it looks like it from this sample). ( @ept )
The data I'm scraping from git certainly has cases like that. Here's a simple editing history from a single file in nodejs. I'd want to capture and replay this trace accurately.
For what use case? I can't think of any time I would have the system treat a paste event or a typing event differently. Maybe there would be some cases where you'd want to have special optimizations for large inserts - but in that case, surely a heuristic approach based on the insert length would be what you'd want anyway. A single character paste event seems functionally indistinuishable from a typing event to me. And obviously, large insert events can always be broken up into a series of single character inserts if thats what your system expects.
I'd be open to this. I have timestamps in the data sets I've recorded myself, with 1 second granularity. But diamond types doesn't currently store timestamps at all. I just can't think of many uses for them, outside of "what did the document look like at (time)?". And for that I'd consider storing timestamps with like, 10 minute granularity or something. I think it makes sense to have a conservative stance wrt. the data we store about user's habits. |
Looping back on this, I've made the first pass of an export tool. Love some thoughts.
{
"startContent": "",
"endContent": "yoooooooo\n",
"txns": [
{
"id": 0,
"parents": [],
"numChildren": 1,
"agent": "GdLDwvf0MIfN",
"ops": [
[ 0, 0, "hi there\n" ]
]
},
{
"id": 1,
"parents": [ 0 ],
"numChildren": 1,
"agent": "r2GLJfHRMNbx",
"ops": [
[ 0, 8, "" ],
[ 0, 0, "yoooo" ]
]
},
{
"id": 2,
"parents": [ 1 ],
"numChildren": 0,
"agent": "gsWTmOk9JJOI",
"ops": [
[ 5, 0, "oooo" ]
]
}
]
} Does that make sense?
I'm a bit worried that in complex examples, the operation order will be dependent on the algorithm. It might be best to just stick to examples where there aren't any concurrent inserts at the same place. (So the resulting document order is well defined). I have a script to extract editing histories from any file in any git repository. Its not neat, but it works. If anyone has better sources of data, happy to use that instead. Some traces:
I'll put these files (and some others??) on my crdt-benchmarks github repository. But yeah, I'd really appreciate a glance before I do, to make sure the format doesn't look too crazy. |
Ping @alexjg |
I've spent the last couple of hours writing a little script to run the benchmarks against automerge, directly in rust (to make it a fair comparison). I assume I'm using automerge correctly since I get the right output after merging all the changes. I was worried - for some data sets, the resulting document order will be dependent on the CRDT algorithm so I checked if its ever the case that multiple inserts land at the same place in the document. The Here's my automerge generation & benchmarking script: I'd love to do the same treatment for yjs/yrs too, and get some more traces from different git repositories so we have a representative sample of data. @dmonad - do you have any thoughts about running benchmarks directly in rust for these algorithms? Hows yrs going? |
@josephg it makes sense, but only for However, a list CRDT like RGA (Automerge) or YATA (Y.js) usually is only concerned with keeping the latest version up to date, so they have no ability to replay a trace like that, essentially, for RGA or YATA CRDT to replay a trace like that, they would need to re-implement It looks to me for the trace to be shareable and reusable by different libraries the patches should be in a form to serve well all algorithms. Also, we may want to ask a question: what are we benchmarking here? It looks to me that:
So, the concurrent trace, IMHO, stresses the remote insert/delete code path of CRDTs. But the remote operations are different in all CRDTs. So, maybe the trace patches should contain all possible metadata needed for all different CRDTs. When I say "all", obviously that is not feasible, but RGA is the most popular algorithm, so that could be included. Also, Y.js is the most popular library, so maybe YATA can be included. Effectively, what operations need to contain is what the remote site/server would expect to receive:
In all three approaches maybe the IDs can be normalized to a 2-tuple: [agent: number | string, sequence: number] However, I'm not sure how much they can be reused across approaches. As they all semantically differ. For example, "agent" is a number in Y.js but a string in Automerge and Diamond Types. Also, the "sequence" part of the ID is incremented differently in all three approaches. So, it looks like even the operation ID cannot be shared across the approaches. Really the only part that can be shared is the "text" content of the operation. Which leads me to thinking: in that case is it even worth to have a common format. Maybe better to transform the trace into the native format of specific library, and see how that library performs on its native format. But if there was a common format, maybe it could look like something below: {
"startContent": "",
"endContent": "yoooooooo\n",
"txns": [
{
type: "insert",
content": "hi there\n",
ct: {
id: ["GdLDwvf0MIfN", 0],
parents: [],
position: 0,
},
rga: {
id: [123, 1],
ref: [123, 0],
},
yata: {
id: [123, 1],
leftRef: [123, 0],
rightRef: [123, 0],
},
},
{
type: "delete",
length: 8,
ct: {
id: ["r2GLJfHRMNbx", 1],
parents: [
["GdLDwvf0MIfN", 0]
],
position: 0,
},
rga: {
id: [456, 2],
ref: [123, 1],
},
yata: {
id: [456, 0], // 0 here, as in YATA, unlike RGA, you don't need to take max() of all seen ops
// Actually, Y.js does not even implement operations for deletes.
// It acts as state-based-CRDT for deletes, so not sure what to put here.
ref: [123, 1]
},
},
// and so on ....
]
} Another way to think about this concurrent trace is that: it is just like the non-concurrent trace, but with the exception that the {
"startContent": "",
"endContent": "yoooooooo\n",
"txns": [
{
type: "insert",
content": "hi there\n",
position: { // <--- In non-concurrent trace this would be just a number
ot: {
// the correct position as from perspective of every agent in the trace
"GdLDwvf0MIfN": { lastKnownOp: 0, position: 0 },
"r2GLJfHRMNbx": { lastKnownOp: 0, position: 0 },
// all agents ...
},
ct: {
// refs and position...
},
rga: {
// refs...
},
yata: {
// refs...
},
},
}
// and so on ....
]
} The above also incorporates OT approach: basically the correct OT position is pre-computed for every agent's perspective. This also highlights the similarity that Diamond Types has with OT: (1) both approaches need to encode somehow the "last know op" or "parents"; and then (2) both approaches also provide the An then, the RGA could actually be thought as being equivalent to CT/OT, with the difference that in RGA there are no absolute positions in a string, i.e. strings don't start from 0, but are referenced relative to some character. So, RGA also provides a "guess" of the {
"txns": [
{
type: "insert",
content": "foo",
position: {
rga: {
parent: [123, 1], // = ref
// Implicit "guess" of RELATIVE position in relation to "parent"
// position: 0,
},
}
}
}
} To sum up, OT and CT do not assign a unique ID to every character, so On the other hand, RGA and YATA assign a unique ID to every character, so So, if RGA/YATA would not store their characters as a linked list, but instead just stored the operations like OT/CT; for example RGA operation could look like: Then recursively resolving all the RGA operations in the form |
Yeah looks like we totally agree here - thats absolutely the goal! I don't want other algorithms to need to implement diamond types - but as I said above, I've managed to write a simple ~100 line script to convert data from this format into automerge or yjs's native formats anyway. The goal isn't to benchmark that conversion process. But once the data is converted, we can use the converted trace to benchmark how well automerge / yjs / sharejs / whatever else can load, save and merge the operations it contains. Re: What are we trying to benchmark, the idea is to compare how well the different collaborative editing systems would handle the same collaborative editing session. Ideally I want the following metrics:
Comparing remote operations seems easy to compare to me. Obviously each system will do different work internally, but it should be pretty easy to arrange data like this into a series of "network messages" (patches) received by a passive peer that merges them all together. Measuring how fast a given system can ingest a series of remote messages and then output the resulting document state seems like a reasonable, useful and fair benchmark. Comparing local operations seems like a trickier case to me, but we can probably take an editing trace and run a simulation of one of the peers. The simulation could run a carefully prepared trace which includes the following actions:
Total size on disk and memory usage are self explanatory. Given that we can convert any of these traces to any of those formats (yjs, rga, yata, etc) using a simple script, I don't think its worth adding the additional ID fields to these editing traces. - Since as you said, those IDs are quite implementation-specific and idiosyncratic, and we would need to pick ahead of time which algorithms to support. For an example of idiosyncracies, yjs's sequence numbers aren't consumed by deletes. Whereas automerge does consume sequence numbers in deletes. Yjs uses an integer for agent ID. Diamond types uses a string and automerge uses a fixed sized array of bytes. I'd much rather leave all of these details to the algorithms we're testing. Re: agent IDs, I'm not sure that the format should include agent names. If we only share traces which merge the same regardless of algorithm (in practice, this means no concurrent inserts at the same location) then the agent ID stops being useful. But it would come in handy if we want to run simulations from the point of view of a single peer (since we can use the agent IDs to filter the trace). |
Generating CRDT-specific traces out of Diamond Types trace sounds good, if indeed those scripts are simple. (Also, would be nice to have a generic RGA converter script, not specific to Automerge format, as RGA is the most popular algorithm.) I'm not sure I understand why you think that local operations are trickier to compare. I thought the existing editing traces are already good enough to compare the local operations. BTW, some CRDT implementations might not even have a dedicated local operation path. One way to think about local insert/delete operations is that they are just performance optimizations over the remote insert/delete. For example, in my implementation I don't have a dedicated local delete operation, instead local delete immediately creates the "parents" and executes locally the remote delete. Also, I didn't have a local insert for a long time, only added it later as an optimization. |
See for yourself!. Its ~100 lines of code, not including JSON parsing.
What has convinced you that RGA is the most popular algorithm? If you think a generic RGA converter is useful, you're welcome to write a tool to do this conversion yourself. The above algorithm can be used to generate the data for any sequence CRDT a positional editing trace, so long as that algorithm implements a Converting from RGA to other formats sounds harder to me than simply replaying a positional editing trace, but I haven't thought about it. Do you think that would be easier? How would you convert RGA to yjs's format, for example?
My partner and I recorded an editing trace the other night. I edited my little wiki to add 1s of latency, and then both of us typed up notes about a tv show we watched at the same time, into a shared document. The resulting document is 4000 words long, totalling just 21kb of text. But the causal graph (even run-length encoded) is this absolute unit of a graph: (NOTE: this SVG is 4mb in size. You will need to zoom out and/or scroll right to see the graph): This data set is very different from single user editing traces. I expect those differences will manifest in a lot of weird ways for different CRDTs. As an example, the optimizations I've made to the diamond types file format work much less well on this editing trace than on the single user traces I have. I suspect yjs will store this data set more compactly than I am managing to do. Another reason to simulate an editing history using interleaved local and remote operations is that it seems more "honest". I can imagine some CRDT implementations simply buffering local changes and applying / merging them all at once. A test which mixes remote changes and local changes seems like it would even the playing field.
I think about local insert / deletes as being different because the local peer first needs to convert the edit from a numeric position (from the editor) into the CRDT's internal data format. Eg, the operation I suspect this conversion may be fast / trivial for some CRDT implementations, and slow for others depending on the data structures used. As you said, some implementations will optimize this and some will not. |
It is the most cited and the best know/researched algorithm that provides good (
I didn't mean to convert from RGA to Y.js; rather was just saying that a converter from positional trace to RGA would be really nice to have. Would be happy to write it, the only problem is I don't know Rust, so not sure how well that will go.
That is true, I was just saying that that conversion of position to ID can be done in 2 different ways:
Specifically which optimizations do you have in mind?
Why? Storage is one thing, I'm wondering how will Y.js handle performance-wise for a concurrent trace. Y.js uses a cache of 80 items to effectively search in its linked list, as far as I understand. I'm wondering if given some number of concurrent users and some syncrhonization latency, where many cache entries are consumed by each user, how would it impact that caching optimization. Lets simplify, say each caching entry is used only by one user, and there are 81 concurrent users, but only 80 cache slots; how would it impact Y.js performance? But then Y.js has a benchmarking scenario for random inserts in its bench suite, so I guess it is not going to be slower than in that benchmark. And it looks like random inserts and sequential ones are performed at about the same speed (from the benchmark results).
Agreed, I think it is not just more "honest", but more relevant. Because:
BTW, I really appreciate that you have collected those traces and created that |
RGA implemented as it’s written in the paper (as a tree) isn’t O(log n) because the tree is unbalanced. Inserts approximate O(n) time in practice. Re: optimisations, the DT file format stores the distance travelled from each edit to the next edit. (Positive or negative). When two users are typing at the same time at different locations, this optimisation doesn’t work as well. DT and automerge also store the full causal graph - which is usually very small and simple. But in a trace like this, it’s quite large. If you’re explicitly looking for unusually pathological traces which will make systems perform badly, traces where millions of edits concurrently type at the same location will also perform very badly with all of these systems. I’m not that interested in traces like that because it’s unrealistic that it shows up by chance. And we have bigger problems at the moment if users are malicious. (See Martin’s BFT paper). Anyway, this is all a big sidebar for the issue here. Sounds like you’re more or less happy with the format. That’s good news. |
Alright, final proposal after a lot of unrelated conversation: For multi user editing traces, the dataset is a JSON file containing all the operations made by users. The dataset has the following fields:
txns have the following fields:
Misc rules, some repeated from above:
Open questions:
Example data set:{
"startContent": "",
"endContent": "yoooooooo\n",
"txns": [
{
"id": 0,
"parents": [],
"numChildren": 1,
"agent": 0,
"patches": [
[ 0, 0, "hi there\n" ]
]
},
{
"id": 1,
"parents": [ 0 ],
"numChildren": 1,
"agent": 0,
"patches": [
[ 0, 8, "" ],
[ 0, 0, "yoooo" ]
]
},
{
"id": 2,
"parents": [ 1 ],
"numChildren": 0,
"agent": 1,
"patches": [
[ 5, 0, "oooo" ]
]
}
]
} Unless there's any comments, I'll start publishing some data sets in this format. |
Lets go! I think |
I agree. I'll:
|
The first concurrent editing trace is published here, along with a spec: https://github.com/josephg/editing-traces/tree/master/concurrent_traces I've also uploaded a linearized (flattened) copy of this editing trace into the The next step is integrating this stuff into yjs. And I'd love to make some more traces! |
Sorry for the delayed reply, I just caught up on this thread and realised I never answered this question:
Sorry, I was unclear. I don't care either whether a particular insertion op was the result of typing a character or performing a paste. But if some characters were typed one by one, I would like them to be listed as individual ops in the trace (not aggregated into a longer string), ideally with a wall-clock timestamp on every op (or patch, I don't mind what we call it). I wasn't sure whether this is what you're already planning anyway. There are several reasons for this request:
Otherwise I'm fine with the format you propose. Thanks for doing this, and I will also aim to contribute some traces! |
@ept Thanks for the feedback - that makes a lot of sense and I agree. I'll add a timestamp field to transaction objects in the spec, and a dummy timestamp to the dataset I've recorded already. Getting some more multi user traces would be great if you can think of a reasonable way to record them. And @streamich is in the process of preparing & contributing some traces too. Unfortunately recording editing traces which accurately split each editing event using diamond types is a bit tricky because I actively merge consecutive edits together everywhere I can. I might need to rethink my recording strategy before I record anything else. (Or, maybe add a sidechannel to store edit groupings & timestamps for trace recording and reconstruct that information later.) I'd also like to use my git extractor tool to get some traces from git logs. But unfortunately, it turns out to be extremely common for traces extracted from git to contain concurrent inserts at the same location in the file. (And I'm trying to avoid that here so we don't get problems due to algorithms making different ordering choices). Unless we get lucky, my git extracting tool will probably not generate a lot of useful traces for cross-system benchmarking. |
I've found real-world editing traces to have very different performance profiles than simulated (random) editing traces. For diamond types, I've ended up writing a script to reverse engineer editing histories out of git repositories, which has given me some great & terrible editing traces.
(That last link is a graph of the history from Makefile from the git repository for Git itself).
These traces aren't perfect - I'm using
diff
to figure out what changes in each commit. But they're good enough for benchmarks, and I think a lot better than random traces.Anyway, I'd love to share them and add them here but I'm not sure what format would suit! How can I contribute them? How do you think we could express concurrent traces in a way that we can apply to multiple algorithms?
The text was updated successfully, but these errors were encountered: