-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diff multiple GEDCOMs #323
Comments
I'm wondering if the best you can do is remove definite duplicate fields or individuals. |
There are two things to explore here:
Going with option 2, you may be able to test how that might work with something like:
There might need to be some special options in this case where primary is always included and it just show the closest match (if any) from each respective right GEDCOM. |
|
In my case, I have many trees from Geni.com and Ancestry, and I don't know which one is most comprehensive, and some have a lot of information (not necessarily regarding overlapping parts of the tree) that the others don't, so I don't have a primary. It is possible that by comparing each one, I could identify (or create) a primary though, but it would be work. |
Unless similar to the |
Unless similar to the |
The GEDCOM comparison can already make full use of multiple cores to speed it up, see https://github.com/elliotchance/gedcom/blob/master/cmd/gedcom/diff.go#L84-L86 And, perhaps even better, if it knows two individuals are the same (by an identifier) it can avoid the expensive comparison altogether. However, consider the numbers: Comparing two small trees of 1000 individuals takes 1 million comparisons (that's fine), but three trees of 1000 individuals requires 1 billion comparisons to be exhaustive (not fine). Trying to compare many trees (even if they are quite small) will exponentially increase the processing time required. Depending on what you're goal is, it probably makes more sense to just choose a primary file and have everything work against that. |
Over time, I have accumulated multiple trees across multiple platforms, and they have regrettably became out of sync. I want to condense them all into one tree, but that means diffing multiple trees, something gedcomdiff doesn't currently support.
It would be unbelievable if I could diff multiple GEDCOMs and display the diff into a single HTML. The coloring would represent that a field is either missing from one file or added/missing from the rest. It would be cool to see for a given field which files are missing it and which have it. Is this feasible?
Sample code (not a Go expert, but I think this works):
The text was updated successfully, but these errors were encountered: