-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapping GEDCOMX to the process model #141
Comments
I think GEDCOMX should also support all aspects of the Genealogical Workflow as presented by Ron Tanner at RootsTech 2012: There's probably a lot in common between the Genealogy Research Process and this, but I'm sure the two nicely augment each other. Louis |
@lkessler - both links are based on the GRP so yes I agree should be supported but ... @stoicflame - could we have some clarity on whether GEDCOMX is intending to provide a minimal or best practice model? This was touched on in #138 - the two goals are quite different I think - either a minimum which must be adhered to (tho' without a regulatory body I'm not sure how the 'must' is ever enforced so I guess it has to be 'should') or the "best in class" which applications should strive to achieve. Also, you said in #138 that the goal was:
The proof standard is not quite the same thing as the process model (tho' the two are obviously complimentary). The GPS consists of five elements:
Sorry if I'm sounding picky here but the Proof Standard in its simplest form could just be represented by tacking a ProofStatement and a Bibliography onto the Record Model, whilst the Process Model is more specific and covers the whole range of research activities not just the proof at the end. To put it another way, should we focus on exchanging/standardising the publication of genealogical data (conclusions at the end of the process) or should we focus on exchanging/standardising the sharing/transfer of genealogical data (all data during and throughout the process). I'm just trying to understand the scope - sorry if it's tedious. |
Some use cases may help illustrate:
Which of the above can/should GEDCOMX Conclusion Model be trying to address? |
An excellent set of use cases. Good work! |
Many thanks :) I wondered if I was just getting too tedious! |
OK here's my take on answering my own questions:
I hasten to add that this is my head speaking ... my heart longs for a way to miraculously pump my cherished data through a magic machine and get it into whatever software I fancy with all the data, links and context intact (and preferably transformed in the "better" way supported by the new software). Sadly this pipe dream just leads to disappointment when I wake up in the real world. To summarise, I would conclude that the Conclusion Model should focus on a clear and simple data structure which can be interpreted either end of the transfer as unambiguously as possible. (In concluding this I should go back and retract many of my posts since I have tended to focus on the 'best practice' rather than minimalistic! Yes, I'm shooting myself in the foot here!) |
PS: Just to shoot myself in the other foot ... I suspect the "minimalist" model = the Record Model and hence reverses my vote for #138 |
@EssyGreen you've done a great job putting together these thoughts and use cases. I don't think you're being tedious at all. The issues you bring up are really tough to answer, but in the end I think I arrive at the same place that you articulated:
Which seems to imply a "minimalist" approach for this first version. But it still needs to be flexible enough to provide for future standards that will fill in more aspects of that "magic machine" with "all the data, links and context intact". In addition to addressing extensibility concerns, we know that the "minimal" standard needs to address more than what legacy GEDCOM does today. Our task is to identify and address what else is minimally needed and provide for it "as unambiguously as possible".
Actually, I sincerely think the conclusion model is a better fit for this. The record model as it's defined today attempts to deal with some very narrowly-focused subtleties of dealing with field-based record extraction and hence has a bunch of stuff that I doesn't really fit in this "minimalist" model. Date parts (see issue #130) is a great example of that. |
@stoicflame - many thanks for the positive feedback :) I have a couple of points related to your reply:
I'm not sure I agree with you there (tho' some examples may make me change my mind!) ... I think in some ways old GEDCOM attempted to achieve too much and hence ended up with aspects that applications wanted to treat differently but felt they couldn't because of the GEDCOM structure. A clear example of this I think is the PLACe structure ... by making it an embedded structure and including sub-elements it was difficult to convert this to/from a high level Place object without added complexity on import and data loss on export. We've solved this one in GEDCOMX (I think) by making it a record-level element but could fall into the same trap elsewhere. A similar problem happened with the little-used ROMN and FONE sub-elements which were quickly outdated by more advanced phonetic techniques and yet hung around in the sub-structures making the GEDCOM PLACe and NAME structures unnecessarily unwieldy. Conversely I would argue that over-use of the NOTE record links (e.g. alongside CALlNumbers) created an unnecessarily "stringy" structure. In summary, I think that the flatter the structure (within reason) the more flexible it is ... long trails of sub-elements are more likely to be problematic, especially in relational data scenarios.
You may be right ... to be honest my .Net version of the model is a bit of a mess so it's really hard to see what's in what. I've been hoping for a pull request to get a clearer/new model? Have I missed one or is it still in limbo (or should I go back to using eclipse/java)? |
EssyGreen said:
Sounds like GEDCOM with a few tweaks. :-) stoicflame said:
That works for me too. Louis |
Maybe ... @stoicflame - do you have a list of the good and the bad things about old GEDCOM so we can retain the good and get rid of the bad? If not, is it worth brainstorming? |
It does kind of sound like that, huh? I guess it kind of depends on what you think legacy GEDCOM primarily was. If you think it was a definition of a model for evidence information and a way to encode it, then I agree that this project sounds a lot like GEDCOM with a few tweaks. But if you consider the syntax of a GEDCOM file as being a major part of the spec, then this project doesn't sound like "GEDCOM with a few tweaks". In other words, I think one of the primary goals of this project is to overhaul the foundational technologies of GEnealogical Data COMmunications. This will enable the genealogical IT community to collaboratively, iteratively, and cleanly integrate the latest trends in application development. So even though the conceptual scope of GEDCOM X 1.0 won't be a huge revolution, the remodel of the infrastructure will be a big step forward for the community. In response to the original purpose of this thread, I think the initial scope of this project needs to be limited to the "Cite" and "Analyze" sections of the genealogy research map that @EssyGreen referenced. These are the sections that we're most familiar with sharing and exchanging via legacy GEDCOM, so the focus there has the biggest chance of success. As much as possible, the standard needs to supply well-defined integration points for the other sections of the process model that will be addressed by future efforts. Right now, we're working now on refactoring the project so that these concepts are clearly articulated at the project site. This effort includes the proposal outlined at #138. We hope this will be a big improvement to the project and we're anxious to get these changes applied for everybody to see. |
I don't have a definitive list, no. We should probably pull together that list from a lot of different sources, including this issue forum, the BetterGEDCOM wik, etc. We should also proactively request community help to pull together that list. I think a brainstorm is a good idea, but I'm struggling with the best way to facilitate that. I worry that creating a new thread would get too noisy with everybody commenting on everybody else's comments. And that would stifle those who have something to say but don't want to be subject to community scrutiny. What if I created a web form that people could fill out and submit? I'd broadcast its availability, gather all the comments, and post them somewhere so everybody could see the results without knowing who submitted them. There are some people that I consider legacy GEDCOM experts that I'd be especially anxious to see contribute.... Thoughts? |
Sounds like an excellent plan! |
Initial scope maybe but I think the whole process needs to be covered albeit in a simple form. For example, a simplistic inclusion of "Goals" could be a "ToDo" (=Research Goal) object (top level entity) with:
Plus an (optional) "ToDo" list of links included in each Person (representing the subject of the goal) A listing of all ToDos in CreationDate order represents the ResearchLog. This seems pretty simple to me but maybe I'm falling back into the "best practice" rather than the "simplistic" approach again. |
Re the other end of the process (Proof/Resolve/Conclude) ... In my experience there has been a growing awareness of the need for evidence-based genealogy rather than just "citing" sources and I think some form of inclusion would add credibility to the model and get a greater chance of GEDCOMX's acceptance. But it's a complex area so needs to be shredded down to a simple form. |
Current GEDCOM is a way to store and transfer genealogical conclusions. It also has inclusion of sources and source detail data, but only when used as evidence from the point of view of the conclusions.
No, I don't see the syntax being a major part of the spec. We could take the existing GEDCOM and transfer it mechanically into XML, JSON, or whatever. We could also take the GEDCOM X spec and translate it into the GEDCOM syntax. The content is all important. The syntax is not. Using a standard syntax potentially gives programmers and users more tools to use. Simple translators would be easy to write to convert GEDCOM X in one syntax to another. But simple translators to convert to and from GEDCOM 5.5.1 will be essential. If the conclusion data model of GEDCOM X is only "tweaked" from GEDCOM 5.5.1, then the transfer of the data that GEDCOM 5.5.1 can accept will be possible. However, if the conclusion data model of GEDCOM X is rebuilt, then the transfer will not be possible and the genealogical community will have a problem. Louis |
That's rather overstating the case. If the conclusion data model is substantially different from GEDCOM's, the translation may be more complicated and lossy, particularly going from GedcomX to GEDCOM. It won't be impossible. The genealogy software community (not the genealogy community, most of which doesn't actually care about the details but is utterly frustrated with the present lack of interoperability between mainstream programs) already has this problem: Few mainstream programs have internal data models that map well to GEDCOM, and their inadequate translation efforts are one of the main sources of that user frustration. The greater problem for GedcomX isn't what should or shouldn't be in its data model, it's that none of the mainstream program vendors are participating. |
@jralls - excellent points!
I have to say this is something that's bothered me ... hands up anyone here from Family Tree Maker, RootsMagic, Master Genealogist, ReUnion, FamilyHistorian etc etc? Are you lurking or absent? |
I totally agree with you on this point ...
... however, here I disagree ... to limit the scope of GEDCOMX to GEDCOM 5 with a few tweaks would be worthless. The problem with GEDCOM has never been the syntax (it's about as simple as you can get), it's the content (as you say above). Yes, we will need to provide a migration path from 5 to X but this should not be the goal of GEDCOMX. The goal should be to improve the data content and structure to be more in-line with the needs of the user community (which in turn should be more in-line with the needs of the software industry). Ergo map the process model but do it in a simple way that can be implemented in different ways by different software vendors. |
@EssyGreen Some are lurking, some are absent, some just flat don't care. |
@stoicflame - I noticed you put up that web-link for peeps to comment on GEDCOM strong/weak points .... any feedback yet? |
Yes, thanks for reminding me. I need to get that posted. |
@EssyGreen I finally got around to compiling the responses we got from the little poll we took: |
Brilliant! So now we have something to judge GEDCOM X against ... has it resolved these problems/addressed the deficiencies? Which areas do we need to tweak/adjust? |
A lot of them, yes.
Maybe that's the next step here? How do you think we should publish that information? Maybe add to that page a table with notes on how (or whether) GEDCOM X intends to address those issues? |
Using a field for something it wasn't intended to be used for e.g.
|
As people are pointing out you can't force developers to understand specifications or to properly implement them. You can supply tools and example files and suggest how they use them. You can implement a certification program where you run tests on their software before you issue the certification. One useful approach is to create a GEDCOM-X test file that includes examples of every possible type of content, and require developers to run a "round trip" test where they import that file into an empty database and them immediately export the database created during import to another GEDCOM-X file. There would have to be a tool to test whether the exported GEDCOM-X file were functionally equivalent to the test file. This won't prove that the developers understand what everything in the specifications means, but it does prove their software can handle all the little details of the specifications. |
Given several mentions of Tamura Jones' blog post about gedcom-x file size, it seems helpful to provide a link to the post. This seems to be it: GEDCOM X Converter The use of a DTD to help out seems potentially useful and I'd love to see an example of something like that. But it raises the question of how much xml machinery one might expect to need to use to read or write the files. Might people produce geddom-x files with XML Schema or XSLT or other related technologies, which readers would have to be able to interpret? |
But it would still need some 'overseer', to maintain the standard. |
That's a different sort of maintenance ie updating as/when necessary to accommodate new features etc. Policing the standard is not a viable option. |
Rephrasing then, who will look after the extensions and contribution, in the longer term, past release. |
I believe Family Search are the owners so it would be them. One for Ryan to confirm how this will be done. |
GEDCOM-X is a FamilySearch product. FS is listening to opinions from interested persons outside FS, but they will decide exactly what GEDCOM-X is to be. FS's primary goals in developing FS are to support their own internal data archiving and software processing needs. Fortunately those needs are relatively in sync with the needs of the overall genealogical community. It will up to FS to decide how much support GEDCOM-X gets with regard to updates and future evolution. You could look back at how they supported the evolution of GEDCOM to get a possible window on the level of support GX may get. |
Please excuse my ignorance, but surely if GEDCOM-X is to be a 'real open standard which developers can extend and contribute to', would not this make FS a caretaker, rather than an owner?? |
This seems to indicate a degree of scepticism that FS will maintain ongoing support. |
Indeed. But at the moment it is still owned/controlled by FS. Whether they choose to make it an open standard or retain ownership is one for Ryan/FS. |
From FAQ:
Does this mean FS may not always be the admin for the project? |
Alex, There is no way to answer your questions. Maybe Family Search will open GEDCOMX, maybe they won't. Maybe GEDCOMX will be a flash in the pan, maybe it will come to fruition and be actively used. Maybe Family Search will support its evolution into the future, maybe they won't. Maybe the FHISO will want to get involved and apply pressure to open up GEDCOMX. Maybe FHISO will decide to roll their own and pick up work on BGEDCOM and try for their own standard. Maybe the GEDCOMX and BGEDCOM efforts will merge. Maybe Ancestry.com will throw a big monkey wrench into the works and announce a data format ANCECOM of their own. Maybe Louis will propose GEDCOM 7.0 in such a convincing manner that Winnipeg becomes the genealogical capital of the western hemisphere. Maybe the scales will simultaneously fall from everyone's eyes and DeadEnds become the genealogical standard for the future. There are no simple answers to comfort you. GEDCOMX is currently the most serious, visible work being done on genealogical data models, so it is the place to be if you are interested in such things and want to make any kind of contribution. |
Tom said: "Maybe Louis will propose GEDCOM 7.0 in such a convincing manner that Winnipeg becomes the genealogical capital of the western hemisphere." :-) Louis |
From reading Louis' comments here, that is an option I could not only support, but actually understand. |
Not sure it is a place to be, if a genealogist and not a developer. Appears very technical. |
That may be true, but the Title from gedcom.org states: Either thay mean OPEN or they don't. |
This is a very good point ... I think most of us here are both but the documentation, jargon and issues discussed are (in my opinion) tremendously biased towards the technicalities rather than the genealogical complexities. @alex-anders - don't go away! Use your voice to help us get back on track! |
I am new to GitHub and have difficulty in finding 'stuff' I understand. Even the "GEDCOM X is the technological standard whereby genealogical data is stored, shared, searched, and secured across all phases of the genealogical research process" statement from the front page confuses me. How can GEDCOM X be a standard of any sort if it is yet to be developed/released? Or am I missing something obvious? |
@alex-anders - you are not missing something obvious. You are highlighting a problem which desperately needs to be addressed ie the usability of the documentation (and at least parts of the underlying model/code)
Try reading this as if the whole thing was already finished and released ... think of it as a work-in-progress ... we are developing the documentation and the code simultaneously as we go along. Theoretically we could write "will be a standard" but then when it was released we'd have to go back and re-write it all. |
It's true, and it's important to get the genealogical complexities right. OTOH, we are developing a specification for computer-to-computer communication, so the "technical" aspects are just as important. Github itself is designed as a programmer's tool, and communicating via bug report is a programmer's habit. That likely increases the intimidation factor for non-programmers. |
True on both counts but any decent development project needs to be able to communicate with users in a non-techie way. |
The users of a genealogical data model are not the same as the users of genealogical desktop or on-line systems. GEDCOMX users will be the implementors of import and export modules, implementors of database schemas, algorithm developers and so forth. One must assume they have the ability to understand a model specification without hand holding. On the other hand there will be curious users of the desktop and on-line systems who would like more detail on this GEDCOMX thing they see listed on their program's feature list. Any curious user of this type will be able to take a reasonably prepared specification and get what they need from it. My point, if there is one, is that GEDCOMX, certainly at this point in time, does not need to spend any time worrying about writing something up so the ultimate users of systems can understand what GEDCOMX is. |
True but the end-users are the ones whose requirements are paramount. Also there are numerous non-techie or partial-techie genealogists out there who currently have an understanding of GEDCOM 5 (because it was so simple and easy to understand) and I think we would be missing a trick if we were to alienate these users. |
Hope you don't think I want to alienate users. And you are certainly correct that the requirements flow back from users. But enough requirements have flowed back from enough users for enough years, that those of us who have been working on data models for enough years are rather fully versed in them, and do not require a whole lot of additional time to gather more. Take a look at the requirements list on the Better GEDCOM wiki for example. GEDCOMX seems to be working as if the requirements are community knowledge that does not need to be written down. Has advantages and disadvantages. For example the two top level objects in the conclusion model are persons and relationships. As far as I know there are no requirements anywhere that would have led to the decision to have those two as the only top-level objects. There must be some collective wisdom within FamilySearch that indicates that these are the right ones to end up with. |
Of course not! :) But I do think that the current documentation does.
Excellent point. I personally think this is a mistake. |
As a sanity check it is worth checking that GEDCOMX fulfills the needs for the research process. In an attempt to prevent too much debate on the definition of the research process I am citing the model certified by BCG & ESM (see this link http://www.olliatauburn.org/download/genealogy_research_map.pdf)
How well does GEDCOMX support the data described?
The text was updated successfully, but these errors were encountered: