Describo Newsletter 3 #15
marcolarosa
announced in
Announcements
Replies: 1 comment
-
Marco, Great newsletter - very informative. This looks like really impressive work. Gavan |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Welcome to the third edition of Describo News. A lot has happened since the last letter in May!
Background
To recap: Describo is a user application for working with data and producing RO-Crate metadata.
Within that, there is an embeddable VueJS (and React) component that handles the RO-Crate management. The component is open source and permissively licensed with the MIT licence so if you are developing an RO-Crate based platform and wish to embed a user interface to the crate file, this is where to look.
But the application is where the bulk of development has occurred.
Describo was initially developed as a tool to create and manage RO-Crate metadata. The free desktop application has since evolved with tools to work with data. Specifically, you can access text extraction, named entity recognition and a generative AI assistant via the extended capabilities.
Documentation
The first point to make is that Describo is comprehensively documented. An important part of any application is having good documentation that clearly, yet concisely, explains each of the capabilities. And a significant amount of effort has been spent on that task! In addition, the website has feature articles that explore the tools from a particular viewpoint.
Start with why: https://describo.github.io/docs/articles/why-use-it
Read more:
Metadata creation
The capabilities of the metadata editor have been progressively improved over this time period with many bugfixes and optimisations. In addition a new bulk data entry mode has been added to the component which makes it easy to create many of a type of data entry.
In addition, our Hungarian friends (@beepsoft, @csontosreka) have implemented constraint handling for text and number fields. For example, you can specify that a text field has a min / max length and matches a specific regular expression. Or maybe you want a text field to enable a user to enter dates as YYYY and YYYY-MM (strings not ISO dates) - well, you can do that too!
Read more:
Personal Knowledgebase
Being a personal, desktop application, Describo can maintain a personal knowledgebase. Consider the process of defining yourself as an author. Why recreate that entity for each RO-Crate when you can do it once, save it as a template and then have describo offer it to you as a suggestion every time you create an entity of type Person.
Further, you can ingest entities from an RO-Crate into your knowledgebase for reuse. Think of this as creating your own dataset of definitions (in the link below I describe creating a dataset of TKLabels) - or data pack - that is available every time you work on a crate.
Read more:
Generative AI verification
Within Describo you can start up the AI assistant and get it to verify your metadata against the specification! This eases the process of checking your work significantly; ensuring that your efforts are spec compliant.
Read more:
Licensing your work
Licensing in Describo is a first class action. That is, there is an easy to use dialog to apply both a data licence and a metadata licence to your crate, as per the requirements of the spec. Choose from CC licenses or mark your work as "CONFIDENTIAL" and / or "RESTRICTED" and / or "SENSITIVE".
Browse mode
Describo has a browse mode which lists the entities in a more familiar tabular format. This makes it easy to display a set of entities and specific properties to find out what needs more work.
Read more:
Image manipulation, text extraction and named entity recognition
For people working with textual content (e.g. manuscript images) Describo can help you perform the drudge work of processing those images and running the text through Optical Character Recognition. In addition, it can perform named entity recognition with markup created in an associated HTML file (using data attributes) and the crate metadata.
You can process content in bulk (like a workflow application might do) or focus on page by page for a detailed exploration of your content.
Read more:
Generative AI e-Discovery
Generative AI is the big thing at the moment and whilst it does have its limitations, it can significantly ease certain tasks like comprehending large swathes of content. So, if you are dealing with textual content (PDF's, Word documents, plain text, html etc), the AI Assistant can help you comb through that data and come to grips with it.
In my work I've used it to comb through the text extracted from digitised manuscript images to pull out the themes, subjects and topics and write them into the crate metadata providing 3 separate axes from which to discover the content (this is available in the bulk transform tools).
And if you need it, you can download your conversation with the Assistant as a PDF (with a licence applied to boot!).
Read more:
Visualising the network
Since we're building a linked data structure, it seems obvious to visualise it as a network. In terms of data discovery and comprehension, this format can help uncover the less obvious connections in a dataset.
Read more:
Vocabulary creation
Many ontologies preceded schema.org. I'm not sufficiently educated to debate the merits of one over another but it's a fact that different domains use different vocabs (especially in the HASS / GLAM space) and schema.org doesn't cover everything that everyone needs. So, with Describo's vocab tool you can create your own vocab drawing from MODS, Premis-3, Records in Context - Ontology, SKOS and schema.org in addition to creating your own definitions.
Furthermore, it's architected such that you end up with a spec compliant RO-Crate - just one with extensions. And the
@context
is guaranteed to be correct for the Vocab.Basically, you can define your domain and then use Describo to describe it. Without having to run a script to process files. In addition, Describo creates the Vocabulary inside the folder so the Vocab travels with the crate. Anyone else that uses your work, assuming they have permission, can carry on from where you left off.
Consider this: you are the Ontological expert in your area and you're working with users to define a vocabulary for some project. You can create an RO-Crate with a Vocab and no data. Share that RO-Crate and have everyone working in the same ontological domain.
Read more:
Data Processing
As Describo uses cloud services for data processing, you can read about how your data is handled in the following documents. Basically, your data is yours and it's never used for training. Not even of the AI. And, it's processed geographically close to you wherever possible.
Read more:
How much does it cost? How is it funded?
Describo has not received any funding since mid 2022 when the component was developed as part of the Nyingarn Project.
As to cost:
The pricing page on the website details how you can support continued development if you use the product. Basically, you can buy credits to the services if you need those; or to the Vocab tool if that's your thing (you do get 2 months free access upon registration - that's free as well). If you are feeling especially generous, you can become a supporter.
Apart from helping to subsidise continued product development, the extra services are provided via the AWS cloud which is not FREE.
Support Describo
That's a long newsletter and if you've made it this far - Thankyou!
Before I sign off I must ask for your help.
If there are features that could help you in your work or if you are in a position to provide some financial support, reach out to say hi: [email protected].
Thankyou
And last but not least, a very big thankyou to @beepsoft and @csontosreka for their continued help with the component.
Thankyou!
Marco
Beta Was this translation helpful? Give feedback.
All reactions