pymods is utility module for working with the Library of Congress's MODS XML standard: Metadata Description Schema (MODS). It is a utility wrapper for the lxml module specific to deserializing data out of MODSXML into python data types.
If you need a module to serialize data into MODSXML, see the other pymods by Matt Cordial.
pip install pymods
XML is parsed using the MODSReader class:
mods_records = pymods.MODSReader('some_file.xml')
Individual records are stored as an iterator of the MODSRecord object:
In [5]: for record in mods_records:
....: print(record)
<Element {}mods at 0x47a69f8>
<Element {}mods at 0x47fd908>
<Element {}mods at 0x47fda48>
MODSReader will work with mods:modsCollection
documents, outputs
from OAI-PMH feeds, or individual MODSXML documents with mods:mods
as the root element.
The MODSReader class parses each mods:mods
element into a
pymods.MODSRecord object. pymods.MODSRecord is a custom wrapper class
for the lxml.ElementBase class. All children of pymods.Record inherit
the lxml._Element and lxml.ElementBase methods.
In [6]: record = next(pymods.MODSReader('example.xml'))
In [7]: print(record.nsmap)
{'dcterms': '', 'xsi': '', None: '', 'flvc': 'info:flvc/manifest/v1', 'xlink': '', 'mods': ''}
In [8]: for child in record.iterdescendants():
....: print(child.tag)
All functions return data either as a string, list, list of named tuples. See the API documentation or appropriate docstring for details.
>>> record.genre?
Type: property
String form: <property object at 0x0000000004812C78>
Accesses mods:genre element.
:return: A list containing Genre elements with term, authority,
authorityURI, and valueURI attributes.
from pymods import MODSReader, MODSRecord
Parsing a file
In [10]: mods = MODSReader('example.xml')
In [11]: for record in mods:
....: print(record.dates)
[Date(text='1966-12-08', type='{}dateCreated')]
[Date(text='1987-02', type='{}dateIssued')]
Generating a title list
In [14]: for record in mods:
....: print(record.titles)
['Fire Line System']
['$93,668.90. One Mill Tax Apportioned by Various Ways Proposed']
['Broward NOW News: National Organization for Women, February 1987']
Creating a subject list
In [17]: for record in mods:
....: for subject in record.subjects:
....: print(subject.text)
Concert halls
Architectural drawings
Structural systems
Structural systems drawings
Structural drawings
Safety equipment
Structural optimization
Architectural design
Fire prevention--Safety measures
Tax payers
Tax collection
Sex discrimination against women
Women's rights
Equal rights amendments
Women--Societies and clubs
National Organization for Women
Creating a list of subject URI's only for LCSH subjects
In [18]: for record in mods:
....: for subject in record.subjects:
....: if 'lcsh' == subject.authority:
....: print(subject.uri)
Get URLs for objects using a No Copyright US URI
In [23]: for record in mods:
....: for rights_elem in record.rights
....: if rights_elem.uri == '':
....: print(record.purl)