extract
supports attachments with any of the built-in email/appointment types from extract-msg and does not raise aTypeError
exception when encountering them- Attachment data types that are not emails, appointments, or bytes still raise an exception
extract
"unpacked" log output no longer contains a duplicate "data" field containing the number of files extracted- The data field is still filled out when saved to the log table
- Fix occasional error occurring when unpacking an MSG file and one of the attachment was smaller than 1536 bytes and
greater than 255 bytes
- extract-msg tried to interpret it as a file name, causing an
OSError(36, 'File name too long')
exception to be raised
- extract-msg tried to interpret it as a file name, causing an
- Fix occasional error occurring when unpacking an MSG file with no HTML body a slightly malformed RTF body
- extract-msg would try to use the RTF body to generate the HTML, causing a
RTFDE.exceptions.MalformedEncapsulatedRtf
exception to be raised
- extract-msg would try to use the RTF body to generate the HTML, causing a
- Fix extract sometimes repeating files in log with the message "Does not have extract action" when given a query
- Ignore meeting-related attachments when extracting MSG files instead of raising a
TypeError
finalize doc-collections
command to rearrange files in Documents into docCollection directories.info
command to print general information about the database
- Improve log output of
identify
commands
- Fix MSG extraction failing sometimes when attachments were incorrectly parsed as MSG files
edit original puid
andedit master puid
commands to set the PUID of original and master files respectivelymanual extract
command to add manually extracted filesmanual convert
command to add manually converted files
- Fix files being overwritten causing an error when MSG and TNEF files contained more than one attachment with the same name.
- When using the
@file
operator in queries, the values are matched exactly with the SQLIN
operator.- The
@like
operator no longer has any effect when use in conjunction with@file
- The
- Use acacore 4.1.1
edit master processed
can now set processed status of access and statutory targets separately
- Fix issue with rollback getting interrupted before finishing if a run contained unhandled events
Complete overhaul of digiarch to work with the entire AVID folder and handle files across document types (original, master, access, and statutory).
General structure is the same but some commands have been update to support the different document types.
- Handle original, master, access, and statutory documents
- Automatically detect root AVID folder
- Import a database created with v4 of digiarch
- New
edit master
commands to handle master files
- Removed duplicated functions like
reidentify
, query arguments are used instead to run/rerun a command on specific files - Improved file identification and extraction to be faster and more resilient
- Overhauled and safer rollback command
- Improved filename sanitization when extracting files
- Fix file name length error when running extract on some MSG files whose attached files had extra-long names #748
- Use acacore 3.3.3
- Use acacore 3.3.2
- Fix
doctor
command adding underscores to files that required no fixing when sanitizing paths
- Use acacore 3.3.1
- Use acacore 3.3.0
- Use acacore 3.2.0
- Use acacore 3.1.1
extract
for TNEF files does not create an HTML/RTF/TXT file for the body
- Fix
reidentify
using the wrong UUID when an error occurred during identification, causing it to not updated the file
- Fix
parent
column being sometimes reset when runningreidentify
on files extracted from archives
- When running
reidentify
, the lock and processed values of files are preserved - When running
reidentify
without a query, both locked and processed files are ignored, and files without an action are selected extract
command removes empty folders when it is done- When
extract
encounters an error and sets the file to "manual", the file is also locked
- Add webarchive extractor
- Use acacore 3.1.0
- Use acacore 3.0.11
- New
edit processed
command to set files' processed status
- Use acacore 3.0.10
- Fix
PIL.UnidentifiedImageError
causingidentify
andreidentify
to stop
- Use acacore 3.0.9
- Fix #723
- Fix BMP images not being properly checked
- Caused by imagesize library not supporting them
- Overhauled query syntax for edit, reidentify, and search commands
@<field>
will match a specific field, the following are supported: uuid, checksum, puid, relative_path, action, warning, processed, lock.@null
and@notnull
will match columns with null and not null values respectively.@true
and@false
will match columns with true and false values respectively.@like
toggles LIKE syntax for the values following it in the same column.@file
toggles file reading for the values following it in the same column: each value will be considered as a file path and values will be read from the lines in the given file (@null
,@notnull
,@true
, and@false
in files are not supported).- Changing to a new
@<field>
resets like and file toggles. Values for the same column will be matched with OR logic, while values from different columns will be matched with AND logic.
extract
sanitized paths of extracted files and saves the originals to history with operation "digiarch.extract: rename"edit remove
deletes empty parent directories if the--delete
option is used
search
command to search files and display them- Displays results in YAML format
- Uses the same selectors as the
edit
commands - Supports sorting by relative path, size, and action (both ascending and descending)
edit action
commands have a new--lock
option that locks the files after editing them- The default behaviour is to not lock the files
history
command has a new--limit
option
- When running
extract
, unknown errors are logged with the file's uuid- The event is not logged to the database, as it is already done with the
end
operation
- The event is not logged to the database, as it is already done with the
extract
does not automatically add the .msg extension to extracted message attachments, as they could be EML as well- Improved list of invalid characters for filenames:
- \#%&{}[]<>*?/$!'`":@+|=
- Fix
extract
failing on MSG/EML files that contained a message attachments without a filename- The subject is used, if available, otherwise
attachment-{n}
is used instead, where n is the index of said attachment
- The subject is used, if available, otherwise
- Fix issues when extracting attachments from MSG/EML files with forward slashes in the attachment file name
- Fix issue with extract when file is already found
- Fix
doctor
command--fix
option not allowing "files" value
- Add
files
fix todoctor
command- Ensures that all files in the database exist, if not they are removed
edit rollback
supportsdoctor
commands
doctor
command events that signal a rename have .rename in their operation name
completions
command generates completions scripts for Zsh, Bash, and Fish shellsedit
commands (and others using identifiers) now accept@null
as a valid value to matchNULL
fields
- Use acacore 3.0.8
- v3.0.7 contained a critical error in the database upgrade function causing
Files.action_data
to be set toNULL
- v3.0.7 contained a critical error in the database upgrade function causing
- Use acacore 3.0.7
- Folders for extracted files created by
extract
use the UUID of the archive file- Uses format
_{uuid}
- Reduces length of nested file paths
- Is still unique to that folder
- Uses format
- Use Acacore 3.0.6
- Show start event for
upgrade
immediately
- Fix MSG attachments of MSG files not using the proper extension
- Fix
upgrade
adding an end event to the history table only when no update occurred
- Fix issue with MSG empty attachments
- Fix issue with MSG attachment bytes data sometimes being interpreted as a string by extract_msg causing it through a
FileNotFoundError
- Fix incorrect handling of MSG attachments in MSG files
- Use acacore 3.0.5
- Fix some edge cases with
edit rename
anddoctor
failing when a file had multiple extensions
- Fix some edge cases with
- Fix
doctor
extension deduplication not working on some system where the SQLite reverse function was not available
- Fix error when extracting HTML and RTF body where they were sometimes None
- Support signed MSG and signed MSG attachments for extraction
edit rollback
supports extract events- Archive files are reset to the extract action
- Extracted files are removed from the database and the file system
- Fix extract events not being saved in History table
- Reidentify resets
processed
column toFalse
- Use acacore 3.0.4
- Added "msg" tool to extract MSG files
- Support
extract.on_success
--exclude
option foridentify
to exclude files or folders with globbing patterns
- Extracted files now use the "extracted-archive" template when they are set to "ignore"
extract
command to extract archives- If the extract tool can't be found, the file is skipped and a warning messages is displayed
- If the file is encrypted, then the file is set to "ignore" action with template "password-protected"
- Detection of encrypted archives with Patool is experimental and needs testing
- If a file should not be preserved, it is set to ignore
- If other errors occur during extraction, the file is set to manual with reason set to the exception's message
- Improved docstrings and help messages
- Fix missing help from
history
command when running it without arguments
- Improved error messages when downloading actions and custom signatures
edit remove
command uses a different sub-operation when deleting files so that they can be automatically ignored by rollback
edit action copy
command to copy an action from an existing format
- Use a modular structure for commands and subcommands
- Overhauled
edit action
using subcommands and named options for each field - Added extensions deduplication to
doctor
command - Improved rollback
- Simplified history events
- Improved handling of exceptions and argument errors
--data-puid
option inedit action
command allows copying data from an existing identifier in the reference files- Fix issues #692
- If the identifier is not found or the action argument is not found in the data, a
KeyError
exception is raised
- Fix
--id-files
option not working withedit lock
command
- Add
edit lock
command to lock specific files- Can be rolled back to the previous value with
edit rollback
- Can be rolled back to the previous value with
- The
upgrade
command backs up the database file before performing the upgrade- Can be ignored with the
--no-backup
option
- Can be ignored with the
- The
edit remove
command can delete files from the disk as well with the--delete
option
- Use acacore 2.0.1
- Fix upgrade issues
- Update to acacore 2.0.0
- Python 3.11
- Simpler logging of events
- Database version checks
- Siegfried batching
- Files are identified in batches
- Defaults to 100 files per batch
doctor
command to fix common database issuesupgrade
command to upgrade the database to the latest version
reidentify
command- Allows running identification process again on specific files
- Files are selected with the same system as the edit commands
history
command- Allows viewing and searching the events log
- Can search by:
- time (from and/or to)
- uuid (allows multiple)
- operation (LIKE with % only, allows multiple)
- reason (LIKE, allows multiple)
edit rename
accepts an empty extension- To set an empty extension, spaces must be used (e.g.,
" "
) - When used with the
--replace
and--replace-all
options, existing extensions are removed
- To set an empty extension, spaces must be used (e.g.,
- Stricter extension patterns in
edit rename
- Only allowed characters are a-z, A-Z, and 0-9
edit rollback
command- Undo other edit operations
- Must select a start and end time for history events
--dry-run
option foredit rename
- Show changes without committing them
edit rename
uses replace mode options instead of an f-string--replace
replaces the last suffix with the new extension (default)--replace-all
replaces all valid suffixes (matching the expression\.[^/<>:"\\|?*\x7F\x00-\x20]+
) with the new extension--append
appends the new extension if it is not already there
edit rename
command- Change the extension of files
- Uses the same selector options as the other
edit
commands - Ignores changes that would duplicate existing extensions or not alter them
- New extensions can be formatted with:
suffix
the last extension of the filesuffixes
all the extensions of the file, used for append mode (e.g.,{suffixes}.ext
will change " file.tar.gz" to "file.tar.gz.ext")
- Added docstring to
edit action
andedit remove
commands --siegfried-path
can be set withSIEGFRIED_PATH
environment variable--siegfried-home
can be set withSIEGFRIED_HOME
environment variable
- Add
--id-files
option to edit commands- Interpret IDs as files from which to read the IDs
- Each line is considered a separate ID
- Blank lines are ignore
- All IDs are stripped of newlines, carriage return, and tab characters, but not spaces
--no-update-siegfried-signature
option is now the default
- Fix error in
edit remove
command when using--path-like
- Was using the like statement to delete files instead of their UUID
- Fix error in
edit action
command- SQLite cursor was rewinding to start because INSERT statements were executed in-between iteration steps
- acacore 1.2.0
- LIKE matches for paths
- Added new
--path-like
option to match edit IDs with LIKE statements
- Added new
- All files matching the given IDs are edited, not just the first one
- Removed Siegfried signature update from test workflow
- Is already present in test folder
- Updated PRONOM signature file for Siegfried
- Added new
edit remove
command to remove files by UUID, path, checksum, PUID, or warning- Can be used to re-identify files
- Both digiarch's and acacore's versions are saved with the "start" event
- Fix non-matching history events for edit action
- Fix traceback of identification errors not being saved in History table
- Use acacore 1.1.4
- Use acacore 1.1.3
- Allow to use different identifiers than UUID
- uuid
- puid
- relative path
- checksum
- warnings
- The history event contains both the previous and new action
- Improve hadling of exceptions
OSError
andIOError
are always raisedException
,UnidentifiedImageError
,DecompressionBombError
are always caught
- Increase maximum size of images before Pillow raises a
DecompressionBombError
- Improve end events in history by using the exception repr value in the data column, or None if the program ended with no errors
- Fix incorrect handling of action data when an
UnidentifiedImageError
exception was caught
UnidentifiedImageError
exceptions are logged
- Add corrupt GIF to test files
- Handle
UnidentifiedImageError
exception by setting the file to action to "manual" - Add
--siegfried-home
option to set the folder that contains the signature files
- Use acacore 1.1.1
- Automatically build necessary wheel files and save them as a release on new pushed tags
- New command to change an action
- Can optionally specify new data to be used in action data column
- Use acacore 1.0.2
- Use acacore to handle database and file identification
- Remove all unnecessary files and dependencies
- Use acacore 1.0.1
- added which version of DRIOD we use to the log
- makes sure we use most / all of the avaivable info given by
sf
- added ability to get reference files version and is printing it to stdout
- added check to ensure updates of the changelog
- fixed missing identification of aca-fmt/17 (MapInfo Map Files)
- added x-fmt/111 to signatures that we re-identify, as Mapinfo TAB files are identified as such
- added aca-fmt/19 (MapInfo TAB files) to list of custom signatures
- added list of puids that we have to identify with our custom signatures even though Siegfried identified them. Currently "fmt/111".
- added aca-fmt/18 (Lotus Aprroach View File) to custom_signatures.json
- added signature for 5 versions of Microsoft Access Database