Schema Version: 3.0.0
Stone Version: ^11.2.0
This document defines the schema and semantics of the shiragame games database.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Defined terms will be indicated in italics throughout the document. Their definition can be found at the end of this document, in the section headered Definitions.
shiragame is a games database similar to OpenVGDB, created primarily for use with Snowflake, but is also intended to be widely applicable to many use cases involving emulation, game preservation, archival, and verification. shiragame uses Stone platform IDs and mimetypes to facilitate precise identification of the gaming platform a dump belongs to, as well as the format of a dump.
For definitions of "platform" and "format", please refer to the Stone specification document.
Unlike OpenVGDB, shiragame does not aim to be an all-in-one database. Its primary purpose is to provide an efficient method to identify and verify that a given file is a known dump that is part of a game distribution, and to identify the game such a dump is part of. shiragame does not aim to catalogue information outside of what can be ascertained from the game entry's canonical name. Once a searchable title is obtained from a dump, other tools may be used to scrape more information, such as cover arts and descriptions. shiragame however, is only meant for the first step of identification.
In addition, while shiragame was created primarily for use with Snowflake, a conscious effort is taken to maintain a stable, documented API in the form of this document.
Instead of using a markup format like JSON or custom binary formats, a shiragame database is always distributed as a SQLite database, which carries a few advantages. Most importantly, a SQLite database is stable and portable across many platforms. It is quickly and easily queryable without needing to stream or load the entire file into memory from many programming languages, and without the need for custom or one-off binary parsers. For niche purposes such as low-memory environments or platforms without ergonomic bindings to SQLite, it can easily be transformed to another transport format via the schema as described in this document.
shiragame is REQUIRED to be distributed as an SQLite database with the following tables.
Each row of the game
table is REQUIRED to describe a single game entry.
Column | Description | Status |
---|---|---|
game_id |
An internal ID used to refer to the serial and rom rows related to this game row. This ID is unstable and MUST NOT be persisted. |
REQUIRED |
platform_id |
The Stone platform ID of the platform this game entry was intended for. | REQUIRED |
entry_name |
The canonical name of the game entry | REQUIRED |
entry_title |
The canonical name of the game entry, with any metadata flags removed. | REQUIRED |
release_title |
The distribution or release name of the game entry that is is known as.†* This is usable as a search term for scraping purposes. | REQUIRED |
region |
The region the game was released under.† | REQUIRED |
part_number |
If this game entry is multi-part, or is one part of multiple discs or tapes, the part number thereof.† | OPTIONAL |
is_unlicensed |
If this game entry is of an unlicensed release.† | REQUIRED |
is_demo |
If this game entry is a demonstration or sample release.† | REQUIRED |
is_system |
If this game entry is a system update or BIOS file.† | REQUIRED |
version |
If this game entry has a revision to an earlier released game entry, the revision or version number thereof.†‡ | OPTIONAL |
status |
The development status of this game.† | REQUIRED |
naming_convention |
The naming convention used the entry_name conforms to. |
REQUIRED |
source |
The name of the cataloguing organization that provided the source data. | REQUIRED |
†as ascertained from the entry_title
, in accordance with the naming convention used by the source data.
* See release title derivation on how this name is derived.
‡ If multiple versions are listed in the filename, then version
lists only the first version.
The game_id
value MUST NOT be saved or persisted anywhere outside of a query. It SHOULD NOT be used for anything except to refer to other tables that relate
to a game entry, and SHALL NOT carry any meaning across different releases of the shiragame database. It MUST NOT be used as a canonical identifier for a
game entry, and MAY change without incurring API breakage across different releases of the shiragame database. They MUST only be treated as opaque cursors by the
client consumer.
The release title is the name the game entry was sold or distributed under, without any copyright or trademark markers. This is derived from the game entry name with the following rules.
- The first article following a comma is moved to the beginning of the string. For example, "Legend of Shiratsu, The - Wind of the City" becomes "The Legend of Shiratsu - Wind of the City".
The following strings are considered 'articles'.
- Eine
- The
- Der
- Die
- Das
- Ein
- Les
- Los
- Las
- An
- De
- La
- Le
- El
- A
- All hyphens with spaces on both sides (matching the pattern
-
) is removed, and at the index where the pattern begins, is replaces with a colon followed by a space (the pattern:
). For example, "The Legend of Shiratsu - Wind of the City" becomes "The Legend of Shiratsu: Wind of the City".
A game entry MAY have one or more dump entries.
Each row of the rom
table describes a single dump entry with the following schema.
Column | Description | Status |
---|---|---|
file_name |
The canonical filename assigned to this dump by the cataloguing organization. | REQUIRED |
mimetype |
The Stone mimetype of the format of this file the dump entry refers to. | REQUIRED |
md5 |
The MD5 hash of the file this dump entry refers to. | RECOMMENDED |
crc |
The CRC32 hash of the file this dump entry refers to. | RECOMMENDED |
sha1 |
The SHA1 hash of the file this dump entry refers to. | RECOMMENDED |
size |
The size of the file this dump entry refers to in bytes, as a signed 64 bit integer. | REQUIRED |
game_id |
Refers to the game entry this dump entry belongs to. There MUST be a row in game with the same value. |
REQUIRED |
One or more of md5
, crc
, sha1
MUST be populated. It is RECOMMENDED, but not REQUIRED, for all three to be populated.
A game entry MAY have zero or more serial numbers that describe it.
Each row of the serial
table describes a serial number with the following schema.
Column | Description | Status |
---|---|---|
serial |
The serial number as it was published by the data source. | REQUIRED |
normalized |
The serial number, normalized by the normalization rules described below. | REQUIRED |
game_id |
Refers to the game entry this serial number belongs to. There MUST be a row in game with the same value. |
REQUIRED |
Serial numbers are normalized according to the following normalization rules, which are defined by a combination of a platform ID, a verification pattern that matches the serial, and the rewrite rule that is used to normalize the serial using the verification rule. If no normalization rule matches, the normalized serial MUST be identical to the serial number as it was published by the data source. Platforms can have multiple rules applied, but rules for the same system must be mutually exclusive.
Platform ID | Verification Pattern | Rewrite Rule | Example |
---|---|---|---|
SONY_PSX , SONY_PS2 , SONY_PS3 , SONY_PSP , SONY_PSV |
^(?P<code>[a-zA-Z]+)[-_ ](?P<number>\d+)([#-_ /]*(\w?|$))*$ |
$code-$number |
SLUS 1234-GE to SLUS-1234 |
NINTENDO_GCN |
^DL-DOL-(?P<code>[\w]{4})-[-\w\(\)]+$ |
$code |
DL-DOL-GC3E-0-USA to GC3E |
NINTENDO_WII |
^RVL-(?P<code>[\w]{4})-[-\w\(\)]+$ |
$code |
RVL-R4QP-EUR to R4QP |
NINTENDO_WIIU |
^WUP-(P|N|M|T|U|B)-(?P<code>[\w]{4})-[-\w\(\)]+$ |
$code |
WUP-P-AMKP-EUR-0 to AMKP |
NINTENDO_3DS |
^CTR-(P|M|N|\T|U|B)-(?P<code>[\w]{4})(-[-\w\(\)]+)*$ |
$code |
CTR-P-BSGJ to BSGJ |
NINTENDO_NSW |
^LA-H-(?P<code>[\w]{5})(-[-\w\(\)]+)*$ |
$code |
LA-H-BABBD to BABBD |
SEGA_GEN , SEGA_CD , SEGA_DC , SEGA_GG , SEGA_SAT , SEGA_32X , SEGA_32X_CD |
^(?P<pre>[\d\w]+)-(?P<code>[\d\w]+)(-[\d.]+)$ |
$pre-$number |
T-114033-00 to T-114033 |
SEGA_GEN , SEGA_CD , SEGA_DC , SEGA_GG , SEGA_SAT , SEGA_32X , SEGA_32X_CD |
^(?P<pre>MK|T|GS)(?P<code>[\d\w]+)(-[\d.]+)?$ |
$pre-$number |
MK81086-50 to MK-81086 |
SEGA_GEN , SEGA_CD , SEGA_DC , SEGA_GG , SEGA_SAT , SEGA_32X , SEGA_32X_CD |
^(?P<pre>0{2,3})(?P<code>[\d]+)(-\d{2}\w?)?$ |
$pre$number |
00054503-00 to 00054503 |
NEC_TGCD |
^(?P<code>[\d\w]{4,5})[ -](?P<number>[\d\w]+)$ |
$code$number |
NSCD 2011 to NSCD2011 |
Since there may be exceptions that these rewrite rules do not cover, the original serial number is always available in the serial
column. Also note that the internal serial may not be consistent with the media serial, so querying by serial is not necessarily consistent.
Describes this release of the shiragame database. This table MUST only contain one row, with the following schema.
Column | Description | Status |
---|---|---|
shiragame |
The string shiragame . |
REQUIRED |
schema_version |
The version of the schema used by this database. | REQUIRED |
stone_version |
The version of the Stone definitions file used by this database, for platform IDs and mimetypes. | REQUIRED |
generated |
The time this release was created, expressed as a Unix timestamp (seconds since epoch). | REQUIRED |
release |
A version 4 UUID that identifies this shiragame database. | REQUIRED |
aggregator |
The aggregator that generated this shiragame database. In shiratsu's case, the string shiratsu . |
REQUIRED |
Both the schema and the releases of the shiragame database itself are versioned. As of schema version ^2.0.0
, The shiragame schema is versioned with Semantic Versioning.
The following changes to the schema incur an API breakage, and the MAJOR version number MUST be incremented.
- Dropping or renaming a table or column.
- The MAJOR version of the Stone definitions file used is increased.
- Any sufficiently major change to the schema as decided by the project maintainers.
The following changes to the schema do not incur an API breakage, and the MINOR version number MUST be incremented.
- Adding a new column or table.
- Changing an existing normalization rule for serial numbers.
- Any sufficiently small change to the schema as decided by the project maintainers.
The following changes to the schema do not incur an API breakage, and the PATCH version number MUST be incremented.
- Adding a new normalization rule for serial numbers.
- Any sufficiently small change to the schema as decided by the project maintainers.
As of schema 2, releases of the shiragame database are versioned by the schema version of the released database, and the Unix timestamp of its generation, in the form X.XXXXXX
. A release is also identified by its unique UUIDv4.
shiratsu's constituent libraries (such as shiratsu-naming
) MAY be versioned independently of this specification, however the shiratsu
crate MUST BE versioned according to this specification.
Each MAJOR.MINOR version of the shiratsu application MUST generate a shiragame database with the same MAJOR.MINOR schema version. Effectively, this means that shiragame schema versions are tied to the shiratsu implementation. In the case where this document and shiratsu differ, it should be considered a bug in the specification.
Currently, shiragame sources data from Standard DATs published by cataloguing organizations. shiragame does not process Parent-Clone relationships.
This MAY change in future releases of shiragame.
The re-distribution of a release of the shiragame database MUST be consistent with the licensing terms of the data sources used in the release. For all three data sources in use, re-distribution is common across the community and is not discouraged by the cataloguing organizations that publish the source DATs. However, since no license is clearly delimited for the re-distribution of said data, shiragame is unable to provide a clear license for its distribution.
However, this specification document, and the shiratsu application are distributed under the terms of the MIT license.
- platform ID The Stone specified ID for a platform. See the list of defined Stone platforms.
- platform Refer to the Stone specification on Platforms.
- format Refer to the Stone specification on Platforms.
- canonical filename The file name given to a dump by a cataloguing organization.
- canonical name The name given to a game entry by a cataloguing organization, following a naming convention.
- cataloguing organization A group that maintains databases of game entries and the dump entries belonging to the distribution thereof.
- naming convention
The method of assigning names to game entries followed by the cataloguing organization that catalogues such games. The shiragame schema does not restrict the naming convention of files. shiratsu uses the
shiratsu-naming
library to parse file names of known naming conventions. - dump Any file that is part of a game distribution that identifies such a file by its dump entry.
- game distribution The set of dumps that are required to execute the game as it was intended on the original platform it was intended for, and the title of the game thereof, as it was known or distributed under.
- game entry
A game distribution that has been verified to exist, and published by a cataloguing organization through a DAT. This corresponds to a
game
element in a DAT. - dump entry/ROM entry
The hashes (one or more of CRC32, MD5, or SHA1) of a dump that belongs to a game entry. This corresponds to a
rom
element in a DAT, and is referred to in shiratsu source code as aRomEntry
. - DAT An XML file with through which game entries are published by cataloguing organizations. A valid DAT MUST have the following DOCTYPE
<!DOCTYPE datafile PUBLIC "-//Logiqx//DTD ROM Management Datafile//EN" "http://www.logiqx.com/Dats/datafile.dtd">
- development status
One of
release
,prerelease
, orprototype
.release
indicates that the software was made commercially, whether distributed gratis or not, in an official capacity by the publisher or developer.prerelease
indicates that the software is in an unfinished, but mostly feature complete product, that may or may not have been intentionally released.prototype
indicates the software is an unreleased, unfinished product that was not intentionally released in any official capacity.