gem5 Resources: How do we ensure data integrity? #201
Replies: 2 comments 3 replies
-
Not sure if I agree with the following:
I think that we should just have I think that compatibility should be more about schema compatibility than tested with gem5. In theory, gem5 should be perfectly forward compatible in terms of ISA/devices/etc. and the only time it's not compatible is because a schema changes. This could be the json schema, the checkpoint "schema," or the stdlib api "schema." I think that I'm not sure I totally understand the open question. |
Beta Was this translation helpful? Give feedback.
-
[this is a writeup of a document hand-written by @Harshil2107 on the schema. His work is quoted, my response to his document is not]
No, they are just strings, we cannot cast them as floats. For example, the current release of gem5 is v23.0.1.0. That's not a float. This doesn't' change your coming points so it's a bit of a moot point. They are still sortable as strings though.
This assumption isn't necessarily true. We may have resources that are compatible only with the upcoming release. This may be a nit-pick but just something to remember.
No, we can't say this at all. If we have made a resource that is only compatible with version 1.1 of gem5, even if it has the same schema as that used in 1.0, we can't say it is compatible with 1.0. Being incompatible with the schema isn't the only way a resource may not work with a given version of gem5. For example, if we introduce a new RISC-V ISA instruction in gem5 v1.1, and we create a binary resource which uses it, that binary will not run in gem5 v1.0. Your assumption is true, only the categories that have changed need to worry about compatibility problems in the new version. It might be hard to automate though due to resource specialization. E.g. if the schema for a File resource changes than that also impacts the schema for the Binary resource. This assumption may lead to a design that's more efficient in some ways but too much of a pain to engineer and maintain. If we update the schema, why not just assume anything that touched the old schema needs updated and create a new versions? Does it make much of a difference? We're just wasting database space but it's cheap.
You're not wrong here but we may want/need a way to automatically update all the resources to this new schema version (e.g., create a new version of each workload which conforms to the new schema). We don't want to lose, say, every workload for the next version of gem5 because we update it's schema. Don't prioritize this problem, it's definitely something we can work on later, but we will likely need to do this.
Wouldn't it be easier to just create a schema for each major release and avoid this logic? It's not saving us very much by being clever for gem5 versions that don't change the schema. An alternative approach to all this is to this is just to have a CSV file somewhere accessible (like the resources.gem5.org website) with a format like:
In this approach we don't need to worry about what the file is called and match on it's name, we just look up our current version in the CSV, download the schema, then use it. It also makes it quick to get the most up-to-date schema. You just sort the CSV on the first column and select the first entry. There is an assumption in this work that minor and hotfix releases don't exist. I'm fine with not changing the schema between major releases of gem5 but it should be explicitly noted if so. Another approach to this I think we need to step back and ask what it means for a resource to be compatible with a given version of gem5. It means two things:
Given this wouldn't it be good to have the following fields:
We could easily write scripts to automatically go through the database to update the I think this solves the problem i outlined before, but I'll have to think about it a little more. If I'm right in my thinking here we could regularly run a 'job' on the database to update the |
Beta Was this translation helpful? Give feedback.
-
We require tooling to update our gem5 Resources data, but there are many things to consider when doing so, most which regard maintaining the integrity of the data. Here I outline my thoughts on this.
Terminology
To start, I wish to clarify the terminology I'll be using in explaining using the following terminology.
DiskImageResource
object in the gem5 standard library contains a link/path to a Disk image and data regarding it (source, description, partitioning information, etc.). Theset_disk_image_workload
function in gem5 accepts aDiskImageResource
in order to setup the disk image on for a simulation. The gem5 stdlib accepts manual construction of this, or may do so automatically by querying the gem5 Resources data sources for a resource, in which case the gem5 Resource is constructed based on the outcome of that query.Things to know and current caveats
DiskImageResource
,KernelResource
, and specify a thereadfile_contents
to pass to gem5 once the Kernel and Disk Image have been added to the system. Note: At the time of writing the Workloads are not technically a type of resource. This is highlighted in Issue Add workload to resource specialization of resources.py and deprecate workload.py #193. The remainder of this discussion will assume Workloads are a specialization of a Resource.Suites
.Suites
are collections of Workloads. These are useful for defining benchmark suites, for example. I will continue this discussion assuming these have already been implemented.compatible-versions
field which is a list highlighting which versions of gem5 the resource is compatible with.Goals
In maintaining this data over time we have the following goals:
X
is to be updated then a new entry of that resource is created in the data source. It new version will beY+1
whereY
was the value of the current latest version ofX
compatible-versions
field.X
of versionY
is updated (new entry with versionY+1
), any Workloads referencingX
of versionY
must too be updated to include this new version (again a new entry is created with the same ID but a "+1ed" version) with this new Resource. This ensures that A) we preserve the older workload, and B) update the workload to contain the newer workload resource.X
VersionY
toY+1
ifY+2
already exists.5.A Workload/Suite's
compatible-versions
field must be a subset of the intersection of allcompatible-versions
fields of it's constituent resources.Pain Points
compatible-versions
field. A suite/workload with compatibility to VersionZ
of gem5 must not contain any resources which do not have compatibility to versionZ
/Open Questions
compatible_versions
of the Suite/Workloads is always the intersection of all the constituent Resources but what if the Schema of the Suite/Workload chances? That means it can only be compatible with gem5-verisons compatible with that schema even if the intersection of the constituent resources suggests more. How to handle this in an automated fashion?Beta Was this translation helpful? Give feedback.
All reactions