-
Notifications
You must be signed in to change notification settings - Fork 28
Metadata Database Schema
Amir M. Mir edited this page Apr 1, 2020
·
41 revisions
This page will explain the schema that is used to store the metadata of the callgraphs.
We decided to use PostgreSQL version 12 or higher because it has some very useful features that we utilize (like extensive JSON data type support).
Depicted below is the E/R diagram of the metadata database schema we have devised:
- The
packages
table contains basic information about the packages/products like name of the package, forge and etc. - Each package can have multiple versions, thus the table
package_versions
referencespackages
and stores information like version of the package. - Each package version can have multiple dependencies to other packages. Therefore table
dependencies
referencespackage_versions.id
as source package andpackages.id
as dependent package and has aversion_range
which specifies acceptable versions of dependency. - Each
package_version
can have multiple modules in them. For some languages, these modules can be viewed as classes. Thus the tablemodules
referencespackage_versions
and stores such attributes as the namespace of the module, SHA256 hash of the module, timestamp and metadata about the module in form of JSON. Currently, in a module's metadata, the following attributes are stored:superClasses
,superInterfaces
andsourceFile
. - Each module can consist of multiple methods - callables. Therefore, table
callables
stores reference tomodules
, FASTEN URI of the method, boolean value if this callable is resolved/internal and timestamp with metadata. However, a reference tomodules
can beNULL
, because, in the graph, there is no information about the module and package of the external call. Thus, when saving the graph, all external callables are inserted in thecallables
table without reference tomodules
andis_resolved_call
set to false. - The table
edges
stores the graph's edges. Each edge has a source and target which both referencecallables
table allowing to store both internal and external calls. Each edge also stores some metadata about the call (like if the call invokes a virtual method and etc).
Here is the SQL code that is used to create all tables described above in the diagram with all required keys and constraints.
CREATE TABLE packages
(
id BIGSERIAL PRIMARY KEY,
package_name TEXT NOT NULL,
forge TEXT NOT NULL,
project_name TEXT,
repository TEXT,
created_at TIMESTAMP
);
CREATE TABLE package_versions
(
id BIGSERIAL PRIMARY KEY,
package_id BIGINT NOT NULL REFERENCES packages (id),
version TEXT NOT NULL,
cg_generator TEXT NOT NULL,
created_at TIMESTAMP,
metadata JSONB
);
CREATE TABLE dependencies
(
package_version_id BIGINT NOT NULL REFERENCES package_versions (id),
dependency_id BIGINT NOT NULL REFERENCES packages (id),
version_range TEXT[] NOT NULL
);
CREATE TABLE modules
(
id BIGSERIAL PRIMARY KEY,
package_version_id BIGINT NOT NULL REFERENCES package_versions (id),
namespaces TEXT NOT NULL,
sha256 BYTEA,
created_at TIMESTAMP,
metadata JSONB
);
CREATE TABLE callables
(
id BIGSERIAL PRIMARY KEY,
module_id BIGINT REFERENCES modules (id),
fasten_uri TEXT NOT NULL,
is_resolved_call BOOLEAN NOT NULL,
created_at TIMESTAMP,
metadata JSONB
);
CREATE TABLE edges
(
source_id BIGINT NOT NULL REFERENCES callables (id),
target_id BIGINT NOT NULL REFERENCES callables (id),
metadata JSONB
);
CREATE INDEX packages_compound_index ON packages (package_name, forge);
CREATE INDEX package_versions_compound_index ON package_versions (package_id, version, cg_generator);
CREATE INDEX dependencies_compound_index ON dependencies (package_version_id, dependency_id, version_range);
CREATE INDEX modules_compound_index ON modules (package_version_id, namespaces);
CREATE INDEX callables_compound_index ON callables (fasten_uri, is_resolved_call);
CREATE INDEX edges_compound_index ON edges (source_id, target_id);