-
Notifications
You must be signed in to change notification settings - Fork 28
Metadata Database Schema
This page will explain the schema that is used to store the metadata of the callgraphs.
We decided to use PostgreSQL version 12 or higher because it has some very useful features that we utilize (like extensive JSON data type support).
Depicted below is the E/R diagram of the metadata database schema we have devised:
- The
packages
table contains basic information about the packages/products like the name of the package, forge, repository URL and etc. - Each package can have multiple versions, thus the table
package_versions
referencespackages
and stores information like version of the package. - Each package version can have multiple dependencies to other packages. Therefore table
dependencies
referencespackage_versions.id
as source package andpackages.id
as a dependent package and has aversion_range
which specifies acceptable versions of dependency. - Each
package_version
can have multiple modules in them. For some languages, these modules can be viewed as classes. Thus the tablemodules
referencespackage_versions
and stores such attributes as the namespace of the module, SHA256 hash of the module, timestamp, and metadata about the module in form of JSON. Currently, in a module's metadata, the following attributes are stored:superClasses
andsuperInterfaces
. - Each module can consist of multiple methods - callables. Therefore, table
callables
stores reference tomodules
, FASTEN URI of the method, boolean value if this callable is internal or external, line numbers of the start and the end of the method, timestamp and metadata. However, for external callable, a reference tomodules
is -1 which is a default module for all external callables. Thus, when saving the graph, all external callables are inserted in thecallables
table with -1 as a reference tomodules
andis_internal_call
set to false. - The table
edges
stores the graph's edges. Each edge has a source and target which both referencecallables
table allowing to store both internal and external calls. Each edge also stores receiver information (line number, type, and receiver URI) for each call-site in array and metadata. - Each module can also contain multiple
files
and one file can contain multiple modules. Therefore this many-to-many relationship is realized by means ofmodule_contents
table. - Each package version also has binary modules for some of the languages.
binary_modules
are very similar tomodule
, but they are supposed to store data for binary modules and link to binary files. - Similarly to modules,
binary_modules
has many-to-many relationship realized by thebinary_module_contents
table.
From POM Analyzer
POM Analyzer extracts a list of dependencies of an artifact, dependencyManagement of that artifact, repository URL, and commit tag, packaging type, project name if they are present, and generates sources URL.
Each dependency is put into the dependencies
table with detailed metadata. Here is an example of dependency's metadata:
{
"type":"",
"scope":"compile",
"groupId":"com.yahoo.athenz",
"optional":false,
"artifactId":"athenz-zms-java-client",
"classifier":"",
"exclusions":[
{
"groupId":"org.slf4j",
"artifactId":"slf4j-api"
},
{
"groupId":"org.glassfish.jersey.core",
"artifactId":"jersey-client"
},
{
"groupId":"org.glassfish.jersey.media",
"artifactId":"jersey-media-json-jackson"
},
{
"groupId":"com.fasterxml.jackson.core",
"artifactId":"jackson-core"
},
{
"groupId":"com.fasterxml.jackson.core",
"artifactId":"jackson-databind"
},
{
"groupId":"com.fasterxml.jackson.core",
"artifactId":"jackson-annotations"
}
],
"versionConstraints":[
{
"lowerBound":"1.7.28",
"upperBound":"1.7.28",
"isLowerHardRequirement":false,
"isUpperHardRequirement":false
}
]
}
The dependencyManagement
information is put into the package_versions.metadata
along with commit tag (a tag in the repository which should signify certain version), link to Maven sources JAR file, and packaging type of the artifact. Here is an example of a package version's metadata:
{
"commitTag":"2.5.0",
"sourcesUrl":"https://repo.maven.apache.org/maven2/junit/junit/4.12/junit-4.12-sources.jar",
"packagingType":"jar",
"parentCoordinate": "",
"dependencyManagement":{
"dependencies":[
{
"type":"",
"scope":"",
"groupId":"com.yahoo.elide",
"optional":false,
"artifactId":"elide-core",
"classifier":"",
"exclusions":[
],
"versionConstraints":[
{
"lowerBound":"2.5.0",
"upperBound":"2.5.0",
"isLowerHardRequirement":false,
"isUpperHardRequirement":false
}
]
}
]
}
}
From Metadata Plugin
Metadata plugin inserts metadata produced by the call graph generator and inserts this metadata into the callables
, and modules
tables.
Here is an example of module's metadata:
{
"access": "public",
"final": false,
"superInterfaces": [],
"superClasses": ["/java.lang/Object"]
}
Here is an example of callable's metadata:
{
"access": "public",
"defined": true
}
More detailed information about this metadata can be found here
Here is the SQL code that is used to create all tables described above in the diagram with all required keys, indexes, and constraints.
CREATE TABLE packages
(
id BIGSERIAL PRIMARY KEY,
package_name TEXT NOT NULL,
forge TEXT NOT NULL,
project_name TEXT,
repository TEXT,
created_at TIMESTAMP
);
CREATE TABLE package_versions
(
id BIGSERIAL PRIMARY KEY,
package_id BIGINT NOT NULL REFERENCES packages (id),
version TEXT NOT NULL,
cg_generator TEXT NOT NULL,
created_at TIMESTAMP,
metadata JSONB
);
CREATE TABLE dependencies
(
package_version_id BIGINT NOT NULL REFERENCES package_versions (id),
dependency_id BIGINT NOT NULL REFERENCES packages (id),
version_range TEXT[] NOT NULL,
metadata JSONB
);
CREATE TABLE modules
(
id BIGSERIAL PRIMARY KEY,
package_version_id BIGINT NOT NULL REFERENCES package_versions (id),
namespace TEXT NOT NULL,
created_at TIMESTAMP,
metadata JSONB
);
CREATE TABLE files
(
id BIGSERIAL PRIMARY KEY,
package_version_id BIGINT NOT NULL REFERENCES package_versions (id),
path TEXT NOT NULL,
checksum BYTEA,
created_at TIMESTAMP,
metadata JSONB
);
CREATE TABLE module_contents
(
module_id BIGINT NOT NULL REFERENCES modules (id),
file_id BIGINT NOT NULL REFERENCES files (id)
);
CREATE TABLE binary_modules
(
id BIGSERIAL PRIMARY KEY,
package_version_id BIGINT NOT NULL REFERENCES package_versions (id),
name TEXT NOT NULL,
created_at TIMESTAMP,
metadata JSONB
);
CREATE TABLE binary_module_contents
(
binary_module_id BIGINT NOT NULL REFERENCES binary_modules (id),
file_id BIGINT NOT NULL REFERENCES files (id)
);
CREATE TABLE callables
(
id BIGSERIAL PRIMARY KEY,
module_id BIGINT NOT NULL REFERENCES modules (id),
fasten_uri TEXT NOT NULL,
is_internal_call BOOLEAN NOT NULL,
created_at TIMESTAMP,
line_start INTEGER,
line_end INTEGER,
metadata JSONB
);
CREATE TYPE RECEIVER_TYPE AS ENUM ('static', 'dynamic', 'virtual', 'interface', 'special');
CREATE TYPE RECEIVER AS
(
line INTEGER,
type RECEIVER_TYPE,
receiver_uri TEXT
);
CREATE TABLE edges
(
source_id BIGINT NOT NULL REFERENCES callables (id),
target_id BIGINT NOT NULL REFERENCES callables (id),
receivers RECEIVER[] NOT NULL,
metadata JSONB
);
-- CREATE INDEX CONCURRENTLY package_versions_package_id ON package_versions USING btree (package_id);
-- CREATE INDEX CONCURRENTLY dependencies_package_version_id ON dependencies USING btree (package_version_id);
-- CREATE INDEX CONCURRENTLY dependencies_dependency_id ON dependencies USING btree (dependency_id);
-- CREATE INDEX CONCURRENTLY files_package_version_id ON files USING btree (package_version_id);
-- CREATE INDEX CONCURRENTLY modules_package_version_id ON modules USING btree (package_version_id);
-- CREATE INDEX CONCURRENTLY module_contents_module_id ON module_contents USING btree (module_id);
-- CREATE INDEX CONCURRENTLY module_contents_file_id ON module_contents USING btree (file_id);
-- CREATE INDEX CONCURRENTLY binary_modules_package_version_id ON binary_modules USING btree (package_version_id);
-- CREATE INDEX CONCURRENTLY binary_module_contents_binary_module_id ON binary_module_contents USING btree (binary_module_id);
-- CREATE INDEX CONCURRENTLY binary_module_contents_file_id ON binary_module_contents USING btree (file_id);
-- CREATE INDEX CONCURRENTLY callables_module_id ON callables USING btree (module_id);
-- CREATE INDEX CONCURRENTLY edges_source_id ON edges USING btree (source_id);
-- CREATE INDEX CONCURRENTLY edges_target_id ON edges USING btree (target_id);
CREATE UNIQUE INDEX CONCURRENTLY unique_package_forge ON packages USING btree (package_name, forge);
ALTER TABLE packages
ADD CONSTRAINT unique_package_forge UNIQUE USING INDEX unique_package_forge;
CREATE UNIQUE INDEX CONCURRENTLY unique_package_version_generator ON package_versions USING btree (package_id, version, cg_generator);
ALTER TABLE package_versions
ADD CONSTRAINT unique_package_version_generator UNIQUE USING INDEX unique_package_version_generator;
CREATE UNIQUE INDEX CONCURRENTLY unique_version_dependency_range ON dependencies USING btree (package_version_id, dependency_id, version_range);
ALTER TABLE dependencies
ADD CONSTRAINT unique_version_dependency_range UNIQUE USING INDEX unique_version_dependency_range;
CREATE UNIQUE INDEX CONCURRENTLY unique_version_namespace ON modules USING btree (package_version_id, namespace);
ALTER TABLE modules
ADD CONSTRAINT unique_version_namespace UNIQUE USING INDEX unique_version_namespace;
CREATE UNIQUE INDEX CONCURRENTLY unique_module_file ON module_contents USING btree (module_id, file_id);
ALTER TABLE module_contents
ADD CONSTRAINT unique_module_file UNIQUE USING INDEX unique_module_file;
CREATE UNIQUE INDEX CONCURRENTLY unique_version_name ON binary_modules USING btree (package_version_id, name);
ALTER TABLE binary_modules
ADD CONSTRAINT unique_version_name UNIQUE USING INDEX unique_version_name;
CREATE UNIQUE INDEX CONCURRENTLY unique_binary_module_file ON binary_module_contents USING btree (binary_module_id, file_id);
ALTER TABLE binary_module_contents
ADD CONSTRAINT unique_binary_module_file UNIQUE USING INDEX unique_binary_module_file;
CREATE UNIQUE INDEX CONCURRENTLY unique_version_path ON files USING btree (package_version_id, path);
ALTER TABLE files
ADD CONSTRAINT unique_version_path UNIQUE USING INDEX unique_version_path;
CREATE UNIQUE INDEX CONCURRENTLY unique_uri_call ON callables USING btree (module_id, fasten_uri, is_internal_call);
ALTER TABLE callables
ADD CONSTRAINT unique_uri_call UNIQUE USING INDEX unique_uri_call;
CREATE UNIQUE INDEX CONCURRENTLY unique_source_target ON edges USING btree (source_id, target_id);
ALTER TABLE edges
ADD CONSTRAINT unique_source_target UNIQUE USING INDEX unique_source_target;
ALTER TABLE callables
ADD CONSTRAINT check_module_id CHECK ((module_id = -1 AND is_internal_call IS false) OR
(module_id IS NOT NULL AND is_internal_call IS true));
INSERT INTO packages (id, package_name, forge)
VALUES (-1, 'external_callables_library', 'mvn')
ON CONFLICT DO NOTHING;
INSERT INTO package_versions (id, package_id, version, cg_generator)
VALUES (-1, -1, '0.0.1', 'OPAL')
ON CONFLICT DO NOTHING;
INSERT INTO modules (id, package_version_id, namespace)
VALUES (-1, -1, 'global_external_callables')
ON CONFLICT DO NOTHING;
The following need to be executed once on the master using the Postgres superuser (usually, postgres
) to initialise users and databases.
-- Create default read/write user and databases
create user fasten with encrypted password PASSWD; -- change me
ALTER USER fasten with CREATEDB;
create database fasten_java ENCODING 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8' TEMPLATE template0;
create database fasten_rust ENCODING 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8' TEMPLATE template0;
create database fasten_c ENCODING 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8' TEMPLATE template0;
create database fasten_python ENCODING 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8' TEMPLATE template0;
grant all privileges on database fasten_java to fasten;
grant all privileges on database fasten_rust to fasten;
grant all privileges on database fasten_c to fasten;
grant all privileges on database fasten_python to fasten;
-- Create a readonly user for querying
CREATE USER fastenro WITH ENCRYPTED PASSWORD 'fasten';
GRANT CONNECT ON DATABASE fasten_java TO fastenro;
GRANT CONNECT ON DATABASE fasten_rust TO fastenro;
GRANT CONNECT ON DATABASE fasten_c TO fastenro;
GRANT CONNECT ON DATABASE fasten_python TO fastenro;
ALTER DATABASE fasten_java OWNER to fasten;
ALTER DATABASE fasten_rust OWNER to fasten;
ALTER DATABASE fasten_c OWNER to fasten;
ALTER DATABASE fasten_python OWNER to fasten;
\c fasten_java
GRANT USAGE ON SCHEMA public TO fastenro;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO fastenro;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO fastenro;
\c fasten_rust
GRANT USAGE ON SCHEMA public TO fastenro;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO fastenro;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO fastenro;
\c fasten_c
GRANT USAGE ON SCHEMA public TO fastenro;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO fastenro;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO fastenro;
\c fasten_python
GRANT USAGE ON SCHEMA public TO fastenro;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO fastenro;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO fastenro;