Skip to content

Metadata Database Schema

Amir M. Mir edited this page Apr 1, 2020 · 41 revisions

This page will explain the schema that is used to store the metadata of the callgraphs.

Database

We decided to use PostgreSQL version 12 or higher because it has some very useful features that we utilize (like extensive JSON data type support).

Schema

Depicted below is the E/R diagram of the metadata database schema we have devised: FASTEN Metadata Database Schema

Explanation

  • The packages table contains basic information about the packages/products like name of the package, forge and etc.
  • Each package can have multiple versions, thus the table package_versions references packages and stores information like version of the package.
  • Each package version can have multiple dependencies to other packages. Therefore table dependencies references package_versions.id as source package and packages.id as dependent package and has a version_range which specifies acceptable versions of dependency.
  • Each package_version can have multiple modules in them. For some languages, these modules can be viewed as classes. Thus the table modules references package_versions and stores such attributes as the namespace of the module, SHA256 hash of the module, timestamp and metadata about the module in form of JSON. Currently, in a module's metadata, the following attributes are stored: superClasses, superInterfaces and sourceFile.
  • Each module can consist of multiple methods - callables. Therefore, table callables stores reference to modules, FASTEN URI of the method, boolean value if this callable is resolved/internal and timestamp with metadata. However, a reference to modules can be NULL, because, in the graph, there is no information about the module and package of the external call. Thus, when saving the graph, all external callables are inserted in the callables table without reference to modules and is_resolved_call set to false.
  • The table edges stores the graph's edges. Each edge has a source and target which both reference callables table allowing to store both internal and external calls. Each edge also stores some metadata about the call (like if the call invokes a virtual method and etc).

SQL to create database tables

Here is the SQL code that is used to create all tables described above in the diagram with all required keys and constraints.

CREATE TABLE packages
(
    id           BIGSERIAL PRIMARY KEY,
    package_name TEXT NOT NULL,
    forge        TEXT NOT NULL,
    project_name TEXT,
    repository   TEXT,
    created_at   TIMESTAMP
);

CREATE TABLE package_versions
(
    id           BIGSERIAL PRIMARY KEY,
    package_id   BIGINT NOT NULL REFERENCES packages (id),
    version      TEXT   NOT NULL,
    cg_generator TEXT   NOT NULL,
    created_at   TIMESTAMP,
    metadata     JSONB
);

CREATE TABLE dependencies
(
    package_version_id BIGINT NOT NULL REFERENCES package_versions (id),
    dependency_id      BIGINT NOT NULL REFERENCES packages (id),
    version_range      TEXT[] NOT NULL
);

CREATE TABLE modules
(
    id                 BIGSERIAL PRIMARY KEY,
    package_version_id BIGINT NOT NULL REFERENCES package_versions (id),
    namespaces         TEXT   NOT NULL,
    sha256             BYTEA,
    created_at         TIMESTAMP,
    metadata           JSONB
);

CREATE TABLE callables
(
    id               BIGSERIAL PRIMARY KEY,
    module_id        BIGINT REFERENCES modules (id),
    fasten_uri       TEXT    NOT NULL,
    is_resolved_call BOOLEAN NOT NULL,
    created_at       TIMESTAMP,
    metadata         JSONB
);

CREATE TABLE edges
(
    source_id BIGINT NOT NULL REFERENCES callables (id),
    target_id BIGINT NOT NULL REFERENCES callables (id),
    metadata  JSONB
);

CREATE INDEX packages_compound_index ON packages (package_name, forge);
CREATE INDEX package_versions_compound_index ON package_versions (package_id, version, cg_generator);
CREATE INDEX dependencies_compound_index ON dependencies (package_version_id, dependency_id, version_range);
CREATE INDEX modules_compound_index ON modules (package_version_id, namespaces);
CREATE INDEX callables_compound_index ON callables (fasten_uri, is_resolved_call);
CREATE INDEX edges_compound_index ON edges (source_id, target_id);