Skip to content

Commit

Permalink
Add detailed repo docs and download instructions (#25)
Browse files Browse the repository at this point in the history
* Add detailed repo docs and download instructions

* Fix timestamp

* Name the fields inside JSON objects of the DB

* Incorporate PR comments

* Fix typo

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
  • Loading branch information
kuzdogan and coderabbitai[bot] authored Sep 16, 2024
1 parent bcb8b12 commit 34d39a1
Show file tree
Hide file tree
Showing 9 changed files with 356 additions and 197 deletions.
111 changes: 111 additions & 0 deletions docs/4. repository/1. sourcify-database.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Sourcify Database

Sourcify Database is the main storage backend for Sourcify. It is a PostgreSQL database that follows the [Verified Alliance Schema](https://github.com/verifier-alliance/database-specs) as its base with few modifications.

On a high level, these modifications are:
- Sourcify DB does accept contracts without the deployment details such as `block_number`, `transaction_hash` as well as without an onchain creation bytecode (`contracts.creation_code_hash`).
- Stores the Solidity metadata separately in the `sourcify_matches` table.
- Introduces tables for other purposes.

You can follow the [`services/database/migrations`](https://github.com/ethereum/sourcify/tree/staging/services/database/migrations) folder for the initial schema and the changes made to it. These are not necessarily the differences between Sourcify DB and the Verified Alliance Schema, but any changes made to the schema over time.

## Schema

You can access the live schema of the database [here](https://dbdiagram.io/d/Sourcify-DB-66e1a0076dde7f4149c77e3a) or in the embedded frame below.

<iframe src='https://dbdiagram.io/e/66e1a0076dde7f4149c77e3a/66e1a0196dde7f4149c78072' style={{width: "100%", height: "500px"}}> </iframe>

In short:
- Every verified contract is a coupling between a deployed contract (`contract_deployments`) and a compilation (`compiled_contracts`)
- "Transformations" are applied to reach the final matching onchain bytecode from a bytecode from a compilation.
- Contract bytecodes are "normalized" for deduplication. A bytecode of a popular contract like `ERC20.sol` will only be stored once.

For more information about the schemas of the json fields below check the [Verifier Alliance repo](https://github.com/verifier-alliance/database-specs/tree/master/json-schemas).

JSON fields of `verified_contracts` table:
- `creation_values`
- `creation_transformations`
- `runtime_values`
- `runtime_transformations`

The transformations and values are the operations done on a bytecode from a compilation to reach the final matching onchain bytecode.

JSON fields of `compiled_contracts` table:
- `sources`: Source code files of a contract
- `compiler_settings`
- `compilation_artifacts`: Fields from the compilation output JSON. Fields: `abi`, `userdoc`, `devdoc`, `sources` (AST identifiers), `storageLayout`
- `creation_code_artifacts`: Fields under `evm.bytecode` field. Fields: `sourceMap`, `linkReferences`, `cborAuxdata`
- `runtime_code_artifacts`: Fields under `evm.deployedBytecode` field. Fields: `sourceMap`, `linkReferences`, `cborAuxdata`, `immutableReferences`

## Download

We dump the whole database daily in [Parquet](https://en.wikipedia.org/wiki/Apache_Parquet) format and upload it to a Cloudflare R2 storage. You can access the manifest file at https://export.sourcify.dev ( `.dev` redirects to `.app` domain, which also belongs to Sourcify). The script that does the dump is at [sourcifyeth/parquet-export](https://github.com/sourcifyeth/parquet-export).


[export.sourcify.dev](https://export.sourcify.dev) will redirect to a `manifest.json` file:

<details>
<summary>manifest.json</summary>

```json
{
"timestamp": 1726030203254,
"dateStr": "2024-09-11T04:50:03.254904Z",
"files": {
"code": [
"code/code_0_100000.parquet",
"code/code_100000_200000.parquet",
...
"code/code_2700000_2800000.parquet"
],
"contracts": [
"contracts/contracts_0_1000000.parquet",
...
"contracts/contracts_4000000_5000000.parquet"
],
"contract_deployments": [
"contract_deployments/contract_deployments_0_1000000.parquet",
...
"contract_deployments/contract_deployments_5000000_6000000.parquet"
],
"compiled_contracts": [
"compiled_contracts/compiled_contracts_0_5000.parquet",
...
"compiled_contracts/compiled_contracts_815000_820000.parquet"
],
"verified_contracts": [
"verified_contracts/verified_contracts_0_1000000.parquet",
...
"verified_contracts/verified_contracts_5000000_6000000.parquet"
],
"sourcify_matches": [
"sourcify_matches/sourcify_matches_0_100000.parquet",
...
"sourcify_matches/sourcify_matches_5300000_5400000.parquet"
]
}
}
```
</details>

You can download all the files and use a parquet client to query, inspect, or process the data.

1. Download the manifest file (`-L` to follow redirects):
```bash
curl -L -O https://export.sourcify.dev/manifest.json
```

2. Download all the tables listed in the manifest:
```bash
jq -r '.files | keys[] as $k | .[$k][]' manifest.json | xargs -I {} curl -L -O https://export.sourcify.dev/{}
```

For example you can install the [`parquet-cli`](https://github.com/apache/parquet-java/blob/master/parquet-cli/README.md) to do basic inspection:

```bash
brew install parquet-cli

parquet meta compiled_contracts_0_5000.parquet
```

alternatively use your favorite data processing tool or import this data into a database.
131 changes: 131 additions & 0 deletions docs/4. repository/2. file-repositories.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
import TotalRepoSize from "./TotalRepoSize"

# File Repositories

This page describes the `RepositoryV1` and `RepositoryV2`, which are file systems. See [All Repositories](/docs/repository) for details.

## Table of Contents

- [RepositoryV1 vs RepositoryV2](#repositoryv1-vs-repositoryv2)
- [RepositoryV1](#repositoryv1)
- [RepositoryV2](#repositoryv2)
- [Download](#download)


## RepositoryV1 vs RepositoryV2

### RepositoryV1
RepositoryV1 is the legacy storage backend for files. It is simply a file system based on how file paths are given in the [Solidity metadata](/docs/metadata). file.

An [example metadata](https://repo.sourcify.dev/contracts/full_match/1/0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D/metadata.json) source file paths look like this for the "full_match" contract `0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D` on Ethereum Mainnet (1):
```json
{
"sources": {
"erc20/IERC20.sol": {
"keccak256": "0xa38ec4e151e4d397d05bdfb94e6e4eb91e57a9fca3bc1c655289a4adf31a58fa",
"license": "MIT",
"urls": [
"bzz-raw://312e850e36efbf0f2450896c213b23dc0a28150e051bcbf933a8b9211627c44b",
"dweb:/ipfs/QmWsyisPjDwTJrTMhsGZa4JHiCS63mWfsyVQKbaijWGdmK"
]
},
"erc20/airdrop.sol": {
"keccak256": "0xea27a3e2c4179a064caf9fe9a198addd526fd1d1ea467ea474a0c069e6eac957",
"urls": [
"bzz-raw://6a86bc69b99876768bdbddba504410cf60b33681e1203a36d98840bf2ab8a42b",
"dweb:/ipfs/QmRZSqNfAPduoPoUJ6BM4NpBTbTKBqg5Mz5YBNpaUz4TfQ"
]
}
},
}
```

These files will be like below ([see in repo](https://repo.sourcify.dev/contracts/full_match/1/0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D/)):
```
contracts/full_match/1/0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D/
├── metadata.json
└── sources/
└── erc20/
├── IERC20.sol
└── airdrop.sol
```
The problem with this is the part `"erc20/airdrop.sol"` is not necessarily a valid file path but a ["source unit name"](https://docs.soliditylang.org/en/v0.8.27/path-resolution.html#virtual-filesystem:~:text=assigned%20a%20unique-,source%20unit%20name,-which%20is%20an) in Solidity, i.e. arbitrary strings. This may cause issues on file systems as well as when pinning to IPFS.

### RepositoryV2

RepositoryV2 is the format where we normalize the file names with their keccak256 hashes (source files must have a `keccak256` field in the metadata). So the example above would look like this:

```
contracts/full_match/1/0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D/
├── metadata.json
└── sources/
└── 0xa38ec4e151e4d397d05bdfb94e6e4eb91e57a9fca3bc1c655289a4adf31a58fa.sol
└── 0xea27a3e2c4179a064caf9fe9a198addd526fd1d1ea467ea474a0c069e6eac957.sol
```

The files are exactly the same so their IPFS hashes will not change, and you can look up the metadata file to find the original path-alike source unit names.

## IPFS

Unfortunatelly publishing under IPNS is temporarily disabled. This is because of the difficulty of managing the whole filesystem over IPFS (with MFS etc.) and updating the IPNS regularly.

We still pin all the files on IPFS so you can access them over their individual CIDs (e.g. [`QmVij3h9z536ZG5cRpUmTfdoN9KR1Xp4ix2P7to9dPHgE5`](https://ipfs.io/ipfs/QmVij3h9z536ZG5cRpUmTfdoN9KR1Xp4ix2P7to9dPHgE5)).

Look at the [Download section](#download) to learn how to download the whole repository.

## Web

Moved to [repo.sourcify.dev](/docs/repository/repo.sourcify.dev).

## Download

We compress the **RepositoryV2** weekly and publish on Cloudflare R2 under https://repo-backup.sourcify.dev ( `.dev` redirects to `.app` domain, which also belongs to Sourcify).

<TotalRepoSize/>

[repo-backup.sourcify.dev](https://repo-backup.sourcify.dev) will redirect to a `manifest.json` file:

<details>
<summary>manifest.json</summary>

```json
{
"description": "Manifest file for when the Sourcify file repository was uploaded",
"timestamp": 1726030203254,
"dateStr": "2024-09-11T04:50:03.254904Z",
"files": [
{
"path": "sourcify-repository-2024-09-10T13-36-47/sourcify-repository-2024-09-10T13-36-47.part.gz.aa",
"sizeInBytes": 2097152000
},
...
{
"path": "sourcify-repository-2024-09-10T13-36-47/sourcify-repository-2024-09-10T13-36-47.part.gz.ap",
"sizeInBytes": 800472503
}
]
}
```
</details>

You can download all files in the `files` array and unzip them:

1. Download the manifest file (`-L` to follow redirects):
```bash
curl -L -O https://repo-backup.sourcify.dev/manifest.json
```

2. Extract file paths and download each file:
```bash
jq -r '.files[].path' manifest.json | xargs -I {} curl -L -O https://repo-backup.sourcify.dev/{}
```

3. Concatenate the downloaded parts:
```bash
cat sourcify-repository-*.part.gz.* > sourcify-repository.gz
```

4. Unzip the concatenated file:
```bash
tar -xvzf sourcify-repository.gz
```
46 changes: 46 additions & 0 deletions docs/4. repository/3. repo.sourcify.dev.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@

# repo.sourcify.dev

[repo.sourcify.dev](https://repo.sourcify.dev) is an interface to the Sourcify contract file repository `RepositoryV1`.

The code is available at [sourcifyeth/h5ai-nginx](https://github.com/sourcifyeth/h5ai-nginx). For performance reasons, it is not possible to navigate the folders above the contract level. You need to know ahead the contract you are looking for.
:::tip Lookup

Instead of entering the chain, you can check an address over all chains at https://sourcify.dev/#/lookup
:::

The contracts are accessible under the following path format:

```
https://repo.sourcify.dev/contracts/:match/:chainId/:contractAddress
```

- `:match`: either `full_match` or `partial_match`
- `:chainId`: EVM chain id `1` for Ethereum Mainnet, `5` Ethereum testnet Görli etc. See [chainlist.org](https://chainlist.org)
- `:contractAddress`: e.g. `0x5ed4a410A612F2fe625a8F3cB4d70f197fF8C8be`

### Examples

Here are some example contracts:

- https://repo.sourcify.dev/contracts/full_match/1/0x5ed4a410A612F2fe625a8F3cB4d70f197fF8C8be
- https://repo.sourcify.dev/contracts/full_match/1/0xca2ad74003502af6B727e846Fab40D6cb8Da0035
- https://repo.sourcify.dev/contracts/full_match/100/0x4f15a6e74CFC2F80D5967a8aB75F3c83D8043cF4
- https://repo.sourcify.dev/contracts/partial_match/1/0xb857F1f4014A0C45C287667148417b6799Fe594E/
- (staging) https://repo.staging.sourcify.dev/contracts/partial_match/69/0xb50cBeeFBCE78cDe83F184B275b5E80c4f01006A/sources/

### View Source Code in Remix IDE

It is possible to view the contract folder in the Remix IDE by clicking "View in Remix".

Allow the Sourcify plugin on the next screen in Remix IDE (might take a while to load). The contract folder will be available under `verified-sources/<contract-address>` in the Remix file explorer.

![Sourcify repository screenshot](/img/sourcify-repo.png)

### Download folders

You can download the whole folder by clicking on top left download icon.

Alternatively you can select which files/folders to download by clicking the checkmarks, and click the download icon.

![Sourcify repository screenshot](/img/sourcify-repo-download.png)
52 changes: 52 additions & 0 deletions docs/4. repository/TotalRepoSize.jsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import React, { useState, useEffect } from "react";
import LoadingOverlay from "../../src/components/LoadingOverlay";

const RepositoryStats = () => {
const [isLoading, setIsLoading] = useState(true);
const [totalSize, setTotalSize] = useState(0);
const [timestamp, setTimestamp] = useState("");

useEffect(() => {
const fetchStats = async () => {
const manifestUrl = "https://repo-backup.sourcify.app/manifest.json";

try {
const manifestResponse = await fetch(manifestUrl);
const manifestData = await manifestResponse.json();

const totalSizeBytes = manifestData.files.reduce((acc, file) => acc + file.sizeInBytes, 0);
const totalSizeGB = totalSizeBytes / (1024 * 1024 * 1024); // Convert to GB

setTotalSize(totalSizeGB);

const date = new Date(manifestData.timestamp);
const formattedDate = date
.toUTCString()
.replace(/^[A-Za-z]+, /, "")
.replace(/:\d{2} /, " ");
setTimestamp(formattedDate);
} catch (error) {
console.error("Error fetching manifest:", error);
} finally {
setIsLoading(false);
}
};

fetchStats();
}, []);

if (isLoading) {
return <LoadingOverlay message="Calculating the repository size..." />;
}

return (
<div>
<p>
As of {timestamp} the <strong>compressed</strong> size of the repository files is:{" "}
<strong>{totalSize.toFixed(2)} GB</strong>
</p>
</div>
);
};

export default RepositoryStats;
16 changes: 16 additions & 0 deletions docs/4. repository/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Contract Repository

Sourcify stores the contracts in multiple storage backends and gives the option to choose which one to use. In short there are the following options:

- `RepositoryV1`
- `RepositoryV2`
- `SourcifyDatabase`
- `AllianceDatabase`

For details see [Choosing the storage backend](https://github.com/ethereum/sourcify/tree/staging/services/server#choosing-the-storage-backend).

## Download

You can download the whole contract file repository in zips or the Sourcify database in Parquet format. Follow the guides in each page:
- [Download RepositoryV2](/docs/repository/file-repositories/#download)
- [Download SourcifyDatabase](/docs/repository/sourcify-database/#download)
File renamed without changes.
Loading

0 comments on commit 34d39a1

Please sign in to comment.