-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add detailed repo docs and download instructions (#25)
* Add detailed repo docs and download instructions * Fix timestamp * Name the fields inside JSON objects of the DB * Incorporate PR comments * Fix typo Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
- Loading branch information
1 parent
bcb8b12
commit 34d39a1
Showing
9 changed files
with
356 additions
and
197 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
# Sourcify Database | ||
|
||
Sourcify Database is the main storage backend for Sourcify. It is a PostgreSQL database that follows the [Verified Alliance Schema](https://github.com/verifier-alliance/database-specs) as its base with few modifications. | ||
|
||
On a high level, these modifications are: | ||
- Sourcify DB does accept contracts without the deployment details such as `block_number`, `transaction_hash` as well as without an onchain creation bytecode (`contracts.creation_code_hash`). | ||
- Stores the Solidity metadata separately in the `sourcify_matches` table. | ||
- Introduces tables for other purposes. | ||
|
||
You can follow the [`services/database/migrations`](https://github.com/ethereum/sourcify/tree/staging/services/database/migrations) folder for the initial schema and the changes made to it. These are not necessarily the differences between Sourcify DB and the Verified Alliance Schema, but any changes made to the schema over time. | ||
|
||
## Schema | ||
|
||
You can access the live schema of the database [here](https://dbdiagram.io/d/Sourcify-DB-66e1a0076dde7f4149c77e3a) or in the embedded frame below. | ||
|
||
<iframe src='https://dbdiagram.io/e/66e1a0076dde7f4149c77e3a/66e1a0196dde7f4149c78072' style={{width: "100%", height: "500px"}}> </iframe> | ||
|
||
In short: | ||
- Every verified contract is a coupling between a deployed contract (`contract_deployments`) and a compilation (`compiled_contracts`) | ||
- "Transformations" are applied to reach the final matching onchain bytecode from a bytecode from a compilation. | ||
- Contract bytecodes are "normalized" for deduplication. A bytecode of a popular contract like `ERC20.sol` will only be stored once. | ||
|
||
For more information about the schemas of the json fields below check the [Verifier Alliance repo](https://github.com/verifier-alliance/database-specs/tree/master/json-schemas). | ||
|
||
JSON fields of `verified_contracts` table: | ||
- `creation_values` | ||
- `creation_transformations` | ||
- `runtime_values` | ||
- `runtime_transformations` | ||
|
||
The transformations and values are the operations done on a bytecode from a compilation to reach the final matching onchain bytecode. | ||
|
||
JSON fields of `compiled_contracts` table: | ||
- `sources`: Source code files of a contract | ||
- `compiler_settings` | ||
- `compilation_artifacts`: Fields from the compilation output JSON. Fields: `abi`, `userdoc`, `devdoc`, `sources` (AST identifiers), `storageLayout` | ||
- `creation_code_artifacts`: Fields under `evm.bytecode` field. Fields: `sourceMap`, `linkReferences`, `cborAuxdata` | ||
- `runtime_code_artifacts`: Fields under `evm.deployedBytecode` field. Fields: `sourceMap`, `linkReferences`, `cborAuxdata`, `immutableReferences` | ||
|
||
## Download | ||
|
||
We dump the whole database daily in [Parquet](https://en.wikipedia.org/wiki/Apache_Parquet) format and upload it to a Cloudflare R2 storage. You can access the manifest file at https://export.sourcify.dev ( `.dev` redirects to `.app` domain, which also belongs to Sourcify). The script that does the dump is at [sourcifyeth/parquet-export](https://github.com/sourcifyeth/parquet-export). | ||
|
||
|
||
[export.sourcify.dev](https://export.sourcify.dev) will redirect to a `manifest.json` file: | ||
|
||
<details> | ||
<summary>manifest.json</summary> | ||
|
||
```json | ||
{ | ||
"timestamp": 1726030203254, | ||
"dateStr": "2024-09-11T04:50:03.254904Z", | ||
"files": { | ||
"code": [ | ||
"code/code_0_100000.parquet", | ||
"code/code_100000_200000.parquet", | ||
... | ||
"code/code_2700000_2800000.parquet" | ||
], | ||
"contracts": [ | ||
"contracts/contracts_0_1000000.parquet", | ||
... | ||
"contracts/contracts_4000000_5000000.parquet" | ||
], | ||
"contract_deployments": [ | ||
"contract_deployments/contract_deployments_0_1000000.parquet", | ||
... | ||
"contract_deployments/contract_deployments_5000000_6000000.parquet" | ||
], | ||
"compiled_contracts": [ | ||
"compiled_contracts/compiled_contracts_0_5000.parquet", | ||
... | ||
"compiled_contracts/compiled_contracts_815000_820000.parquet" | ||
], | ||
"verified_contracts": [ | ||
"verified_contracts/verified_contracts_0_1000000.parquet", | ||
... | ||
"verified_contracts/verified_contracts_5000000_6000000.parquet" | ||
], | ||
"sourcify_matches": [ | ||
"sourcify_matches/sourcify_matches_0_100000.parquet", | ||
... | ||
"sourcify_matches/sourcify_matches_5300000_5400000.parquet" | ||
] | ||
} | ||
} | ||
``` | ||
</details> | ||
|
||
You can download all the files and use a parquet client to query, inspect, or process the data. | ||
|
||
1. Download the manifest file (`-L` to follow redirects): | ||
```bash | ||
curl -L -O https://export.sourcify.dev/manifest.json | ||
``` | ||
|
||
2. Download all the tables listed in the manifest: | ||
```bash | ||
jq -r '.files | keys[] as $k | .[$k][]' manifest.json | xargs -I {} curl -L -O https://export.sourcify.dev/{} | ||
``` | ||
|
||
For example you can install the [`parquet-cli`](https://github.com/apache/parquet-java/blob/master/parquet-cli/README.md) to do basic inspection: | ||
|
||
```bash | ||
brew install parquet-cli | ||
|
||
parquet meta compiled_contracts_0_5000.parquet | ||
``` | ||
|
||
alternatively use your favorite data processing tool or import this data into a database. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,131 @@ | ||
import TotalRepoSize from "./TotalRepoSize" | ||
|
||
# File Repositories | ||
|
||
This page describes the `RepositoryV1` and `RepositoryV2`, which are file systems. See [All Repositories](/docs/repository) for details. | ||
|
||
## Table of Contents | ||
|
||
- [RepositoryV1 vs RepositoryV2](#repositoryv1-vs-repositoryv2) | ||
- [RepositoryV1](#repositoryv1) | ||
- [RepositoryV2](#repositoryv2) | ||
- [Download](#download) | ||
|
||
|
||
## RepositoryV1 vs RepositoryV2 | ||
|
||
### RepositoryV1 | ||
RepositoryV1 is the legacy storage backend for files. It is simply a file system based on how file paths are given in the [Solidity metadata](/docs/metadata). file. | ||
|
||
An [example metadata](https://repo.sourcify.dev/contracts/full_match/1/0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D/metadata.json) source file paths look like this for the "full_match" contract `0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D` on Ethereum Mainnet (1): | ||
```json | ||
{ | ||
"sources": { | ||
"erc20/IERC20.sol": { | ||
"keccak256": "0xa38ec4e151e4d397d05bdfb94e6e4eb91e57a9fca3bc1c655289a4adf31a58fa", | ||
"license": "MIT", | ||
"urls": [ | ||
"bzz-raw://312e850e36efbf0f2450896c213b23dc0a28150e051bcbf933a8b9211627c44b", | ||
"dweb:/ipfs/QmWsyisPjDwTJrTMhsGZa4JHiCS63mWfsyVQKbaijWGdmK" | ||
] | ||
}, | ||
"erc20/airdrop.sol": { | ||
"keccak256": "0xea27a3e2c4179a064caf9fe9a198addd526fd1d1ea467ea474a0c069e6eac957", | ||
"urls": [ | ||
"bzz-raw://6a86bc69b99876768bdbddba504410cf60b33681e1203a36d98840bf2ab8a42b", | ||
"dweb:/ipfs/QmRZSqNfAPduoPoUJ6BM4NpBTbTKBqg5Mz5YBNpaUz4TfQ" | ||
] | ||
} | ||
}, | ||
} | ||
``` | ||
|
||
These files will be like below ([see in repo](https://repo.sourcify.dev/contracts/full_match/1/0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D/)): | ||
``` | ||
contracts/full_match/1/0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D/ | ||
├── metadata.json | ||
└── sources/ | ||
└── erc20/ | ||
├── IERC20.sol | ||
└── airdrop.sol | ||
``` | ||
The problem with this is the part `"erc20/airdrop.sol"` is not necessarily a valid file path but a ["source unit name"](https://docs.soliditylang.org/en/v0.8.27/path-resolution.html#virtual-filesystem:~:text=assigned%20a%20unique-,source%20unit%20name,-which%20is%20an) in Solidity, i.e. arbitrary strings. This may cause issues on file systems as well as when pinning to IPFS. | ||
|
||
### RepositoryV2 | ||
|
||
RepositoryV2 is the format where we normalize the file names with their keccak256 hashes (source files must have a `keccak256` field in the metadata). So the example above would look like this: | ||
|
||
``` | ||
contracts/full_match/1/0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D/ | ||
├── metadata.json | ||
└── sources/ | ||
└── 0xa38ec4e151e4d397d05bdfb94e6e4eb91e57a9fca3bc1c655289a4adf31a58fa.sol | ||
└── 0xea27a3e2c4179a064caf9fe9a198addd526fd1d1ea467ea474a0c069e6eac957.sol | ||
``` | ||
|
||
The files are exactly the same so their IPFS hashes will not change, and you can look up the metadata file to find the original path-alike source unit names. | ||
|
||
## IPFS | ||
|
||
Unfortunatelly publishing under IPNS is temporarily disabled. This is because of the difficulty of managing the whole filesystem over IPFS (with MFS etc.) and updating the IPNS regularly. | ||
|
||
We still pin all the files on IPFS so you can access them over their individual CIDs (e.g. [`QmVij3h9z536ZG5cRpUmTfdoN9KR1Xp4ix2P7to9dPHgE5`](https://ipfs.io/ipfs/QmVij3h9z536ZG5cRpUmTfdoN9KR1Xp4ix2P7to9dPHgE5)). | ||
|
||
Look at the [Download section](#download) to learn how to download the whole repository. | ||
|
||
## Web | ||
|
||
Moved to [repo.sourcify.dev](/docs/repository/repo.sourcify.dev). | ||
|
||
## Download | ||
|
||
We compress the **RepositoryV2** weekly and publish on Cloudflare R2 under https://repo-backup.sourcify.dev ( `.dev` redirects to `.app` domain, which also belongs to Sourcify). | ||
|
||
<TotalRepoSize/> | ||
|
||
[repo-backup.sourcify.dev](https://repo-backup.sourcify.dev) will redirect to a `manifest.json` file: | ||
|
||
<details> | ||
<summary>manifest.json</summary> | ||
|
||
```json | ||
{ | ||
"description": "Manifest file for when the Sourcify file repository was uploaded", | ||
"timestamp": 1726030203254, | ||
"dateStr": "2024-09-11T04:50:03.254904Z", | ||
"files": [ | ||
{ | ||
"path": "sourcify-repository-2024-09-10T13-36-47/sourcify-repository-2024-09-10T13-36-47.part.gz.aa", | ||
"sizeInBytes": 2097152000 | ||
}, | ||
... | ||
{ | ||
"path": "sourcify-repository-2024-09-10T13-36-47/sourcify-repository-2024-09-10T13-36-47.part.gz.ap", | ||
"sizeInBytes": 800472503 | ||
} | ||
] | ||
} | ||
``` | ||
</details> | ||
|
||
You can download all files in the `files` array and unzip them: | ||
|
||
1. Download the manifest file (`-L` to follow redirects): | ||
```bash | ||
curl -L -O https://repo-backup.sourcify.dev/manifest.json | ||
``` | ||
|
||
2. Extract file paths and download each file: | ||
```bash | ||
jq -r '.files[].path' manifest.json | xargs -I {} curl -L -O https://repo-backup.sourcify.dev/{} | ||
``` | ||
|
||
3. Concatenate the downloaded parts: | ||
```bash | ||
cat sourcify-repository-*.part.gz.* > sourcify-repository.gz | ||
``` | ||
|
||
4. Unzip the concatenated file: | ||
```bash | ||
tar -xvzf sourcify-repository.gz | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
|
||
# repo.sourcify.dev | ||
|
||
[repo.sourcify.dev](https://repo.sourcify.dev) is an interface to the Sourcify contract file repository `RepositoryV1`. | ||
|
||
The code is available at [sourcifyeth/h5ai-nginx](https://github.com/sourcifyeth/h5ai-nginx). For performance reasons, it is not possible to navigate the folders above the contract level. You need to know ahead the contract you are looking for. | ||
:::tip Lookup | ||
|
||
Instead of entering the chain, you can check an address over all chains at https://sourcify.dev/#/lookup | ||
::: | ||
|
||
The contracts are accessible under the following path format: | ||
|
||
``` | ||
https://repo.sourcify.dev/contracts/:match/:chainId/:contractAddress | ||
``` | ||
|
||
- `:match`: either `full_match` or `partial_match` | ||
- `:chainId`: EVM chain id `1` for Ethereum Mainnet, `5` Ethereum testnet Görli etc. See [chainlist.org](https://chainlist.org) | ||
- `:contractAddress`: e.g. `0x5ed4a410A612F2fe625a8F3cB4d70f197fF8C8be` | ||
|
||
### Examples | ||
|
||
Here are some example contracts: | ||
|
||
- https://repo.sourcify.dev/contracts/full_match/1/0x5ed4a410A612F2fe625a8F3cB4d70f197fF8C8be | ||
- https://repo.sourcify.dev/contracts/full_match/1/0xca2ad74003502af6B727e846Fab40D6cb8Da0035 | ||
- https://repo.sourcify.dev/contracts/full_match/100/0x4f15a6e74CFC2F80D5967a8aB75F3c83D8043cF4 | ||
- https://repo.sourcify.dev/contracts/partial_match/1/0xb857F1f4014A0C45C287667148417b6799Fe594E/ | ||
- (staging) https://repo.staging.sourcify.dev/contracts/partial_match/69/0xb50cBeeFBCE78cDe83F184B275b5E80c4f01006A/sources/ | ||
|
||
### View Source Code in Remix IDE | ||
|
||
It is possible to view the contract folder in the Remix IDE by clicking "View in Remix". | ||
|
||
Allow the Sourcify plugin on the next screen in Remix IDE (might take a while to load). The contract folder will be available under `verified-sources/<contract-address>` in the Remix file explorer. | ||
|
||
![Sourcify repository screenshot](/img/sourcify-repo.png) | ||
|
||
### Download folders | ||
|
||
You can download the whole folder by clicking on top left download icon. | ||
|
||
Alternatively you can select which files/folders to download by clicking the checkmarks, and click the download icon. | ||
|
||
![Sourcify repository screenshot](/img/sourcify-repo-download.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
import React, { useState, useEffect } from "react"; | ||
import LoadingOverlay from "../../src/components/LoadingOverlay"; | ||
|
||
const RepositoryStats = () => { | ||
const [isLoading, setIsLoading] = useState(true); | ||
const [totalSize, setTotalSize] = useState(0); | ||
const [timestamp, setTimestamp] = useState(""); | ||
|
||
useEffect(() => { | ||
const fetchStats = async () => { | ||
const manifestUrl = "https://repo-backup.sourcify.app/manifest.json"; | ||
|
||
try { | ||
const manifestResponse = await fetch(manifestUrl); | ||
const manifestData = await manifestResponse.json(); | ||
|
||
const totalSizeBytes = manifestData.files.reduce((acc, file) => acc + file.sizeInBytes, 0); | ||
const totalSizeGB = totalSizeBytes / (1024 * 1024 * 1024); // Convert to GB | ||
|
||
setTotalSize(totalSizeGB); | ||
|
||
const date = new Date(manifestData.timestamp); | ||
const formattedDate = date | ||
.toUTCString() | ||
.replace(/^[A-Za-z]+, /, "") | ||
.replace(/:\d{2} /, " "); | ||
setTimestamp(formattedDate); | ||
} catch (error) { | ||
console.error("Error fetching manifest:", error); | ||
} finally { | ||
setIsLoading(false); | ||
} | ||
}; | ||
|
||
fetchStats(); | ||
}, []); | ||
|
||
if (isLoading) { | ||
return <LoadingOverlay message="Calculating the repository size..." />; | ||
} | ||
|
||
return ( | ||
<div> | ||
<p> | ||
As of {timestamp} the <strong>compressed</strong> size of the repository files is:{" "} | ||
<strong>{totalSize.toFixed(2)} GB</strong> | ||
</p> | ||
</div> | ||
); | ||
}; | ||
|
||
export default RepositoryStats; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Contract Repository | ||
|
||
Sourcify stores the contracts in multiple storage backends and gives the option to choose which one to use. In short there are the following options: | ||
|
||
- `RepositoryV1` | ||
- `RepositoryV2` | ||
- `SourcifyDatabase` | ||
- `AllianceDatabase` | ||
|
||
For details see [Choosing the storage backend](https://github.com/ethereum/sourcify/tree/staging/services/server#choosing-the-storage-backend). | ||
|
||
## Download | ||
|
||
You can download the whole contract file repository in zips or the Sourcify database in Parquet format. Follow the guides in each page: | ||
- [Download RepositoryV2](/docs/repository/file-repositories/#download) | ||
- [Download SourcifyDatabase](/docs/repository/sourcify-database/#download) |
File renamed without changes.
Oops, something went wrong.