Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improve Documentation] HCS-1 files and hashinals are not stored on-chain #12

Open
5 tasks
regcs opened this issue Jul 6, 2024 · 2 comments
Open
5 tasks

Comments

@regcs
Copy link

regcs commented Jul 6, 2024

TL;DR

HCS-1 strives to be a new decentralized file storage protocol on Hedera promising, for example, the capability to create "truly on-chain NFTs" like Hashinals, but currently relies on centralized cloud providers and Hedera's mirror node network. Hedera’s consensus nodes do not store historical hashgraph data (including HCS-1 files). HCS-1 files are only stored on two centralized cloud providers and mirror nodes. Both lack guarantees for file permanence and storage decentralization. Such guarantees are available on true decentralized storage networks like Filecoin, Arweave, Storj, SiaCoin, etc. Improvements suggested herein include clearer documentation and enhancing mirror nodes with proof-of-storage and proof-of-replication mechanisms.

Detailed improvement proposal

Motivation

HCS-1 is currently advertised as a decentralized file storage protocol on Hedera. The motivation was to create a Hedera native file storage, which allows storage of files on Hedera providing improved file permanence and reduced reliance on third-party storage providers. This should allow use cases like Hashinals, which are advertised as "entirely on-chain NFTs", which are inscribed to the hashgraph. While this vision sounds promising, the current network architecture of Hedera fails to live up to these promises leaving users with wrong impressions. Neither the social media appearance of the vocal figures pushing HCS-1 and Hashinals, nor the documentation currently reflect the technical fundamentals of Hedera and their impact on HCS-1 and Hashinals.

Hedera network

Hedera hashgraph consists of two node types: Consensus nodes and mirror nodes.

Consensus nodes process the consensus mechanism and store the current state of all entities (i.e., Accounts, Hedera File Service files, HCS topics, smart contracts, etc.). They do not store historic states of the hashgraph. In particular, consensus nodes do not store HCS messages older than 3 minutes. Instead, consensus nodes push historic transactions to centralized cloud providers (AWS and Google Cloud) and then forget about the data. Consequently, HCS-1 files do not live on the consensus nodes.

Mirror nodes may or may not query this data and store the full or any arbitrary subset of the hashgraph history. There is no obligation or guarantee to store any particular file. Currently, the full history of the hashgraph (mainnet) has a size of ~50 TB, most of which has accumulated since atma.io went online on mainnet over the past 1.5 years. Most mirror nodes currently store the full history. It is foreseeable, however, that with (most-likely exponentially) growing hashgraph history size and increasing amount of use cases full mirror nodes might become rare and new cost structures will form with specialized mirror node types (e.g., mirror nodes only providing account balances, specific HCS topics, etc.). This leaves the question, what impact this will have on storage cost, accessibility, and decentralization of files stored via HCS-1.

File Storage on Hedera

The standard way to store files on Hedera is the Hedera File Service, which stores files directly on consensus nodes. They are, thus, part of the current state of the hashgraph and in that sense "truly on-chain files". These are not HCS-files.

The proposed way to store files is HCS-1 which stores files by chunking them and submitting the chunks to HCS topics in separate, ordered messages. This data is not stored on consensus nodes. It is only stored in the hashgraph history, which means: AWS and Google Cloud. From there it may be decentralized to mirror nodes, which are free to decide whether they store a particular file or not. As outlined before, this circumstance will become increasingly important with increasing hashgraph history.

Missing proof-of-storage & proof-of-replication for Hashinals and HCS-1 files

NFTs on Hedera were traditionally created by minting a NFT via Hedera Token Service (HTS) which linked to associated file data on the Interplanetary File System (IPFS). The IPFS is a free, decentralized storage network without consensus mechanism. Nodes on IPFS are free to decide which data they store and which data they prune from their disks. Usually, files which are frequently accessed are more likely to be stored on multiple nodes and for long-term. However, due to missing consensus mechanisms, there is no obligation and no guarantee to store any particular file. In that sense, the IPFS network and the Hedera mirror node network provide the same (low) storage and decentralization guarantees for files.

Decentralized storage blockchains like Filecoin, Arweave, Storj, SiaCoin, etc. emerged over the past 10 years to solve this issue. They add a consensus mechanism to this decentralized file storage, which usually guarantees that multiple nodes:

  • store a specific file by proving to each other that they know what's in the file
  • store the same file by proving its data integrity (file hashes)

This guarantees the decentralization, instant accessibility, and data integrity of files. Mirror nodes on Hedera currently do not provide these guarentees.

Hedera network issues

Despite the fundamental advantages of Hedera, the network architecture has some shortcomings compared to other DLTs. For most use cases, these are rational compromises accepted to keep Hedera's exceptional performance. Furthermore, these compromises do not have a significant practical impact for most use cases. What are these shortcomings?

  1. Hedera has no guaranteed decentralization of the complete hashgraph history. At the moment, we have to trust centralized cloud providers AWS and Google Cloud that they store the hashgraph history (including HCS-1 files and hashinals). This adds a layer of centralization to the network history and, in particular, to HCS-1 files and Hashinals.
  2. Hedera mirror nodes have no proof-of-storage, which would guarantee that a particular mirror node stores a specific file version over a guaranteed time period.
  3. Hedera mirror nodes have no proof-of-replication, which would guarantee that multiple mirror nodes store the same file.

Note: Point 1 seems to be currently addressed by Swirlds Lab which currently creates a new Hedera node type called block node. Based on the rare publicly available information, these nodes will guarantee decentralized storage of the complete hashgraph history. However, it seems this will have no impact on issues 2 and 3.

Actionable points to improve HCS

@regcs regcs changed the title [Improve Documentation] HCS-1 files are not on-chain and not safer than IPFS [Improve Documentation] HCS-1 files are not on-chain and provide no better storage guarentees than IPFS Jul 6, 2024
@regcs regcs changed the title [Improve Documentation] HCS-1 files are not on-chain and provide no better storage guarentees than IPFS [Improve Documentation] HCS-1 files and hashinals are not stored on-chain Jul 6, 2024
@teacoat
Copy link

teacoat commented Jul 6, 2024

This PR seems to operate under the assumption that mirrornodes are where data lives on chain - this is not the case, nor is anyone involved operating as if that were the case. Mirrornodes do not need a consensus mechanism, as they are read-only portals to on chain data that anyone can set up. This has been explained multiple times but continues to be ignored for some reason.

The concerns around where the event stream is stored is a separate issue that should be brought up with swirlds regarding decentralization of chain data, and is not an action item relating to this project, though as already explained they are working on a solution for that already.

@regcs
Copy link
Author

regcs commented Jul 6, 2024

This PR seems to operate under the assumption that mirrornodes are where data lives on chain ...

What? This draft states the exact opposite. True on-chain data in a strict definition (required for files) lives on consensus nodes. If I take the time to write this piece, please take the time to thoroughly read and understand it. Thank you!

... this is not the case, nor is anyone involved operating as if that were the case.

Logic 1-on-1:

  • We know, HCS-1 files and Hashinals are stored in record stream and may be copied by mirror nodes

  • HCS-1 and Hashinals standard document claims that HCS-1 files and Hashinals are "truly on-chain"

  • that is, these standard documents state that mirror nodes data are truly on-chain; in addition, some persons involved directly stated that and we discussed this a lot

Mirrornodes do not need a consensus mechanism, as they are read-only portals to on chain data that anyone can set up. This has been explained multiple times but continues to be ignored for some reason.

It is dishonest to state that this has been ignored. We discussed this multiple times. But let's discuss it again, maybe in a different mode. Please answer:

  1. Would you trust IPFS with storing your files decentralized and long-term no matter what file it is?

  2. If I store a file today on Filecoin/Arweave/Storj/etc. and on HCS-1, what exact guarantees do I have in both cases that my file will be immediately retrievable in 1 year from at least 5 independent nodes in each case? (Assuming both networks still exist of course, which is likely)

The concerns around where the event stream is stored is a separate issue that should be brought up with swirlds regarding decentralization of chain data, and is not an action item relating to this project, though as already explained they are working on a solution for that already.

As highlighted in the document, we agree here. Therefore it is not an action item in this document, in case you thought it would be one for some reason.

@kantorcodes kantorcodes added invalid This doesn't seem right and removed invalid This doesn't seem right labels Jul 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants