Transactions are at the heart of Bitcoin. Transactions, simply put, are value transfers from one entity to another. We’ll see in Chapter 6 how "entities" in this case are really smart contracts. But we’re getting ahead of ourselves. Lets first look at what transactions in Bitcoin are, what they look like and how they are parsed.
At a high level, a transaction really only has 4 components. They are:
-
Version
-
Inputs
-
Outputs
-
Locktime
At this point a general overview of these fields might be helpful. Version defines what the format of the transaction is supposed to be, Inputs define what bitcoins are being spent, Outputs define where the bitcoins are going and Locktime defines what time this transaction starts being valid. We’ll go through each component in depth.
Here is a hexadecimal dump of a typical transaction that shows which parts are which.
The differently highlighted parts represent Version, Inputs, Outputs and Locktime respectively.
With this in mind, we can start constructing the Transaction class, which we’ll call Tx
:
class Tx:
def __init__(self, version, tx_ins, tx_outs, locktime):
self.version = version
self.tx_ins = tx_ins # (1)
self.tx_outs = tx_outs
self.locktime = locktime
def __repr__(self):
tx_ins = ''
for tx_in in self.tx_ins:
tx_ins += tx_in.__repr__() + '\n'
tx_outs = ''
for tx_out in self.tx_outs:
tx_outs += tx_out.__repr__() + '\n'
return 'version: {}\ntx_ins:\n{}\ntx_outs:\n{}\nlocktime: {}\n'.format(
self.version,
tx_ins,
tx_outs,
self.locktime,
)
-
Input and Output are very generic terms so we specify what kind of inputs they are. We’ll define the specific object types they are later.
The rest of this chapter will be concerned with parsing a transaction. This is very useful to know as this teaches you about the actual serialized format. You may at this point write some code like this:
class Tx:
...
@classmethod # (1)
def parse(cls, serialization):
version = serialization[0:4] # (2)
...
-
This method would have to be a class method as the serialization would have to be converted to a
Tx
object. -
We assume here that the variable
serialization
is a byte array.
This could definitely work, but practically speaking, the bytes size of the transaction may be very large. Ideally, we want to be able to parse from a stream instead. This will allow us to not need the entire transaction serialization before we start parsing and from an engineering point, that allows to fail early and be more efficient. Thus, the code for parsing a transaction will look more like this:
class Tx:
...
@classmethod
def parse(cls, stream):
serialized_version = stream.read(4) # (1)
...
-
The
read
method will allow us to parse on the fly as we won’t have to wait on I/O
This is advantageous from an engineering perspective as the stream can be a socket connection on the network or a file handle. Our method will be able to handle any sort of stream and return us the Tx
object that we need.
When you see a version number in something, it’s meant to give the receiver information about what the versioned thing is supposed to represent. If, for example, you are running Windows 3.1, that’s a version number that’s very differnt than Windows 8 or Windows 10. You could specify just "Windows", but specifying the version number after the operating system helps you know what features it has and what API’s you can program against.
In a similar way, Bitcoin transactions have version numbers. In Bitcoin’s case, the transaction version is generally 1, but there are cases where it can be 2 (transactions utilizing an OP code called OP_CHECKSEQUENCEVERIFY as defined in BIP0068)
You may notice here that the actual value in hexadecimal is 01000000, which doesn’t look like 1. It actually is when the interpreted as a little-endian integer (recall the discussion from chapter 4).
Each input is an output of a previous transaction. This last sentence requires more explanation as it’s not intuitively obvious at first.
Bitcoin’s inputs are outputs of another transaction. That is, there is some entity that needs to have sent you the Bitcoins for you to be able to spend it. This makes intuitive sense. You cannot spend money unless someone gave you some first. The inputs are there to show which bitcoins have been given to you. There are two things that each input needs to have:
-
A reference to the previous Bitcoins you own
-
Proof that these are yours to spend
It is the second part that ends up needing ECDSA. You don’t want people to be able to forge this, so inputs contain signatures which only the owner of the private key can produce.
The inputs field can contain more than one input. This is analogous to using a single 100 bill to pay for a 70 dollar meal or a 50 and a 20. The former only requires one input ("bill") the latter requires 2. There are situations where there could be even more. In our analogy, we can pay for a 70 dollar meal with 14 5-dollar bills or even 7000 pennies. This would be analogous to 14 inputs or 7000 inputs.
The number of inputs is the next part of the transaction:
We can see that the byte is actually 01
, which means that this transaction has 1 input. It may be tempting here to assume that it’s always a single byte, but it’s not. A single byte has 8 bits, so this means that anything over 255 inputs would not be expressible in a single byte.
This is where varint comes in. Varint is shorthand for variable integer which is a way to encode an integer into bytes that range from 0 to 264-1. We could, of course, always reserve 8 bytes for the number of inputs, but that would be a lot of wasted space if we expect the number of inputs to be a relatively small number (say under 200). This is the case with the number of inputs in a normal transaction, so utilizing a varint helps to save space. You can see how they work in the sidebar.
Note
|
Varint
Variable integers work by these rules:
Two functions are very helpful here as we’ll be using this more as we keep parsing different fields in Bitcoin: def read_varint(s):
'''read_varint reads a variable integer from a stream'''
i = s.read(1)[0]
if i == 0xfd:
return little_endian_to_int(s.read(2))
elif i == 0xfe:
return little_endian_to_int(s.read(4))
elif i == 0xff:
return little_endian_to_int(s.read(8))
else:
return i
def encode_varint(i):
'''encodes an integer as a varint'''
if i < 0xfd:
return bytes([i])
elif i < 0x10000:
return b'\xfd' + int_to_little_endian(i, 2)
elif i < 0x100000000:
return b'\xfe' + int_to_little_endian(i, 4)
elif i < 0x10000000000000000:
return b'\xff' + int_to_little_endian(i, 8)
else:
raise RuntimeError('integer too large: {}'.format(i))
|
Each input contains 4 fields, the first two which point ot the previous transaction output and two more that define how it can be spent. These are as follows:
-
Previous Transaction ID
-
Previous transaction index
-
ScriptSig
-
Sequence
As explained above, each input is actually a previous transaction’s output. The previous transaction id is the double_sha256
of the previous transaction’s contents in little endian order. This uniquely defines the previous transaction as the probability of a hash collision is very, very small. As we’ll see below, each transaction has to have at least 1 output, but may have many. Thus, we need to define exactly which output within a transactio that we’re spending.
We will note here that the transaction id is 32 bytes and that the transaction index is 4 bytes. Both are in little-endian order
ScriptSig has to do with Bitcoin’s smart contract language SCRIPT, and will be discussed more thoroughly in chapter 6. For now, think of ScriptSig as opening a lock box. Something that can only be done by the owner of the transaction output. The ScriptSig field is a variable-length field, not a fixed-length field like most of what we’ve seen so far. A variable-length field requires us to define exactly how long it will be which is why the field is preceded by a varint telling us how long the field is.
Sequence was originally intended as a way to do payment channels (see sidebar), but is currently used with Replace-By-Fee and Check Sequence Verify. This field is also in little-endian and takes up 4 bytes. The resulting transaction looks something like this:
Originally, Satoshi wanted to sequence and locktime to be used for something called payment channels. A payment channel is a way to do payments back and forth with another party without making lots and lots of on-chain transactions. For example, if Alice pays Bob x
Bitcoins for something and then Bob pays Alice y
Bitcoin for something else (say x > y
), then we can have Alice just pay Bob x-y
, instead of two separte transactions on-chain. We could do the same thing if Alice and Bob had 100 transactions between them. We can essentially compress a bunch of transactions into a single transaction.
That’s the idea of a payment channel. It’s a continuously updating mini-ledger between the two parties involved that gets settled on-chain. Satoshi’s idea was to utilize sequence and locktime to update the payment channel transaction every time there is a new payment between the two parties. The payment-channel transaction would have two inputs, one from Alice, one from Bob and two outputs, one to Alice and one to Bob. The payment-channel transaction would start with sequence at 0 with a far away locktime (say 500 blocks from now).valid in 500 blocks. This would be the base transaction where Alice and Bob get the same amounts as they put in.
After the first transaction where Alice pays Bob x Bitcoins, the sequence of each input would be 1 and the locktime earlier (say 499 blocks from now). After the second transaction where Bob pays Alice y Bitcoins, the sequence of each input would be 2 and the locktime even earlier (say 498 blocks from now). Using this method, we could have up to 500 payments compressed into a single on-chain transaction.
Unfortunately, as clever as this is, it turns out that it’s quite easy for a miner to cheat. In our example, Bob could be a miner and ignore the updated payment channel transaction with sequence number 2 and mine the payment channel transaction with sequence number 1 and cheat Alice out of y
Bitcoins.
Now that we know what the fields are, we can start creating a TxIn
class in Python:
class TxIn:
def __init__(self, prev_tx, prev_index, script_sig, sequence):
self.prev_tx = prev_tx
self.prev_index = prev_index
self.script_sig = script_sig
self.sequence = sequence
def __repr__(self): # (1)
return '{}:{}'.format(
self.prev_tx.hex(),
self.prev_index,
)
-
Generally, we’re going to want to know the previous input when we print the TxIn.
A couple things to note here. The amount of each input is actually not specified. We have no idea how much is being spent unless we actually look up the transaction. Furthermore, we don’t even know if the transaction is unlocking the right box, so to speak, without knowing about the previous transaction. Every node must verify that this transaction is actually unlocking the right box and that it’s not spending Bitcoins that don’t exist.
We’ll delve more deeply into how Script is parsed in the next chapter, but for now, here’s how you get a Script object from hexadecimal in Python:
>>> from io import BytesIO
>>> from script import Script
>>> script_hex = '6b483045..8a'
>>> stream = BytesIO(bytes.fromhex(script_hex))
>>> script_raw = Script.parse(stream)
>>> print(script_raw)
3045...01 0349...8a
The details of how script is parsed is covered in Chapter 6.
As hinted in the previous section, outputs define where the bitcoins are actually going. We must have at least one output and can have lots of outputs. An exchange may batch transactions, for example, and pay out a lot of people at once instead of generating a single transaction for every single person that requests Bitcoins.
Like inputs, the transaction serialization starts with how many outputs there are as a varint.
Outputs each have two fields: amount and ScriptPubKey. Amount is the amount of bitcoin being assigned and is specified in satoshis, or 1/100,000,000th of a Bitcoin. This allows us to divide Bitcoin very finely, down to 1/100th of a penny in USD terms as of this writing. The absolute maximum for the amount is the asymptotic limit of Bitcoin (21 million bitcoins) in satoshis, which is 2,100,000,000,000,000 (2.1 quadrillion) satoshis. This number is greater than 232 (4.3 billion or so) and thus be stored in more bytes. This is why amount takes up 8 bytes and is serialized in little-endian.
ScriptPubKey is much like ScriptSig in that it has to do with Bitcoin’s smart contract language SCRIPT. Think of ScriptPubKey as the lock box that can only be opened by the holder of the key. ScriptPubKey is essentially a one-way safe that can receive deposits from anyone, but can only be opened by the owner of the safe. We’ll explore what this is in more detail in chapter 6. Like ScriptSig, ScriptPubKey is a variable-length field and is thus preceded by the length of the field as a varint.
The actual output fields look like this
UTXO stands for Unspent Transaction Output. The entire set of unspent transaction outputs at any given moment is called the UTXO Set. The reason why UTXOs are important is because they represent all the actual Bitcoins that are available to spend. In other words, these are the Bitcoins that are in circulation. Full nodes on the network keep track of the UTXO set because this makes validating new transactions much, much easier.
For example, it’s very easy to detect a double-spend simply by looking up the previous transaction output in the UTXO set. If the input of a new transaction is using a transaction output that’s been spent already, that’s an attempt at a double-spend and thus invalid. Keeping the UTXO set handy is also very useful for validating transactions. We need to look up the amounts and the ScriptPubKey from the previous transaction output, so having these UTXOs handy will greatly speed up transaction validation.
We can now start coding the TxOut
class given what we know.
class TxOut:
def __init__(self, amount, script_pubkey):
self.amount = amount
self.script_pubkey = script_pubkey
def __repr__(self):
return '{}:{}'.format(self.amount, self.script_pubkey)
Locktime is a way to time-delay a transaction. A transaction with a locktime of 600,000 cannot go into the blockchain until block 600,000. this was originally construed as a way to do payment channels (see sidebar). The rule with locktime is that if the locktime is greater than 500,000,000, locktime is a unix time stamp. If locktime is less than 500,000,000, it is a block number. This way, transactions can be signed, but unspendable until a certain point in time or block.
The serialization is in Little-endian and 4-bytes like so:
This turns out not to be that useful as the recipient of the transaction has no certainty that the transaction will be good when the locktime comes. The sender can spend the inputs prior to the locktime transaction getting into the blockchain invalidating the transaction at locktime.
That said, BIP0065 introduced something called OP_CHECKLOCKTIMEVERIFY which makes use of locktime in a more useful way by making an output unspendable until a certain locktime.
What is the ScriptSig from the second input, scriptPubKey from the first output and the amount of the second output for this transaction?
010000000456919960ac691763688d3d3bcea9ad6ecaf875df5339e148a1fc61c6ed7a069e010000006a47304402204585bcdef85e6b1c6af5c2669d4830ff86e42dd205c0e089bc2a821657e951c002201024a10366077f87d6bce1f7100ad8cfa8a064b39d4e8fe4ea13a7b71aa8180f012102f0da57e85eec2934a82a585ea337ce2f4998b50ae699dd79f5880e253dafafb7feffffffeb8f51f4038dc17e6313cf831d4f02281c2a468bde0fafd37f1bf882729e7fd3000000006a47304402207899531a52d59a6de200179928ca900254a36b8dff8bb75f5f5d71b1cdc26125022008b422690b8461cb52c3cc30330b23d574351872b7c361e9aae3649071c1a7160121035d5c93d9ac96881f19ba1f686f15f009ded7c62efe85a872e6a19b43c15a2937feffffff567bf40595119d1bb8a3037c356efd56170b64cbcc160fb028fa10704b45d775000000006a47304402204c7c7818424c7f7911da6cddc59655a70af1cb5eaf17c69dadbfc74ffa0b662f02207599e08bc8023693ad4e9527dc42c34210f7a7d1d1ddfc8492b654a11e7620a0012102158b46fbdff65d0172b7989aec8850aa0dae49abfb84c81ae6e5b251a58ace5cfeffffffd63a5e6c16e620f86f375925b21cabaf736c779f88fd04dcad51d26690f7f345010000006a47304402200633ea0d3314bea0d95b3cd8dadb2ef79ea8331ffe1e61f762c0f6daea0fabde022029f23b3e9c30f080446150b23852028751635dcee2be669c2a1686a4b5edf304012103ffd6f4a67e94aba353a00882e563ff2722eb4cff0ad6006e86ee20dfe7520d55feffffff0251430f00000000001976a914ab0c0b2e98b1ab6dbf67d4750b0a56244948a87988ac005a6202000000001976a9143c82d7df364eb6c75be8c80df2b3eda8db57397088ac46430600
We’ve already parsed the transactions, now we want to do the opposite, which is serializing the transactions.
class TxOut:
...
def serialize(self): # (1)
'''Returns the byte serialization of the transaction output'''
result = int_to_little_endian(self.amount, 8)
result += self.script_pubkey.serialize()
return result
-
We’re going to serialize the TxOut object to a bunch of bytes.
We can proceed to make the TxIn class which will be somewhat similar.
class TxIn:
...
def serialize(self):
'''Returns the byte serialization of the transaction input'''
result = self.prev_tx[::-1]
result += int_to_little_endian(self.prev_index, 4)
result += self.script_sig.serialize()
result += int_to_little_endian(self.sequence, 4)
return result
Lastly, we can put together the transaction object this way:
class Tx:
...
def serialize(self):
'''Returns the byte serialization of the transaction'''
result = int_to_little_endian(self.version, 4)
result += encode_varint(len(self.tx_ins))
for tx_in in self.tx_ins:
result += tx_in.serialize()
result += encode_varint(len(self.tx_outs))
for tx_out in self.tx_outs:
result += tx_out.serialize()
result += int_to_little_endian(self.locktime, 4)
return result
We end up utilizing the serialize methods of both TxIn and TxOut to make everything work.
One thing that might be interesting to note is that the transaction fee is not specified anywhere! This is because it’s an implied amount. It’s the total of the inputs amounts minus the total of the output amounts.
One of the consensus rules of Bitcoin is that for any non-coinbase transactions (more on Coinbase transactions in Chapter 8), the amount in the inputs have to be greater than or equal to the amount in the outputs. You may be wondering why the inputs and outputs can’t just be forced to be equal. This is because if every transaction had zero cost, there wouldn’t be any incentive for miners to include transactions in blocks. Fees are a way to incentivize miners to include transactions in blocks. Transactions that are not in blocks are not part of the blockchain and cannot be counted on as being valid.
The transaction fee is simply the input sum minus the output sum. The difference is what the miner gets to keep. As inputs don’t have an amount field, we have to look up the actual amount. This requires access to the blockchain, specifically the UTXO set. If you are not running a full node, this can be tricky, as you now need to trust some other entity to provide you with this information. Here is how we can get th previous transaction and thus, the amount, for an input:
class TxIn:
...
@classmethod
def get_url(cls, testnet=False):
if testnet:
return 'http://tbtc.programmingblockchain.com:18332'
else:
return 'http://btc.programmingblockchain.com:8332'
def fetch_tx(self, testnet=False):
if self.prev_tx not in self.cache:
url = '{}/rest/tx/{}.hex'.format(
self.get_url(testnet), self.prev_tx.hex())
response = requests.get(url)
stream = BytesIO(bytes.fromhex(response.text.strip()))
tx = Tx.parse(stream)
self.cache[self.prev_tx] = tx
return self.cache[self.prev_tx]
We can get the previous transaction using the fetch_tx
method. You may be wondering why we don’t get the specific output for the transaction and instead get the entire transaction. This is because we don’t want to be trusting a third party! By getting the entire transaction, we can verify the transaction id (double_sha256 of its contents) and be sure that we are indeed getting the transaction we asked for. This is impossible unless we get the entire transaction.
Warning
|
Why we minimize trusting third parties
As Nick Szabo so clearly wrote in his seminal essay "Trusted Third Parties are Security Holes" (https://nakamotoinstitute.org/trusted-third-parties/), trusting third parties to provide correct data is not a good security practice. The third party may be behaving well now, but you never know when they may get hacked, may have an employee go rogue or simply start implementing policies that are against your interests. Part of what makes Bitcoin secure is in not trusting, but actually verifying data that we’re given. |
We can now write methods to get the previous transaction output’s amount and script_pubkey (the latter to be used in the next chapter):
def value(self, testnet=False):
'''Get the outpoint value by looking up the tx hash on libbitcoin server
Returns the amount in satoshi
'''
tx = self.fetch_tx(testnet=testnet)
return tx.tx_outs[self.prev_index].amount
def script_pubkey(self, testnet=False):
'''Get the scriptPubKey by looking up the tx hash on libbitcoin server
Returns the binary scriptpubkey
'''
tx = self.fetch_tx(testnet=testnet)
return tx.tx_outs[self.prev_index].script_pubkey
Now that we have the amount method in TxIn
which lets us access how many Bitcoins are in each transaction input, we can calculate the fee for a transaction.