The keys I understand, t
+ 32-byte hash.
But my problem are the values. I understand from sources such as What are the keys used in the blockchain levelDB (ie what are the key:value pairs)? that the values should encode three values: dat file number, block offset, and tx offset within block.
But I’ve noticed that each value has a different sizes between 5 and 10 on the first thousand entries, so I’m not sure how to decode the values into those three fields. Are those fields simply 3 varint values?
Here’s my Plyvel code that prints out the lengths using plyvel==1.5.1, Bitcoin Core v26.0.0 on Ubuntu 23.10:
#!/usr/bin/env python3
import struct
import plyvel
def decode_varint(data):
"""
https://github.com/alecalve/python-bitcoin-blockchain-parser/blob/c06f420995b345c9a193c8be6e0916eb70335863/blockchain_parser/utils.py#L41
"""
assert(len(data) > 0)
size = int(data[0])
assert(size <= 255)
if size < 253:
return size, 1
if size == 253:
format_ = '<H'
elif size == 254:
format_ = '<I'
elif size == 255:
format_ = '<Q'
else:
# Should never be reached
assert 0, "unknown format_ for size : %s" % size
size = struct.calcsize(format_)
return struct.unpack(format_, data[1:size+1])[0], size + 1
ldb = plyvel.DB('/home/ciro/snap/bitcoin-core/common/.bitcoin/indexes/txindex/', compression=None)
i = 0
for key, value in ldb:
if key[0:1] == b't':
txid = bytes(reversed(key[1:])).hex()
print(i)
print(txid)
print(len(value))
print(value.hex(' '))
value = bytes(reversed(value))
file, off = decode_varint(value)
blk_off, off = decode_varint(value[off:])
tx_off, off = decode_varint(value[off:])
print((txid, file, blk_off, tx_off))
print()
i += 1
but it eventually blows up at:
131344
ec4de461b0dd1350b7596f95c0d7576aa825214d9af0e8c54de567ab0ce70800
7
42 ff c0 43 8b 94 35
Traceback (most recent call last):
File "/home/ciro/bak/git/bitcoin-strings-with-txids/./tmp.py", line 39, in <module>
blk_off, off = decode_varint(value[off:])
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ciro/bak/git/bitcoin-strings-with-txids/./tmp.py", line 29, in decode_varint
return struct.unpack(format_, data[1:size+1])[0], size + 1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: unpack requires a buffer of 8 bytes
So I wonder if I guessed the format wrong, or if it’s just a bug in my code.
Comparing to: https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer I would decode:
42 ff c0 43 8b 94 35
manually as:
- 42
- ff: expect 8 bytes next
- c0 43 8b 94 35: only 5 bytes left, blowup
I also tried to inverse value:
value = bytes(reversed(value))
but then it blows up very early, definitely wrong.
I also tried to ignore the error to see if there are others, but there were hundreds them, so something is definitely wrong with my method.
Related: