#bitcoin-wizards

/

      • Noldorin has quit
      • deusexbeer has quit
      • priidu has quit
      • rmwb has quit
      • rmwb joined the channel
      • jtimon joined the channel
      • dabura667 joined the channel
      • chjj has quit
      • itsme joined the channel
      • DougieBot5000 has quit
      • DougieBot5000 joined the channel
      • talmai joined the channel
      • chjj joined the channel
      • Dyaheon joined the channel
      • itsme has quit
      • Samchun joined the channel
      • Ylbam has quit
      • thrmo has quit
      • Samchun has quit
      • Noldorin joined the channel
      • Noldorin has quit
      • d9b4bef9 has quit
      • d9b4bef9 joined the channel
      • anon616 joined the channel
      • bedeho joined the channel
      • Chris_Stewart_5 has quit
      • rmwb has quit
      • str4d has quit
      • chjj has quit
      • PaulCape_ joined the channel
      • chjj joined the channel
      • deusexbeer joined the channel
      • kanzure
        high-dimensional stochastic gradient descent stuff https://cbmm.mit.edu/sites/default/files/public...
      • legogris has quit
      • legogris joined the channel
      • coinsmurf has quit
      • MaxSan has quit
      • rmwb joined the channel
      • coinsmurf joined the channel
      • DigiByteDev joined the channel
      • coinsmurf has quit
      • mn3monic joined the channel
      • mn3monic has quit
      • mn3monic joined the channel
      • bedeho joined the channel
      • [7] has quit
      • TheSeven joined the channel
      • Dyaheon joined the channel
      • afk11 joined the channel
      • matozoid joined the channel
      • mn3monic joined the channel
      • runeks
        sipa: that makes sense, since the further we get into a chain, the larger the UTXO set, so I would assume that the lookup/update operations become increasingly expensive. Is this the primary performance bottleneck of UTXO DB management -- that the UTXO set grows, and thus the duration of the operations on it? Or is it just because it's updated sequentially, i.e. one transaction at a time is "folded" into the UTXO set?
      • sipa
        larger utxo set = slower operations on it
      • we heavily cache the active parts of the set in memore, but for optimal speed you need several GB cache already
      • there are some proposals that remove or reduce the load of utxo management from full nodes
      • and replace them with wallet generated proofs of unspentness
      • for example petertodd's txo mmr, or bram cohen's bitfield
      • runeks
        I'm wondering if we could gain a speedup in this area if we used a temporary data structure for the UTXO set which allows negative balances, meaning that if an input redeems an output not in the UTXO set, it's added to this temporary UTXO set with a negative balance. Then we chop up the chain in <num_cores> pieces, have each core produce a temporary UTXO set of this nature, and finish off by merging these temporary UTXO
      • sets, which should end up with all the negative balances being cancelled out by a positive one (or the chain is invalid).
      • sipa
        you can't validate a script without looking up the output
      • but perhaps the whole thing could be delayed
      • runeks
        sipa: everything we can't validate right now would be deferred until we merge temporary UTXO sets
      • sipa
        that sounds like ot could theoretically give some improvement
      • but not so much... you can get the same effect by using a db engine that supports multiple asynchronous queries
      • the idea of removing the concern of utxo sets from full nodes sounds far more appealing to mr
      • coinsmurf joined the channel
      • runeks
        I'll have to read up on that
      • matozoid has quit
      • Would that require a change in the protocol?
      • sipa
        yes
      • proofs need to be sent along with transactions
      • and blocks
      • runeks
        Ok, I see. I guess I'm mostly interested in speed-ups not requiring protocol changes.
      • matozoid joined the channel
      • sipa
        haha, those are certainly less invasive
      • but the potential impact is also much smaller
      • protocol change here btw does not mean consensus change
      • runeks
        I guess the speedup would depend on how many cores you have available. Although, the temporary UTXO sets, which would need to be merged, would keep increasing in size, until you have a merge operation with roughly two halves of the UTXO set. So here, too, performance degrades as we get closer to a final UTXO set.
      • So are we walking a change to the P2P protocol only?
      • *talking
      • sipa
        yes
      • an invasive one, though
      • and a change to wallet software, eventually
      • runeks
        Oh ok. That's much less of a problem.
      • sipa
        and they come with big tradeoffs
      • str4d joined the channel
      • transactions (on P2P) would become significantly bigger
      • runeks
        Hmm, I see. That's unfortunate. As far as I can see, the broadcast-all-to-all nature of the P2P network is Bitcoin's primary bottleneck.
      • sipa
        bandwidth usage is really not that high
      • a non-serving bitcoin full node just needs a few kb/s
      • (after IBD)
      • runeks
        I mean, if 1KB/s of transaction data needs to be spread out between 10,000 nodes, that's 10MB/s for
      • all nodes cumulatively.
      • sipa
        yeah
      • rmwb has quit
      • runeks
        Also, with regards to "you can get the same effect by using a db engine that supports multiple asynchronous queries", as far as I can see, this isn't possible unless each entry in the UTXO set has a monotonically increasing counter (e.g. <32_bit_block_num+32_bit_tx_num_in_block>), such that an output can only cancel out an input if it appears in an earlier tx (an output can only cancel out a negative-balance entry in
      • the UTXO set if the output's counter is less than that of the negative-balance UTXO entry).
      • sipa
        instead of merging the sets, you just issue an asynchronous query in a single set
      • and continue prcocessing when it comes back
      • that should be just as fast, with lower latency
      • the advantage of merging sets is that it's linear in time
      • but the disadvantage is that most of the time you're not using your hardware at all
      • runeks
        sipa: but queries need to be handled in-order, such that only an output that was added earlier can be cancelled out by an input
      • sipa
        yes?
      • the spend only happen after the send
      • runeks
        sipa: How can we make sure queries are handled in-order unless each UTXO entry contains a counter?
      • sipa
        i don't understand
      • the blockchai is the order
      • spending a utxo before it is created is illegal
      • runeks
        Right, but the supposed speedup is from parallelizing UTXO operations. And then we lose the ordering.
      • sipa
        do a quick pass for finding utxos spent within the block itself
      • anything else should also be created by an earlier block
      • also, parallellizing utxo sets is not that hard
      • just keep 256 parallel caches based on the first 8 bits of the txid
      • the likelyhood of needing to modify two of them concurrently is low
      • even when proxessing on multiple threads
      • ryan-c joined the channel
      • sipa zzz
      • Ylbam joined the channel
      • runeks
        You can't modify the UTXO set in parallel unless each entry in it describes when the entry in question appeared in the blockchain.
      • It's the same with all parallel databases, I believe: each operation needs a counter/timestamp such that only an "update(x) *after* create(x)" works, as opposed to "create(x) after update(x)".
      • I believe most parallel DBs use a monotonically increasing counter, while a DB like Google Spanner uses an absolute timestamp from an atomic clock which is synchronized between all nodes. For Bitcoin this counter could be a 64 bit word, where the most significant 32 bits is the block count, and least significant 32 bits is tx_number_within_block.
      • bitbit has quit
      • Using a strategy like this we can, essentially, separate each tx from its containing block, and fold them in parallel into the UTXO set, since the counter (for each tx) now makes sure that only earlier outputs can be redeemed by later inputs, and not the other way around.
      • sipa
        well, i'm still talking about a model where you fully process one block before moiving on to the next
      • and utxos are incredibly easy structures
      • they're created once, typically looked up once or twice, and deleted
      • phayd joined the channel
      • there's no sequencing of operations needed apart from making sure the spend hapoens after the creation
      • the in-memory cache in bitcoin core takes advantage of that: utxos that are created and spent before a disk flush happens are just deleted and never hit disk at all
      • thrmo joined the channel
      • and parallellizing is not hard with parallel caches
      • just not implememted
      • oh... you really need to make sure a block is valid as soon as possible in steady state
      • otherwise you can't processing new mempool txn, which is essential for mini g
      • in IBD some parallel processing with higher throughput but worse latency may make sense
      • Dyaheon joined the channel
      • matozoid_ joined the channel
      • matozoid has quit
      • Dyaheon joined the channel
      • paveljanik has quit
      • jaromil has quit
      • jaromil joined the channel
      • jaromil has quit
      • jaromil joined the channel
      • runeks
        To be clear, I'm only talking about IBD, where we have a huge, sequential list of transactions that need to be processed in-order. But it may not make sense, performance-wise, unless we have much more than 4-8 cores. So that would entail either waiting for many-core CPUs, or needing the ability to schedule it onto multiple VMs.
      • chjj has quit
      • matozoid_ has quit
      • matozoid joined the channel
      • rmwb joined the channel