i keep dreaming about breaking down the code/executable/data divisions
pfraze joined the channel
mafintosh
ogd: see stackvm - mikolalysenko has an interesting idea about generic hash set replication :)
karissa: o/
pfraze joined the channel
pfraze has quit
pfraze joined the channel
-- BotBot disconnected, possible missing messages --
-- BotBot disconnected, possible missing messages --
[o__o] joined the channel
todrobbins joined the channel
AndreasMadsen joined the channel
xat- is now known as xAt
AndreasMadsen has quit
floppy joined the channel
AndreasMadsen joined the channel
sethvincent joined the channel
floppy has quit
floppy joined the channel
flyingzumwalt joined the channel
xAt is now known as xat-
flyingzumwalt
I'm wondering about dat hooks. I want to know when someone has pushed new changes to a dat repository so I can do things in response to those changes. Example: index the updates into ElasticSearch. Has anyone done any work on this? The git equivalent are git hooks https://git-scm.com/book/en/v2/Customizing-Git-...
karissa
oh rad
flyingzumwalt: you can do a createChangesStream({live: true}) in javascript to get a change and then do something with it
flyingzumwalt: we don't have the git-hooks interface in the config file (package.json) but it could be done
flyingzumwalt
thanks karissa so with the createXXStream approach I would need to run some sort of node daemon that watches for changes? (as opposed to hooks, which would be triggered by dat when events occur)
karissa
flyingzumwalt: yeah, that's right.
flyingzumwalt
karissa++
I'd like to look into writing a hooks implementation. I'll poke through the code and see what I can figure out.
karissa
flyingzumwalt: wow thanks!
flyingzumwalt: it'd be good to implement it in 'dat' not 'dat-core' because we are changing 'dat-core'
flyingzumwalt
are there instructions anywhere on how to set up the dev environment and run the test suite?
" In this case, Cassandra and Riak implement a Dynamo-inspired process called anti-entropy. In anti-entropy, replicas exchange Merkle trees to identify parts of their replicated key ranges which are out of sync. A Merkle tree is a hierarchical hash verification: if the hash over the entire keyspace is not the same between two replicas, they will exchange
hashes of smaller and smaller portions of the replicated keyspace until the out-of-sync keys are identified."
mafintosh: we should see how o/ is implemented
mafintosh
ogd: unrelated i just read how that works two hours ago
ogd
lol
mafintosh: you gotta post more links to irc :)
then we can use it as a metric in our next biannual grant report: 'Posted 3288 links to IRC'
mafintosh: but apparently that proposes something called a 'search DAG'
mafintosh: (i think its a newer version of a paper i linked earlier)
Guest14553 is now known as JSON_voorhees
JSON_voorhees is now known as Guest87244
mafintosh
ogd: reading it
ogd: the dynamo merkle tree replications just sorts the entire dataset then builds a merkle tree
ogd: it then sends the root hash to the other peer (who also sorts his dataset and generates a merkle tree)
ogd
these are all about security and dont really talk about append-only, distributed incremental use cases. but maybe we can get some ideas
mafintosh
ogd: if the hashes match they have the same data - if not peer one will send the next two hashes
ogd: and it then applies the same algorithm to those two hashes
ogd: so in log(n) roundtrips they'll know the diff
ogd
mafintosh: isnt that a lot of traversals
mafintosh: cause they have to sort it all on every round
mafintosh
ogd: they have to sort the dataset once at the beginning of the replication plus generate the merkle tree
ogd: but that might also take a while if the dataset is big
mikolalysenko came up with a very clever way of maintaining that merkle tree by using a merkle trie
because its "easy" to add/remove things from a trie
in practice its still a bit tricky though to calculate how big the diff is gonna be though using these approaches without buffering the diff to disk before sending it back (i.e. progress bar is difficult unless you buffer the diff to disk before sending it)
ogd
how is sql COUNT implemented? must be some indexing so they can quickly estimate without doing a full scan of the data
mafintosh
and i think if you'd only wanna generate a single branch you'd have to generate a new trie for that
ogd: they probably just index the number of nodes in the index
ogd: like we do in the new dat-graph impl
ogd
what if we just store the node count in each node of the graph?
its like a merkle tree but instead of a hash its just the sum
mafintosh
thats what i do now in dat-graph
ogd
ah ok. so cant you get quick counts that way?
shama joined the channel
mafintosh
yep
WaldoJ has quit
but i cannot combine that with the merkle trie/tree approach