#dat

/

      • ogd
        ajbouh: right i see. i could imagine a few different ways to implement that with dat, but should be doable. we have support for multiple writer keys in a single dat coming to hyperdrive/dat soon (currently part of a module called hyperdb)
      • ajbouh
        got it
      • for now i think the single writer case is just fine
      • ogd
        ajbouh: the hyperdrive api is actually the exact same as the node fs api, so you can use it as a drop in replacement for fs code
      • ajbouh
        right
      • ogd
        ajbouh: also you should check out https://stenci.la/
      • ajbouh: they have some basic dat support as well
      • ajbouh
        cool, thanks for the pointer
      • ogd
        ajbouh: and are open to collaboration etc
      • ajbouh
        neat
      • ogd
        ajbouh: weve prototyped python and R apis but theyve fallen out of date as dat itself as updated
      • ajbouh
        the big thing i’m looking for is to find people that care about having a humane way to create reproducible breadcrumbs by default
      • @ogd, yeah multi-lang support is hard when a project is moving quickly
      • ogd
        ajbouh: yea i like the idea of a workflow that forces you to produce a versioned immutable blob of research that you can build on with future blobs
      • jhand and karissa might have feedback too :D
      • ajbouh
        with support from notebook environments you can also put the notebook data itself in one of these
      • ogd
        ajbouh: oh yea i was gonna mention we have done a bit on putting docker containers into dat
      • ajbouh
        docker is such a beast
      • ogd
        ajbouh: so you can version them, and in the future with multiwriter support, fork and modify vms
      • ajbouh: yea and it gets bigger and more complicated as they get more funding :d
      • ajbouh
        indeed
      • ogd
        ajbouh: but mafintosh was just playing with nspawn, and we've messed with xhyve and chroots too a a lighter weight alternative
      • ajbouh
        yes, xhyve is a nice abstraction
      • so how much overlap is there between what i’m trying to do with workspaces and the goal of the dat project?
      • ogd
        ajbouh: well we have been too busy to take on a python api for example, so im sure we'd be supportive if you wanted to take a stab
      • bret
        @ogd have you ever written or talked about whats involved in running a nonprofit open source project?
      • (like as in the legal entity aspect of it)
      • ogd
        ajbouh: (a stab at your workspaces concept as a way ppl can use dat, not at a 'dat python api')
      • ajbouh
        yes, that’s what i mean… i don’t think the python api is important
      • ogd
        ajbouh: but in general the goal of dat is to provide an api for working with versioned sets of files, allowing ppl to securely transfer the files over a p2p network, and allowing for partial and/or live incremental synchronization of exact versions
      • ajbouh
        if there’s a way to think about working with data that leaves reproducible breadcrumbs, i think that’s relevant across languages, apis, etc
      • ogd
        ajbouh: yea i see your idea as something you use dat to build, not a competitor to dat at all. its a goal for us to enable a community to experiment with different approaches
      • ajbouh
        ogd: yes, exactly!
      • ogd
        ajbouh: if your thing found an abstraction that works with any fs-style api then its even less married to dat also
      • ajbouh: but those kinds of abstractions are hard to design without prototyping first as im sure you know
      • ajbouh
        yes
      • ogd
        bret: nope but i have a post brewing in my head about it
      • ajbouh
        my ideal here is i find a way to make this approach to using data *more* productive than using a standard fs-style api to read/write data
      • so that it makes sense to use for individual projects and for the community as a whole
      • ogd
        ajbouh: yea all we add on top of fs right now is version numbers
      • ajbouh
        (this is true for version control)
      • right, i want to find a minimum enabling workflow
      • does dat have folks working with data that it gets input/feedback from?
      • ogd
        ajbouh: if i were doing a dat flavor of your workflow, you would do somethign liek Workspace("dat://my-dat-hash@5") for version 5
      • ajbouh: yea we're partnered with the university of california and are working with a couple labs over the next year
      • ajbouh
        you could, i would expect that to make a clone of whatever full directory structure is rooted at dat://my-dat-hash@5
      • but part of the idea of workspaces is that you can save and commit a pointer file to git
      • (or email, dropbox, etc)
      • so you’d also do: ws.save(“myproject.ws”)
      • ogd
        ajbouh: oh right, so in the .ws file you'd put dat://my-dat-hash@5 as the 'remote' or something
      • ajbouh
        right
      • that’s what would happen under the hood
      • ogd
        ajbouh: but then the dat protocol could resolve the request for all the bytes from ws.import("daily_prices.txt") at version 5, the p2p transport, the encryption, the verification of the received data, etc
      • ajbouh
        right, though .import is supposed to import data from the local fs into it
      • ogd
        ajbouh: oh heh
      • ajbouh
        .read, .cp, .mv, .rm would actually interact with the current state of the ws
      • also .write
      • and .ls
      • ogd
        ajbouh: so if it were dat powered, all those commands would just modify the dat, which would be stored i guess in ./.dat, and every time the dat version changes it could just update the integer in dat://my-dat-hash@5 in the .ws config to point at the latest version (if you wanted to auto-peg versions for maximum reproducibility). though you can also leave the integer off and clients will get whatever latest version is
      • available at the time they pull. tradeoffs for each
      • ajbouh
        yes, that’s the idea
      • so one question to answer is which of those behaviors should be default
      • probably auto-peg
      • ogd
        yea then its basically immutable
      • ajbouh
        sorry, probably leave integer off
      • but no idea
      • might do both
      • store both in there and you can decide when you openm
      • ogd
        yea
      • ajbouh
        whether to sync forward or work from checkout
      • i see people working in data science that sort of have a big mess of data and files
      • and they manually track which code was run against which data
      • so the idea here is to make it possible to commit a pointer to the data that was used right along side the code that uses it
      • and since it’s content addressable, there’s the the benefit of being able to store the same huge dataset alongside every bit of code that uses it
      • so when you see a project with a .ws file in it, you can be confident that you’ll actually get everything you need to reproduce what’s in there and carry things forward
      • ogd
        yea thats cool, we definitely see the same problem
      • gotta run, can chat later
      • ajbouh
        cool, looking forward to hearing your thoughts on what’s wrong with that gist :)
      • mafintosh: would love your thoughts here too ^
      • karissa: also ^
      • bedeho joined the channel
      • bedeho has quit
      • domanic joined the channel
      • domanic joined the channel
      • bedeho joined the channel
      • bedeho has quit
      • ogd
      • barry_bluejorts joined the channel
      • barry_bluejorts
        hey all, interested in running a silo on my unraid server at home, is there a docker container for hypercored?
      • barry_bluejorts has quit
      • larpanet joined the channel
      • iml_ has quit
      • domanic joined the channel
      • ajbouh has quit
      • bnewbold_
        ogd: we punched a hole for .dat directories in archive.org: https://ia600805.us.archive.org/16/items/densho...
      • so the s3.us.archive.org trick isn't necessary
      • might not roll out everywhere for a day or two