ajbouh: right i see. i could imagine a few different ways to implement that with dat, but should be doable. we have support for multiple writer keys in a single dat coming to hyperdrive/dat soon (currently part of a module called hyperdb)
ajbouh
got it
for now i think the single writer case is just fine
ogd
ajbouh: the hyperdrive api is actually the exact same as the node fs api, so you can use it as a drop in replacement for fs code
ajbouh: weve prototyped python and R apis but theyve fallen out of date as dat itself as updated
ajbouh
the big thing i’m looking for is to find people that care about having a humane way to create reproducible breadcrumbs by default
@ogd, yeah multi-lang support is hard when a project is moving quickly
ogd
ajbouh: yea i like the idea of a workflow that forces you to produce a versioned immutable blob of research that you can build on with future blobs
jhand and karissa might have feedback too :D
ajbouh
with support from notebook environments you can also put the notebook data itself in one of these
ogd
ajbouh: oh yea i was gonna mention we have done a bit on putting docker containers into dat
ajbouh
docker is such a beast
ogd
ajbouh: so you can version them, and in the future with multiwriter support, fork and modify vms
ajbouh: yea and it gets bigger and more complicated as they get more funding :d
ajbouh
indeed
ogd
ajbouh: but mafintosh was just playing with nspawn, and we've messed with xhyve and chroots too a a lighter weight alternative
ajbouh
yes, xhyve is a nice abstraction
so how much overlap is there between what i’m trying to do with workspaces and the goal of the dat project?
ogd
ajbouh: well we have been too busy to take on a python api for example, so im sure we'd be supportive if you wanted to take a stab
bret
@ogd have you ever written or talked about whats involved in running a nonprofit open source project?
(like as in the legal entity aspect of it)
ogd
ajbouh: (a stab at your workspaces concept as a way ppl can use dat, not at a 'dat python api')
ajbouh
yes, that’s what i mean… i don’t think the python api is important
ogd
ajbouh: but in general the goal of dat is to provide an api for working with versioned sets of files, allowing ppl to securely transfer the files over a p2p network, and allowing for partial and/or live incremental synchronization of exact versions
ajbouh
if there’s a way to think about working with data that leaves reproducible breadcrumbs, i think that’s relevant across languages, apis, etc
ogd
ajbouh: yea i see your idea as something you use dat to build, not a competitor to dat at all. its a goal for us to enable a community to experiment with different approaches
ajbouh
ogd: yes, exactly!
ogd
ajbouh: if your thing found an abstraction that works with any fs-style api then its even less married to dat also
ajbouh: but those kinds of abstractions are hard to design without prototyping first as im sure you know
ajbouh
yes
ogd
bret: nope but i have a post brewing in my head about it
ajbouh
my ideal here is i find a way to make this approach to using data *more* productive than using a standard fs-style api to read/write data
so that it makes sense to use for individual projects and for the community as a whole
ogd
ajbouh: yea all we add on top of fs right now is version numbers
ajbouh
(this is true for version control)
right, i want to find a minimum enabling workflow
does dat have folks working with data that it gets input/feedback from?
ogd
ajbouh: if i were doing a dat flavor of your workflow, you would do somethign liek Workspace("dat://my-dat-hash@5") for version 5
ajbouh: yea we're partnered with the university of california and are working with a couple labs over the next year
ajbouh
you could, i would expect that to make a clone of whatever full directory structure is rooted at dat://my-dat-hash@5
but part of the idea of workspaces is that you can save and commit a pointer file to git
(or email, dropbox, etc)
so you’d also do: ws.save(“myproject.ws”)
ogd
ajbouh: oh right, so in the .ws file you'd put dat://my-dat-hash@5 as the 'remote' or something
ajbouh
right
that’s what would happen under the hood
ogd
ajbouh: but then the dat protocol could resolve the request for all the bytes from ws.import("daily_prices.txt") at version 5, the p2p transport, the encryption, the verification of the received data, etc
ajbouh
right, though .import is supposed to import data from the local fs into it
ogd
ajbouh: oh heh
ajbouh
.read, .cp, .mv, .rm would actually interact with the current state of the ws
also .write
and .ls
ogd
ajbouh: so if it were dat powered, all those commands would just modify the dat, which would be stored i guess in ./.dat, and every time the dat version changes it could just update the integer in dat://my-dat-hash@5 in the .ws config to point at the latest version (if you wanted to auto-peg versions for maximum reproducibility). though you can also leave the integer off and clients will get whatever latest version is
available at the time they pull. tradeoffs for each
ajbouh
yes, that’s the idea
so one question to answer is which of those behaviors should be default
probably auto-peg
ogd
yea then its basically immutable
ajbouh
sorry, probably leave integer off
but no idea
might do both
store both in there and you can decide when you openm
ogd
yea
ajbouh
whether to sync forward or work from checkout
i see people working in data science that sort of have a big mess of data and files
and they manually track which code was run against which data
so the idea here is to make it possible to commit a pointer to the data that was used right along side the code that uses it
and since it’s content addressable, there’s the the benefit of being able to store the same huge dataset alongside every bit of code that uses it
so when you see a project with a .ws file in it, you can be confident that you’ll actually get everything you need to reproduce what’s in there and carry things forward
ogd
yea thats cool, we definitely see the same problem
gotta run, can chat later
ajbouh
cool, looking forward to hearing your thoughts on what’s wrong with that gist :)