0:46 AM
leow joined the channel
0:51 AM
leow has quit
1:24 AM
leow joined the channel
1:38 AM
leow has quit
3:17 AM
georgianab joined the channel
4:50 AM
leow joined the channel
6:17 AM
leow has quit
7:25 AM
leow joined the channel
7:43 AM
pudo
georgianab: hey there!
7:43 AM
:)
7:43 AM
georgianab
pudo: hello!
7:43 AM
pudo
how are you doing?
7:44 AM
georgianab
quite well, i'm at my day job
7:44 AM
pudo
cool. what is that?
7:44 AM
georgianab
backend developer in ruby
7:45 AM
pudo
coool
7:45 AM
georgianab
indeed
7:45 AM
pudo
so adriana has now formed a little secret huddle? that's really brilliant.
7:45 AM
georgianab
anyway, victor showed me aleph
7:45 AM
yes she did
7:46 AM
and the funny thing is that apparently both projects bring me to you :))
7:46 AM
pudo
on aleph: I'm currently working on a related day job and hope to convince them to let me develop it a bit further
7:46 AM
7:47 AM
georgianab
it has much potential
7:47 AM
this one is shiny!
7:47 AM
pudo
Yes, designers rock :)
7:48 AM
I've already got a new search backend up for them for entities/networks, once that is stable we'll do more on documents
7:48 AM
(btw really need to import TED into that network thingie :)
7:53 AM
georgianab
would be nice
7:53 AM
so, getting back to adriana's project
7:54 AM
which now has the codename datavis
7:54 AM
pudo
yeah, tell me more about that if you don't mind!
7:54 AM
georgianab
i have the structure and some data
7:54 AM
pudo
I just saw her mid-August but we didn't chat about it enough
7:54 AM
i.e. the TED DB?
7:54 AM
georgianab
yes
7:54 AM
sort of
7:55 AM
we are only interested in the fields that are included in opencontracting standard
7:55 AM
i also plan to use elasticsearch for that one
7:56 AM
pudo
7:58 AM
georgianab
no, but seems like we could be good friends
7:58 AM
pudo
so it takes tables (db or spreadsheet) and maps them to JSON Schema
7:58 AM
which I think OCDS is
7:59 AM
georgianab
it should work
7:59 AM
pudo
right now it indexes to ES
8:00 AM
but I also want to make it run against RDF
8:00 AM
which is scary as shit
8:00 AM
but then I get graph queries
8:02 AM
where the graph is defined by json schema :)
8:10 AM
nightsh
pudo: who has sparql endpoints for procurement data?
8:11 AM
pudo
nightsh: I don't actually know. I don't yet :)
8:11 AM
there's a few EU projects that might
8:11 AM
8:12 AM
georgianab
somebody needs me here
8:12 AM
will get back as soon as i can
8:13 AM
pudo
ok ciao
8:13 AM
sorry to rant on you :)
8:13 AM
nightsh
and pudo, what is barn>=0.4 in aleph's requirements.txt and where does it come from? :)
8:13 AM
pudo
aargh
8:14 AM
8:14 AM
it's an old library I used
8:14 AM
you can just remove it apparently
8:15 AM
nightsh
I was about to try, thought I'll search for unfortunate module imports before doing that
8:18 AM
grep gives me nothing, I suppose it's safe-ish to get rid of it, thx
8:23 AM
leow has quit
8:36 AM
leow joined the channel
8:52 AM
pudo: docpipe is also MIA :)
8:53 AM
pudo
can you try with the openoil branch applied?
8:53 AM
I think I sorted out the errors in that
8:53 AM
FWIW, barn -> archivekit, docpipe -> loadkit
8:55 AM
nightsh
I'm slow ATM skype-ing, but sure :)
8:55 AM
pudo
sorry :) I'll be around all day in any case
9:31 AM
georgianab
pudo: i'm back
9:31 AM
pudo
hey hey georgianab
9:32 AM
georgianab
you were speaking about rdf and having graph queries
9:32 AM
pudo
I'm very sorry about that
9:32 AM
but yes
9:32 AM
georgianab
why are you sorry
9:33 AM
pudo
because I brought up RDF
9:33 AM
it's akin to a war crime
9:33 AM
georgianab
it's great to know something that cool will come up
9:33 AM
pudo
haha
9:33 AM
so I'm a bit cynical about it
9:33 AM
but it's useful for making data integrate
9:34 AM
9:34 AM
georgianab
i worked on a project before
9:34 AM
pudo
oh cool, what did it do with RDF?
9:34 AM
georgianab
when we investigated tenders in romania
9:34 AM
and we had a loot of xml's
9:35 AM
but we has to build a relational db that was similar with the one where we got them from
9:36 AM
because it was hard to make sense of them otherwise
9:36 AM
pudo
yeah I get that sense a lot
9:36 AM
georgianab
i think the rdf functionality you were talking about would have been priceless for that
9:36 AM
pudo
sharing db dumps moght be easier in many cases
9:36 AM
georgianab
it wasn't a dump
9:37 AM
we got them through soap
9:37 AM
messy business
9:37 AM
pudo
outch
9:37 AM
that's hard core
9:37 AM
georgianab
but it happens a lot
9:37 AM
because most state instituions work on windows
9:37 AM
at least in Eastern Europe
9:37 AM
and that's how they roll
9:37 AM
:))
9:38 AM
pudo
old-skool :)
9:38 AM
georgianab
which is always bad in software
9:38 AM
getting back to it, i will take a look at your tool
9:39 AM
but i wanted to ask
9:39 AM
pudo
I need to write this all up...
9:40 AM
georgianab
for the aleph project, did you use es as the main datastore?
9:40 AM
pudo
so aleph is based on this thing called archivekit
9:40 AM
which basically manages a sort of virtual filesystem
9:40 AM
in production, that would be S3 or something like that
9:40 AM
so all index can be rebuilt from an S3 bucket
9:41 AM
which has all the actual source documents
9:41 AM
if you click on [cached] next to anything on aleph, it actually forwards you there
9:42 AM
georgianab
oh, i thought that was a db that stored everything
9:42 AM
pudo
damn, with an HTTPS warning
9:42 AM
no it's just flat files on S3
9:42 AM
georgianab
interesting
9:43 AM
for adriana's project however
9:43 AM
pudo
9:44 AM
nightsh
pudo: got it past pip, diving into the thing now
9:44 AM
nightsh hides
9:45 AM
pudo
haha, awesome nightsh!
9:46 AM
georgianab
so for adriana's project i think i need to store them all in a db because there are some calculations and things i need to do on that data
9:46 AM
pudo
what kind of DB were you thinking of?
9:47 AM
SQL would be really interesting, but there's so much stuff you can also do with elastic now
9:47 AM
georgianab
i was thinking about mongo because there are several levels of nesting
9:47 AM
since, i assume, in TED they were kept as flat files
9:48 AM
pudo
oh we're talking about TED? I thought you meant documents?
9:48 AM
georgianab
well, they are documents
9:48 AM
that come from ted
9:49 AM
as far as i know
9:49 AM
pudo
is it the actual TED web pages?
9:49 AM
georgianab
no, there are some csv's
9:49 AM
that contain only some of the fields in documents
9:50 AM
let me give you a sample
9:50 AM
pudo
please
9:51 AM
georgianab
oh, i got it all wrong
9:52 AM
sorry
9:54 AM
so, the thing is that i have some documents whom fields need to be stored into a db that we are going to use + es and also be able to map their fields to a csv structure and export that
9:55 AM
pudo
so all of that minus the database is in aleph
9:55 AM
you can index documents and extract any sort of field youlike
9:55 AM
and then export these as a table
9:56 AM
georgianab
indeed
9:57 AM
pudo
hooking in a database should not be very hard, but it's a bit of work
9:57 AM
georgianab
that's thing i wanted to ask, because i'm slightly undecided about how to make that db arhitecture
9:58 AM
pudo
where does the structured data come from, are you pulling it out of the docs or is it in a separate dataset?
10:04 AM
georgianab
out of teh docs