pwalsh: how's it going :) you seem to be rocking it today
switch2mac joined the channel
switch2mac has quit
everton137 joined the channel
pwalsh
hey pudo!
I'm working on a few projects, and getting up to speed after holidays
about the MT2 mock APIs, shall we do via github issues?
pudo
pwalsh: github issues would be good. I get the sense we may want two APIs to access the thing
one would be the strict API, which assumes a bunch of stuff, e.g. that the header is row 1
the other one is the DIY one
where you get a tuple stream and apply a set of analysers and processors
pwalsh
in GT, i assume header is row one, or, allow the caller to pass in an explicit index (and strictly no guessing). That is what I'd prefer as the base API as far as headers goes
sorry, row 0, you know what i mean
definitely a tuple stream, yes
i was thinking that it might be possible to have a keyed option, to return dicts or named tuples if true, but that means we have to handle invalid dimensions etc in the core API. perhaps better to just return tuples of values, and the headers separately, then the caller can make dicts or named tuples or whatever if required. WDYT?
pudo
definitely need to handle the case in which there is no header
i.e. no column names
pwalsh
sure
pudo
it might not be so bad to do column_0, column_1 etc. ....
because with totally unnamed rows it will also be impossible to link all of this to JTS somehow
pwalsh
I'd assume the constructor may default to header_index=0. And header_index=None could be passed for no headers. Or I think in GT I took a headers arg, an iterable, and it that is passed, I do not look for headers in the stream. something like that
*if that is passed....
pudo
nice.
so shall we assume fake column names by default?
pwalsh
what do you mean by fake?
pudo
well if we assume that every cell has a column name it makes the whole model simpler
but then, for tables where we have no info, we need to make them up
well yeah, i started playing with it precisely when messing in ipython to think about this :)
so, you are saying that: if headers=None, if header_index=None, then generate dummy headers
that still doesn't help with rows that have invalid dimensions
as instantiating a named tuple would fail
due to length mismatch
unless i misunderstand, and you want to generate fake headers when iterating, per row
my feeling is that have datatable.headers - which is a regular old tuple, and then each row is also a regular old tuple. then caller can dict/zip them if required. removes the need to handle invalid dimensions from the base datatable iterator, and pushes that logic up somwhere else, like a wrapping class that may do more MT stuff like guess things
excuse my spelling, I'm notorious for typing errors :)
pudo
ok, so the iterator would be for pure value tuples and then there'd be some metadata thingie somewhere that keeps typing and making an ordereddict available?
TheInfinity has quit
TheInfinity joined the channel
pwalsh
i think it depends on where we draw the line with the base class for datatable iteration. I think what i described above is the most basic requirement, and spitting that out, consistently, from the various input formats we desire is a solid start.
the next API layer as far as I see is the typecasting. and i guess there, IF we have a schema, and IF the row matches and is castable (per cell), we probably want the cast values, and/or the cast + raw
but then there is a decision to make with fields that are not described in the schema/are invalid in terms of table dimensions, because obviously this layer requires key/value pairs