#openspending

/

      • rgrp joined the channel
      • rgrp joined the channel
      • rgrp joined the channel
      • rgrp has quit
      • rgrp joined the channel
      • rgrp has quit
      • rgrp joined the channel
      • rgrp has quit
      • rgrp joined the channel
      • TheInfinity joined the channel
      • rgrp joined the channel
      • rgrp has quit
      • rgrp joined the channel
      • pudo
        pwalsh: how's it going :) you seem to be rocking it today
      • switch2mac joined the channel
      • switch2mac has quit
      • everton137 joined the channel
      • pwalsh
        hey pudo!
      • I'm working on a few projects, and getting up to speed after holidays
      • about the MT2 mock APIs, shall we do via github issues?
      • pudo
        pwalsh: github issues would be good. I get the sense we may want two APIs to access the thing
      • one would be the strict API, which assumes a bunch of stuff, e.g. that the header is row 1
      • the other one is the DIY one
      • where you get a tuple stream and apply a set of analysers and processors
      • pwalsh
        in GT, i assume header is row one, or, allow the caller to pass in an explicit index (and strictly no guessing). That is what I'd prefer as the base API as far as headers goes
      • sorry, row 0, you know what i mean
      • definitely a tuple stream, yes
      • i was thinking that it might be possible to have a keyed option, to return dicts or named tuples if true, but that means we have to handle invalid dimensions etc in the core API. perhaps better to just return tuples of values, and the headers separately, then the caller can make dicts or named tuples or whatever if required. WDYT?
      • pudo
        definitely need to handle the case in which there is no header
      • i.e. no column names
      • pwalsh
        sure
      • pudo
        it might not be so bad to do column_0, column_1 etc. ....
      • because with totally unnamed rows it will also be impossible to link all of this to JTS somehow
      • pwalsh
        I'd assume the constructor may default to header_index=0. And header_index=None could be passed for no headers. Or I think in GT I took a headers arg, an iterable, and it that is passed, I do not look for headers in the stream. something like that
      • *if that is passed....
      • pudo
        nice.
      • so shall we assume fake column names by default?
      • pwalsh
        what do you mean by fake?
      • pudo
        well if we assume that every cell has a column name it makes the whole model simpler
      • but then, for tables where we have no info, we need to make them up
      • it's supposed to save all the memory etc.
      • pwalsh
        well yeah, i started playing with it precisely when messing in ipython to think about this :)
      • so, you are saying that: if headers=None, if header_index=None, then generate dummy headers
      • that still doesn't help with rows that have invalid dimensions
      • as instantiating a named tuple would fail
      • due to length mismatch
      • unless i misunderstand, and you want to generate fake headers when iterating, per row
      • my feeling is that have datatable.headers - which is a regular old tuple, and then each row is also a regular old tuple. then caller can dict/zip them if required. removes the need to handle invalid dimensions from the base datatable iterator, and pushes that logic up somwhere else, like a wrapping class that may do more MT stuff like guess things
      • excuse my spelling, I'm notorious for typing errors :)
      • pudo
        ok, so the iterator would be for pure value tuples and then there'd be some metadata thingie somewhere that keeps typing and making an ordereddict available?
      • TheInfinity has quit
      • TheInfinity joined the channel
      • pwalsh
        i think it depends on where we draw the line with the base class for datatable iteration. I think what i described above is the most basic requirement, and spitting that out, consistently, from the various input formats we desire is a solid start.
      • the next API layer as far as I see is the typecasting. and i guess there, IF we have a schema, and IF the row matches and is castable (per cell), we probably want the cast values, and/or the cast + raw
      • but then there is a decision to make with fields that are not described in the schema/are invalid in terms of table dimensions, because obviously this layer requires key/value pairs
      • pudo
        I think you want to do headers before types
      • because they're always of a different type
      • pwalsh
        ok
      • SCK joined the channel
      • SCK has left the channel
      • leow joined the channel
      • TheInfinity joined the channel
      • leow has quit
      • TheInfinity has quit
      • leow joined the channel