i think it isn't chrome that inserts the byte order mark, we put it in ourselves in the Python version, and i'm not sure how dansmith_btc did it in the JS version, maybe it's in that pako thing?
linagee
annoying how chrome strips http/https off of things quite often, hah. (and google too)
you don't need to know! :)
dansmith_btc
waxwing, js code only add utf bom when content type is html, it doesn add for e.g. pdf
rename html-1 to something.zip, unzip and behold :)
linagee
?????
matches hash?
waxwing
well i haven't checked yet because didn't download original, but unzipped and i can see the code repo
linagee
exciting. :)
waxwing
i like functional definitions of correctness :)
but of course you're right hash check is more scientific
linagee
how do I get the zip?
ah, html-1
waxwing
yes. important note: the forbrowser- version contains the byte order mark for display
the non-forbrowser one (html-1) is the raw file with the headers
linagee
I think sha1sum mismatches... hrm...
waxwing
but dechunked and gunzipped if i remember right. so my unzip tool must have just ignored the headers i guess
linagee
ah what the hell. :)
coming across your first obstacle. redirect. :)
waxwing
when you reach the endpoint, remember that html-1 contains the http headers... but if you just unzip it (at least here on ubuntu) you get the contents of the zip file
so here's where we are: .zip is at least possible outside browser, but in browser there may be some rough edges we have to work on. The other issue is that large sizes do not seem to work, the oracle doesn't like it.
linagee
did you say there was a version with no headers?
waxwing
linagee: no, the version with headers is the *exact* response from the server, with headers and with no byte order mark.
if your zip tool does not support having headers, just strip them out.
(now to try my original instead of this github zip)
waxwing
huh? that must mean the byte order mark wasn't added there. i'll check the code again. anyway, details, basically it works.
linagee
I fear it may need some cookie or something, but I'll give it a go
waxwing
linagee: your original won't work I bet, 5MB is prob. too big. but worth a try.
belcher is now known as Guest94843
linagee: yes. part of the browserless version is that you can add headers from a headers file.
linagee
waxwing: it was kind of slow even doing 161kB. like maybe... 15 seconds or so, hah.
belcher joined the channel
waxwing
you can see in the APIexamples directory how i use headers to do a bitfinex API request
linagee
nice
waxwing
so if it needs cookies you can do that
linagee
aha
my goal is to deliver the head of the opensnp project a sample .pgsg with instructions on how to consume it, then to get their server scripts to be able to verify it and deal with all of that. :)
waxwing
if you want to do something automated then browserless is the way to go, of course, but you'll have to pay attention to certificate verification
linagee
weird. zero content length reply delivered. hah. yep, probably a cookie problem.
(will try curl too...)
waxwing
yeah you'll need to read off your client headers somehow
it's easier with APIs that are docc-ed because they tell you exactly what to do
linagee
wget failed. trying inserting cookies with wget...
wow, so it might be hard to do this!
tried cookies... hrm....
(maybe they're inspecting referer or something??)
waxwing
linagee: ideally you just want to copy the whole set of headers; that's what happens in the browser version
with something like an API it's easy, they tell you exactly what to use
linagee
aha. :)
waxwing
if there are session cookies that get timed out quickly, that can cause a problem of course, but it's rare. things like referer? maybe, i'm not sure.
just try to copy the entire set of headers if possible
linagee
I think they at least give me like 5 minutes (at least that's what I've seen in the browser)
waxwing
yeah, usually it's longer than that i guess (sometimes a *lot* longer)
linagee
:)
waxwing
i guess you already saw, you can do a -e option for headers file. although it isn't hugely well tested, but i think it works.
linagee
ah I see that now. :)
that's all the bitfinex is doing I think is building something to pass to -e
(bitfinexAPI)
in json format
dansmith_btc has quit
waxwing
not important, but i know now why the forbrowser- version didn't have byte order mark: it wasn't transfer-encoding: chunked. in those cases we don't alter the body of the response at all
dansmith_btc joined the channel
linagee
I feel like I'm so close to having this working. Things expire before I can get all the headers prepared. :)
dansmith_btc has quit
I just need one working example, come on! :)
waxwing
if it's really time critical like that, that's very unusual. you might need the browser version (and you might need it in any case if you want ordinary users to produce the data)
dansmith_btc joined the channel
linagee
waxwing: I think I just got it to work, but I think they are custom making a zip file every time...
ah of course. I will compare the sha1sum of the extracted output, duh. :-D
yep. that was the only difference. the .txt file itself has a different timestamp append to it, which changes the .zip
awesome!
will try to import .pgsg into chrome extension now and see what smoke pours out. :-D
dansmith_btc has quit
waxwing
linagee: just to let you know, the .pgsg can also be verified browserless
linagee
cool. will have to figure out how that works, then I'll have a working demo. :)
waxwing
python auditor.py <filename>
confusingly, also in the auditee directory :)
didn't clean up the naming yet
linagee
well it took nearly forever, but I think it actually worked. (my browser is still "patched", no idea if that would affect this part.)
the .pgsg imported and immediately delivered me a .zip (just as if I downloaded it)
and the .txt file looks intact.
waxwing
ok that's good. my guess is it does need the patch but would have to double check
linagee
patch is against the function "writeDatafile" in main.js, so not entirely sure.
(and deals with headers)
dansmith_btc joined the channel
wow, it makes things really slow with pagesigner, hah. "did manage files", and it is hung there...
(on the initial import, chrome was saying "do you want to kill this thing, it appears frozen?")
waxwing
yeah a lot of decryption in js aes .. not sure if that's causing it, would have to think. that's my first thought though
linagee
(took maybe... 2 or 3 minutes. "forever", lol.)
ok, patch is required. it spit a .html file at me without it.
so that is definitely a good step in the right directly / likely merge worthy.
waxwing
makes sense that js aes decryption is tons slower than C .. and python is even worse, believe me :)
linagee
I think the final conclusion here is that, large .zip files can be downloaded with -browserless, used by chrome extension using .pgsg import (with the patch)
would be nice to have browser "full loop", but it seems some other timeout problems were brought up there. (it's fine for now, will use -browserless)
waxwing
linagee: i wouldn't put it quite like that; i think large files are a problem oracle side. i think what makes it work in browserless that doesn't work in browser, can be fixed.
browserless is mainly just for when you need automated solutions
linagee
ah ok. :-)
(and it was quite a pain to manually push around the DOZEN plus cookies 23andme was requiring, hah....)
waxwing
but saying that, you have a good point that MB files are going to kill the browser because of the decryption.
there's a lot of factors involved here .. the python script browserless does have the virtue of making everything very "bare bones", so in edge cases it can help you out
but if you have headers issues, that's an annoyance