right, so what we have here is a wrapper that just copies the terminal codes, no attempt to parse it / read metadata / fix encodings, etc.
that'd make for a good starting point
sickill
yes
ais523
something that's perhaps not obvious is that the timestamp of the first frame often reflects the date when the ttyrec was created
sickill
the write_stdout method accepts either string or bytes, and if it's bytes it does incremental utf-8 decoding
ais523
(not always, some recorders will zero it, but that tends to be fairly common)
there are also some codes that can appear in the data payload for terminal size, encoding, etc.
although most ttyrecs use old recorders that don't add them
sickill
so the source encoding detection should happen somewhere in that loop after reading from file
ok, I see
I only implemeted very basic spec for starters
but that's actually good point about the first timestamp being the actual timestamp of when recording started
we can "if" it
ais523
I think a "proper" ttyrec reader would need to concatenate all the data, re-encode it and extract control signals for things like terminal-size, then split it again at boundaries
sickill
if it's > some value that looks like seconds since epoch then we can use it to set timestamp value in asciicast's header
ais523
hmm, eventually I'm going to write a ttyrec re-encoder that converts them into a consistent format but that's /way/ down the line
because it involves me finishing my terminal parsing and rendering libraries first
sickill
are resize control signals any special? there's some family of control sequences (private) to trigger window resize, is that what you're refering to?
ais523
right, ttyrecs just use the standard resize code to indicate terminal size
just like they use the standard change-encoding code to indicate encoding
that way it's backwards-compatible with old players
sickill
yeah, ok
I've been considering these I don't think it's the righ way
ais523
I think there are three ways to do it
the current script will be good enough in simple cases
sickill
these sequences are supposed to trigger window resize, which IF happens, triggers SIGWINCH
ais523
the moderate-effort way is to parse the ttyrec looking only for control codes (size, encoding, etc.) and fix those accordingly
and the high-effort way is a full terminal parser that canonicalises all the codes, etc.
I don't think the high-effort way makes sense in the short term
sickill
:)
ais523
but I've been planning to do something like that for other reasons, just never have had the time to work on it
sickill
yeah, I don't have time to focus on this, but wanted it get some starting base out there
specifically I wanted to abstract asciicast writing, create a simple API for that task
ais523
(this seems not to be documented other than by reading the source?)
telnet joined the channel
anyway there are probalby recordings out there that do silly things like use UTF-8 and DECgraphics at the same time, so I'd really like to work on a re-encoder
sickill
probably
ais523
I was working on something unrelated just now but have run into a roadblock, maybe working on this will help
sickill
:)
telnet has left the channel
ais523
let me look up how generators work in Python, they seem pretty much perfect for this problem
sickill
yes, generators are perfect, I use them all the way there
ais523
I thought "this is a perfect problem for coroutines" and then remembered that Python had coroutines under a different name
sickill
re what termrec is doing: that's some way of solving it, but again I don;t like it. Imagine this: I record and one of the programs prints this resize sequence, which is suppose to resize window, but the window doesn't resize because terminal emulator doesn't support that. There's also no user-triggered window resizes. Now, because the window hasn't been resized there was no SIGWINCH and buffer resize, the vim inside
pty stayed at the same size for the whole recording. Now, when you replay this in a player which supports this sequence, the player will actually resize the buffer/window (let's say it will shrink it because sequence said it's now smaller), because it can't tell if the resize really happened (ttyrec saved it because of sigwinch) or if it was a "wish" of the app running in a terminal
:)
ais523
what's argv[2] in this script?
sickill
second command line argument
ais523
no, I mean
what should I set it to to run the script
sickill
ah, that's filename (path) to output asciicast file
it will be created/overwritten
python3 ttyrec2asciicast.py demo.ttyrec demo.cast
ais523
yep, I just tried it on a ttyrec I had lying around
and it's obvious that we need to correct the encoding
sickill
:)
ais523
"\u001b(0~~~~~~x\u001b(B" and the like all over the place
hmm, I need to get used to editing Python
with nearly every language I use an editing style where I just type the braces, semicolons, etc. and my editor automatically adds the whitelist
sickill
if you had ttyrec recording done in utf-8 encoding then it should be all good
and most existing ttyrecs use either IBM437, the DEC line drawing set via SI/SO, or else repeatedly change encoding because no single encoding that they support has all the characters they need
well, most text-based games are old and thus predate Unicode
the actual horrifying part of it is that there's two different ways to change encoding, and neither works in all terminals in their default configuration
sickill
changing char encoding on the fly during the decoding sounds like fun thing to approach :P
not that I'm gonna approach that
yeah, so I guess this is where the proper, reliable anything->unicode re-coder would be needed
and having that, it's output can just be shoved into this asciicast writer
if someone wanted that
now I understand why you mentioned parse/read meta-data/fix encoding as your first reaction :)
ais523
I'm going to write a ttyrec reading library in Python, and start by trying to recreate your script
then I'll add a re-encoding filter and see if it works then
sickill
:)
a tool of its own, for re-coding ttyrec into a "modern, fixed" ttyrec would be nice anyway
ais523
yes
sickill
and if it wasn't in python it would still be trivial to integrate it with asciinema's asciicast writer
in other words I thought you're much into python ;)
Strife89 joined the channel
*not much
Strife1989 has quit
sickill is looking for an excuse to learn some Rust, start some small project with it to learn it
ais523
is repeatedly appending a byte to a string efficient in Python?
some languages that's linear, some languages that's quadratic
the advantage of this is that it allows you to process the bytes individually / with a state machine / whatever and will still split it up into frames correctly
which means that I can worry about the encoding and the splitting into frames separately
(there's some of your code in this; you're OK with that copyright-wise, right? if not I can paraphrase it)
thought I could use it for parsing, but it's so much code there, doing all sorts of things I don't need
so I hacked this rough reader myself
but maybe there's some knowledge there you could use
ais523
sickill: I just finished a ttyrec re-encoder (and worked it into the ttyrec-to-asciicast converter, although you could just as easily write it back to ttyrec): http://nethack4.org/pastebin/ttyrec2asciicast.py
it took a while because I implemented a) pretty much the entirety of Ecma-35, b) UTF-8, c) some nonstandard extensions that some terminals use related to codepage 437
but I ran my character set test through it and it seemed to decode it correctly
hmm, maybe I should implement the special case for >96-char character sets using G0 and G2 (not G1), or inventing a G5 for the purpose
it's hard to know how nonstandard extensions should interact with standard commands if they're basically nevere used together
it's up to you whether you want to use the C1 splitter
the output without it is smaller and should in theory be equivalent, but it's become so nonidiomatic that basically nobody uses it in their programs and as such most terminals don't support it
(support using "raw" rather than split C1, that is)
oh, duh, the correct fix is just to point GR at the high part of the ttyrec and ignore the numbers