Will badges and notifications be sent for any and all groups and individual chats?
hslack
<robertknight> M-sraja: The badge count includes private and group annotations
<robertknight> Notifications are sent when someone else replies to an annotation that is either public or part of a group.
MrWoohoo joined the channel
<chdorner> I've created a Discussion PR about a change that came out of a PR review and would like to get people's opinion on it please. https://github.com/hypothesis/h/pull/3250 @wyan, @seanh, and @nick - plus whoever else is interested
<chdorner> (I've just added a link to a gist with the postgres statement logging)
<nick> Thanks, @chdorner. I'll take a look as soon as I can.
<seanh> Just wondering something about our API, while dealing with this `extras` column that we have in the new Postgres model. Currently (on master, with the legacy Elasticsearch storage) I can use the API to add an arbitrary field like `foo: "bar"` to an annotation. But then I can't delete that field. I can set it to an empty string or to null/None but I can't delete it. It probably makes sense for the standard fields
same type (empty string for text, empty list for tags...) rather than to be completely removable. But additional custom fields like foo that I created myself I feel I ought to be able to delete. Should our API always require all fields to be sent when updating an annotation? Any fields not sent in the update request will be removed from the annotation. That way I could delete foo, and omitting a standard field like text
idea is that you'd get the annotation from the API's response to a previous create, update or read request, edit that object in your code, then send it back in the next update request, so always sending every field (that you don't want to delete) should be easy in client code. Our own client already works in this way. Another way we could do it is that you don't have to send the values for standard fields in each updat
fields (this is the current behaviour) but for custom fields not sending them will delete them. That's inconsistent though.
bengo joined the channel
M-McOmghall has left the channel
<seanh> The "always send everything" approach seems kinda nice actually, since it recurses all the way down into sub objects. For example if I want to add a tag to an annotation I have to send for the value of the tags field a list of all the existing tags plus the new one. If I just sent the new one it'd delete the existing ones.
M-austinhc joined the channel
<robertknight> @seanh: Deleting missing fields in an update could cause clients to unintentionally delete fields they don't support.
<robertknight> You _could_ stipulate that clients must send all fields back from the original annotation
<seanh> That's what I mean, yes, clients _must_ send back all fields that they received from us, unless they want to delete a field
<seanh> With our own client, if you call the API directly and add a foo='bar' field that our client doesn't have any particular support for, then you edit the annotation using our client, our client will send the foo field in the update request
<robertknight> But that may involve a bunch of extra work for clients which means that at least some won't implement it
<robertknight> And it also runs counter to PATCH conventions
<seanh> One way it could go wrong though is if the client received the annotation from the API first, then you edited the annotation by calling the API directly (or using a different client) and added a custom field, and then without reloading the annotation used our client to update it. Would delete the field I think
<seanh> It should involve any extra work - you take the object that you get from us as a Python dict or JavaScript object, edit that object by changing, adding or deleting fields, then send that object back. Seems simple
<seanh> I think it might be the only way to support custom fields and support deleting those fields, without making a special case out of the delete action
<seanh> Anyway, something to talk about later maybe
<nick> TBH I think that Sean is right.
<nick> Insofar as our current update view is a PUT
<nick> not a PATCH
<nick> and HTTP semantics would tend to lean towards requiring some reasonably complete representation to be sent for that
<nick> (which is, in practice, what our client already does)
<seanh> I'm not familiar with what PATCH semantics are supposed to be, but I'm guessing they're similar to how our API currently works then
<robertknight> It is a PUT?
<nick> yes, pretty much
<seanh> I've been using HTTP PUT in my testing of the API. Not sure whether PATCH works as well or 404s (can't easily check right now rebasing horribly)
<nick> anyway, I think realistically we're stuck with what we've got for now
<seanh> PATCH is more convenient when you're hacking around with the API with a command line tool like curl or httpie.
<nick> but we will need to start thinking in more detail about our API and how it should behave soon
<nick> at the moment an awful lot of API behaviour is, to put it politely, "implementation-defined"
<seanh> Sure, currently going out of my way to make sure that with 'postgres' on it behaves in the same inconsistent way as with it off. Just something to be aware of though - you can add custom fields, but you can't remove them
<nick> indeed
<nick> same goes for "non-custom" fields, tbh
<seanh> Yeah, although arguably you shouldn't be able to remove those?
<seanh> i.e. you would never want an annotation with literally no text field. An empty string, sure.
<nick> wouldn't you?
<nick> what about an annotation which only has tags?
<seanh> I'd argue that the semantics of a PUT with no text field are "delete the text field" but our response to that should be a validation error
<seanh> Empty string for the text in that case?
<nick> I'm not really arguing one way or the other.
<nick> I'm just saying it's not "obvious"
<seanh> It seems desirable for consistency and easy of use, to guarantee that all the standard fields are always there and with the right type
<nick> JSON-LD format says "an annotation has 0 or more bodies"
<nick> some of which might be tags
<nick> some of which might be text content
<nick> why would an annotation which is only tagging have a textual body?
<seanh> Do we want clients to be able to do `annotation["text"]` or to always have to do `annotation.get("text", "")`? That's the argument for not allowing them to be deleted I think
<seanh> So it's usability vs semantics maybe
<seanh> If we force client to send all the standard fields on an annotation _create_ (instead of inserting default values as we do currently) that would be a pain for clients. They have no object from us to start with, have to create all the fields. Maybe create is different than update though
<nick> we don't need to force the client to send us all the fields
<nick> for either create or update
<nick> there are fields that the client has control over
<nick> and there are fields that the server has control over
<nick> the point you made, which is a good one, is that currently there is no way for the client to remove fields that it putatively has control over
<nick> we don't need to become dogmatists all of a sudden and force the client to do extra work
<nick> it's just about making something practical which is currently impossible
<seanh> So some fields should behave differently than others, when omitted from an update request? Some would be deleted, others would be left unchanged?
<nick> well, uh, yes
<nick> because the client doesn't ever get to decide what the `user` field says
<seanh> Right
<nick> and it doesn't ever get to decide what the annotation `id` is
<seanh> But we _can_ make the semantics of all the fields consistent if we want. For example if the client sends a value for user that's different than the current one, the semantics of that could be "please change the user", and our response would be a validation error because you're not allowed to
<robertknight> Requiring the client to send all standard fields on annotation creation would become a breaking change if we ever wanted to expand the set of "standard" fields.
<seanh> And what I was thinking was that it's _not_ extra work for the client cos it just gets the object from us, makes just the changes it wants, sends it back
<seanh> Hmm
<robertknight> @seanh: That depends on how the client stores the data. If it serialized the annotation into a model in storage, then it needs extra code to roundtrip unsupported fields
<robertknight> As our Postgres storage does
<seanh> It wouldn't break client code if said code were implemented as I've been saying (and assuming the client didn't try to use cached annotation data from before the API change), but I see what you mean
<seanh> Yeah
<seanh> We could also support both PATCH and PUT, each with their correct semantics, which would give you the easiest to use and less breaky API (PATCH) and the one that you can use to delete fields (PUT)
<robertknight> Can you think of a reason that sending `null` as the value of a field shouldn't delete it in an update?
<robertknight> I think those were the semantics used in the Mendeley API to delete fields in a PATCH request.
<seanh> The only reason that I can think of is that it'd prevent you from actually storing null if you wanted to. That and I think it's semantically kind of inconsistent with how you have to treat sub-objects and lists like tags
<seanh> Basing it on the correct semantics of the verbs both in theory and in practice (aka what other APIs do) is probably the way to go
tilgovi joined the channel
<robertknight> Semantically inconsistent?
<seanh> If an annotation has tags `["foo", "bar"]` and I want to remove `"bar"` I do an update and send tags `["foo"]`, i.e. omitting `"bar"` removes it. I don't send `["foo", null]`.
<seanh> (It might not be quite as academic as it sounds. In code: get the annotation from the API, delete a tag from its tags list, send it back to the API. But if you were to do the same thing in code with a top-level field instead of an item or field in a sub-list or object, it wouldn't remove it.)
<chdorner> eager loading document data with our model where the annotation doesn't have a document id is nearly impossible :(
<chdorner> or rather, eager loading an association proxy
<seanh> How come it's impossible?
<chdorner> not quite sure yet, I might have to redefine the relationship between an annotation and a document to make it possible
<chdorner> I think the main problem is that our `documents` association proxy is a list and not just one document
<seanh> So you have an annotation and you're trying to get its document?
<chdorner> I'm trying to load annotations in batches and eager load its document, document uris, and document meta
<seanh> By "association proxy" you're referring to the sqlalchemy orm thing right? i.e. you should be able to just access `annotation.document` ideally.
<chdorner> because presenting 2000 annotations will result in 6001 queries
<chdorner> `annotation.document` is currently a property which calls `annotation.documents[0]`
<seanh> How are documents actually related to annotations? Not seeing an annotation id in the document table, nor a relation table
<seanh> Right I see the association proxy in the annotation model code now
<seanh> Why do you need to get the documents of 2000 annotations at once?
<chdorner> for the batch reindexing/checking
<chdorner> can also be less
<chdorner> but at the moment a full reindex would cause >1 million PG queries
<seanh> You could do them one at a time, and slowly so as not to hammer the server. Probably you do need to be able to do a lot fairly quickly though
<seanh> If you have the IDs of all 2000 annotations, can't you do one sqlalchemy query to get all the documents?
<chdorner> I would like to avoid having to load the document data myself and then inserting it into Annotation objects, but that would be the last resort :)
<chdorner> but it must be possible to define the relationships in a different way to allow us to eager load
<chdorner> it might not work with `yield_per` anymore, but that's a lesser concern at the moment
<seanh> I don't know how sqlalch's association proxy works I'm afraid
<chdorner> no worries, I will just have to dig through the docs :) I will get there at some point
<chdorner> I do miss ActiveRecord sometimes, which in these complicated situation decides on its own to fire off multiple queries and then build the objects together