Replacing an ORM with SQLite and dataclasses
Note: due to my new setup, this post came out almost in raw form. It is now heavily edited as it should have been, but at least forced me to finish it :)
I started using Mastodon some three years ago, and since the beginning I started having issues adapting to how timelines work and how you actually should use the platform. I listed several things that I didn't like, and I always thought that at some point I would just sit down and write my own client. This weekend looked like it would rain all weekend, so it was the perfect time to do it.
As with any project that uses a database, I usually go for an ORM. I have already used SQLAlchemy on 3 or 4 projects, so it was natural for me to use it again. One of them is probably dead, but the rest get repeated usage; not daily, but many times a year for many years. Through all those years, most had the data layer written once and forgot about, because the data model is quite simple; and this case too. The problem is that every time I mostly have to relearn how to use it, specially since even when SQLAlchemy has evidently stayed quite backwards compatible, it has also evolved.
Version 2.0 has a very different interface. With the old version, if you wanted to fetch an object, a row from the database, you did something like this:
image = session.query(Image).filter_by(name=filename)[0]
But with the new version, they have exposed a thin abstraction of the SQL language; the previous exmaple becomes:
image = session.scalars(select(Image).where(User.name == filename))[0]
I love the magick1 they are using to convert that expression into something they can later
render as SQL, which involves rewriting __getattribute__()
to return another object that implements
__eq__()
to capture the value.
It's like writing SQL in Python, right? And in some ways, it makes a lot of sense, because one of the things people usually complain about ORMs is that, at some point doing things that are easy in SQL they become too convoluted on ORMs. It's like you are bending yourself backwards just to try to kind of scratch your left ear using your right arm, but under your left leg.
So, putting a slight veneer of Python on top of SQL kind of makes sense, and I like that someone has explored that idea to the point that it has become the new way of doing things. And with that, I mean, I guess the old API still works, again, all the code I have written so far still works, and I still use it still works, but the new way looks better. But then another thing that over and sometimes too, and in particular, SQL actually seems to do too, is to have two levels of data returned. You may have, you may get objects, which is basically you go and fetch stuff from the database and you get objects in return, and it's the objects you are thinking of in terms of your application model and not in terms of your tables and other stuff, which is amazing because you can do one-to-end and end-to-end relationships, and it's just natural, because let's say a person has several addresses, you just say person.addresses.add, and you add an address there, and it just works. You don't have to think about anything there, you just add stuff and it will go in the database. Before you have an end-to-end relationship and you can go from photos to tags and from tags to photos, and it just works, right? You just get, you fetch a tag and say, give me all the photos, and it just gives you the photos and you completely forget about the intermediary table, and that's just amazing. It just works and it's fantastic. Then there's the other API where you don't get objects, you just get data from the database. You get roles, you get tuples with data inside, which has no structure and anything else. After writing this Friday night, what I did in Friday night was mostly to look at the responses from a mastern, servers, see what they look like, and try to make data model based on that. I am not going to complete the model accounts or toots, because they have way more information than I want to, to display in my client, but that's all I did on Friday night. So the morning I started writing the model in SQL Alchemy, and it was just fine, just mapping JSON fields to tables and columns, and then to classes and objects and attributes. I started writing all the code and it was just fine, everything was kind of obvious in that sense. But at some point I hit a wall where I was going to, I was fetching all the, I was fetching accounts based on an ID, and instead of returning me objects, it was returning me tuples. So when I wanted to say, okay, I have a new toot, and this toot was tooted by this, toot or this account, if I don't know the account, I create a new one and then I create a toot with this new account, and I just push it to that list. But in the case of the account already existing and me getting a tuple instead of an object, when I was trying to say, okay, toot.author is this object, I was not putting an object, I was putting just a string of the ID, because that's all I had, that's all what the ORM was giving me. And of course later the ORM says, no, this doesn't work because I expect an object and you'll give me just an ID. And I spent like 24 hours on that, I mean, from Saturday afternoon to this morning, okay, not 24 hours, maybe like 10 hours on it. And asking everywhere and I didn't get a very good answer. So I just switched to SQLite 3. I just had to do it. So before my workflow was, if I needed to add a new column to a table, I only had to add a new attribute to the class and then destroy the database and create it again. One of the good decisions I did at the beginning was to store all the original information in raw, just getting the JSON and storing it in the database. So that means that if I make any changes to the data structure that I will use later, I can just read everything back from the database, create the new objects and save them in the database again. And this is something I could use to replace database structure migrations. I just nuke all the secondary tables and just create all the data again from scratch, yeah, from the original data. So I can do these migrations very easily now. So with SQL Alchemy, I could just drop the tables, create everything again and just feed, create all the objects again and that worked. But now I don't have that anymore. So what do I do? Well, my workflow now includes SQLite browser, which is a GUI application for creating, for manipulating SQLite files, right? And I just had to add a new column to the Tooth table, so I did just that in the GUI. And what I'm going to do at some point is to make a dump of the structure and keep it in a SQL file. So if I ever have to do this migration again, which I have to do at some point, I will just take that file, just executed it from scratch when I have to create the tables again and that's it. I will just do it that way. I really wish our, in this case, SQLite, SQL Alchemy would have been just transparent for me. But I guess that's exactly the kind of things people say about ORAMS, the type of criticism that ORAMS usually get, that 90% of the work is really transparent, but then at some point it is no longer so transparent and you start doing weird shit just to accomplish what you want. So what I'm going to do in the future, I'm not really sure. I will see how this raw SQLite works. This project is rather simple. I have very few tables, like I have two tables for the original raw content, one not 24 hours, maybe like 10 hours on it. And asking everywhere and I didn't get a very good answer. So I just switched to SQLite 3. I just had to do it. So before my workflow was, if I needed to add a new column to a table, I only had to add a new attribute to the class and then destroy the database and create it again. One of the good decisions I did at the beginning was to store all the original information in raw, just getting the JSON and storing it in the database. So that means that if I make any changes to the data structure that I will use later, I can just read everything back from the database, create the new objects and save them in the database again. And this is something I could use to replace database structure migrations. I just nuke all the secondary tables and just create all the data again from scratch, yeah, from the original data. So I can do these migrations very easily now. So with SQL Alchemy, I could just drop the tables, create everything again and just feed, create all the objects again and that worked. But now I don't have that anymore. So what do I do? Well, my workflow now includes SQLite browser, which is a GUI application for creating, for manipulating SQLite files, right? And I just had to add a new column to the Tooth table, so I did just that in the GUI. And what I'm going to do at some point is to make a dump of the structure and keep it in a SQL file. So if I ever have to do this migration again, which I have to do at some point, I will just take that file, just executed it from scratch when I have to create the tables again and that's it. I will just do it that way. I really wish our, in this case, SQLite, SQL Alchemy would have been just transparent for me. But I guess that's exactly the kind of things people say about ORAMS, the type of criticism that ORAMS usually get, that 90% of the work is really transparent, but then at some point it is no longer so transparent and you start doing weird shit just to accomplish what you want. So what I'm going to do in the future, I'm not really sure. I will see how this raw SQLite works. This project is rather simple. I have very few tables, like I have two tables for the original raw content, one for the actual raw content and one for the raw toot content. One of the things is when you boost another toot, you basically generate a toot that has a toot nested into it. So the original raw content includes all that information, but to me, for my client, the most important thing is the original toot itself. So I have another table that only has the original toots, not the boosters, and the client will just work from that. And then I have toots, tooters, and tacks so far. I'm not really sure if I will do a table with bookmarks and filters and all that shit, because those things usually stay on the server in your instance, and I should just query them all the time. Press also think of having an offline version where you can do stuff, and if you cannot send them immediately to the instance, you just queue them for later, for when you have a connection, or when you want to use a connection. So that's it. It's very, very simple data model. If this goes right, but later I have a more complex data model, I might try other over ends, especially simpler over ends that are less magic, a little bit more transparent, but yeah, we'll see when the time comes.
-
Typo intended. It's not real magic, just a clever data model, but sometimes indistinguishable from black magic :) ↩