Touhou-Project.com

And the Beat Goes On (2)

Added 2023-05-12 23:58:00 +0000 UTC

Hey all, hope you’re well. It’s taken me a little longer than I would have liked to write this up, but better late than never, right? I want to continue off from last time and continue to cover the stuff that I’ve been working on. This time we’re focusing on the story list in particular.

As you may recall, THP’s old story list was something of a hacky mess and it took concerted effort to replace it with something that was both easier to maintain as an administrator and also easy for users to interact with. I’ve always meant to expand and refine the current design but since things worked more or less well, it wasn’t the highest of priorities. In fact, even though I’m going to talk about the work I’ve done to enhance it, I don’t consider it a finished project by any means—just one whose future expansion will happen further down the line.

Better tag filtering

One of the additions I’ve made is to allow for tags to be excluded when searching for stories. Such a feature may seem simple from the user’s perspective but it wasn’t quite as straightforward to implement. I wished to make the internals as uncomplicated as possible while minimizing calls to the database and avoiding too much duplicate work that could tie down resources. I’m going to gloss over the original way that tags were processed (previous posts on the story list have covered it and I suggest you read them if you’re interested) for the sake of expediency and just state that it tried to be efficient by first validating tags, then checking against a story if an id was supplied, before actually going to the more resource-intensive work of actually pulling in story information for display.

In order to exclude tags, things like how input is sanitized and tag results are validated had to be redone. Dealing with various corner cases and usage needs led to the code being refactored and my splitting off functions in their own discrete bits for maintainability, so that it’s easier to reuse and follow the logic. It’s only after those strings of text are turned into discrete and safe entries that can be understood by the rest of the program that you can do something simple like checking that the positive and negative tags don’t overlap and, if they do, to stop the program from further action.

Depending on whether you have only excluding tags, desired tags, or a mix of both different bits will run in the code as you might imagine. With an eye of making the more expensive calculations at the end—pulling in full story data with children—rarer, more filtering happens at this stage to make sure that the final number of stories that are queried is as small as possible.

While I think that there’s still room for optimization, the extra time I put in to ensure that the system was built correctly feels worth it. It’s easier to follow how data gets processed and the story list page itself doesn’t really concern itself with how things run elsewhere under the hood.

Pagination

The large, full, story list that was shown by default is gone, replaced by discrete pages of results. On the one hand it makes the story list less unwieldy and, on the other, is an obvious optimization as data needs to get processed each time the page is opened or someone searches for something.

This was something I tackled after the tag additions and took a long time to think of how to best implement the feature. It’s easy enough, you see, to take a dynamic list and limit it by x per page and outputting a y total number of pages via math. It gets a bit more complicated when you have to include search criteria and keep it consistent across the pages because it is dynamic content. A different page is basically another query and, well, keeping variables consistent without overengineering is not as simple as it may first appear.

I was sure from the start that I would like to cache as much as possible to avoid re-running queries to the database whenever possible. Something like GET and POST could be rigged to pass that sort of data, sure. But, as those methods are sent by the client, they can be manipulated or otherwise be malformed, needing revalidation every time around (not just a security precaution but also to ensure continuity).

Though imperfect, I decided that I would use sessions to keep track of data, even if they do have their own limitations (and also depend on cookies). Saving things like the list of story ids that are to be queried, whether there were no results, the total number of pages and the like means that there’s fewer calculations that need to be made and fewer queries needed to display a single page of results.

(As an aside and a refresher, stories aren’t just a title, author etc but also have to fetch their tags, story synopsis, their board, any children or parent stories. All of that adds up over the several thousand entries that could potentially be displayed, so it’s best to do it only for the things that we strictly need.)

There are some additional complexities introduced, to be clear, but checking for session variables and their occasional validity is far less involved than running the whole tag processing again or, for that matter, searching for authors and story titles.

Before moving on, I’d like to add that something stupid like the display of the pages themselves at the bottom also required some thinking. I wanted to limit the number of pages displayed and have a consistent look that scaled well on mobile devices. That means figuring out how many pages and which numbers should be displayed depending on the total number of page results and where the user is (ie page 10/40). Always displaying 9 page results (5 on mobile), keeping the current page number centered unless it’s the first couple of pages takes a little bit of math and logic to work out for the 4 or so possible states that the page list could find itself in. My dumb self caught a few corner cases after thinking about it even longer and it was only after I had the programming side sorted that I could begin to think about styling it and its position on the outputted page. Keeping things friendly for all sorts of device sizes and avoiding misclicks was something I aimed for.

Sorting

Previously, as the story list was one big result without pages, sorting of results could be done client-side with some Javascript. With pages, however, results need to be sorted server-side before they’re served. Otherwise you’d simply get a the same order of pages.

I got rid of the old JS and small associated library and replaced it with a system that also takes advantage of sessions to keep results consistent. That is to say, when one of the order buttons is now pressed, the query is re-submitted and the order within the database query is rechecked. This, naturally, involves sanitation checks as user input should not be trusted, especially when database queries are part of the picture.

The default remains search by timestamp but for the other possibilities, I had to make sure that they all worked consistently. Like, sorting by author is something that needs to take into account tripcodes and anonymous authors. Author names are prioritized but tripcodes are evaluated if no name is set, omitting the actual delimiters in order to be alphabetical order and intermixed with regular names. Further, even when you have that information set, in both the case of sorting by author or board, you may still get inconsistent results depending on execution speed. This is because the sorting criteria is non-unique, meaning that a second sort needs to be done on the database level, which I additionally set to title. That way you’ll always get an alphabetical order of works of each board or author.

I’ll spare you kvetching about databases this time around but making sure that criteria are strict and will be parsed by various possible backends and interfaces in consistent manner can be a little bit of a headache. Thankfully our queries aren’t that complex in the larger scheme of things but my heart goes out to any developers out there who deal with far more complex structures regularly. It is way too easy to trip oneself up with subtle behavior and with different properties set to tables, columns or even databases themselves.

And others

There’s a few other small things that have been done on the story list. For example the addition of the same footer as the rest of the site and a “clear search” button. They took much less effort and brainpower to implement but are nice quality-of-life additions; they also work well for mobile users (had to check every step of testing that this was the case).

As you might imagine, the various possible combinations of search criteria, saved data, orders and the like necessitate a large degree of modularity. As I’ve been burned by the terrible code practices of Kusaba X developers elsewhere for many years, I’ve made the underlying code as clean-looking and reasonable-well-commented as possible. Understanding what is happening and where is important, especially as I plan on more additions in the future.

As of the time of writing this, there are a few features that won’t make the cut just yet. Things related to tagging, both on the story list and on the backend, and displaying messages on the story list itself. A couple of links that are missing also. I haven’t had enough time to polish all the other systems that interact with those things so I decided to hold that back and just push most of the things mentioned in this post live (well, very soon™). Once I sort everything else out, you can expect a third part that sums up all the work on that front.

This is a good stopping point, at any rate. I always hesitate to make these posts too long as I’m never sure how many people actually take the time to read them in the first place. I don’t want to be too boring. Regardless, I hope hope that enjoy the changes outlined above and, until next time, take it easy!