Mountain out of a molehill
Added 2017-12-22 03:48:08 +0000 UTCHey guys, hope you’ve been well. This latest installment in “me rambling about technical stuff” is about a few relatively unexciting things that I worked on recently. Nonetheless, it illustrates why making changes to the site can be such a time-consuming task.
A few weeks (months?) ago, I noticed something of a glaring omission on the site: in catalog view, threads started with webms would not display a thumbnail like other threads. I know, no one uses the catalog (evidenced by this being a thing for years and no one had mentioned it) and there’s very few threads that are started by posting webms. Still, it bothered me and it was something that was promptly added to my ‘to-do’ list.
My first instinct was to see how the catalog page was built and inject a little snippet of code telling it to look for image previews that are generated for the webm video files. What you need to understand is that finding things in the site’s code is not the most straightforward task. Code is seldom commented, the comments themselves aren’t always useful and functions and variables are spread across as fair number of files. Tools like the ever-useful grep help a lot but, when you have a lot of text files but without having a clue as to what things may be called, there’s still a fair amount of patience required to manually check files.
Another problem is that, even if you find what you’re kind of looking for, it’s often nested in bigger functions and code blocks that do all sorts of things. In other words, you shouldn’t just edit blindly as understanding what all the other code around it does is important to prevent breakage and other unpleasantness. As I mentioned, some functions are in other files and so if a particular section of code you’re looking for calls on it, I usually do my due diligence and take a look at the other files as well. Factoring stuff like trying to see where variables are declared and if they’re valid for other things makes the whole thing even slower going.
So, I found the bit that set the catalog thread images. Should have been as simple as adding an “if” statement for the file type and pointing it to the path to the corresponding file. Turns out that the site wasn’t generating smaller catalog-sized files for webms. Just hadn’t been set up when webms were first implemented. It was then I realized something else: other file types were generating catalog-sized images for every image uploaded every time—something completely useless if the images aren’t the first ones in the thread!
Naturally, in the name of efficiency, I began to look into how thumbnails were generated for other images (for webms it’s a separate function I wrote myself and could just add an if-statement for catalog-sized images). This led to looking through yet more files and trying to make sense of poorly-commented code and structures throughout. Our board software wasn’t finished when the project died and there’s plenty of places where you see half-implemented or messy (“spaghetti”) code with a lot of repetition and general inefficiency. Generally, you want to have functions with flexible parameters that you can call under different circumstances instead of duplicating lines of code with different variations. It’s a matter of both organization and performance.
In order to keep sane, I don’t usually try to do too much at once. When there’s a problem in the code and things aren’t obvious (as is usually the case in poorly-commented code), I tackle it in bursts and often take a few days before returning to the problem. Sometimes that helps with inspiration or simply keeps my eyes from glazing over when staring at the same complex bit of conditionals time after time. I freely admit I’m not much of a programmer, and I’m sure more experienced individuals would have no problem, but I have to pace myself most of the time.
The incomplete nature of the code makes it so that, even when you think you have the solution, it can turn out that you’re applying it to the wrong place. It’s often not very clear if your code should go in one similar-looking code block or the other. I’ll talk about my testing process some other day in greater detail but, suffice it to say, it takes me a very long time to test the changes I’m making and getting the results I want. I’m easily at it for hours on end.
It doesn’t help that the codebase is old and that a lot of coding practices used in it are outdated (some, even for the time). So you have to have to guess why they went with that in the first place before changing it, something again complicated by the lack of comments. Some parts may be deprecated but it’s very possible that there was non-standard behavior they were relying upon anyhow. So you can’t just rip it all up without thinking about it, annoyingly enough.
So, like an archaeologist translating a nearly-forgotten script on a tomb wall, I eventually got there after a lot of trial and some error. I found the right place to place new code to prevent the creation of catalog-sized thumbnails for all images. Nothing seemed to catch fire. Then I added in the bit to my function that generated them in the case of it being a webm and the first post image. Finally, on the generation of the catalog page itself.
All was fine, I tested and say that it was performing as expected. Then I noticed that these webm images weren’t being deleted from the various directories when they were otherwise deleted from display on the actual boards. And so I repeated the same laborious process for the delete function. There’s a specific to-do note that some of the functions in that file need to be refactored (simplified to a single thing, mostly) and, boy, does it show. All in all, long story short, it took a fair amount of more time to sort it all out.
Goes to show how messy the site’s codebase is and how the post I made earlier this year about it being a tangled mess still holds true. If you touch any one thing, you’re more than likely to end up touching something else in the code as well. Even things that are conceptually simple to understand and logically follow (by just reading the raw code) is a hassle because of how interconnected and all over the place things are.
There was another thing that had been bothering me for a while. The title for each of the site’s pages was less than ideal. I mean the bit at the top of the browser window which, as you’re reading this post, says something about Patreon and Touhou fanfiction. On THP, the front page says something about it being a place for chat and discussion but, on the board pages themselves, it outputs the board name and the thread number (if applicable). Seems like a silly thing to worry about, right? Well, it’s not. Having unique and pertinent titles usually make pages more easily visible on search engine look ups. Generally you want to have something that’ll help people find your website and that shows that there’s a lot of varied content when the algorithms rank it.
The most obvious solution was to have the subject from the first post up in the titles. So, if you search for a story’s title, it’ll be likelier to show up in the search engine results. It’s an easy concept to visualize, the problem is that you then have to face up to the reality of the code. Its function described in a single sentence: the site queries the database for relevant data that is filtered, then assigned to variables and arrays, passed on to a template engine which then generates the various html pages accordingly. So, if you want to have the post subject in the title, you will want to specify as much in the header template.
What took the most time and effort here was to figure out in which variable the subject was stored and how to retrieve it. This required some more sleuthing and reading of the database requests to see what was called when and stored where. Turned out that there was no clean way of passing the subject string of just the opening post to the image page header. I ended up assigning a whole new variable right after the board software queries the database (incidentally, it was unclear which of the queries was the relevant one because of similar tidiness issues as with the thumbnails) and then this value is called from the templates within a few if statements.
If there’s a subject, print that, if not, the thread id. The thread id was a whole different detour and I haven’t figured out why there’s existing code that, under some circumstances, replaces a string with the variable instead of calling the variable directly. It’s a weird mystery that’ll likely remain that way due to the lack of comments. I haven’t removed that because I wasn’t ready to spend a dozen hours troubleshooting in case everything else broke. More importantly, if the thread id is displayed, the board name follows. Finally, a trailing “Touhou-Project,com” caps off the title.
The last part took a little more effort and took partial rewriting of the generic “title” function in the site code. Otherwise it kept overwriting these new variables. The actual amount of time I spent on this was only a couple of hours on and off again, but I did take long breaks between sessions as I tried to piece together the puzzle that is the site’s code.
In the end, I’m not sure if it’ll remain strictly in that format. I may replace the URL part with something like “THP – a place for Touhou discussion and fanfiction” but I’m not sure which would be best for the site’s visibility. If you have any suggestions of something clever and succinct, I’d love to hear it in the comments. I’ll definitely take it into consideration.
Well, that pretty much covers all the boring stuff I’ve finalized today. There’s other things I’m also working on but they all more or less undergo the same laborious process as I’ve described in this post. It’s rare to just be able to implement a thing, even something really simple, in under an hour of coding. THP is a complex beast and it requires a lot of forethought and patience to deal with. If I accidentally mess something up, it could mean that people can’t post or the site won’t load at all.
I hope to have a few new things done by early next month. There’s a number of overhauls that are nearing completion and a few minor features that hopefully will be done soon as well. Still not the kind of thing I’d place a hard ETA for, however. Not only do I need to code, debug and all that, but I also need to read books and reference materials in order to try to implement things. Setting up a Patreon page has been really helpful in getting things going, though, as funds have proven useful when it comes to doing a proper job of code maintenance and getting peace of mind. There’s probably been more done in the last year than in the many preceding ones combined and there’s no signs of things slowing down at all.
As a “bonus” I’ll attach a pair of screenshots of some of the actual site code and all of its partially-commented glory (at least one of the comments there was added by me). Hope this wasn’t too dull a read!


Comments
If you're interested, I think you can still find the source for kusaba x up on the web. It's what the site runs, though I guess now we more than qualify as a proper fork with all the modifications we've made over the years. Virtually all of the third party libraries have been replaced, for one.
Touhou-Project.com
2017-12-25 05:50:24 +0000 UTCAs a hopeful future webdev and weird guy who likes looking at site guts, I appreciate these posts. It's always neat to see the inner-workings of a site you use regularly.
Benjamin Oist
2017-12-24 19:06:18 +0000 UTC