Anatomy of a Bug - The Slowdown Bug
Added 2018-06-25 06:53:13 +0000 UTCI'll start this breakdown with a tl;dr: I've discovered and fixed the source of all of the slowdowns and halting in LR2. Expect greatly improved responsiveness and stability in v0.6.1. In the worst offending case the game now processes turns 650 times faster (6.5 second hang-up during end-of-day code down to 0.01 seconds, mostly Ren'py code).
When I find interesting, complicated, or in this case critical, bugs I like to breakdown exactly what caused it. Explaining it helps me make sure that the same issue never comes up again, and I hope it ends up being entertaining as well.
For the last few update cycles of Lab Rats 2 there has been a problem that was getting progressively worse. The most obvious issue was the hangup at the start and end of the day, sometimes lasting for four or five seconds and always getting longer as the game went on. Other symptoms were less obvious but I think more important - menu buttons felt sluggish, loading/saving took a noticeably long time, and rollback was slow enough that it felt useless.
To understand the bug I'll have to take a quick detour and explain how LR2 represents clothing internally (spoiler: my hunch was correct and the bug was in this system). In LR2 clothing is built up as a hierarchy of bits. A Wardrobe, the highest level of abstraction, holds a collection of Outfits. An Outfit is what is assigned to a character when they get dressed in the morning, when they put on their uniform, and so on. An Outfit is itself composed of pieces of Clothing. Each instance of Clothing stores a bunch of important information: it's name, the layer it sits on, it's colour, if it's considered underwear, ect. Importantly, each Clothing instance also keeps a reference to a Clothing_Images object which holds all of the images that are used for all body types. When LR2 draws a piece of clothing it asks the linked Clothing_Images what image matches the characters body type, breast size, and current position. Depending on what was being represented Clothing_Images might hold hundreds of images.
Now let me set up the cause of the bug. Anywhere an outfit is changed in LR2 a copy of the reference outfit is always used. This is important - without it a uniform would be forever changed when you stripped something off of an employee. There are two simple ways of copying objects python (and in programming in general): A shallow copy and a deep copy. A shallow copy only creates duplicates of the highest level variables - like an outfit's name - but would keep lower level references. A deep copy starts by copying all of the top level variables, then takes the next level and creates copies of them, and so on until it's created a duplicate of every reference all the way down. To change outfits I had to use deep copies, otherwise references in an outfit would point to the same piece of clothing - and recolouring that clothing on one girl would recolour it for everyone wearing that piece of clothing.
So here's the bug: Every time an Outfit was copied it copied everything below, right down to the Clothing_Images which it held. Each character might have ten or so outfits with 4 or 5 pieces of clothing, and each one might hold hundreds of individual images. The result was each character having thousands and thousands of images associated with them.
Lets look at why this caused some of the symptoms it did: At the start of each day each character picks out a new outfit to wear for the day from their Wardrobe and uses it as their planned outfit. At the end of the work day they would take a copy of their planned outfit and assign it as their active outfit. These two actions took far longer than it should have because they were manipulating huge amounts of data. The delay got longer as games went on because more characters were generated, each of which needed to assign outfits. Many people playing on android ran into memory issues, which was caused by the fact that there was a massive duplication of data in each outfit. The long save/load time and rollback delay was caused by the same issue - each character had way more "unique" information associated with them that had to be recorded. Menu delays were caused because Ren'py predicts button actions to speed up the response - a process that likely involves copying involved objects to test on and then slip back in if the button was used. Copying the massive character objects took a small-but-noticeable amount of time.
So how did the bug show up in the first place? When I first designed the clothing system there was no ability to dynamically recolour clothing and I assumed Clothing objects would be unchanged throughout the game and so only a reference would be needed to it (letting me use a shallow copy when copying Outfit's around). When recolouring was introduced I switched to using a deep copy to prevent modifying one piece of clothing from modifying all of the same type of clothing. The range of clothing items and postures was so much smaller then that the slowdown introduced wasn't immediately noticeable. It grew larger with each update because more items of clothing, more outfits, and more image sets were added to the game.
And finally, how was the bug fixed? Fixing this up required writing a custom copying function to be used with Outfits and Clothing. When an Outfit is copied now it copies all of the high level data, then takes a shallow copy of each piece of Clothing. This leaves the reference to the master Clothing_Images object intact, so only one of them exists for all pieces of clothing to share. In a demo game I created with a town of 50 people the end of day processing time has dropped from 6.5 seconds to 0.01 seconds, and a large chunk of that is likely to be time taken internally by Ren'py. Subjectively the rest of the game now runs far smoother as well.
So that's the story of the latest major bug in Lab Rats 2. I hope you enjoyed this peak behind the screen!
Comments
No sign of it in the new release. You done swatted it good.
Dubsington
2018-06-28 19:56:58 +0000 UTCIt was a great feeling and I was absolutely air-punching just the same as you. It was extra relieving to see that all of those problems shared a single root cause instead of being completely separate issues.
Vren
2018-06-28 05:15:28 +0000 UTCIt could be done like that and I think it would work just fine. I ended up with my method of attaching an object because it made the code for actually drawing each piece of clothing simple: For each clothing in outfit: clothing.draw() Where the draw function is basically the lookup function you are describing.
Vren
2018-06-28 05:12:02 +0000 UTCI'll be doing a demo game tonight before the public release, I'll take a look and make sure nothing strange is going on there,
Vren
2018-06-28 04:54:44 +0000 UTCHas anyone else had the problem of the game crashing whenever a serum with the Growing Breasts trait is used?
HighGrove
2018-06-27 21:21:14 +0000 UTCthis post is the main reason I support your work. You actually care about your project and about involving your patrons in the process of your work. I have stopped supporting many creators for various reasons, but I do not see me ever stopping my support of all the work you do to make your games great!
zombiegun71
2018-06-27 01:16:03 +0000 UTCSo this is more of a Python question, but isn’t there a pass by reference? Or from a more scripty sense: Each individual clothing object then would say I am x size of y color and z type, and passes that to a separate function which would return an image reference? Either way, good job on crushing a major annoyance.
Isaac Dansicker
2018-06-27 01:10:55 +0000 UTCGreat to see the going-ons behind the scenes. Hooray for another bug squashed!
2018-06-26 15:02:07 +0000 UTCIsn't swatting a long-standing bug a great weight off your shoulders? I had the same thing at my work today, although, sadly, the software in question involved no boobies. I had poked at it, tweaked the "likely" things, made some broad changes to the code, even rewrote the likely section and it persisted. Finally I had a couple quiet hours and busted out proper debugging methodology on it. When I got a clean test I was air-punching and whooping like a kid. Hope this fix was as good a feeling for you.
Dubsington
2018-06-26 00:00:41 +0000 UTCAgreed, this is interesting. Is this also related to why the women in crisis events will sometimes not be wearing their uniform?
Lazarus
2018-06-25 13:01:50 +0000 UTCOkay, I'll know now! ٩(◕‿◕)۶
SexBook _____ ADULT PHOTO AND VIDEO SIGNATURES
2018-06-25 11:52:39 +0000 UTCI've always found your more technical commentary interesting! I'm sure it will help budding ren'py game developers, or at least just give an insight to whats going on behind the scenes. Kudos to not encrypting your files too, I had fun going through them, and modifying it to get my lazy ass a head start.
Kin
2018-06-25 07:26:14 +0000 UTC