Parsing the c64 Bubble Bobble Wind Currents
Background
A while ago, I managed to rip the level data and graphics from the c64 version of Bubble Bobble.
I wrote a tool in typescript that reads the data from the actual game.
It was fun to see all the levels I had played so many times, but I wanted to see if I could get more of the related data. Platform graphics and colors are stored as simple arrays, while the platform positions and monsters are increasingly complex to read.
I had to study the disassembled binary in detail to understand how the game worked. Running it in a debugger/emulator to check the values of registers at each step helped a lot.
The platforms are stored as 1-bit bitmaps where each tile in the level is represented by a single bit. The levels are 32 by 25 tiles but the top and bottom rows are not stored in the bitmap, so the bitmap is 4 bytes wide and 23 bytes tall, giving 92 bytes each.
But levels have a separate array of metadata containing among other things a flag signaling if the level is symmetric. Instead of just storing the whole bitmap, symmetric levels store only half of it and mirror it in a separate step, saving 46 bytes.
The same metadata array is used to store wether a level has extra graphics decorating the sides. The extra graphics are stored as 2 x 2 tiles of 8 bytes each giving 32 bytes each, and are dynamically written to the charset when loading the level. Levels without this decoration just use the single tile platform graphics instead.
The monsters instead use a single null-byte to terminate the array of monsters for each level. Pretty easy.
At this point I had noticed that the titular bubbles follows “wind currents” clearly visible in the debugger. Martin Piper has a great Bubble Bobble code analysis video on youtube if that’s your thing.
It was’t too hard to find the code responsible for decoding this data. I basically just set a debugger breakpoint when the visible memory was written to. It was also easy to find where the source data was stored. Understanding how the format worked was a lot harder. I spent a lot of hours staring at it, but since this was almost my first exposure to ASM language, I found it absolutely brain melting.
This christmas, I tried again and finally solved it. Below is the annotated 6510 disassembly. I’ll walk you through it.
The Code
I’m sure there are errors in my notes, but it is accurate enough that I could finally write the code to extract the currents and visualize them.
At e18d, a pointer to the wind current data block is constructed. Pointers are 16 bits, but since the MOS 6502 family are 8 bit processors, it has to be loaded and stored one byte at a time. The 6502/6510 can only use pointers if they are stored in the zero-page, the first 256 bytes of the address space. The least significant bits are stored at address $02 and the most significant bits at $03. That’s called Little Endian.
At e19f, the level platforms are unpacked. Bits of that data is also used for wind currents.
At e1a4, the code starts looping through over the levels, skipping over a specified number of bytes for each level until it stops at the selected level. The loop counter starts at the wanted level index and is decremented. That way, only one register is needed.
The number of bytes to skip is read at e1a8. The first bit says if the level even has any wind data, or if it should copy the wind from another level instead. If it has it’s own wind data, the rest of the bits says how many bytes are used. At e1b1, that number is added to the wind data pointer, skipping the level. Being a 16 bit pointer on a 8 bit processor, any overflow must be handled manually, so at e1b5, the carry-bit is checked, and the most significant half of the pointer is incremented if set.
At e1b9, the loop counter is decremented and a BNE instruction (short for “Branch if Not Equal”) is used to check if the result after decrementing was zero. If it was not zero, the branch is taken to the top of the loop where the first byte of the next level is read.
After the loop at e1bc, the first byte of the selected level is read again. As explained above, if the first bit is set, the remaining bits are the level index to copy the wind data from. If set, the byte is masked with #%01111111, discarding the first bit. At e1c2, the code then jumps to the beginning of the function, just after the level index was loaded. That makes this a recursive function that will follow any number of indirections.
If the copy-bit was not set, execution continues at e1c5, where the level platform bitmap unpacked at e19f is used. The last 2 bits of each row stores the default wind direction for each row. Those 2 bits can be used since the sides of the levels always are solid anyway. The loop goes on until e1eb.
At e1ed, the main wind data parsing begins. The Y register was used as a loop counter earlier at e1da, and ended up as -1. Therefore it is incremented to 0 and used to read the first byte. Again. Since we have come this far, that first bit must be zero, and the rest of the bits is the number of bytes used by this level.
The count may be zero, and if so we are done, so just return. Interestingly, the developer choose to reuse a random RTS instruction at address LE188 instead of adding one in this function.
At e1f2, the array index Y register is incremented again and the second byte is read. Again, the first bit is used to select between 2 different cases: Either:
-
This byte and the 2 following make up a rectangle with a wind direction to draw to the wind buffer, in which case wee keep executing the next line at e1f7.
-
Or it tells us to make the wind data loaded so far symmetric by copying the left half to the right, mirroring it and flipping the wind direction horizontally. In That case the execution branches to LE24C, where this symmetry is handled. It is possible to include multiple symmetry markers, but it wouldn’t make sense to have more than one per level.
At e1f7, we handle the rectangle. The content of the A registry is the first rectangle byte. It is copied to the X registry as well, for safe keeping. Then bit 1 and 2 are masked out with #%01100000.
At e1fa, the byte is shifted left, discarding the top bit that told us this is a rectangle. Another left shift is done, causing the upper of the 2 masked bits to be shifted out as well. But the shift instruction stores the discarded bit in the carry-flag. The following 2 ROL instructions work pretty much like left shifts, but the fresh bit added at the right end is taken from the carry-flag instead of just adding a zero. The end result is equivalent to shifting right 5 times. Yay, we saved one instruction!
At e1fe, the resulting 2-bit value is the wind direction, and is stored away for later.
At e200, the first rectangle byte is copied back to A from the X register, since there is more data in it to handle. This time it is masked with #%00011111. That’s the left position of the rectangle. It too is stored away for later.
At e205, the second rectangle byte is read. It too is copied to the X registry. It is shifter right twice and the masked with #%11111110. This is the top position of the rectangle.
Notice how the lowest bit was masked away. You’d think the byte would be shifter right one more time instead. But the code used to actually draw the rectangle to the wind buffer would need to multiply it by 2 anyway, so it is stored as twice the actual value.
At e210, the remaining 3 bits are masked with #%00000111 and stored at address $06. They are the top 3 bits of the 5 bit rectangle width. The bottom 2 bits are taken from the top bits of the last byte of the rectangle. It is done at e21b, by shifting out the top bit from the A register and ROL-ing it into the bottom of the byte at address $06 where the width was stored. That’s done twice, once for each bit.
In the middle of that, at e218, the Y register used to index the rectangle bytes is incremented and stored away in address $0b. That’s where the next rectangle will start reading from.
Finally at e222, the third rectangle byte is masked with #%00011111, giving the rectangle height.
For some reason, both the width and the height are stored as one less than the actual number. Of course zero width/height rectangles would be pretty useless in this context, and there are only 5 bits, but I don’t think it actually helps with anything.
At e226, the rectangle coordinates are loaded and at e239 a loop begins drawing the rectangle to the wind buffer. After the loop comes the symmetry code, so it is JMP-ed past.
At e282, the position in the rectangle bytes array is compared to the size allocated for this level and if they match, the function returns, otherwise the code loops back to LE1F3 to read another rectangle.
Results
After successfully reading the wind data, I could visualize it. Here are a few examples. To the left is the level as it would look when playing the game. To the right is the wind, with arrows pointing in the direction of the current.
I color coded each tile.
- Cyan is the explicit stored data. The column just to the right of the level is the per-row default wind direction stored in the platforms bitmap. Everything else is stored as rectangles.
- White is used where Cyan overlaps a platform, just to make it easier to visually correlate the wind with the platforms.
- Dark blue is where the wind direction is determined by the per-row default direction.
- Purple is where the dark blue overlaps a platform.
- Red is where a direction has been reflected from the left side by a symmetry-marker.
- Yellow is where the red overlaps a platform.