jamchamb’s blog

Modifying Embedded Filesystems in ARM Linux zImages

2022-01-02T00:00:00+00:00

Ever run binwalk on an embedded Linux device’s kernel image and find its entire fileystem contained inside? Ever want to change one little line inside to enable root shell on that device that’s just mocking you with its lack of boot security, only to be thwarted by a bit of compressed data entangled in machine code?

The mechanism for this built-in filesystem is known as the initial ramdisk,¹ which often takes the form of a CPIO archive (initramfs). The initial ramdisk is embedded in the kernel binary proper (vmlinux), which is in turn compressed and packed into a wrapper program (vmlinuz, zImage, bzImage). The wrapper performs initial setup, decompresses vmlinux, and then jumps into it.² The compressed vmlinux blob tends to be referred to as the “piggy” in Linux boot code.

While it’s quite easy to run extract-vmlinux or binwalk on these kernel images and unleash a flood of shell scripts, config files, and programs that one might have many reasons to want to modify, figuring out how to package these files back up into an image fit for execution is not so straightforward.

This article will demonstrate how to replace the piggy in a 32-bit ARM zImage without worrying about size constraints or finding the right toolchain and exact configuration options necessary to recompile the vmlinuz wrapper code. It’s not intended as a universal solution, but rather a guide that should provide one with enough understanding to make whatever tweaks needed for their specific use case.

While this information is intended to help a neighbor make modifications to proprietary kernel images that they can’t simply rebuild from source, I’ve decided to use the 32-bit ARM virt build of OpenWRT for demonstration purposes. May a thin layer of obfuscation by compression never get in the way of your path to proofs of concept again!

Setup

First, download the OpenWRT ARM virt zImage-initramfs image:

$ wget -q https://downloads.openwrt.org/releases/17.01.0/targets/armvirt/generic/lede-17.01.0-r3205-59508e3-armvirt-zImage-initramfs -O zImage-initramfs
$ sha256sum zImage-initramfs
5ad269e95b2db16aea3794dd0e97dabb6f9712184d79b0764bb10a810f8d7639  zImage-initramfs

We can boot this in qemu with:

$ qemu-system-arm -serial stdio -M virt -m 1024 -kernel zImage-initramfs

After booting, press enter to activate the console. The following banner is displayed:

BusyBox v1.25.1 () built-in shell (ash)

     _________
    /        /\      _    ___ ___  ___
   /  LE    /  \    | |  | __|   \| __|
  /    DE  /    \   | |__| _|| |) | _|
 /________/  LE  \  |____|___|___/|___|                      lede-project.org
 \        \   DE /
  \    LE  \    /  -----------------------------------------------------------
   \  DE    \  /    Reboot (17.01.0, r3205-59508e3)
    \________\/    -----------------------------------------------------------

=== WARNING! =====================================
There is no root password defined on this device!
Use the "passwd" command to set up a new password
in order to prevent unauthorized SSH logins.
--------------------------------------------------
root@LEDE:/#

This looks like a good target for a proof of concept modification. Let’s use the shell to check the base Linux kernel version:

root@LEDE:/# uname -a
Linux LEDE 4.4.50 #0 SMP Mon Feb 20 17:13:44 2017 armv7l GNU/Linux

The core pieces of the zImage wrapper code are unlikely to change much from the original, if at all, so we can look up the assembly source of the wrapper for that version of Linux. Bootlin’s Elixir Cross Referencer provides a nice interface for browsing Linux source code across different versions. Open a browser tab and navigate to https://elixir.bootlin.com/linux/v4.4.50/source/, or clone the linux repo to search through the code.

Most of the files we’re interested in can be found in the arch/arm/boot/compressed directory.

Piggy Extraction

We need to extract the piggy before we can modify it. The well known extract-vmlinux script³ performs a brute force search for the magic bytes of commonly used compression schemes, runs the associated decompressor program on them, and checks if the output is an ELF.

For this image it fails – we’ll see why in a minute. binwalk identifies XZ compressed data:

$ binwalk zImage-initramfs

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             Linux kernel ARM boot executable zImage (little-endian)
15400         0x3C28          xz compressed data
15632         0x3D10          xz compressed data

There’s clearly an XZ stream header magic at 0x3d10:

00003D10   FD 37 7A 58  5A 00 00 01  69 22 DE 36  02 01 07 00  .7zXZ...i".6....
00003D20   21 01 1A 00  AF BB 14 35  E2 84 87 EF  FF 5D 00 20  !......5.....].

As well an XZ stream footer magic at 0x2bf10c:

002BF100   70 AB A0 CD  9B E3 51 40  03 00 00 00  00 01 59 5A  p.....Q@......YZ
002BF110   28 CE 8A 00  00 00 00 00  00 00 00 00  00 00 00 00  (...............

binwalk can successfully extract the compressed vmlinux, but what we now need to know for certain is exactly which start and end offsets the decompressor code uses to delineate the piggy.

Notice that there are several piggy.*.S files in arch/arm/boot/compressed. These include the content of the piggy as a binary blob and store its start and end offsets in the globals input_data and input_data_end:

	.section .piggydata,#alloc
	.globl	input_data
input_data:
	.incbin	"arch/arm/boot/compressed/piggy.xzkern"
	.globl	input_data_end
input_data_end:

These globals are referenced in the decompress_kernel function in arch/arm/boot/compressed/misc.c, which prints a telltale string before calling do_decompress with the piggy start offset and length as arguments:

putstr("Uncompressing Linux...");
ret = do_decompress(input_data, input_data_end - input_data,
    output_data, error);
if (ret)
    error("decompressor returned an error");
else
    putstr(" done, booting the kernel.\n");

We can use cross-references to the “Uncompressing Linux…” string in the disassembled zImage-initramfs binary to locate the call to do_decompress, which will point us to the values of input_data and input_data_end.

I loaded the image up as a raw little endian⁴ ARM binary in Ghidra to find this call in the disassembly. Ghidra detects that 0x3d10 and 0x2bf114 are loaded from the Global Offset Table (GOT) section at the end of the zImage to set up the first two registers for the call to do_decompress.

These addresses match up with the XZ stream header and YZ stream footer bytes as seen above, but there is an extra word that comes just after the stream footer. A bit more digging into the source code confirms that this represents the uncompressed size of the XZ data, and that it’s expected to break decompression with the normal unxz command.⁵

Let’s carve out the piggy:

$ dd if=zImage-initramfs of=vmlinux.xz ibs=1 skip=$[0x3d10] count=$[0x2bf114-0x3d10]
2864132+0 records in
5594+1 records out
2864132 bytes (2.9 MB, 2.7 MiB) copied, 2.85465 s, 1.0 MB/s

Use the --single-stream option to avoid the “Unexpected end of input” error when decompressing it:

$ unxz --verbose --single-stream vmlinux.xz
vmlinux.xz (1/1)
  100 %   2,797.0 KiB / 8,883.5 KiB = 0.315

We now know the exact size and location of the piggy within the zImage. It’s 2864132 (0x2bb404) bytes long, located at 0x3d10 - 0x2bf114.

Modification

Direct replacement

For the test modification, I will modify the bytes after the WARNING! string in the banner. These bytes show up within the initramfs section of the decompressed vmlinux binary, which consists of an uncompressed CPIO archive with no checksums. It’s simple enough to directly edit with a hex editor:

0076AC30   61 74 20 3C  3C 20 45 4F  46 0A 3D 3D  3D 20 57 41  at << EOF.=== WA
0076AC40   52 4E 49 4E  47 21 20 3D  4D 6F 64 69  66 69 65 64  RNING! =Modified
0076AC50   21 20 68 65  6C 6C 6F 20  6E 65 69 67  68 62 6F 72  ! hello neighbor
0076AC60   73 3D 3D 3D  3D 3D 3D 3D  3D 3D 3D 3D  0A 54 68 65  s===========.The

If we try to naively recompress the modified image, it comes out significantly larger than the original piggy. Trying all of the compression presets -0 through -9, even with the --extreme flag, results in at best a 2994568 byte output. In fact, even if we just recompress the original vmlinux unchanged, it ends up at 2994540 bytes in the best case. That’s 130408 bytes larger!

Digging around the Linux boot files we can find the xz options used to compress the original piggy. The command is in xz_wrap.sh:

xz --check=crc32 --arm --lzma2=$LZMA2OPTS,dict=32MiB

Let’s try with those options:

$ xz --check=crc32 --arm --lzma2=,dict=32MiB < vmlinux-mod-warning > /tmp/vmlinux-mod-warntest.xz
$ wc -c /tmp/vmlinux-mod-warntest.xz
2864204 /tmp/vmlinux-mod-warntest.xz

Close, but it’s still 76 bytes too large (including the four extra bytes needed for the inflated size word). After digging through xz’s man page and experimenting with compression options, I found a useful setting that resulted in a smaller output:

$ xz --check=crc32 --arm --lzma2=,dict=32MiB,nice=128 < vmlinux-mod-warning > /tmp/vmlinux-mod-warntest.xz
$ wc -c /tmp/vmlinux-mod-warntest.xz
2863580 /tmp/vmlinux-mod-warntest.xz

Here’s the description of the nice option from the xz man page:

Specify what is considered to be a nice length for a match. Once a match of at least nice bytes is found, the algorithm stops looking for possibly better matches. Nice can be 2-273 bytes. Higher values tend to give better compression ratio at the expense of speed. The default depends on the preset.

A nice option indeed. Now that we have a smaller output, we can append the inflated vmlinux size to the new piggy and try to replace the original piggy with it.

Make a copy of the kernel image and zero out the piggy area:

$ cp zImage-initramfs zImage-initramfs-warnmod
$ dd if=/dev/zero of=zImage-initramfs-warnmod bs=1 seek=$[0x3d10] count=$[0x2bf114-0x3d10] conv=notrunc
2864132+0 records in
2864132+0 records out
2864132 bytes (2.9 MB, 2.7 MiB) copied, 7.53578 s, 380 kB/s

The size of the uncompressed vmlinux is still the same (9096744 bytes), so append that to the end of the new piggy as a little endian 32 bit integer (28 ce 8a 00). Then copy the new piggy into the piggy area:

$ echo -en "\x28\xce\x8a\x00" >> vmlinux-mod-warning.xz
$ dd if=vmlinux-mod-warning.xz of=zImage-initramfs-warnmod bs=1 seek=$[0x3d10] conv=notrunc
2864044+0 records in
2864044+0 records out
2864044 bytes (2.9 MB, 2.7 MiB) copied, 7.86938 s, 364 kB/s

Update the input_data_end word in the GOT near the end of the image (at 0x2bf124). The piggy now ends at 0x2beef0.

002BF110   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
002BF120   50 F1 2B 00  F0 EE 2B 00  68 F5 2B 00  10 3D 00 00  P.+...+.h.+..=..
002BF130   64 F5 2B 00  64 F1 2B 00  54 F1 2B 00  40 09 00 00  d.+.d.+.T.+.@...
002BF140   5C F1 2B 00  60 F1 2B 00  58 F1 2B 00  00 00 00 00  \.+.`.+.X.+.....
002BF150

Then attempt to load it in qemu:

$ qemu-system-arm -serial stdio -M virt -m 1024 -kernel zImage-initramfs-warnmod

It doesn’t work! With a quick debugging session and another look at the code, it’s clear the decompressor is still checking for the inflated piggy size at its original location, 0x2bf110. Because we zeroed out the original piggy area, the decompressor will read zero as the inflated size of the piggy.

The location of the inflated size word shows up in a block of addresses in the main assembly code for the decompressor:⁶

LC0:	.word	LC0			@ r1
		.word	__bss_start		@ r2
		.word	_end			@ r3
		.word	_edata			@ r6
		.word	input_data_end - 4	@ r10 (inflated size location)
		.word	_got_start		@ r11
		.word	_got_end		@ ip
		.word	.L_user_stack_end	@ sp
		.word	_end - restart + 16384 + 1024*1024
		.size	LC0, . - LC0

Easy enough to fix with the hex editor again. 0x002bf110 is at offset 0x258 in the image and we can update it to the new inflated size word location, 0x2beef0 - 4 = 0x2beeec. Now it boots:

BusyBox v1.25.1 () built-in shell (ash)

     _________
    /        /\      _    ___ ___  ___
   /  LE    /  \    | |  | __|   \| __|
  /    DE  /    \   | |__| _|| |) | _|
 /________/  LE  \  |____|___|___/|___|                      lede-project.org
 \        \   DE /
  \    LE  \    /  -----------------------------------------------------------
   \  DE    \  /    Reboot (17.01.0, r3205-59508e3)
    \________\/    -----------------------------------------------------------

=== WARNING! =Modified! hello neighbors===========
There is no root password defined on this device!
Use the "passwd" command to set up a new password
in order to prevent unauthorized SSH logins.
--------------------------------------------------
root@LEDE:/#

It turns out we could’ve made things simpler by leaving the inflated size word in its original location. The extra zeros at the end of the XZ data don’t bother the XZ decompressor, and it would also save us from needing to update the location of the size word in LC0.

This approach works as long as we can recompress the modified vmlinux to a size equal to or smaller than the original. Some scripts such as repack-zImage.sh will perform modifications along these lines and attempt to optimize compression, but can’t repack an initramfs once modifications increase its compressed size.⁷

Extending the image

But what if no amount of compressor option tuning can save us? What if we must increase the size of the piggy? To figure out what needs to change if we move the end of the piggy to a higher address we can use the layout of the image as described in the linker script arch/arm/boot/compressed/vmlinux.lds.S.⁸

The GOT at the end of the table needs to be moved up, along with the bss section address. Nearly all of the references to the locations of these sections are baked into the LC0 object shown above.

We’ll need to update the addresses of anything that’s located after the start of the piggy, including:

Addresses in LC0 object
- __bss_start
- _end - end of program (including bss)
- _edata - end of image
- inflated piggy size location
- _got_start
- _got_end
- user_stack_end
- end - restart + 16384 + 1024*1024
Any entries in the GOT that come after input_data

Mapping of LC0 pointers to vmlinuz locations. Anything pointing to a location after .piggydata must be updated.

At this point I started using a Python script to automate the editing. The LC0 object is easy to locate dynamically because it starts with its own address (e.g., for this zImage the word 0x00000248 is at offset 0x248). We can pull the GOT location from LC0 and use it to get input_data and input_data_end (i.e., piggy start and end).⁹ For each value in the LC0 and GOT that’s greater than the piggy start offset, we increase it by the amount we’re increasing the size of the image. Then we can extend the image and insert the new larger piggy over the original one.

One more thing to fix up is the _magic_end value near the beginning of the image: it matches the size of the zImage file. (This didn’t have any effect on whether qemu booted the image.)

Does it work yet? Nope! Another debugging session shows that the arguments for do_decompress are wrong: the input location, length, and error function pointer are all zero. Notice that these are all values in the GOT.

What’s happening here is that the handful of functions compiled from C code (misc.c and so on) have offset tables appended to them which are used to locate entries in the GOT. The first offset in the table is a PC-relative offset to the GOT itself. The subsequent offsets locate specific entries within it. Those entries contain a fixed up pointer to their global symbol.

input_data and input_data_end are globals referenced in the GOT, so we’ll have to fix this. Luckily we only have to fix the base GOT offset for each function.

There are some simple constraints we can use to implement a quick and dirty search and update routine for these values:

These words should only exist in between LC0 and the piggy.
The rough minimum possible GOT base offset is from where the code ends to where the GOT starts: got_start - piggy_start.
The rough maximum offset is from the beginning of the code after LC0 to the GOT start: got_start - lc0_end.

Updating each of these words with the size increase delta works and fixes the extended image! Here’s a demonstration:

$ cp vmlinux vmlinux-mod-big
$ # changing /etc/banner...
$ xz --check=crc32 < vmlinux-mod-big > vmlinux-mod-big.xz
$ # add inflated size to end of XZ data
$ echo -en "\x28\xce\x8a\x00" >> vmlinux-mod-big.xz
$ wc -c vmlinux-mod-big.xz
2994648 vmlinux-mod-big.xz
$ # that is 130516 (0x1fdd4) bytes larger than original
$ ./arm_zimg_extend.py zImage-initramfs bigpig --replace vmlinux-mod-big.xz
LC0 @ 0x0248 - 0x026c
  0x00: 0x00000248
  0x01: 0x002bf150
  0x02: 0x002bf56c
  0x03: 0x002bf150
  0x04: 0x002bf110
  0x05: 0x002bf120
  0x06: 0x002bf14c
  0x07: 0x002c0570
  0x08: 0x003c34a4
GOT @ 0x002bf120 - 0x002bf14c
  0x00: 0x002bf150
  0x01: 0x002bf114
  0x02: 0x002bf568
  0x03: 0x00003d10
  0x04: 0x002bf564
  0x05: 0x002bf164
  0x06: 0x002bf154
  0x07: 0x00000940
  0x08: 0x002bf15c
  0x09: 0x002bf160
  0x0a: 0x002bf158
piggy data @ 0x00003d10 - 0x002bf114
piggy compressed size: 0x002bb404
piggy inflated size @ 0x002bf110
piggy inflated size: 0x008ace28
piggy new compressed size: 0x002db1d8
extending image by 0x0001fdd4
LC0 extended:
  0x00: 0x00000248
  0x01: 0x002def24
  0x02: 0x002df340
  0x03: 0x002def24
  0x04: 0x002deee4
  0x05: 0x002deef4
  0x06: 0x002def20
  0x07: 0x002e0344
  0x08: 0x003e3278
GOT extended:
  0x00: 0x002def24
  0x01: 0x002deee8
  0x02: 0x002df33c
  0x03: 0x00003d10
  0x04: 0x002df338
  0x05: 0x002def38
  0x06: 0x002def28
  0x07: 0x00000940
  0x08: 0x002def30
  0x09: 0x002def34
  0x0a: 0x002def2c
Searching for GOT offsets...
Candidate GOT offset @ 0x09d8: 0x002be750
Candidate GOT offset @ 0x0ac0: 0x002be6f0
Candidate GOT offset @ 0x0b78: 0x002be618
Candidate GOT offset @ 0x0cc4: 0x002be498
Candidate GOT offset @ 0x1080: 0x002be24c
Candidate GOT offset @ 0x1184: 0x002be074
Candidate GOT offset @ 0x3360: 0x002bcbb4
Candidate GOT offset @ 0x350c: 0x002bbd84
magic start: 0x00000000
magic end: 0x002bf150
magic end updated: 0x002def24
wrote new image
$ qemu-system-arm -serial stdio -M virt -m 1024 -kernel bigpig
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.4.50 (buildbot@builds-02.infra.lede-project.org) (gcc version 5.4.0 (LEDE GCC 5.4.0 r3101-bce140e) ) #0 SMP Mon Feb 20 17:13:44 2017
...
BusyBox v1.25.1 () built-in shell (ash)

         ^,    ,^
        /  ----  \
       / _\    /_ \  Ful
       |  / __ \  |
       |   /oo\   |            ,-.
       |   \__/   |____________.:'
       \   .__.   /            \ '
        '.______.'              \
            \                   |
             |  /____...-----\  |
             |  |            |  |
             |^^|            |^^|
 Big piggy mod!

=== WARNING! =====================================
There is no root password defined on this device!
Use the "passwd" command to set up a new password
in order to prevent unauthorized SSH logins.
--------------------------------------------------
root@LEDE:/#

The source code for this script can be found at https://gist.github.com/jamchamb/243e6973aeb5c9a2e302a4d4f57f16e1.

In the context of PCs the initial ramdisk is a more limited filesystem used for an intermediary “early userspace” stage. These images can still contain interesting programs, such as those used to get a disk decryption key from the user or TPM. ↩
https://people.kernel.org/linusw/how-the-arm32-linux-kernel-decompresses ↩
https://github.com/torvalds/linux/blob/master/scripts/extract-vmlinux ↩
If binwalk hadn’t already told us the image is little endian, the magic endianness value of 0x04030201 would. It’s stored as 01 02 03 04 near the beginning of the image, which tells us it’s stored little endian. ↩
https://elixir.bootlin.com/linux/v4.4.50/source/scripts/Makefile.lib#L374 ↩
https://elixir.bootlin.com/linux/v4.4.50/source/arch/arm/boot/compressed/head.S#L576 ↩
https://forum.xda-developers.com/t/script-repack-zimage-sh-unpack-and-repack-a-zimage-without-kernel-source-v-5.901152/ ↩
https://elixir.bootlin.com/linux/v4.4.50/source/arch/arm/boot/compressed/vmlinux.lds.S ↩
I’ve used the known indices of these values in my code, but some smarts could be added to automatically detect the right entries based on compression header magic (piggy start) and greatest offset before the GOT (piggy end). ↩

Reversing the Pokémon Snap Station without a Snap Station

2021-08-17T00:00:00+00:00

Back in 1999 when the original Pokémon Snap was released for the Nintendo 64, one of its coolest features was that you could print out the photos you took in-game on sticker sheets using a Snap Station. Snap Stations could only be found at a Blockbuster video store (or a Lawson convenience store in Japan), and you’d have to pay for credits in the form of Pokémon-styled smart cards each time you wanted to print out a sheet of stickers. I’ve had one of the Charmander cards sitting around with my collection of Nintendo stuff for a while, which got me thinking about what it would be like to hack one of these kiosks.

Description of the Snap Station from the June 1999 issue of Nintendo Power

The Snap Stations themselves are hard to come by these days, but while watching some YouTube videos made by collectors I recalled that, in order to print your photos, you would insert your own Pokémon Snap game cartridge into the station. Furthermore, some videos showing the interior of the station revealed that the printer hardware was actually connected to the Nintendo 64 through its fourth controller port. That suggested that the code for handling some amount of the printing behavior might be present on all retail copies of the game.

By looking into the Pokémon Snap ROM, I was able to quickly confirm that the print menu text was present in the retail copy of the game:

a08270 63 72 65 65 6e 2e 00 00 49 66 20 79 6f 75 20 6c  >creen...If you l<
a08280 69 6b 65 20 74 68 65 73 65 20 70 69 63 74 75 72  >ike these pictur<
a08290 65 73 2c 20 70 6c 65 61 73 65 0a 6d 61 6b 65 20  >es, please.make <
a082a0 73 75 72 65 20 61 20 70 72 69 6e 74 20 63 72 65  >sure a print cre<
a082b0 64 69 74 20 65 78 69 73 74 73 0a 74 68 65 6e 20  >dit exists.then <
a082c0 70 72 65 73 73 20 5c 61 20 74 6f 20 70 72 69 6e  >press \a to prin<
a082d0 74 2e 00 00 52 65 74 75 72 6e 20 74 6f 20 74 68  >t...Return to th<
a082e0 65 20 54 69 74 6c 65 20 53 63 72 65 65 6e 0a 62  >e Title Screen.b<
a082f0 79 20 73 61 76 69 6e 67 2e 00 00 00 47 61 6c 6c  >y saving....Gall<

It was much trickier to identify the actual code responsible for handling the print functionality for reasons I’ll describe later, but with a combination of static analysis, dynamic analysis, and a custom FPGA-based hardware tool, I was able to reverse engineer the Snap Station’s control protocol without having access to one. (With access to a real Snap Station, all I would’ve had to do was use a logic analyzer to observe what was being transmitted over the fourth controller port.)

Using this information, I’ve implemented a Snap Station simulator in the Project64 emulator, as well as a hardware implementation using an iCEBreaker FPGA board. The code for each can be found at:

Project64 fork: https://github.com/jamchamb/project64/tree/snapstation
iCEBreaker/iCE40UP5k FPGA design: https://github.com/jamchamb/cojiro

The following video shows the Snap Station simulator in Project64:

I recommend watching a video of the real Snap Station in action if you haven’t yet in order to help make sense of what happens at the end.

When the player presses the Print button, the selection of photos chosen for printing is saved to the game cartridge. The station then resets the console and instructs the game to enter a photo display mode after the boot logo screen. The printer in the Snap Station uses a video pass-through input to capture photos directly from the Nintendo 64’s video output. Each time a photo is displayed, the station instructs the printer to perform a screen capture. After all 16 photos have been displayed and captured the station can print out the sticker sheet.

The end result doesn’t look like much without any printer hardware involved, but using the FPGA design I’ve released, you could implement a full setup with a printer if you really wanted to. The big takeway for me with this project was learning how to reverse engineer and simulate peripherals with an FPGA.

Snap Station protocol summary

The Snap Station protocol itself is pretty simple, and I will briefly describe how it works before going into the details of how I reverse engineered it.

The Snap Station acts like a controller with a peripheral plugged into it, much like a Rumble Pak or Controller Pak (memory card). Communication happens via the same read and write commands used by the Controller Pak, which read or write 32 bytes of data at a specified address. In this case the address indicates the type of message rather than an actual memory location.

To signal to the game that printing functionality is enabled, the station indicates a peripheral is plugged in. This causes the game to start querying what type of peripheral is connected through the controller. These messages use the address 0x8000:

write cmd: 8000 (addr CRC-5 01)
  fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  response: e1
read cmd: 8000 (addr CRC-5 01)
  response: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            00
write cmd: 8000 (addr CRC-5 01)
  85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85
  85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85
  response: f5
read cmd: 8000 (addr CRC-5 01)
  response: 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85
            85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85
            f5

The console first sends a message with repeating FE bytes, which is something like a reset or initialization message, followed by 85 repeating, which is the peripheral ID it’s checking for. The station should respond with 85, where a different device like the Rumble Pak would respond with 80. If this happens during the initial boot screen, the game goes into the photo display mode. Otherwise it just registers that the station is available, which enables the Print button in the Gallery.

Messages that correspond to the state of the print flow use the address 0xC000. When the player presses the Print button in the Gallery menu, the following values are sent to station:

read cmd: C000 (addr CRC-5 1b)
  response: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            00
write cmd: C000 (addr CRC-5 1b)
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cc
  response: 27
read cmd: C000 (addr CRC-5 1b)
  response: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cc
            27
read cmd: C000 (addr CRC-5 1b)
  response: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cc
            27
write cmd: C000 (addr CRC-5 1b)
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 33
  response: aa
read cmd: C000 (addr CRC-5 1b)
  response: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 33
            aa
read cmd: C000 (addr CRC-5 1b)
  response: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 33
            aa
write cmd: C000 (addr CRC-5 1b)
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 5a
  response: 59
read cmd: C000 (addr CRC-5 1b)
  response: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 5a
            59

Only the last data byte of each message matters, so in short the sequence is:

0xCC
0x33
0x5A

There isn’t much meaning to the values themselves, they’re just bit patterns (1100 1100, 0011 0011, and 0101 0101). The first two (CC, 33) are sent before and after the save operation is performed, which saves the selected photos for printing. The last one (5A) signals that it’s time to do a soft reset on the console, which I believe was handled in the Snap Station by directly triggering the console’s reset button or cycling the power.

The station remains active during the reset, so it will be present during the Nintendo logo boot screen, which triggers the photo display mode. In this mode, the console sends the following bytes to the 0xC000 address:

0x01
0x02 (16 times in a row)
0x04

Knowing that 16 photos are displayed, it’s easy to guess that 01 signals the start of the display, 02 is sent each time a photo is displayed, and 04 signals the end of the display (when the screen goes blank).

At any point during the communications with the C000 address the station can return a value of 08 to trigger a busy loop, which can help synchronize the state of the station and the game code. In the gallery menu, I’ve used this to keep the game frozen at the 5A message until the console has been reset, which keeps the “Now Saving…” message displayed while the Print button is darkened. If the game were to proceed past reading 5A back, the player would just remain in the Gallery menu and nothing would appear to happen, which looks awkward. In the photo display mode I use the busy loop to keep the photos displayed a little longer so that they don’t flash by too quickly. With a real printer involved in the setup, these pauses could help ensure that each photo is captured and stored in the print buffer.

Take a look at the Source/Project64-core/N64System/Mips/Snapstation.cpp file in the Project64 fork linked above to see how this is described in code.

Reverse engineering

Static analysis difficulties

The toughest part of this project was simply finding code relevant to the Snap Station within the Pokémon Snap ROM. Disassembling the game code was complicated due to code moving around at runtime, with some mutually exclusive segments of code even occupying the same areas of memory when navigating through different menus or gameplay. Most of this was probably due to the use of overlays, a technique for conserving RAM where “some segments share the same memory region during different phases of execution.” This caused most of the automatic disassembly to fail, because the addresses of the code as loaded into the disassembler didn’t match where they would be at runtime. Disassemblers generally perform automatic code discovery by following jump instructions, which wouldn’t work here.

Besides a couple of debug and error strings, there were no symbol names to work with. Without the benefit of accurate cross-reference information for code and data, this made it very hard to build up any context with static analysis, which is crucial for understanding complex code without any metadata information about it. I was able to to configure the memory layout in Ghidra to get one decent chunk of code to disassemble correctly, but loading the entire game ROM properly would’ve been an intensive task.

JoyBus FPGA tool

At this point I decided to switch over to working on the FPGA-based hardware interface for the JoyBus protocol used by Nintendo 64 controllers in the hopes that probing the controller port would provide quicker results. There are a few good resources with unofficial documentation on how JoyBus works online; in particular I found https://sites.google.com/site/consoleprotocols/home/nintendo-joy-bus-documentation helpful. Here’s a quick recap of how the hardware interface works:

There’s one bi-directional data line used by the console and controller to communicate with each other. The other two pins in a controller port are power and ground.
The data line is an “open drain” output. This means the line is held at the high logic level by default, and devices on the bus communicate by pulling the line down to ground. For the FPGA implementation this means the JoyBus pin will be left in high impedance mode except when transmitting a 0 bit.
A pull-up resistor on the data line helps the signal rise back to the high logic level faster, resulting in a more square waveform.
Every bit transmission starts with pulling the line low for a microsecond, holding it at high for a 1 bit or low for a 0 bit for two microseconds, and always returning high for the fourth microsecond. That’s four microseconds per bit, so the bit rate is roughly 250,000 bits per second.

For the physical connection to the console I cut an N64 controller extension cable in half and broke the wires out to a breadboard. Besides some jumper cables to connect to the iCEBreaker, the only other component on the breadboard is a 330 Ohm pull-up resistor between the 3.3V power line and the data line.

N64 controller extension cord breakout

Console commands consist of a one byte command ID followed by optional data bytes, such as a 16-bit read address for the Controller Pak read command. I started by implementing the two basic commands needed to simulate a regular N64 controller. The first (FF or 00) queries the type of device connected and its status, and the second (01) checks the current state of the buttons and analog stick.

I connected the iCEBreaker to the fourth controller port while Pokémon Snap was running and tried returning different values to the query command to see if different device type IDs or status values would be recognized. With the normal controller ID 0500, just changing the status to 01 to indicate a peripheral was plugged in caused the console to send a few more requests to the device. At the time I noticed this by watching an oscilloscope hooked up to the data line; to get a better look at the data I added UART forwarding to send the request packets over to my PC. With the UART forwarding I was able to see the write command with the FEFEFEFE... payload:

03 8001 fefefefefefefefefefefefefefefefefefefefefefefefefefefefefefefefe

Knowing that reporting a controller with a peripheral plugged in on the fourth port did something interesting, I switched back to dynamic analysis with the Project64 emulator to see if I could identify code that checked the status of the fourth controller.

Dynamic analysis

Using the Project64 debugger to set watchpoints on reads or writes to the controller state memory was fruitless because they were constantly triggered by code that I assume was generic system-level code checking and updating the controller state every frame.

Instead I tried a more indirect approach with memory scanning on the Gallery menu. First I tracked down the location of the current button index with the standard technique of repeatedly updating the selection and then scanning memory for the newly changed value. Then I could set read watchpoints on the button index value to see if I could identify where the menu handling code was.

Initially this also caused the watchpoint to trigger repeatedly. By patching out the read instruction that was repeatedly triggered, I saw it was caused by the glowing orange cursor that shows above the currently selected button. After removing that read instruction I got much more useful results: the watchpoint triggered when the menu description text for the currently selected button was changed, and when I pressed A to trigger the currently selected button.

With this approach I was finally able to find that, in the menu code, there were some conditional checks related to the button index where the Print button would appear (6, right before the Save button which has index 7). One of the checks was a call to a function that simply checked if a certain global variable had a value of 5. On a hunch that this was some kind of state related to printing, I patched the function to always return true, which resulted in the Print button appearing:

Forcing Print button to appear

Tracking down the code that would set this particular variable to 5 finally led me to the code that appeared to send out the FE and 85 sequences on the JoyBus. Looking at the code, I could see that for the first command the controller should not return FE in kind, and for the second it should return back the repeating 85 sequence. Implementing this behavior on the FPGA unlocked the core behavior of enabling the Print button and triggering the photo display mode when the game first boots. With the UART forwarding feature, the rest of the protocol became evident when pressing the Print button or during the photo display, and with a little more reversing I found what the messages sent from the Gallery menu meant and how to trigger the busy loops by returning a value of 8.

Conclusion

For the final FPGA tool setup, I use a button for advancing the state of the station. The first button press enables the station, which helps avoid launching immediately into the photo display when you start the game. Next it waits to receive the 5A message and maintains a busy loop to keep the “Now Saving…” message displayed until the player resets the console manually. Once the console reboots into the photo display, pressing the button will advance the photo display one at a time until all 16 photos are displayed.

Snap Station simulator implemented with the iCEBreaker FPGA development board

Dumping K360 wireless keyboard firmware with a GreatFET

2021-05-29T00:00:00+00:00

I recently decided to do some keyboard hacking for fun, so I started with one of the cheapest Logitech wireless keyboard models available: the K360. This model is a little old and the main chip inside it, as well as the Logitech Unifying wireless protocol it uses, have been well covered before. See Marc Newlin’s MouseJack presentation, Travis Goodspeed’s nRF24 sniffing work, and the KeyKeriki research mentioned in each. I’m doing this more as an exercise rather than novel research, and I didn’t know what I’d find going in. That said, I thought this was a neat little example of extracting bare metal firmware from on-chip flash.

I’m taking a hardware-first approach to reverse engineering the keyboard, so the first step is to extract the firwmare. Disassembling the keyboard is pretty easy after removing the adhesive plate from the front, which exposes all the screws keeping the shell together. Inside there’s the rubber dome key matrix and a small PCB with very few components on it. It’s basically just an nRF24LE1 chip:

PCB front

PCB back

On the left of the nRF24LE1 there’s a grid of six large test pads. This seemed like an interesting interface to poke at, so I checked these pads first using a logic analyzer and some pogo pin probes. TP8 seemed to pulse out a short clock at boot, and TP4 was ground, but I didn’t observe much interesting activity here with passive probing. It became clear that the group of six words silkscreened on the upper right of the board were labels for these test pads due to the clock, ground, and VMCU label positions all matching up with the corresponding pad positions.

To actively probe the SPI interface I then wired up the test pads to a header using some enameled wire. I also wired up TP5 and TP7. According to the nRF24LE1 datasheet, TP7 connects to the PROG pin which is used to “enable flash programming,” and TP5 connects to RESET (also labelled right next to TP5). Reset is active low in this case.

Pin assignment diagram from the nRF24LE1 datasheet

I left the smaller test pads 10-13 alone at this point because they connected to generic GPIO pins and sat on the traces out to the key matrix.

SPI wires

I connected the SPI pads to the default SPI pins on the GreatFET, GND to one of the ground pins, and assigned a GPIO pin for the reset pad. I also used a 3V3 pin on the GreatFET to directly power the nRF24 chip through the VMCU pad.

gf = GreatFET()
reset_pin = gf.gpio.get_pin('J1_P4')
reset_pin.high()

I tried to send some simple test commands using the built-in SPI code, but didn’t get anything back in this state:

In [6]: gf.spi.transmit([0x05], receive_length=1)
Out[6]: b'\xff'

In [7]: gf.spi.transmit([0x03, 0x00, 0x00], receive_length=4)
Out[7]: b'\xff\xff\xff\xff'

I went back over the datasheet some more and found that the flash programming SPI interface enabled by the PROG pin has its own set of assigned pins. These pins did not match up with the test pads I just wired up, so I checked to see if those were broken out on the remaining test pads. I used a multimeter continuity test to trace where these pins lead to, but you can also follow the traces in the picture of the nRF24 side of the PCB. Luckily the smaller test pads T10 through T13 do connect to these flash SPI interface pins (P1.2, P1.5, P1.6, and P2.0). Since there were only four more pads, I used my PCBite probes again instead of soldering more wires up.

Probing the flash programming SPI

Enabling the flash programming SPI interface requires holding the PROG pin high and resetting the device by pulsing RESET low. I set up another GPIO pin on the GreatFET for the PROG pin, and then added a function to my test script for pulsing the RESET pin low.

#!/usr/bin/env python3
import hexdump
import time
from greatfet import GreatFET

def reset(gf, reset_pin):
    reset_pin.low()
    time.sleep(0.001)
    reset_pin.high()
    time.sleep(0.001)


def main():
    gf = GreatFET()
    reset_pin = gf.gpio.get_pin('J1_P4')
    prog_pin = gf.gpio.get_pin('J1_P6')

    # Reset is active low
    reset_pin.high()

    # Enter prog mode
    prog_pin.high()
    time.sleep(0.01)
    reset(gf, reset_pin)

    # ...

if __name__ == '__main__':
    main()

After resetting the device I’d attempt to send one of the flash SPI commands to get the flash status and flash protection status registers, as well as do a test read:

fsr = ord(gf.spi.transmit([0x05], receive_length=1))
fpcr = ord(gf.spi.transmit([0x89], receive_length=1))
print(f'flash status register: {fsr:#02x}')
print(f'flash protect register: {fpcr:#02x}')

# test read
print('test read:')
data = gf.spi.transmit([0x03, 0x00, 0x00], receive_length=256)
hexdump.hexdump(data)

I still wasn’t getting any response (just the same series of 0xFF bytes), so I did a bit of debugging with my oscilloscope to make sure all the pins were working correctly. I ran a simple script on the GreatFET to keep resetting the device and triggered the oscilloscope on the reset pin going back up high so I could look at the state of the scope probes soon after boot. In addition to checking the RESET and PROG pins, I took a simple power consumption measurement with a shunt resistor to see if the device was really resetting. Note the difference in the power consumption when the device successfully enters PROG mode:

Reset with PROG low

Reset with PROG high

One of the flash SPI probes might’ve been off, but after checking everything was OK on the oscilloscope I tried this process again and was able to read out some intelligible data from the flash:

$ ./test.py
flash status register: 0x80
flash protect register: 0x0
test read:
00000000: 80 A3 A3 02 00 03 78 FF  E4 F6 D8 FD 90 00 00 7F  ......x.........
00000010: 00 7E 04 E4 F0 A3 DF FC  DE FA 75 81 7E 02 07 82  .~........u.~...
00000020: FC 00 FF D9 00 11 01 FF  E0 FF E0 00 01 02 FF E1  ................
00000030: FF E1 00 01 02 FF E2 FF  E6 00 01 02 FF E7 FF EB  ................
00000040: 00 01 02 FF EC FF EF 00  04 02 FF F0 FF FF 00 01  ................
00000050: 00 57 69 72 65 6C 65 73  73 20 4B 65 79 62 6F 61  .Wireless Keyboa
00000060: 72 64 20 00 34 D9 1D F0  40 01 00 00 00 61 02 20  rd .4...@....a.
...

Note the “Wireless Keyboard” string. After that I did a single read for 18432 bytes, the maximum allowed according to the data sheet. The output looked liked a sensible dump of the program flash. The nRF24LE1 uses the 8051 instruction set, so to confirm it was code I loaded it into Ghidra as an 8051 binary blob. There appears to be an initialization routine near the beginning of the blob, suggesting it’s valid code.

Test disassembly in Ghidra

To verify the flash readback protection and hardware debug enable settings I also wanted to read out the InfoPage mentioned in the datasheet:

InfoPage section of nRF24LE1 datasheet

Reading the InfoPage requires setting the INFEN bit in the flash status register, so to do that I just had to send a “write flash status register” command before performing the read:

def read_fsr(gf):
    fsr = gf.spi.transmit([0x05], receive_length=1)
    return ord(fsr)

def write_fsr(gf, fsr):
    fsr &= 0xff
    gf.spi.transmit([0x01, fsr])

def read_flash(gf, address, count):
    command = struct.pack('>BH', 0x03, address)
    data = gf.spi.transmit(command, receive_length=count)
    return data

def get_infoblock(gf):
    flash_stat_reg = read_fsr(gf)

    # INFEN is bit 3 (2^3)
    write_fsr(gf, flash_stat_reg | 8)
    time.sleep(0.001)

    infoblock = read_flash(gf, 0, 512)

    # Unset INFEN bit
    write_fsr(gf, flash_stat_reg & (~8 & 0xff))

    return infoblock

Now I can read out the InfoPage to confirm the readback protection and HW debug settings. Although the fact that my initial read attempts returned program data also indicated that readback protection was disabled, going through these steps helped me build confidence that I was using the flash programming interface correctly. It looks like a backup of the InfoPage would also be important to have in case I erase the flash later on.

With a few more simple additions it’s easy to use this code to dump the program flash and/or InfoPage out to files:

$ ./k360_spi.py --dump flashdump.bin
flash status register: 0x80
flash protect register: 0x0
InfoBlock content:
00000000: 00 A3 A3 48 31 57 54 79  70 14 0A 12 FF FF 98 04  ...H1WTyp.......
00000010: 79 7C 88 23 B1 50 0F 05  FF FF FF FF 82 79 FF FF  y|.#.P.......y..
00000020: FF FF FF FF FF FF FF FF  FF FF FF FF FF FF FF FF  ................
00000030: FF FF FF 4C 45 31 4F FF  FF FF FF FF FF FF FF FF  ...LE1O.........
...
000001F0: FF FF FF FF FF FF FF FF  FF FF FF FF FF FF FF FF  ................
Flash readback protection: False (ff)
HW debug enabled: False (ff)
wrote flash dump to flashdump.bin

The full source code for k360_spi.py can be found at https://gist.github.com/jamchamb/b2892a22ac0760346d4d617fedf9b541. The next step will be to analyze the firmware.

CSCG 2020 Maze game hacking challenge writeups

2020-06-21T00:00:00+00:00

A couple of months ago I took a crack at the Maze challenges in the CSCG 2020 CTF and thought a few of the challenges were really interesting, so I wanted to share how I solved them.

I found out about the Maze challenge while watching LiveOverflow’s Pwn Adventure 3 video series. The Maze challenge was created by LiveOverflow, so I figured there would be some similarities to the Pwn Adventure 3 CTF and thought it would be a good opportunity to try some online game hacking (without the risk of getting accounts banned :p).

This gave me a chance to get into some Unity game hacking, as well as Cheat Engine. I was familiar with the basic concepts of Cheat Engine, such as dynamic memory scanning and pointer scanning, but I hadn’t used it much before.

I was able to solve the first few challenges just using standard Cheat Engine techniques. The first was the “secret emoji” challenge.

Emoji

When you first start the game and make an account you’re only able to use two emojis:

This game was built with Unity, which is based on the Mono development framework. Using Cheat Engine’s Mono dissector, which is a built-in feature for analyzing metadata about object classes in a Mono binary, I found that there was a sendEmoji method on the server class. By setting a breakpoint on this function I could intercept calls to it and change the first argument, a 16-bit emoji ID.

I solved it by pressing the button for one of the available emojis and changing the argument register (RDX) to a new value each time using the debugger, and then observing what showed up. Eventually I reached the ID value of 0x0D, which resulted in the flag emoji being triggered. Here’s what it looked like:

And here’s how the emoji looks, since I didn’t capture it in the first screenshot:

Flying & Teleporting (across very short distances)

The next two challenges, “The Floor is Lava” and “Tower”, involved getting to hard to reach locations. I figured that in order to attempt this I should try to use Cheat Engine to figure out my player character’s coordinates in memory and see how I could tamper with them.

First I used the memory scanning feature to discover the player’s coordinates, and then I used pointer scanning to discover a reliable way to reference the player object in memory so that I wouldn’t need to find it again every time. There are a bunch of videos online about how to do this, here’s the one from the Pwn Adventure 3 series: https://youtu.be/yAl_6qg6ZnA.

There’s a fence in the maze at the very beginning of the path to these locations that blocks you from proceeding. This was good for testing out a simple teleport hack.

Once I had the coordinates visible in the Cheat Engine address view, I could watch how they changed as I approached the fence. I noticed that the Z axis coordinate increased as I approached it, so once I got stuck I edited that value and increased it by 1.

This didn’t cause any issues, the server accepted the small change in location despite there being an obstacle in the way in the game client. I also tried setting my coordinates to 0, 0, 0, but that caused the server to teleport me back to my previous location. There was clearly some sort of distance limit on how far you could go between position updates.

To do a simple fly hack I used the “Memory that writes to this location” feature on the Y coordinate, which corresponds to altitude, and NOP’d out the instructions that updated that coordinate one by one. Then I was able to set my altitude to about 30 or 40 and not fall back to the ground, giving me an overview of the map:

By freezing updates to all of the coordinates I could teleport myself anywhere in the map and see myself there in the game client, but the server clearly did not accept the position my game client was reporting. As long as I remained in the new location, it would keep trying to teleport me back to my last legitimate position. While blocking this allowed me to explore the map, any interactions with the world that depended on my location would not be recognized by the server, so it was unlikely that I’d be able to get any flags this way.

The Floor is Lava

While I was able to move around using the fly hack, I still couldn’t go over the maze walls even though I wasn’t colliding with them. It was also difficult to pass through them using the short-distance teleport hack. It seemed like the server was enforcing the maze wall barriers rather than just relying on the game client to block movement this time. However, if I positioned myself just right, I could do a short teleport of about 5 units in a given direction and pass through the wall without the server forcing me back.

To make the hack a little easier to use, I set up some key bindings in Cheat Engine to increase or decrease the X or Z coordinate by 5.0, giving me an arrow-key style way to move through the air and through walls. This allowed me to head right for the lava pit, over the lava, and to the treasure chest island:

Tower

Using the same fly hack solution, I was able to head over to the tower as well:

For the rest of the challenges I decided to dig deeper into the game disassembly and network protocol. This game was built with IL2CPP, making it a little trickier to work with than a Mono/.NET binary (which could be decompiled into something that looks much like the original source code using a tool like ILSpy or dnSpy).

I used Ghidra to disassemble the game binary and Il2CppDumper to extract its metadata and recreate function symbols.

To work with the network protocol I used the example network proxy script from the Pwn Adventure 3 video series as a base. The main modification I had to make was to convert it from using TCP connections to UDP packets - the Maze game uses UDP, but otherwise works similar to Pwn Adventure 3.

When it first connects to the server it’s actually hitting some HTTP endpoints to get information about the game servers, such as the range of ports available to connect to. To intercept this traffic I used Burp Suite and redirected all traffic for maze.liveoverflow.com to my proxy VM.

The original range of UDP ports used by the game servers was 1337 to 1357, but to simplify things in the network proxy I eventually added an auto-replace rule for the /api/max_port response so that the game would always connect to the same port.

Once I had the basic network proxy for the UDP game server traffic working it was clear that the packets were encrypted. Using Ghidra I found two suspect pieces of code in the send and receive packet functions for the server class, which performed XOR operations on the packet data. The recreated encryption routine looks like this:

def encrypt_data(data):
    key_x = random.randint(0, 255)
    key_y = random.randint(0, 255)

    encrypted = [key_x, key_y] + [0 for x in data]

    for i in range(len(data)):
        encrypted[i + 2] = key_x ^ ord(data[i])
        new_key = key_x + key_y
        key_x = (new_key + (new_key / 0xff)) & 0xff

    return ''.join([chr(x) for x in encrypted])

With encryption and decryption routines added to the proxy I could begin to analyze and tamper with the network protocol, using the disassembled binary as an aid for figuring out what the packets meant.

Map Radar

“There are rumours of a player who found a secret place and walks in a weird pattern. A radar map could be useful.”

The Map Radar hack was one of the first challenges that was going to require more than manual Cheat Engine hacks. The two possibilities I saw were doing a DLL injection hack to extract player locations and add an actual GUI radar to the game, which would be awesome but time consuming, or reading player position information out of the network packets.

Using the network proxy I could see that periodically I’d receive an “I” packet containing a bunch of player names, and a “P” packet with a big chunk of data in it - much more than any of the other packet types.

[1337] <- 5787493713ffff00051054686520576869746520526162626974
00000000: 57 87 49 37 13 FF FF 00  05 10 54 68 65 20 57 68  W.I7......The Wh
00000010: 69 74 65 20 52 61 62 62  69 74                    ite Rabbit

Player info packet (server to client)

The “I” packet contains a 32-bit player ID, 16-bit unlocked abilities value, and finally the length of the player name and the player name string. One interesting “player” that kept showing up was “The White Rabbit” with an ID of 0xFFFF1337.

These player IDs corresponded to values in the “P” packets, which I could tell contained position data based on the disassembly. Each entry in the position packet contains the player ID, timestamp, coordinates, “trigger” (whether they are jumping or landing, as far as I can tell), and animation blend values (show running vs. jumping/falling animation).

[1337] <- 048a50110000001f27b5000...
00000000: 04 8A 50 11 00 00 00 1F  27 B5 00 00 00 00 00 11  ..P.....'.......
00000010: 45 2B 00 00 00 00 00 BA  5F 23 00 00 00 00 00 D9  E+......_#......
00000020: FE 30 00 00 00 00 00 00  00 00 00 00 50 37 13 FF  .0..........P7..
00000030: FF 63 0C 3F 00 00 00 00  00 B4 F8 21 00 48 77 FF  .c.?.......!.Hw.
00000040: FF F3 86 03 00 D7 EB 36  00 64 EE 1D 00 00 00 00  .......6.d......
00000050: 00 00 C8 00 00 00 50 8E  01 00 00 00 E3 59 8F 01  ......P......Y..
00000060: 00 00 00 30 38 1F 00 20  4E 00 00 A8 AD 1D 00 00  ...08.. N.......
00000070: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................

Position packet (server to client)

Using this data I could figure out the position of “The White Rabbit”: they were somewhere underneath the map! Using the fly/teleport hack I moved underground (which works the same as “flying” - just freeze the altitude with a negative value), and found the area where the character was running around. It’s actually accessible through a special wall in the spaceship area of the map:

This character continuously navigates an unusual path through this underground space, so at this point I figured I should record its coordinates to get an overhead view of what was happening.

By adding a hook to the position data packet parser I recorded all coordinates received for this character and saved them to a text file. Then I used matplotlib to plot the points on a graph, which revealed the flag:

Maze Runner

The timed maze race challenges were the most interesting to me in this set. The first challenge was to simply complete the race without timing out, which was impossible without hacking the game. You needed to reach each consecutive checkpoint in the maze race within 10 seconds of the last or you would fail, and it was only possible to get to the third checkpoint before timing out.

I noticed that there was one type of packet the server would send to the client when a checkpoint was reached, which only contained the checkpoint ID. This packet would trigger the gotCheckpoint method of the RaceManager class with the provided waypoint ID. The first thing I tried was to send one of these packets for each checkpoint to the client in quick succession - it did not trigger the flag, so all of the race handling code must implemented server-side. (I also tried the same attack via DLL injection before I had the network proxy set up, directly invoking the function for each checkpoint ID, but the result was the same.)

That didn’t mean that the checkpoint packets were useless, though - they were still valuable for telling me when the server believed my character had reached a waypoint.

Since the server needed to believe I reached each of the checkpoints legitimately in order to send me the flag, I first wanted to trace out a path from the start to the finish by extracting data from the client position packets, similar to the solution for the Map Radar challenge. I added “record”, “save recording”, “load recording”, and “play recording” commands to the network proxy to achieve this.

The “record” command would enable the logic for collecting the outgoing position coordinates, and the save command would save it into a JSON file. The load command would load coordinates from a JSON file so I could restore the recording later.

I disabled the in-game timeout logic so that I could see where all of the glowing checkpoints were within the maze, and then ran through it while recording my position. The resulting recording had 448 coordinates. Here’s what it looks like plotted on a graph:

The play command used server teleport packets to replay recorded positions. This was unusual but easier than implementing the client position packets, which I wasn’t able to get working right away.

The teleport packet type is what the server used to block illegitimate movements, as described earlier. It’s also used for the magic teleport gates found in the courtyard you land in after logging in. By injecting this type of packet I could force the game client to update the character’s position, causing the client to send out a position update to the server. Because the positions recorded were legitimate and played back with their original timing, the updated positions were accepted by the server.

It looks a little choppy, but it worked:

To beat the first challenge, I decreased the time between position updates in the playback. Reducing the interval from the original timing of about 0.2 seconds per packet to 0.16 seconds, I was able to beat the race without timing out.

However, this took about a minute to complete, and the next challenge was to complete the race within five seconds. Going any lower than the 0.16 second interval would cause the server to start rejecting the position packets, resulting in rubber banding.

M4z3 Runn3r

This was the hardest challenge, but the most fun to figure out. Here are the key points about how the maze race works, based on the previous solutions:

The race progress is kept track of server-side. In order to beat the race, the server must believe the player character has reached each of the checkpoints in succession. (Verified this by trying to teleport to each of the checkpoint coordinates in order and see if the server thought the race was completed anyway.)
There is a limit on how far the player can travel between position updates.
Trying to play back the path recording too quickly failed.

Also, until this point my attempts to directly send position packets to the server resulted in the server kicking me from the game. There was one important part of the position packet that I wasn’t handling properly yet, which was the timestamp. I was trying to just set it to 0 or a value similar to the last seen timestamp in a genuine position packet, but that didn’t work.

Before trying to handle timestamps correctly, I wanted to figure out why the position packets were getting rejected. I used Cheat Engine to disable the usual rate limit on position updates (one per 0.2 seconds), as well as the logic for only sending an update when the position in the client changes. This caused the game to constantly send out position updates. This never resulted in a kick, so I knew the problem was not how frequently I was sending the packets; it had to be something related to the distance.

Another quick hack I tried was to decrease the rate limit and artificially set a distance value to an amount I had seen work before. Normally, while sprinting and using the default rate limit of 0.2 seconds, I observed that the character would move just about 1 unit of distance. I tried turning the rate limit down to 0.01 while keeping the distance travelled at 1.0 units per packet, which also failed.

Based on all of this information, it seemed likely the server was enforcing a distance check based on the player’s speed, i.e., distance over time. Based on the observed 1 unit of movement per 0.2 seconds, the default sprint speed appeared to be 5 units per second. It seemed that defeating this check would require handling the position packet timestamps correctly.

In order to send the correct timestamp, I’d have to synchronize the time value with the game client’s time. The game client keeps track of time as the number of seconds that have passed since launching the game, and the number of seconds that have passed since the player last connected to a game server.

There is a “heartbeat” packet type constantly being sent back and forth between client and server. The game client reports its current time, and the server responds back with the same timestamp as well as the real date time as a Unix epoch timestamp, presumably the real time that it received the heartbeat. Each position update from the client to the server also includes a timestamp value based on the game client’s time. Once I understood how this worked, I could use the proxy to keep track of the time values and inject new values if needed.

Having control over the timestamp values enabled me to do some more checks. For one, I was able to determine that the timestamp value always had to be increasing. Going “back in time” or trying to freeze time would cause the server to boot me.

While running these tests a possibility occurred to me: what if I artificially sped up time so that I could pass the distance over time check? For example, if I wanted to move 100 units I could pretend that 20 seconds had passed by artificially increasing the timestamp value.

To implement this I added some more logic to the recording replay code. Based on a given units per second speed value, I would calculate a new scaled time interval between two positions that satisfied a distance over time check for a maximum speed of ~5 units per second. I rewrote the timestamps for all heartbeats and position packets, and made sure to keep track of how much extra time was accrued so that I never went backwards on the timestamp value and got kicked.

This turned out to be the solution! By tweaking this code I was able to get a finish time of 0.969 seconds using the original recording with 448 points.

Update (July 10th, 2020): I’ve uploaded the challenge solution code to GitHub at https://github.com/jamchamb/cscg2020-maze-proxy.

Making a GameCube memory card editor with Raspberry Pi

2018-12-03T00:00:00+00:00

I started this project because I wanted to be able to use save file modifications I was testing in the Dolphin emulator on real GameCube hardware. One, because some of the weirder features of the Animal Crossing NES emulator and exploit payloads might behave differently in an emulator, and two, because it’s more fun to see things working on a real console.

While it’s possible to transfer files between a PC and memory card using an “SDGecko” SD adapter and homebrew software on a Wii (or GameCube with something like the SD Launcher kit), I don’t have a Wii and it seemed like overkill to buy these things just to edit memory cards. There’s also this obscure GameCube memory card with a built-in USB adapter that allowed editing with custom PC software, but it seems like it went out of production a while ago.

Instead of hunting down out-of-print adapters or buying another console just to transfer some files, I thought it would be an interesting hardware reverse engineering project to figure out how GameCube memory cards work, and how I could simulate or edit them.

The first steps to understanding how the memory card works are:

Mapping out the physical interface between the memory card and console
Capturing electrical signals transmitted between the console and memory card to figure out the low level transmission protocol
Capturing transmitted data
Interpreting captured data to figure out the format of commands that the console sends to the memory card to perform read and write operations

After understanding how read and write commands are sent to the memory card, and how the memory card should respond, it will be possible to simulate a memory card or edit it directly.

Physical interface

Luckily there are already some diagrams of the memory card’s pins, such as in this post on the GC-Forever forum by Ashen: https://www.gc-forever.com/forums/viewtopic.php?t=666.

The first row of pins that enters the console are for power and ground connections. The second row has all of the pins involved in data transfer, and there’s also a “sense” pin in each row so that the console can tell when the card is plugged all the way in.

Here’s what the inside of a third-party memory card looks like, with the pins and main components labelled:

I assumed DI and DO stood for data in/data out, and the clock was the clock signal. INT and CS were less clear, as there was no protocol described to give context, but I guess that INT stands for interrupt, and CS is the “chip select” pin from SPI.

Capturing signals

To figure out what exactly these pins are used for and what the protocol is, I’d have to have some way to capture the signal going through them while the GameCube accesses the memory card.

To do this I soldered some thin enameled wire to each of the pins on the bottom row (DI, DO, CS, INT, CLK). Here’s how the first attempt turned out:

This card stopped getting a clean signal after a while, so the second time around I kept the wires shorter with more consistent lengths, and added some extra hot glue for support:

(Note that I also added wires to the power and ground pins - this was for directly connecting to the card from a Raspberry Pi later on.)

It’s a bit tricky to solder this up, and if you’re just looking to make the memory card editor it would be much cleaner to use a spare memory card slot from an actual console.

With the pins wired up, I could finally capture the signal from them using a logic analyzer. I used a saleae Logic 8 and connected to all of the data row pins, and then performed captures while doing things like inserting the card or copying and deleting files with the system memory card manager.

Besides INT, the data pins map neatly to the standard SPI channels, and using the SPI analyzer turned up a byte stream without much fuss:

DI - MOSI
DO - MISO
CS - CS / Enable
Clock - Clock

This bit of ASCII text that says “Broken File” appearing in the data stream from the card made it easy to check that the settings were correct:

Now it’s clear that the communication protocol is almost entirely standard SPI, save for the INT pin. (If you’re not familiar with SPI, this Sparkfun tutorial is a good resource: https://learn.sparkfun.com/tutorials/serial-peripheral-interface-spi/all.) Luckily the INT pin doesn’t appear to do much, and its signal doesn’t change often, so you don’t need to worry about it just yet.

Reading the traffic

With the low level transmission protocol figured out, it was time to figure out what the bytes being sent between the console and memory card meant. The first thing I wanted to figure out was the read commands so that I’d be able to copy out the contents of the card, whether directly or from a logic analyzer dump.

The “Yet Another Gamecube Documentation” (YAGCD) memory card section, while incomplete, was helpful for identifying the main commands.

For example, “read block” commands start with 0x52 and an offset address. Here’s the beginning of a command to read the first block, at offset zero:

After that there are 128 filler bytes with the value 0xFF (possibly to give the card time to start reading), and finally the card will begin to return data beginning from the requested offset as long as the console continues reading. Write commands are similar: they begin with 0xF2 and an offset address, and then continue with bytes to be written starting from that offset.

When a card is inserted in the console while using the memory card manager, all of the data blocks will be read. By writing a simple parser in Python I was able to reconstruct most of the content of my memory card by using the read commands and responses from a logic analyzer dump of this process.

Interfacing with card on Raspberry Pi

To read all of the contents of the memory card flash, and to start sending my own write commands to it, I’d need to interface directly with the card from hardware that I could program. This is where the Raspberry Pi comes in. It has some dedicated SPI pins that can be used via the Linux spidev interface. This allows me to write a program that will act like the console (SPI master) to the memory card.

Connecting the card to the RasPi requires adding the 3.3V power and ground pins, as seen in the picture of the second card I soldered, and connecting the dedicated SPI pins to the corresponding memory card pins.

RasPi SPI Clock -> Clock
RasPi SPI MOSI -> DI
RasPi SPI MISO -> DO
RasPi SPI CS -> CS

I wrote a simple program in Python that used a Python spidev library to read each block from the card. To get a reliable read you need to use a clock speed that the RasPi and memory card can handle. The average clock speed used by the console is 12.5 MHz, so I’ve been using 12 MHz as the clock speed.

It takes a little while to read every single block, but it worked and I was able to reliably read all the data out of the card. This also happened to reveal the source of the flash chips used on these third-party memory cards from Amazon: all of the ones I’ve looked at have leftover firmware for some Super-H based TV related device on them.

The final step was to implement write commands, as well as a few minor commands related to getting and setting the status of the card. I encountered a pretty painful bug at this point: I would send over a bunch of write commands, read the data back, and see that nothing changed. The first thing I tried was setting up an extra GPIO pin on the RasPi to use as the INT pin. It wasn’t necessary for getting read commands to work, but I thought maybe it was required for writes. That still didn’t fix it.

Finally, I hooked the logic analyzer up to the Raspberry Pi to debug my SPI traffic:

It turns out that I just never added the data I meant to send to the write command buffers! After fixing that, it still behaved oddly: only the first write command would work. This time I had to tweak the timing between commands, as well as use of the INT pin and status commands, to get all of the write commands to work in sequence.

The code is a little rough, but I’ve made it available at https://github.com/jamchamb/gc-memcard-adapter.

I had orignally planned to directly simulate memory cards with a Raspberry Pi after figuring out the communication protocol, but it turns out that it’s only practical to use as the SPI master (it’s possible to “bit bang” this without the direct SPI hardware support, but it would be too slow to meet the required 12.5 MHz clock speed). I’ll have to look at some other options for creating a memory card simulating device, but for now, here’s a video of me loading a Mega Man ROM and my Dolphin save file on a real GameCube to play Mega Man with the NES emulator:

Here's the Mega Man ROM running on real hardware finally pic.twitter.com/3i27xNO3nY
— James Chambers (@jamchamb_) November 18, 2018

Finding and exploiting hidden features of Animal Crossing’s NES emulator

2018-07-11T00:00:00+00:00

While looking for ways to activate the developer menus left over in Animal Crossing, including the NES emulator game selection menu, I found an interesting feature that exists in the original game that was always active, but never used by Nintendo. In addition to the NES/Famicom games that can be obtained in-game, it was possible to load new NES games from the memory card. I was also able to find a way to exploit this ROM loader to patch custom code and data into the game, allowing for code execution via the memory card.

Introduction - The NES console items

The normal NES games that you could obtain in Animal Crossing each came as an individual furniture piece that appeared as an NES console with a single game box on top of it. When you placed the item in your house and interacted with it, it would only play that one game. Pictured below are the Excitebike and Golf items.

There was also a generic “NES Console” item that did not feature any of the built-in games. You could buy this item from Redd, or sometimes obtain it through random events such as town bulletin-board message stating that one has been buried in a random location in town.

This item appeared as the NES console with no game boxes on top of it.

The problem with this item is that it was thought to be unplayable. Every time you interacted with it, you would just see a message indicating that you didn’t have any software to play.

It turns out that this generic console item actually attempts to scan the memory card for specially constructed files that contain NES ROM images! The NES emulator used to play the built-in games is apparently a complete, generic NES emulator for the GameCube, and it’s capable of playing most games thrown at it.

Before demonstrating these features, I’ll explain the process of reverse engineering them.

Finding the memory card ROM loader

Looking for dev menus

My original intention was to find code that activates the various developer menus, such as the map select menu or NES emulator game select menu. The “Forest Map Select” menu, which makes it easy to instantly load directly into different locations in the game, was easy enough to locate just by searching for the “FOREST MAP SELECT” string that appears at the top of the screen (as seen in various videos and screenshots online).

The “FOREST MAP SELECT” had a data cross-reference to a function called select_print_wait, which lead to a bunch of other functions that also had the select_* prefix, including one called select_init. These happen to be the functions that handle the map select menu.

The select_init function lead to another interesting function called game_get_next_game_dlftbl. This one ties together all the other menus and “scenes” that can run: the Nintendo logo screen, the title screen, the map select menu, the NES (Famicom) emulator menu, and so on. It runs early in the main procedure of the game, looks up which scene initialization function it should run, and finds its entry in a table data structure called game_dlftbls. This table holds references to the different scene handling functions, as well as some other data.

A close up of the first block of the function shows that it loads the “next game init” function, and then starts comparing it to a series of known init functions:

first_game_init
select_init
play_init
second_game_init
trademark_init
player_select_init
save_menu_init
famicom_emu_init
prenmi_init

One of the function pointers it checks for is famicom_emu_init, which is responsible for starting up the NES/Famicom emulator. By forcing the result of game_get_next_game_init to be famicom_emu_init or select_init in the Dolphin debugger, I can get the special menus to display. The next step is to figure out how these pointers would normally be set during runtime. All the game_get_next_game_init function does is load a value at offset 0xC of the first argument to game_get_next_game_dlftbl.

Tracking how these values got set across various data structures was a bit tedious, so I’ll just cut to the chase. The main things I found were:

When the game starts up normally, it goes through this sequence:
- first_game_init
- second_game_init
- trademark_init
- play_init
player_select_init will set the next init to select_init. This screen is supposed to allow for player selection just before map selection, but didn’t seem to be working correctly.

There was also one unnamed function that would set the emulator init function, but nothing appeared to set the init function to the player or map select inits.

At this point I realized I had another silly issue with how I loaded function names into IDA, where I was missing any function names that began with a capital letter due to the regular expression I used to cut out lines in the debug symbol file. The function that would set up famicom_emu_init looked related to scene transitions, and indeed its name turned out to be Game_play_fbdemo_wipe_proc.

Game_play_fbdemo_wipe_proc handles scene transitions such as screen wipes and fades. Under certain conditions, the screen transition leads from normal gameplay into the emulator display. That’s what will set the emulator init function.

Console furniture handling

What causes the screen transition handler to switch over to the emulator is actually the furniture item handler functions for the NES consoles. aMR_FamicomEmuCommonMove is called when a player interacts with one of the consoles.

When this function is called, r6 holds an index value corresponding to the numbers seen in the filenames of the NES games in famicom.arc:

01_nes_cluclu3.bin.szs
02_usa_balloon.nes.szs
03_nes_donkey1_3.bin.szs
04_usa_jr_math.nes.szs
05_pinball_1.nes.szs
06_nes_tennis3.bin.szs
07_usa_golf.nes.szs
08_punch_wh.nes.szs
09_usa_baseball_1.nes.szs
10_cluclu_1.qd.szs
11_usa_donkey3.nes.szs
12_donkeyjr_1.nes.szs
13_soccer.nes.szs
14_exbike.nes.szs
15_usa_wario.nes.szs
16_usa_icecl.nes.szs
17_nes_mario1_2.bin.szs
18_smario_0.nes.szs
19_usa_zelda1_1.nes.szs

(.arc is a proprietary file archive format.)

When r6 is non-zero, it’s passed along in a call to aMR_RequestStartEmu. This eventually triggers the emulator transition.

However, if r6 is zero, a function named aMR_RequestStartEmu_MemoryC is called instead. Setting the value to zero in the debugger, I got the “I don’t have any software” message. I didn’t recall the generic “NES Console” item right away to see if that’s what would cause r6 to be zero, but it is - index zero is used for the generic console item.

While aMR_RequestStartEmu just stores the index value to some data structure, aMR_RequestStartEmu_MemoryC does something much more complex…

That third code block calls aMR_GetCardFamicomCount and checks for a non-zero result, or else it will short-circuit past most of the interesting stuff on the left side of the function graph.

aMR_GetCardFamicomCount calls into famicom_get_disksystem_titles, which then calls into memcard_game_list, which is where things start to get really interesting.

memcard_game_list will mount the memory card and start looping through its file entries, checking some values on each one. By tracing through it in the debugger, I could see what it was comparing the values to on each of my memory card files.

Whether or not the function decides to load in a file depends on a few string comparison checks. First, it checks for the presence of the strings “GAFE” and “01”, which are the game ID and company ID, respectively. The 01 refers to Nintendo, “GAFE” refers to Animal Crossing. My guess is that it’s short for “GameCube Animal Forest English”.

Then it checks for the strings “DobutsunomoriP_F_” and “SAVE”. In this case, the first string should match, but not the second. “DobutsunomoriP_F_SAVE” happens to be the name of the file that stores save data for the built-in NES games. So, any file besides that with the “DobutsunomoriP_F_” prefix will be loaded.

By using the Dolphin debugger to skip over the “SAVE” string comparison and trick the game into thinking my “SAVE” file was OK to load, I got this menu to show up when I used the NES console:

I answered yes and attempted to load the save file up as a game, and got the built-in crash screen for the first time:

Cool! Now that I know it is in fact trying to load games from the memory card, I can start figuring out the format for the save files to see how to load up a real ROM.

One of the first things I tried to do was find out where the game name was being read from in the memory card file. By searching for the string “FEFSC” that appears in the “Would you like to play <name>?” message, I found the offset where it was being read from in the file: 0x642. By copying the save file, changing the filename to “DobutsunomoriP_F_TEST”, setting the bytes at offset 0x642 to “TESTING”, and re-importing the edited save, I could get the desired title name to display in the menu.

Adding multiple files in this format resulted in more options being added to the menu, as seen here:

Booting a ROM file

If aMR_GetCardFamicomCount returned non-zero, some memory is allocated on the heap, famicom_get_disksystem_titles is called again directly, and then a bunch of random offsets in a data structure get set. Instead of deciphering where all these values were going to be read, I started looking at the list of famicom functions.

famicom_rom_load turned out to be the right place to look. It handles ROM loading, whether from a memory card or the internal game resources.

The most significant thing in the “memory card load” block is that it calls memcard_game_load. This mounts the file on the memory card once again, reads it in, and parses it. The most important features of the file format become apparent here.

Checksum value

The first thing that happens after the file is loaded is a checksum calculation. The calcSum function is called, which is a very simple algorithm that sums up the values of all the bytes in the memory card data. The low eight bits of the result must be zero. So, to pass this check, you have to sum up the values of all the bytes in your original file, figure out what value to add to that sum to cause the low eight bits to be zero, and then set a checksum byte in your file to that value.

If the check fails, you get a message stating that the memory card couldn’t be read correctly, and nothing happens. During the debugging process, all I have to do is skip over this check.

Copying the ROM

Down near the end of memcard_game_load, another interesting thing happens. There are some more interesting code blocks between this and the checksum, but none of them will result in a branch that skips over this behavior.

If a certain 16-bit integer read from the card is non-zero, a function will be called to check for a compression header on a buffer. It checks for some proprietary Nintendo compression formats by looking for “Yay0” or “Yaz0” at the beginning of the buffer. If one of these is found, a decompression function is called. Otherwise, a simple memory copy function is performed. Either way, a variable called nesinfo_data_size is updated afterwards.

Another context clue here is that the ROM files for the built-in NES games use “Yaz0” compression, and have that string in their file header.

By observing the value that’s checked for zero and the buffer that’s passed to the compression check functions, I can quickly identify where in the memory card file the game is reading from. The zero-check is performed against part of a 32 byte buffer that’s copied from offset 0x640 in the file, which is likely a header for the ROM. Other parts of it are also checked throughout this function, and it’s where the game title is located (starting from the third byte of the header).

With the specific code path I hit, the ROM buffer is located immediately after this 32 byte header buffer.

This is enough information to attempt to construct a valid ROM file. I simply took one of the other Animal Crossing save files and edited it with a hex editor to change the name of the file to DobutsunomoriP_F_TEST and clear out the areas where I needed to insert data.

I used the Pinball ROM that’s already present in the game for this test run, and copied its content in after the 32 byte header for a test. Instead of calculating the checksum value, I also set some breakpoints so that I could just skip over calcSum, as well as observe the results of other checks that might cause a branch that skips past loading the ROM.

Finally, I imported the new file through the Dolphin memory card manager, restarted the game, and went to try it out on the console.

It worked! There were some graphical quirks caused by Dolphin settings that affect the graphics mode used by the NES emulator, but the game played just fine. (In newer Dolphin builds it should work by default.)

To be sure that other games would work, I tried out some more ROMs that weren’t already present in the game. Battletoads would start up, but not continue past the intro text (with some more tweaking later on, it did become playable). Mega Man, on the other hand, worked perfectly:

To be able to generate more ROM files that could load without any debugger intervention I’d have to start writing code and dig into the file format parsing some more.

The external ROM file format

Most of the critical file parsing happens in memcard_game_load. There are six main sections to the parsing code blocks in this function:

Checksum
Save file name
ROM file header
Unknown buffer that’s copied without any processing
Text comment, icon, and banner loader (for new save file creation)
ROM loader

Checksum

The low eight bits of the sum of all the byte values in the save file must be zero. Here’s some simple Python code that generates a checksum byte that can achieve that:

checksum = 0
for byte_val in new_data_tmp:
    checksum += byte_val
    checksum = checksum % (2**32)  # keep it 32 bit

checkbyte = (256 - (checksum % 256)) % 256
new_data_tmp[-1] = checkbyte

There’s probably a designated location to store the checksum byte, but just placing it in empty padding space at the very end of the save file works fine.

File name

Just to reiterate, the save file name must begin with “DobutsunomoriP_F_” and end with something other than “SAVE”. This filename is copied a couple of times, and in one case the letter “F” is replaced with “S”. This will be the name of save files for the given NES game (“DobutsunomoriP_S_NAME”).

ROM header

A direct copy of the 32 byte header is loaded into memory. A few of the values in this header are used to determine how to handle the upcoming sections. It mainly includes some 16-bit size values and packed setting bits.

If you trace the pointer that the header is copied to all the way to the beginning of the function and figure out its argument position, the function signature below reveals that its type is in fact MemcardGameHeader_t*.

memcard_game_load(unsigned char *, int, unsigned char **, char *, char *, MemcardGameHeader_t *, unsigned char *, unsigned long, unsigned char *, unsigned long)

Unknown buffer

A 16-bit size value from the header is checked. If it’s non-zero, that number of bytes will be directly copied from the file buffer into a new block of allocated memory. This advances a data pointer in the file buffer so that copying can resume from the next section later on.

Banner, icon, and comment

Another size value is checked in the header, and if it’s non-zero the compression check function is called. If necessary the decompression algorithm will run, and then SetupExternCommentImage is called.

This function handles three things: a “comment”, a banner image, and an icon. For each one there’s a code in the ROM header that indicates how it should be handled. The options are:

Use a default value
Copy from the ROM file banner/icon/comment section
Copy from an alternate buffer

The default value code will cause the icon or banner to be loaded from an on-disk resource, and the save file name and comment (a text description of the file) to be set to “Animal Crossing” and “NES Cassette Save Data” respectively. This is how it would look:

The second code value will just copy the game name from the ROM file (some alternative to “Animal Crossing”), and then attempt to find the string “] ROM” in the file comment and replace it with “] SAVE”. Presumably, the files Nintendo intended to release would have a name format like “Game Name [NES] ROM”, or something similar.

For the icon and banner it would attempt to figure out the format of the image, get a fixed size value according to that format, and then copy the image over.

For the last code value, the file name and description would be copied from another buffer without any changes, and the icon and banner would be loaded from the alternate buffer as well.

ROM

If you look carefully at the memcard_game_load screenshot of the ROM copying, the 16-bit value that’s checked for zero is left shifted by 4 bits (multiplied by 16) and then used as the size for the memcpy function when no compression is detected. This is another size value present in the header.

If the size is non-zero, the ROM data is checked for compression and then copied over.

The unknown buffer and the search for bugs

While getting new ROMs to load up was pretty cool, one of the most interesting things about this ROM loader to me was that it’s virtually the only thing in the game that accepts variable-size user input and copies it to different places in memory. Almost everything else uses fix-sized buffers. Things like names and letter text might seem like they’re variable in size, but the empty space is basically filled with space characters. Null-terminated strings are not used often, preventing some common memory corruption bugs such as using strcpy on a buffer that’s too small for the string being copied over to it.

I was really interested in finding a save file based exploit in the game, and this seemed like the best bet.

Most of the ROM file handling described above also used fixed-size copies, except for the unknown buffer and ROM data. Unfortunately, the code that handles this buffer allocates just as much space as is needed to copy it, so there’s no overflow, and setting really large ROM file sizes wasn’t very useful.

Still, I wanted to know what was going on with that buffer that would be directly copied without any handling.

The NES Info Tag processors

Revisiting famicom_rom_load, a few functions are called after a ROM gets loaded from the memory card or disk:

nesinfo_tag_process1
nesinfo_tag_process2
nesinfo_tag_process3

By tracing where the unknown buffer was copied to, I verified that it was being operated on by these functions. These start by calling nesinfo_next_tag, which goes through a simple algorithm:

Check if the given pointer matches the pointer in nesinfo_tags_end. If it’s less than nesinfo_tags_end, or nesinfo_tags_end is zero, it checks if the string “END” is present at the head of the pointer.
- If “END” has been reached, or the pointer has advanced up to or past nesinfo_tags_end, the function returns zero (null).
- Otherwise, the byte at offset 0x3 of the pointer is added to 4 and the current pointer, and that value is returned.

This suggests a tag format of some three letter name, a data size value, and data. The result is a pointer to the next tag, as the current tag will be skipped over (cur_ptr + 4 skips the three byte name and one byte size, and size_byte skips over the data).

If the result is non-zero, the tag processing function then goes through a series of string comparisons to figure out what tag to handle. Some of the tag names checked for in nesinfo_tag_process1 are VEQ, VNE, GID, GNO, BBR, and QDS.

If a tag is matched, some handler code is executed. Some of the handlers do nothing but print the tag to a debug message. Others have more complex handlers. After a tag is processed, the function attempts to get the next tag and continue processing.

Luckily, there are a bunch of descriptive debug messages that get printed out when these tags are found. They’re all in Japanese, so they have to be Shift-JIS decoded and translated first. The messages for QDS, for example, can say “Load Disk Save Area” or “Since it is the first play, keep the disk save area”. The messages for BBR say “battery backup load” or “because it is the first play, clear”.

Both of these codes also load some values from their tag data section and use them to calculate an offset into the ROM data and then perform copy operations. It’s apparent that they’re responsible for designating parts of the ROM memory that are related to saving state.

There’s also an “HSC” tag that has a debug message indicating that this handles high scores. It takes an offset into the ROM from its tag data, as well as an initial high score value. These tags can be used to mark where high score values are kept in the NES game’s memory, probably so that it can be saved and restored later.

These tags provide a fairly complex system for loading metadata about the ROMs. Even better, many of them result in memcpy calls based on values provided in the tag data.

Bug hunting

Most of the tags that caused memory manipulation weren’t going to be very useful for exploits, because they all had maximum offset and size values represented by 16-bit integers. This is all that would be needed to handle the 16-bit address space of the NES, but doesn’t provide much range for writing over useful targets such as function pointers or return addresses on the stack in the 32-bit address space of the GameCube.

However, there were a few cases where offsets or size values passed to memcpy could exceed 0xFFFF.

QDS

QDS actually loads a 24-bit offset from its tag data, as well as a 16-bit size value.

The good thing is that the offset is used to calculate the destination of a copy operation. The base address for the offset is the beginning of the loaded ROM data, the source of the copy is in the memory card ROM file, and the size is the given 16-bit size value from the tag.

A 24-bit offset has a maximum value of 0xFFFFFF, which is well above what’s needed to write outside the boundary of the loaded ROM data. There are some problems, though…

The first is that even though the maximum size value is 0xFFFF, it’s initially used to zero out a section of memory. If the size value is too high (not much more than 0x1000), this will actually zero out the “QDS” tag in the game’s code.

This is a problem because nesinfo_tag_process1 actually gets called twice. The first time, it will collect some information about space it needs to set up for save data. The QDS and BBR tags are not fully processed on the first run. After the first run, some space is set up for save data, and the function is called again. This time the QDS and BBR tags would be fully processed, but it’s impossible to match the tags again if the tag name strings have all been cleared out of memory!

So, setting a smaller size value can avoid that. The other problem is that the offset value can only go forwards in memory, and the NES rom data is located on the heap fairly close to the end of usable memory.

There are only a few heap entries that come after it, none of which had anything super useful like obvious function pointers.

Normally it might be possible to use this for a heap overflow exploit, but the malloc implemenation used for this heap actually adds a load of sanity check bytes into the malloc blocks. It’s possible to write over pointer values in the subsequent heap blocks. Without the sanity checking, this could be used to write an arbitrary value to an arbitrary location in memory when free is called on the affected heap block.

However, the malloc implementation used here will check for a specific byte pattern (0x7373) at the beginning of the next and previous blocks it’s going to manipulate upon the call to free. If it doesn’t find those bytes, it calls OSPanic and the game stops.

Without being able to influence those bytes to be present at some target location, it’s not possible to write there. In other words, you can’t write something to an arbitrary location without already being able to write something right next to that location. There could be some way to get the value 0x73730000 to be stored on the stack right before a return address, and the location referenced by the value you want to write to the destination address (it will also be checked as if it’s a pointer to a heap block), but it’d be difficult to find and exploit.

`nesinfo_update_highscore`

Another function involving the QDS, BBR, and HSC tags is nesinfo_update_highscore. The QDS, BBR, and OFS (offset) tag size values are used to calculate an offset to write to, and an HSC tag triggers a write to that location. This function runs for every frame processed by the NES emulator.

The maximum offset value per tag in this case, even for QDS, is 0xFFFF. However, during the tag processing loop, size values from BBR and QDS tags actually get accumulated. This means that multiple tags can be used to calculate just about any offset value. The limit is the number of tags that can be fit in the ROM tag data section of the memory card file, which has a maximum size of 0xFFFF as well.

The base address that the offset gets added to is 0x800C3180, the save data buffer. This is at a much lower address than the ROM data, providing more freedom in choosing where to write to. Writing over the function’s return address on the stack at 0x812F95DC, for example, would be fairly easy.

Unfortunately, this doesn’t work either. nesinfo_tag_process1 happens to also figure out the accumulated size of the offsets from these tags, and uses that size to initialize some space like this:

bzero(nintendo_hi_0, ((offset_sum + 0xB) * 4) + 0x40)

With the offset value I tried to calculate, this resulted in 0x48D91EC (76,386,796) bytes of memory getting wiped out, causing the game to crash spectacularly.

The PAT tag

It was starting to look hopeless, as all of the tags that made unsafe calls to memcpy would end up causing a crash before they could be useful. I decided to switch over to just documenting the purpose of each tag, and eventually reached the tags in nesinfo_tag_process2.

Most of the tag handlers in nesinfo_tag_process2 will never run because they only work when the pointer nesinfo_rom_start is not null. Nothing in the code ever sets that pointer to be non-null. It gets initialized to zero, and never gets used again. Only nesinfo_data_start is set when a ROM gets loaded, so this looks like a piece of dead code.

There is one tag that can still operate when nesinfo_rom_start is null, though: PAT. This is the most complex tag in the nesinfo_tag_process2 function.

It still uses nesinfo_rom_start as a pointer, but never performs a null check on it. The PAT tag will read through its own tag data buffer, processing codes that calculate offsets. Those offsets are added to the nesinfo_rom_start pointer to calculate a destination address, and then bytes are copied from the patch buffer into that location. This copy is performed with load and store byte instructions, rather than memcpy, which is why I hadn’t noticed it sooner.

Each PAT tag data buffer has an 8-bit type code, 8-bit patch size, and 16-bit offset value, followed by the patch data.

If the code is 2, the offset value is added to the current offset sum.
If the code is 9, the offset is shifted up 4 bits and added to the current offset sum.
If the code is 3, the offset sum is reset to 0.

The largest size an NES info tag can have is 255, so the largest possible PAT entry patch size is 251 bytes. Multiple PAT tags are allowed, though, so it’s possible to patch more than 251 bytes, as well as patch non-contiguous locations.

So long as there’s a series of code 2 or code 9 PAT sub-tags, the destination pointer offset continues to accumulate. It will be reset to zero when patch data gets copied, but using a patch size of zero avoids this. Writing this now, it’s clear that this could be used to calculate some arbitrary offset against the null pointer in nesinfo_rom_start by using lots of PAT tags.

However, there are two more code value checks…

If the code is between 0x80 and 0xFF, it gets added to 0x7F80 and then shifted up 16 bits. Finally, this is added to the 16-bit offset value and used as the destination address to patch.

This allows setting any address in the range 0x80000000 to 0x807FFFFF as the destination for the patch! That’s where a bunch of the code for Animal Crossing lives in memory. This means its possible to patch Animal Crossing’s code itself using the ROM metadata tags from a file on the memory card.

With a small loader patch, it’d be possible to easily load even larger patches to any address from the memory card.

For a quick test, I set up a patch that would turn on “zuru mode 2” (the game’s developer mode, described in my last blog post) when the user loads a ROM from the game card. It turns out that the button cheat combo only activates “zuru mode 1”, which doesn’t have access to all the same features that mode 2 has. With this patcher, it’s now possible to get full access to developer mode on real hardware using a memory card.

The patch tags will be processed as the ROM is loaded up.

After the ROM loads, exit the NES emulator to see the result.

It works!

Patcher info tag format

The info tags in the save file that performs this patch look like this:

000000 5a 5a 5a 00 50 41 54 08 a0 04 6f 9c 00 00 00 7d  >ZZZ.PAT...o....}<
000010 45 4e 44 00                                      >END.<

ZZZ \x00: An ignored beginning tag. 0x00 is the size of its data buffer: zero.
PAT \x08 \xA0 \x04 \x6F\x9C \x00\x00\x00\x7D: Patches 0x80206F9C to 0x0000007D.
- 0x08 is the size of the tag buffer.
- 0xA0, when added to 0x7F80, is 0x8020, the upper 16 bits of the destination address.
- 0x04 is the size of the patch data (0x0000007D).
- 0x6F9C is the lower 16-bits of the destination address.
- 0x0000007D is the patch data.
END \x00: The end marker tag.

If you want to experiment with creating patcher or ROM save files yourself, I have some simple code at https://github.com/jamchamb/ac-nesrom-save-generator for generating the files. A patch like the one above can be generated with the following command:

$ ./patcher.py Patcher /dev/null zuru_mode_2.gci -p 80206F9c 0000007D

Arbitrary code execution

With this tag it’s possible to gain arbitrary code execution in Animal Crossing.

There’s one last hurdle: using patches against data works fine, but something’s wrong with patching code instructions.

While the patches do get written, the game continues to execute the old instructions that were there before. It seems like a caching issue, and in fact it is. The GameCube CPU had instruction caches, as seen in https://en.wikipedia.org/wiki/Nintendo_GameCube_technical_specifications.

To figure out how the cache could be cleared, I started looking up cache related functions in the GameCube SDK documentation, and found ICInvalidateRange. This function will invalidate cached blocks of instructions at a given memory address, allowing modified instruction memory to execute with the updated code.

Without a way to get initial code to run, it’d still be impossible to call ICInvalidateRange, though. Getting successful code execution will require one more trick.

While looking over the malloc implementation to figure out if a heap overflow exploit was possible, I learned that the malloc implementation functions could be switched out dynamically through a data structure and function named my_malloc. my_malloc would load a pointer to the current malloc or free implementation function from a static location in memory, and then call that function while passing along whatever arguments were given to my_malloc.

The NES emulator used my_malloc heavily to allocate and free memory for NES ROM-related data, so I knew it would be triggered multiple times around the same time that the PAT tags get processed.

Because my_malloc would load a pointer from memory and then branch to it, I could alter the control flow of the program just by overwriting the pointer for the current malloc or free functions. Instruction caching would not prevent this from running, as none of the instructions in my_malloc need to be changed.

Cuyler, the developer of the Dōbutsu no Mori e+ fan translation project, implemented a loader in PowerPC assembly and demonstrates using it to inject new code in this video: https://www.youtube.com/watch?v=BdxN7gP6WIc. (Dōbutsu no Mori e+ was the last iteration of Animal Crossing on GameCube, which has the most updates and was only released in Japan.) After being injected with PAT tags, the loader can read much larger patches from the memory card, bypassing the size restrictions of the tag info section in ROM files. In the demonstration video it loads in some code that allows the player to spawn any object by typing its ID into a letter and then pressing the Z button.

With that, it will be possible to load mods, cheats, and homebrew using a regular copy of Animal Crossing on a real GameCube.

Update: The previous video has been taken down, so here’s another example of injecting custom code that prints text to the screen and in-game debug console:

Reverse engineering Animal Crossing’s developer mode

2018-06-09T00:00:00+00:00

Last summer I began reverse engineering Animal Crossing for the GameCube to explore the possibility of creating mods for the game. I also wanted to document the process to create tutorials for people interested in ROM hacking and reverse engineering. In this post I explore the developer debugging features that are still left in the game, and how I discovered a cheat combo that can be used to unlock them.

`new_Debug_mode`

While looking around at some leftover debug symbols, I noticed functions and variable names that contained the word “debug”, and thought it would be interesting to see what debug functionality might be left in the game. If there were any debugging or developer features I could activate, it might also help with the process of creating mods.

The first function I took a look at was new_Debug_mode. It’s called by the entry function, which runs right after the Nintendo trademark screen finishes. All it does is allocate a 0x1C94 byte structure and save its pointer.

After it gets called in entry, a value of 0 is set at offset 0xD4 in the allocated structure, right before mainproc is called.

To see what happens when the value is non-zero, I patched the li r0, 0 instruction at 80407C8C to li r0, 1. The raw bytes for the instruction li r0, 0 are 38 00 00 00, where the assigned value is at the end of the instruction, so you can just change this to 38 00 00 01 to get li r0, 1. For a more reliable way to assemble instructions, you could use something like kstool:

$ kstool ppc32be "li 0, 1"
li 0, 1 = [ 38 00 00 01 ]

You can apply this patch in the Dolphin emulator by going to the “Patches” tab of the game’s properties and entering it like so:

Setting this value to 1 caused an interesting looking graph to appear at the bottom of the screen:

It looked like a performance meter, with the little bars at the bottom of the screen growing and shrinking. (Later on when I looked up the names of the functions that draw the graph, I found that they do in fact display metrics for CPU and memory usage.) This was neat, but not particularly useful. Setting the value above 1 actually stopped my town from loading up, so it didn’t seem like there was much else to do with this.

Zuru mode

I started to look around at other references to debug-related things, and saw something called “zuru mode” pop up a few times. Branches to code blocks that had debug functionality often checked a variable called zurumode_flag.

In the game_move_first function pictured above, zzz_LotsOfDebug (a name I made up) only gets called if zurumode_flag is non-zero.

Looking for functions related to this value yields the following:

zurumode_init
zurumode_callback
zurumode_update
zurumode_cleanup

At first glance they’re all a bit obscure, twiddling around various bits at offsets in a variable called osAppNMIBuffer. Here’s a first look at what these functions do:

`zurumode_init`

Set zurumode_flag to 0
Check some bits in osAppNMIBuffer
Store a pointer to the zurumode_callback function in the padmgr structure
Call zurumode_update

`zurumode_update`

Check some bits in osAppNMIBuffer
Conditionally update zurumode_flag based on these bits
Print out a format string to the OS console.

This kind of thing is usually useful for giving context to the code, but there were a bunch of unprintable characters in the string. The only recognizable text was “zurumode_flag” and “%d”.

Guessing it might be Japanese text using a multi-byte character encoding, I ran the string through a character encoding detection tool and found out it was Shift-JIS encoded. The translated string just means “zurumode_flag has been changed from %d to %d”. That doesn’t provide much new information, but knowing about the use of Shift-JIS does, as there are many more strings in the binaries and string tables that use this encoding.

`zurumode_callback`

Calls zerumode_check_keycheck
Check some bits in osAppNMIBuffer
Prints value of zurumode_flag somewhere
Calls zurumode_update

zerumode_check_keycheck didn’t show up before because of the different spelling.. what is it?

A huge complex function that does lots more bit twiddling on values without names. At this point I decided to back off and look for other debug-related functions and variables, as I wasn’t even sure what the significance of zuru mode was. I also wasn’t sure what “key check” meant here. Could it be a cryptographic key?

Back to debug

Around this time I noticed that there was an issue with the way I loaded the debug symbols into IDA. The foresta.map file on the game disc contains a bunch of addresses and names for functions and variables. I hadn’t noticed initially that the addresses for each section started over at zero, so I just set up a simple script to add a name entry for each line in the file.

I set up new some IDA scripts to fix up the symbol map loading for the different sections of the program: .text, .rodata, .data, and .bss. The .text section is where all the functions are, so I set the script to automatically detect functions at each address when setting a name this time.

For the data sections, I set it to create a segment for each binary object (such as m_debug.o, which would be compiled code for something called m_debug), and set up space and names for each piece of data. This gives me much more information than I had before, although I now had to manually define the data type for each piece of data, as I set each data object to be a simple byte array. (In hindsight it would have been better to at least assume any data with a size that’s a multiple of 4 bytes contained 32-bit integers, as there are so many of them, and many contain addresses to functions and data that are important for building up cross-references.)

While looking through the new .bss segment for m_debug_mode.o, I saw some variables like quest_draw_status and event_status. These are interesting because I want to get debug mode to display some more useful stuff than the performance graph. Luckily, there were cross-references from these data entries to a huge piece of code that checks debug_print_flg.

Using the Dolphin debugger, I set a breakpoint on the function where debug_print_flg gets checked (at 8039816C) to see how the check works. The breakpoint never hit.

Let’s check why: this function is called by game_debug_draw_last. Guess what value is checked before conditionally calling it? zurumode_flag. What the heck is it?

I set a breakpoint on that check (80404E18) and it broke immediately. The value of zurumode_flag was zero, so it would normally skip calling this function. I NOPped out the branch instruction (replaced it with an instruction that does nothing) to see what would happen when the function does get called.

In the Dolphin debugger you can do this by pausing the game, right-clicking on an instruction, and then clicking “Insert nop”:

Nothing happened. Then I checked what has happening inside the function, and found another branch statement that could short circuit past all of the interesting stuff at 803981a8. I NOPped that out as well, and the letter “D” appeared at the top right corner of the screen.

There was a bunch more interesting looking code in this function at 8039816C (I called it zzz_DebugDrawPrint), but none of it was getting called. If you look at the graph view of this function, you can see that there’s a series of branch statements that skip over blocks throughout the entire function:

By NOPping out more of these branch statements, I started to see different things get printed to the screen:

The next question is how to activate these debug features without modifying the code. Also, zurumode_flag appears again for some branch statements made in this debug draw function. I added another patch so that zurumode_flag is always set to 2 in zurumode_update, because it’s usually compared specifically with 2 when it’s not being compared with 0. After restarting the game, I saw this “msg. no” message displayed at the top right of the screen.

The number 687 is entry ID of the most recently displayed message. I checked this using a simple table viewer I made early on, but you can also check it with a full GUI string table editor I made for ROM hacking. Here’s what the message looks like in the editor:

At this point it was clear that figuring out zuru mode was no longer avoidable; it’s tied directly into the debugging features of the game.

Zuru mode revisited

Returning to zurumode_init, it initializes a few things:

0xC(padmgr_class) is set to the address of zurumode_callback
0x10(padmgr_class) is set to the address of padmgr_class itself
0x4(zuruKeyCheck) is set to the last bit of a word loaded from 0x3C(osAppNMIBuffer).

I looked into what padmgr means, and it’s short for “gamepad manager”. This suggests there could be a special key (button) combination to enter on the gamepad to activate zuru mode, or possibly some special debugging device or development console feature that could be used to send a signal to activate it.

zurumode_init only runs the first time the game is loaded (pressing reset button doesn’t trigger it).

Setting a breakpoint at 8040efa4, where 0x4(zuruKeyCheck) is set, we can see that during boot without holding down any keys, it’s going to be set to 0. Replacing this with 1 causes an interesting thing to happen:

The letter “D” shows up in the upper right corner again (green instead of yellow this time), and there’s also some build info:

[CopyDate: 02/08/01 00:16:48 ]
[Date: 02-07-31 12:52:00]
[Creator:SRD@SRD036J]

A patch to have 0x4(zuruKeyCheck) always set to 1 on start:

8040ef9c 38c00001

This seems like it’s the correct way to get zuru mode initialized. After that, there may be different actions we need to take in order to get certain debug information to display. Starting up the game and walking around and talking to a villager didn’t show any of the displays mentioned previously (besides the letter “D” in the corner).

The likely suspects are zurumode_update and zurumode_callback.

`zurumode_update`

zurumode_update is first called by zurumode_init, and then repeatedly gets called by zurumode_callback.

It checks the last bit of 0x3C(osAppNMIBuffer) again and then updates zurumode_flag based on its value.

If the bit is zero, the flag is set to zero.

If not, the following instruction runs with r5 being the full value of 0x3c(osAppNMIBuffer):

extrwi r3, r5, 1, 28

This extracts the 28th bit from r5 and saves it into r3. Then 1 is added to the result, so the final result is always 1 or 2.

zurumode_flag is then compared to the previous result, depending on how many of the 28th and last bits are set in 0x3c(osAppNMIBuffer): 0, 1, or 2.

This value is written to zurumode_flag. If it didn’t change anything, the function ends and returns the current value of the flag. If it does change the value, a much more complex chain of code blocks executes.

A message in Japanese is printed: this is the “zurumode_flag has been changed from %d to %d” message mentioned earlier.

Then a series of functions are called with different arguments depending on whether the flag was changed to zero or not. The assembly for this part is tedious, so the pseudo code of it looks like this:

if (flag_changed_to_zero) {
    JC_JUTAssertion_changeDevice(2)
    JC_JUTDbPrint_setVisible(JC_JUTDbPrint_getManager(), 0)
} else if (BIT(nmiBuffer, 25) || BIT(nmiBuffer, 31)) {
    JC_JUTAssertion_changeDevice(3)
    JC_JUTDbPrint_setVisible(JC_JUTDbPrint_getManager(), 1)
}

Notice that if the flag is zero, JC_JUTDbPrint_setVisible is given an argument of 0. If the flag is not zero AND bit 25 or bit 31 are set in 0x3C(osAppNMIBuffer), the setVisible function is passed an argument of 1.

This is the first key to activating zuru mode: the last bit of 0x3C(osAppNMIBuffer) must be set to 1 in order to make the debug displays visible and set zurumode_flag to a non-zero value.

`zurumode_callback`

zurumode_callback is at 8040ee74 and is probably called by a function related to the gamepad. Setting a breakpoint on it in Dolphin debugger, the callstack shows that it is indeed called from padmgr_HandleRetraceMsg.

One of the first things it does is run zerucheck_key_check. It’s complex, but overall it seems to read and then update the value of zuruKeyCheck. I decided to see how that value is used in the rest of the callback function before going any further into the keycheck function.

Next it check some bits in 0x3c(osAppNMIBuffer) again. If bit 26 is set, or else if bit 25 is set and padmgr_isConnectedController(1) returns non-zero, the last bit in 0x3c(osAppNMIBuffer) is set to 1!

If neither of those bits are set, or if bit 25 is at least set but padmgr_isConnectedController(1) returns 0, then it checks if the byte at 0x4(zuruKeyCheck) is 0. If it is, then it will zero out the last bit in the original value and write it back to 0x3c(osAppNMIBuffer). If not, then it still sets the last bit to 1.

In pseudo-code this looks like:

x = osAppNMIBuffer[0x3c]

if (BIT(x, 26) || (BIT(x, 25) && isConnectedController(1)) || zuruKeyCheck[4] != 0) {
    osAppNMIBuffer[0x3c] = x | 1   // set last bit
} else {
    osAppNMIBuffer[0x3c] = x & ~1  // clear last bit
}

After that, if bit 26 is not set, it shorts to calling zurumode_update and then finishes.

If it is set, then if 0x4(zuruKeyCheck) is not zero, it loads up a format string where it appears that it’s going to print out: “ZURU %d/%d”.

Recap

Here’s what happens:

padmgr_HandleRetraceMsg calls the zurumode_callback. My guess is that “handle retrace message” means it has just scanned key presses on the controller. Each time it scans, it may call a series of different callbacks.

When zurumode_callback runs, it checks the current key (button) presses. This seems to check for a specific button or combination of buttons.

The last bit in the NMI Buffer is updated based on specific bits in its current value, as well as the value of one of the zuruKeyCheck bytes (0x4(zuruKeyCheck)).

Then zurumode_update runs and checks that bit. If it’s 0, the zuru mode flag will be set to 0. If it’s 1, the mode flag is updated to 1 or 2 based on whether bit 28 is set.

The three ways to activate zuru mode are:

Bit 26 is set in 0x3C(osAppNMIBuffer)
Bit 25 is set in 0x3C(osAppNMIBuffer), and a controller is connected to port 2
0x4(zuruKeyCheck) is not zero

osAppNMIBuffer

Wondering what osAppNMIBuffer was, I started by searching for “NMI”, and found references to “non-maskable interrupt” in the context of Nintendo. It turns out that the entire variable name also shows up in the developer documentation for the Nintendo 64:

osAppNMIBuffer is a 64-byte buffer that is cleared on a cold reset. If the system reboots because of a NMI, this buffer is unchanged.

Basically, this is a small piece of memory that persists across soft reboots. A game can use this buffer to store whatever it wants as long as the console is powered on. The original Animal Crossing game was actually released on Nintendo 64, so it makes sense that something like this would show up in the code.

Switching over to the boot.dol binary (everything above is from foresta.rel), there are a lot of references to osAppNMIBuffer in the main function. A quick look shows that there are series of checks that can result in various bits of 0x3c(osAppNMIBuffer) getting set with OR operations.

Interesting OR operand values to look out for would be:

Bit 31: 0x01
Bit 30: 0x02
Bit 29: 0x04
Bit 28: 0x08
Bit 27: 0x10
Bit 26: 0x20

Remember that bits 25, 26, and 28 are especially interesting: 25 and 26 determine whether zuru mode is enabled, and bit 28 determines the level of the flag (1 or 2). Bit 31 is also interesting, but primarily seems to be updated based on the values of the others.

Bit 26

First up: at 800062e0 there’s an ori r0, r0, 0x20 instruction on the buffer value at 0x3c. This would set bit 26, which always results in zuru mode being enabled.

For the bit to be set, the 8th byte returned from DVDGetCurrentDiskID has to be 0x99. This ID is located at the very beginning of the game disc image, and is loaded up at 80000000 in memory. For a regular retail release of the game, the ID looks like this:

47 41 46 45 30 31 00 00    GAFE01..

Patching the last byte of the ID to 0x99 causes the following to happen when starting up the game:

And in the OS console, the following is printed:

06:43:404 HW\EXI_DeviceIPL.cpp:339 N[OSREPORT]: ZURUMODE2 ENABLE
08:00:288 HW\EXI_DeviceIPL.cpp:339 N[OSREPORT]: osAppNMIBuffer[15]=0x00000078

All of the other patches can be removed, and the letter D also appears in the top right corner of the screen again, but none of the other debug displays are activated.

Bit 25

Bit 25 is used in conjunction with performing the port 2 controller check. What causes it to be enabled?

This turns out to have the same check used for bit 28: the version must be greater than or equal to 0x90. It bit 26 was set (ID is 0x99), both of these bits will also be set, and zuru mode will be enabled anyway.

If the version is between 0x90 and 0x98, though, zuru mode is not immediately enabled. Recalling the check made in zurumode_callback, it will only be enabled if bit 25 is set and padmgr_isConnectedController(1) returns non-zero. Once a controller is plugged into port 2 (the argument to isConnectedController is zero-indexed), zuru mode gets activated. The letter D and the build info display on the title screen, and… pressing buttons on the second controller controls the debug displays!

Some buttons also do things beside changing the display, such as increasing the speed of the game.

`zerucheck_key_check`

The last mystery is 0x4(zuruKeyCheck). It turns out that this value gets updated by the giant complex function shown before:

Using the Dolphin debugger, I was able to determine that the value checked by this function is a set of bits corresponding to button presses on the second controller. The button press trace is stored in a 16-bit value at 0x2(zuruKeyCheck). When there’s no controller plugged in, the value is 0x7638.

The 2 bytes containing flags for the controller 2 button presses are loaded and then updated near the beginning of zerucheck_key_check. The new value is passed in with register r4 by padmgr_HandleRetraceMsg when it calls the callback function.

Down near the end of zerucheck_key_check, there’s actually another place where 0x4(zuruKeyCheck) is updated. It didn’t appear in the list of cross-references because it’s using r3 as the base address, and we can only figure out what r3 is by looking at what it’s set to any time this function is about to be called.

At 8040ed88 the value of r4 is written to 0x4(zuruKeyCheck). It’s loaded from the same location and then XORd with 1 just before that. What this should do is toggle the value of the byte (really just the last bit) between 0 and 1. (If it’s 0, the result of XORing it with 1 will be 1. If it’s 1, the result will be 0. Look up the truth table for XOR to see this.)

I didn’t notice this behavior while watching the memory values before, but I’ll try breaking on this instruction in the debugger to see what’s happening. The original value is loaded at 8040ed7c.

Without touching any buttons on the controllers, I don’t hit this breakpoint during the title screen. To reach this code block, the value of r5 must be 0xb before the branch instruction that comes before it (8040ed74). Of the many different paths that lead up to that block, there’s one that will set r5 to 0xb before it, at 8040ed68.

Note that in order to reach the block that sets r5 to 0xB, r0 must have been equal to 0x1000 just before. Following the blocks up the chain to the beginning of the function, we can see the constraints necessary to reach this block:

8040ed74: r5 must be 0xB
8040ed60: r0 must be 0x1000
8040ebe8: r5 must be 0xA
8040ebe4: r5 must be less than 0x5B
8040eba4: r5 must be greater than 0x7
8040eb94: r6 must be 1
8040eb5c: r0 must not be 0
8040eb74: Port 2 button values must have changed

Here we reach the point where the old button values are loaded and the new values are stored. Afterwards there are a couple of operations applied to the new and old values:

old_vals = old_vals XOR new_vals
old_vals = old_vals AND new_vals

The XOR operation will mark all of the bits that have changed between the two values. The AND operation then masks the new input to unset any bits that are not currently set. The result in r0 is the set of new bits (button presses) in the new value. If it’s not empty, we’re on the right path.

For r0 to be 0x1000, the 4th out of the 16 button trace bits must have just changed. By setting a breakpoint after the XOR/AND operation I can figure out which button press causes this: it’s the START button.

The next question is how to get r5 to start out as 0xA. r5 and r6 are loaded from 0x0(zuruKeyCheck) at the beginning of the key check function, and updated near the end when we don’t hit the code block that toggles 0x4(zuruKeyCheck).

There are a few places just before where r5 gets set to 0xA:

8040ed50
8040ed00
8040ed38

8040ed38

8040ed34: r0 must be 0x4000 (B button was pressed)
8040ebe0: r5 must be 0x5b
8040eba4: r5 must be greater than 0x7
same as before from here on…

r5 must start at 0x5b

`8040ed00`

8040ecfc: r0 must be 0xC000 (A and B pressed)
8040ebf8: r5 must be >= 9
8040ebf0: r5 must be less than 10
8040ebe4: r5 must be less than 0x5b
8040eba4: r5 must be greater than 0x7
same as before from here on…

r5 must start at 9

`8040ed50`

8040ed4c: r0 must be 0x8000 (A was pressed)
8040ec04: r5 must be less than 0x5d
8040ebe4: r5 must be greater than 0x5b
8040eba4: r5 must be greater than 0x7
same as before from here on…

r5 must start at 0x5c

It seems there’s some kind of state between button presses, and then a certain sequence of button combos need to be entered, ending with START. It seems like A and/or B come just before START.

Following the code path that sets r5 to 9, a pattern emerges: r5 is an incrementing value that can either be increased when the correct button press value is found in r0, or reset to 0. The weirder cases where it’s not a value between 0x0 and 0xB occur when handling multi-button steps, such as pressing A and B at the same time. A person trying to input this combo usually isn’t going to press both buttons at the exact same time the pad trace occurs, so it has to handle either button being pressed before the other.

Continuing with the different code paths:

r5 is set to 9 when RIGHT is pressed at 8040ece8.
r5 is set to 8 when C-stick right is pressed at 8040eccc.
r5 is set to 7 when C-stick left is pressed at 8040ecb0.
r5 is set to 6 when LEFT is pressed at 8040ec98.
r5 is set to 5 (and r6 to 1) when DOWN is pressed at 8040ec7c.
r5 is set to 4 when C-stick up is pressed at 8040ec64.
r5 is set to 3 when C-stick down is pressed at 8040ec48.
r5 is set to 2 when UP is pressed at 8040ec30.
r5 is set to 1 (and r6 to 1) when Z is pressed at 8040ec1c.

The current sequence is:

Z, UP, C-DOWN, C-UP, DOWN, LEFT, C-LEFT, C-RIGHT, RIGHT, A+B, START

One more condition is checked before the Z check: while the newly pressed button must be Z, the current flags must be 0x2030: the left and right bumpers must also be pressed (they have values of 0x10 and 0x20). Also, the UP/DOWN/LEFT/RIGHT are the D-pad buttons, not analog stick.

The cheat code

The full combo is:

Hold L+R bumpers and press Z
D-UP
C-DOWN
C-UP
D-DOWN
D-LEFT
C-LEFT
C-RIGHT
D-RIGHT
A+B
START

It works! Attach a controller to the second port and enter the code, and the debug displays will show up. After that you can start pressing buttons on the second (or even third) controller to start doing things.

This combo will work without patching the version number of the game. You can even use this on a regular retail copy of the game without any cheat tools or console mods. Entering the combo a second time turns the zuru mode back off.

The “ZURU %d/%d” message in zurumode_callback is used to print out the status of this combination if you enter it when the disk ID is already 0x99 (presumably for debugging the cheat code itself). The first number is your current position in the sequence, matching r5. The second is set to 1 while certain buttons in the sequence are held down, these might correspond to when r6 is set to 1.

Most of the displays don’t explain what they are on the screen, so to figure out what they’re doing you have to find the functions that handle them. For example, the long line of blue and red asterisks that appear at the top of the screen are placeholders for displaying the status of different quests. When a quest is active some numbers will appear there, indicating the state of the quest.

The black screen that shows up when you press Z is a console for printing debug messages, but specifically for low level stuff such as memory allocation and heap errors or other bad exceptions. The behavior of fault_callback_scroll suggests it may be for displaying those errors before the system is rebooted. I didn’t trigger any of these errors, but I was able to cause it to print a couple of garbage characters with some NOPs. I think this would be really useful for printing custom debug messages later on:

After doing all this, I found out that getting debug mode by patching the version ID to 0x99 is already known: https://tcrf.net/Animal_Crossing#Debug_Mode. (They also have some good notes on what the various displays are, and more things you can do using a controller in port 3.) As far as I can tell, the cheat combination has not been published yet, though.

That’s all for this post. There are still some more developer features that I’d like to explore, such as the debug map screen and NES emulator select screen, and how to activate them without using patches.

I’ll also be posting write ups about reversing the dialog, event, and quest systems for the purpose of making mods.

Update: The slides for the talk I did on this can be found here.