A common technique for adding data to an existing Mach-O is to simply append the data to the end of the binary. This is the classic self-extracting archive trick, and it’s used by various tools for compiling/bundling scripting languages along with their interpreter into a single executable file. While this trick works, there is a major limitation: You cannot codesign the binary.
1 | $ clang -o helloworld helloworld.c |
Now you might be thinking: “What if I codesign it first, then append the data?” As you might expect, the signature is invalidated once data is appended (obviously it would be bad if two different files could have the same signature).
1 | $ clang -o helloworld helloworld.c |
The exact reason having data appended to a binary is not strictly valid is somewhat obscure, but essentially the __LINKEDIT
segment data must be at the end of the file. This is also the segment which will be modified to include the code signature data directly.
So if we can’t appended more arbitrary data after the __LINKEDIT
segment data, where can we put it so that it’s strictly valid? Well if you read the title, you probably already know the answer: In a new segment.
Getting Started
To get started on this project, let’s make some sample code to test with. In my case, I’ll just make some C code that gets a pointer to the Mach-O header in memory, and simply finds my __CUSTOM,__custom
segment section and writes the data to stdout. You could also read the Mach-O binary from the disk, but this is more-efficient since the segment will already be mapped into memory. I should note that this code uses the 64-bit structures only (you could add 32-bit support, but 32-bit code on macOS is obsolete at this point anyhow) and was only testing on 64-bit Intel and Apple Silicon binaries.
1 |
|
Now if we compile and run our binary we see nothing is output since we haven’t added our custom segment yet.
1 | $ clang -o main main.c |
In order to test our code before we get to manually modifying our binary, we can have clang
link in a new segment of seemingly unused data for us to work with.
1 | __attribute__((section("__CUSTOM,__custom"))) |
1 | $ clang -o main-data main.c data.c |
That string we forced into our custom segment gets printed to stdout (including the null byte, naturally, since it’s a C-string).
Alright, so now we know our C code can work, if we can just insert a new segment after compiling and linking the binary.
Modifying Existing Binaries
Now for the tricky part, what do we need to do to insert a new segment into a Mach-O? Again the knowledge is somewhat obscure, but knowing __LINKEDIT
data must be last and reading through the loader.h
kernel header and comparing it to some real Mach-O examples makes it clear.
- The
__LINKEDIT
segment data must be at the end of the file.- Actually,
dyld
will reject any binary with segment data after__LINKEDIT
.
- Actually,
- We need to add a new
segment_command_64
into the header.- Such a command would normally be inserted before the
__LINKEDIT
command by clang, though this is not currently a hard requirement (we will do what clang does below anyway).
- Such a command would normally be inserted before the
- We will have to shift the
__LINKEDIT
segment data down to make room for our new segment data, and also shift the offsets and addresses in the command. - There are a few other load commands which can reference data within the
__LINKEDIT
segment, which might also need shifting.dyld_info_command
symtab_command
dysymtab_command
linkedit_data_command
Alright, onto the code. For simplicity I will use Python with the macholib
module for my proof-of-concept. As there’s only a handful of structure you need to parse, this could be done in other languages fairly easily without the need for full Mach-O parsing. I also skipped FAT binary support, though it wouldn’t be too hard to support that too.
1 | #!/usr/bin/env python3 |
I also made a simple text file, to hold the data I’m going to append (in my case without the trailing newline character).
1 | Hello, World! |
Now we just have to run it like so to make a modified copy of our main
binary from above.
For Intel, you can just do this:
1 | $ ./appendsection.py main main-appended __CUSTOM __custom sectfile.txt |
For Apple Silicon, codesigning is requires, so you have to do this:
1 | $ cp main main.tmp |
Then you can see it in action:
1 | $ ./main-appended |
Nice, our segment data was printed exactly as we created it in the file!
Now for the real moment of truth, can we codesign it?
1 | $ codesign -fs - main-appended |
Yep, singing worked perfectly! Of course, if you’re using Apple Silicon, you already know this because you had to sign it above.
This post was updated updated on 2022-09-22 to set the correct align value on larger section data and to add info for Apple Silicon usage.
Comments