blog.nebk.xyz

Random knowledge, projects, and musings by Sam Ellicott

08 Sep 2023

Binary Output from a C/C++ Function

This is a repost from my old site.

Just a quick post to demonstrate a script that can be used to generate the binary form of a C/C++ function on Linux. It prints both the disassembled code and a “C array” (array of hexadecimal bytes suitable for a .h file) of the function.

Motivation - Testing Executable Stacks

Update: I was asked why I would care about getting the binary codes for a compiled function. This code came from a discussion in cnlohr’s Discord server about executing code on the stack. There was some back and forth on whether Linux defaults allow that behavior. The easiest way to test the assumptions was to grab some simple code from a precompiled C function, put it into an array on the stack, then jump to it as a function pointer and see what happens.

Basically can we do the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// some code here...

// typedef for casting our precompiled buffer to a function
typedef int (*IntFuncHandle)(void);
int foo(void) {
    // buffer of compiled code that matches a particular function definition
    // i.e. returns an int and takes no arguments
    unsigned char code_array = {/* compiled code here*/};
    // cast to function pointer 
    IntFuncHandle stack_code = (IntFuncHandle) code_array;
    // call the function
    return stack_code();
}
// more code here...

This is mostly useful as a thought experiment on how a just-in-time (JIT) compiler would work (or for writing viruses).

We can solidify this code with an actual program and a precompiled function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <stdint.h>
#include <stdio.h>

typedef int (*IntFuncHandle)(void);
int main(void) {
  /* compiled code for the function
   * intFunc(void) { return 42; }
   *
   * <intFunc>:
   * 1119:	55                   	push   %rbp
   * 111a:	48 89 e5             	mov    %rsp,%rbp
   * 111d:	b8 2a 00 00 00       	mov    $0x2a,%eax
   * 1122:	5d                   	pop    %rbp
   * 1123:	c3                   	ret
   */
  // Output binary
  unsigned char code_buff[] = {0x55, 0x48, 0x89, 0xe5, 0xb8, 0x2a,
                               0x00, 0x00, 0x00, 0x5d, 0xc3};
  // treat the code as a function
  IntFuncHandle stack_code = (IntFuncHandle)code_buff;
  // call the function
  int ret = stack_code();
  // print the returned value
  printf("return value: %d\n", ret);
  return 0;
}

raw code

Now, we can compile the code and see what happens.

$ gcc exe_stack_test.c; ./a.out
Segmentation fault (core dumped)

Interesting, it seems as though the default behavior is to disallow this behavior (probably a good thing, because, you know, viruses.) It raises the interesting question on whether this is possible, it certainly seems possible. As one might expect, since I am posing the question that the answer is yes. After looking through the gcc manual, the option -z execstack does what we want, recompiling with that option produces the expected result.

$ gcc -z execstack exe_stack_test.c; ./a.out
return value: 42

Neat! Now let’s look into how we were able to extract the precompiled C function.

Reading Compiled Opcodes

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#! /bin/bash

# print_function_binary functionName binaryName
# Author: Samuel Ellicott 2022-11-16
# Outputs the disassembled and hex form of a C function to the command line 

function_name=$1
binary_name=$2

# Get the function memory offset and length
function_entry=`objdump -t $binary_name | grep -P "$function_name([^@]|$)" | grep '.text'`
output=$(echo $function_entry | sed -En -e 's/^(\w*)(\s*[.[:alnum:]]*){3}\s*(\w*).*$/0x\1 0x\3/p')
read begin length <<<$output
end=$(( begin + length ))

# get the function file offset
code_offset=$(readelf -l $binary_name | grep LOAD | head -n 1 | sed -En -e 's/^\s*LOAD\s*\w*\s*(\w*).*$/\1/p')

file_begin=$(( begin - code_offset ))

echo "Dissembled Output"
objdump -d --start-address=$begin --stop-address=$end $binary_name | tail -n +7
echo ""

echo "C Output"
xxd -i -s $file_begin -l $length $binary_name

raw code

We will now use this piece of example code to test the script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// main.c
int intFunc(void) {
    return 42;
}

int main(void) {
    int ret_val;
    ret_val = intFunc();
    return ret_val; 
}

Compile the example code:

$ gcc main.c

We can now extract intFunc as a set of op-codes

$ ./print_function_binary intFunc a.out
Dissembled Output
000000000040050d <_Z7intFuncif>:
  40050d:	55                   	push   %rbp
  40050e:	48 89 e5             	mov    %rsp,%rbp
  400511:	89 7d fc             	mov    %edi,-0x4(%rbp)
  400514:	f3 0f 11 45 f8       	movss  %xmm0,-0x8(%rbp)
  400519:	8b 45 fc             	mov    -0x4(%rbp),%eax
  40051c:	5d                   	pop    %rbp
  40051d:	c3                   	retq   

C Output
unsigned char a_out[] = {
  0x55, 0x48, 0x89, 0xe5, 0x89, 0x7d, 0xfc, 0xf3, 0x0f, 0x11, 0x45, 0xf8,
  0x8b, 0x45, 0xfc, 0x5d, 0xc3
};
unsigned int a_out_len = 17;

We can do the same for main

$ ./print_function_binary main a.out
Dissasembled Output
000000000040051e <main>:
  40051e:	55                   	push   %rbp
  40051f:	48 89 e5             	mov    %rsp,%rbp
  400522:	48 83 ec 10          	sub    $0x10,%rsp
  400526:	0f 57 c0             	xorps  %xmm0,%xmm0
  400529:	bf 2a 00 00 00       	mov    $0x2a,%edi
  40052e:	e8 da ff ff ff       	callq  40050d <_Z7intFuncif>
  400533:	89 45 fc             	mov    %eax,-0x4(%rbp)
  400536:	8b 45 fc             	mov    -0x4(%rbp),%eax
  400539:	c9                   	leaveq 
  40053a:	c3                   	retq   

C Output
unsigned char a_out[] = {
  0x55, 0x48, 0x89, 0xe5, 0x48, 0x83, 0xec, 0x10, 0x0f, 0x57, 0xc0, 0xbf,
  0x2a, 0x00, 0x00, 0x00, 0xe8, 0xda, 0xff, 0xff, 0xff, 0x89, 0x45, 0xfc,
  0x8b, 0x45, 0xfc, 0xc9, 0xc3
};
unsigned int a_out_len = 29;