• Function Calling Conventions

    2008-03-29

    原文链接:http://www.delorie.com/djgpp/doc/ug/asm/calling.html 

    GCC follows certain rules in generating and calling its functions. If you are writing portable C or C++ code, you never need to know about these rules. However, if you are writing assembly language or nonportable code that depends on these rules, you need to know what they are. This document attemps to describe them, and gives some examples.

    Notes

    This document assumes a familiarity with assembly language. The assembler code used here is written in the AT&T syntax, as used by GNU as. If you're using an Intel-syntax assembler, like nasm, you'll have to translate appropriately.

    What's described here are GCC's standard calling conventions. Many can be changed by using options like -mregparm, but that's outside the scope of this document.

    These conventions apply to C. C++ introduces several additional complications (such as class pointers and name mangling), some of which can change between compiler versions. Thus, I suggest that asm functions called from C++ code be declared as extern "C". This will cause C calling conventions to be used.

    Writing Assembly-Language Functions

    Naming

    In DJGPP, a function's assembly-language name is the same as its C name, with an underscore ("_") prepended. Thus, the C function foo would be named _foo in assembly language. (This is in fact true for all symbol names, such as variables.) C++ has some much more complicated rules.

    Registers

    GCC requires that some registers not change across a function call. If you want to use these registers in an assembly function, you must save and restore their values. They are:

    • %ebx
    • %esi
    • %edi
    • %ebp (Footnote)
    • The segment registers %ds, %es and %ss

    Other registers are available for your use (though some have other special uses; read on).

    Return Value

    • Integers (of any size up to 32 bits) and pointers are returned in the %eax register.
    • Floating point values are returned in the 387 top-of-stack register, st(0).
    • Return values of type long long int are returned in %edx:%eax (the most significant word in %edx and the least significant in %eax).
    • Returning a structure is complicated and rarely useful; try to avoid it. (Note that this is different from returning a pointer to a structure.)

    If your function returns void (e.g. no value), the contents of these registers are not used.

    Memory Model

    Very simple; all pointers and addresses are near. You need not worry about segments (unless your asm code has a specific need to do so). Your function should end with a simple ret.

    Stack Layout

    When GCC calls your function, it pushes all its arguments onto the stack, starting with the last one, then issues a call. This means that, on entry to your function, the stack is laid out like this:

              Last argument
    ...
    4(%esp) First argument
    (%esp) Return address

    Sizes and layouts of individual arguments are as follows:

    • Integers up to 32 bits and pointers are pushed as a single longword.
    • long long int is pushed as two longwords; the least significant is pushed last (and so is located first in memory).
    • float and double are pushed as a double-precision value, occupying 8 bytes.
    • long double is pushed as an extended-precision value followed by 2 bytes of padding, totalling 12 bytes.
    • As before, structures are more complicated and best avoided.

    These rules also apply to functions which take a variable number of arguments (like printf). As with any variadic function, the function must find its own way of determining how many arguments were actually passed (usually based on one of the required args).

    The stack below the return address is available for temporary storage, but be sure to decrement %esp appropriately. Memory below %esp may be overwritten asynchronously, by interrupt handlers and such. Restore its value when exiting, so that the return works correctly. You may also push and pop at will.

    You may modify your arguments in place if you wish; they will not be reused by the caller. Do not, however, attempt to pop them; the caller handles this.

    Calling C Functions From Assembly Language

    An assembly language function may wish to call a function written in C, either your own or one from the standard library. The same rules already explained apply; you just see them from the other side.

    First, you push the function's arguments (if any) onto the stack, last argument first. See above for the formats used. (Floating point values are usually most easily handled by making space on the stack and then executing a store instruction; i.e. subl $8,%esp; fstpl (%esp).)

    Use a simple call instruction to call the function.

    You are responsible for removing the arguments you have pushed. They may have changed, so you may not reuse them. You need not, however, discard them at once; it may be more convenient when calling several functions to leave the arguments on the stack and pop them all together at the end. addl n,%esp is an efficient way to do this. It may also be convenient in this case to use %ebp as a frame pointer, since it need not change all the time. (The C compiler does this.)

    The return value may be found as detailed above.

    Expect the registers %eax, %ecx, and %edx, as well as the floating-point stack, to have changed. Standard library functions may modify the %gs register, and the _far* functions may modify %fs. Other registers will be preserved.

    Conclusion

    These are the basic calling conventions used by GCC; however, there are special cases, optional modifications, etc. that can apply in situations not covered here. In this case, gcc -S is your best friend - from assembly output, you can usually figure out the rules. Also helpful is the GCC source: see i386.h and i386.md in config/i386. They are well commented.

    Examples

    These examples show how some C functions might be rewritten in assembly language. While the functions here are pretty useless themselves, hopefully they demonstrate the principles involved.

    i_avg

    This function finds the average of two ints.

    In C:

    int i_avg (int a, int b)
    {
    return (a + b) / 2;
    }

    In assembler:

    # Stack layout on entry:
    #
    # 8(%esp) b
    # 4(%esp) a
    # (%esp) return address

    .globl _i_avg
    _i_avg:
    movl 4(%esp), %eax
    addl 8(%esp), %eax # Add the args
    sarl $1, %eax # Divide by 2
    ret # Return value is in %eax

    ull_avg

    This function finds the average of two unsigned long longs. (The unsigned-ness is a cop-out to make the division easier, since there is no sard instruction.)

    In C:

    unsigned long long ull_avg (unsigned long long a, unsigned long long b)
    {
    return (a + b) / 2;
    }

    In assembler:

    # Stack layout on entry:
    #
    # (high half of b)
    # 12(%esp) b
    # (high half of a)
    # 4(%esp) a
    # (%esp) return address

    .globl _ull_avg
    _ull_avg:
    movl 4(%esp), %eax
    movl 8(%esp), %edx
    addl 12(%esp), %eax # Add low halves
    adcl 16(%esp), %edx # Add high halves, with carry
    shrdl $1, %edx, %eax
    shrl $1, %edx # Divide by 2
    ret # Return value is in %edx:%eax

    ld_avg

    This function finds the average of two long doubles.

    In C:

    long double ld_avg (long double a, long double b)
    {
    return (a + b) / 2.0;
    }

    In assembler:

    # Stack layout on entry:
    #
    # 16(%esp) b (12 bytes)
    # 4(%esp) a (12 bytes)
    # (%esp) return address

    two:
    .double 0f2.0 # The number 2.0

    .globl _ld_avg
    _ld_avg:
    fldt 4(%esp)
    fldt 16(%esp)
    faddp %st(1), %st(0) # Add
    fdivl two # Divide %st(0) by 2.0
    ret # Result is in %st(0)

    array_of_42

    This function prints a message, allocates an array of a given size, and fills it with 42.

    In C:

    #include <stdio.h>
    #include <stdlib.h>

    int *array_of_42 (int n)
    {
    int *p;
    int i;
    printf("Creating array of %d elements\n", n);
    p = malloc(n * sizeof(int));
    if (!p)
    return NULL;
    for (i = 0; i < n; i++)
    p[i] = 42;
    return p;
    }

    In assembler:

    # Stack layout:
    #
    # 8(%ebp) n
    # 4(%ebp) return address
    # (%ebp) pushed %ebp
    format:
    .string "Creating array of %d elements\n"

    .globl _array_of_42
    _array_of_42:
    # We will use a frame pointer, since %esp will be changing.
    pushl %ebp
    movl %esp, %ebp
    pushl %edi # Save %edi, since we'll be using it.
    # First, print the message.
    pushl 8(%ebp)
    pushl $format
    call _printf
    addl $8, %esp # Remove printf args from the stack
    # Allocate the array.
    movl 8(%ebp), %ecx
    shll $2, %ecx # Multiply by 4, which is sizeof(int)
    pushl %ecx
    call _malloc
    popl %ecx # Remove malloc args from stack
    orl %eax, %eax # Test return value
    jz finished
    # Fill the array, using stosl.
    movl %eax, %edi # Address
    movl %eax, %edx # and save a copy
    movl 8(%ebp), %ecx # Count
    movl $42, %eax # Fill value
    rep
    stosl
    movl %edx, %eax # Return value
    finished:
    popl %edi # Restore it
    popl %ebp
    ret

    Footnotes

    About using %ebp: Note that using -fomit-frame-pointer does not release you from the requirement to preserve %ebp. With this option enabled, the compiler may use %ebp for something else, but it still expects it to be saved across function calls. Furthermore, some functions cannot be compiled by GCC without a frame pointer.

    收藏到:Del.icio.us




    Tag:

发表评论

您将收到博主的回复邮件
记住我