Techno-Plaza
Site Navigation [ News | Our Software | Calculators | Programming | Assembly | Downloads | Links | Cool Graphs | Feedback ]
 Main
   Site News

   Our Software

   Legal Information

   Credits

 Calculators
   Information

   C Programming

   Assembly

     Introduction

     Keyboard Input

     Basic Graphics

     C & Assembly

       Part I

       Part II

   Downloads

 Miscellaneous
   Links

   Cool Graphs

   Feedback Form

TIGCC Assembly Lessons

TIGCC Assembly Programming Lessons

Lesson 4: Integrating C and Assembly

Step 1 - Background Information

As we have seen from our first three lessons, assembly is cumbersome, terse, and difficult to read. Nobody writes full programs in assembly unless they have no other choice, or are a masochist.

When people first starting hacking their graphing calculators to write programs, these programs were written in assembly. This was due to the first reason, they had no other choice. There were no C (or any language for that matter) compilers, so if you wanted a fast program, you wrote it in assembly.

A few months after PlusShel (the first assembly shell for the TI-89) was written and we got the first assembly programs, some people starting porting gcc to the TI-89/92+. The result of their work is TIGCC, the fully integrated suite of tools we now use to develop programs for the TI-68k calculators. With the maturation of TIGCC, we now have an alternative to assembly programming, C.

Whether you like or do not like C, it is infinitely easier to write programs in than assembly. This is so true that almost no "assembly" programs written for the TI-68k calculators are actually written in assembly now. This is the wonderful result of the hard work of the TIGCC team.

Most people prefer C to assembly, and there are rarely any significant speed differences between writing in C and writing in assembly. The C compiler is so good at optimizing code that it can usually match or surpass the efficiency of any average to moderate skill assembly programmer. However, the C compiler is not perfect, and there are still times when people want to use assembly. This is fine, but even in these times, it is rarely to write an entire program. Instead, you usually just want to hand-optimize a particular function which needs to have optimum speed.

Rather than writing entire programs in assembly, we can combine assembly and C into the same program. In fact, we have been doing this all along, though we haven't formally discussed it. This lesson is setup to teach you how to integrate the two languages so we can have the best of both worlds.

Let's see an example. Start TIGCC and create a new project called mulu32b. Create a new C Source File called main, and an assembly source file called mulu32b. Do not copy the main.c from the first three lessons. Here is the code for our two files.

You can download the project files, source code, and program files from our archives.

main.c


#include <tigcclib.h>

void _mulu32b(unsigned long a, unsigned long b,
			  unsigned long *result) __attribute__((regparm(4)));

void _main(void) {
    unsigned long result[2];
    unsigned long a, b;

    // clear the screen
    clrscr();

    // get the first hex number
    printf("a: ");
    scanf("%lx", &a);
    printf("\n");

    // get the second hex number
    printf("b: ");
    scanf("%lx", &b);
    printf("\n");

    // print the expression
    printf("%lXh x %lXh =\n", a, b);
    _mulu32b(a, b, result);

    // if the upper longword is > 0, display it
    if (result[0] > 0) {
        printf("%lX", result[0]);
    }

    // print the lower longword
    printf("%lXh", result[1]);

    // wait for user input before exiting
    ngetchx();
}

mulu32b.s


    .text
    .xdef _mulu32b

|-------------------------------------------------------------------------------
| void _mulu32b(unsigned long a, unsigned long b, unsigned long *result);
|   multiply 2 32 bit unsigned operands to get a 64-bit unsigned result
|
|   input:  %d0.l - first operand
|           %d1.l - second operand
|   i/o:    %a0.l - 16 byte address to store result

_mulu32b:
    movem.l %d3-%d4,-(%sp)  | save registers

    moveq   #0,%d2          | reset counters
    moveq   #0,%d3
    moveq   #0,%d4          | upper 32-bits for shifting

1:
    tst.l   %d0             | base case, 0 means stop
    beq     3f              | goto done

    lsr.l   #1,%d0          | do we need to do an add
    bcc     2f              | if our bit was 0, then we do not

    add.l   %d1,%d3         | do the addition
    addx.l  %d4,%d2

2:
    lsl.l   #1,%d1          | 64-bit left shift
    roxl.l  #1,%d4

    bra     1b              | repeat

3:
    move.l  %d2,(%a0)+      | move result into address
    move.l  %d3,(%a0)

    movem.l (%sp)+,%d3-%d4  | restore registers
    rts

Build the program and sent it to TiEmu. It will look something like the screenshot. You can check the results using Windows calc (or some other calc program). Just remember that these numbers are all in hex.

Screenshot

 

Step 2 - Program Analysis

Since the purpose of this lesson is the integration of C and assembly, I'm going to assume you have a working knowledge of C. As with all our other programs, our program execution starts in the _main function in main.c. The difference here is that most of our program is also there.

void _mulu32b(unsigned long a, unsigned long b,
			  unsigned long *result) __attribute__((regparm(4)));

At the top of our main.c file, we have this function prototype. It's a pretty typical prototype except for the end. We have talked about calling conventions in the past. Here we need to tell the compiler what the calling conventions are for our assembly function, so it can pass the function arguments properly. By default, gcc passes all arguments to functions on the stack. This is fine, but we have chosen a different way. We are going to use the regparm(4) calling conventions. This means we will pass up to the first 4 data arguments in registers d0-d3, and up to the first 4 address arguments in registers a0-a3. To tell the compiler this, we put __attribute__((regparm(4))) at the end of our prototype. These are the same calling conventions used by the TIGCC library for the functions written in C.

I will go over quickly the code in main.c, but it's not really important. What is important is how we are able to call the _mulu32b function from our C code.

The program clears the screen, then asks the user to input 2 32-bit hex numbers. These numbers we be stored in the a and b variables. Our 64-bit result will be stored in the result array, with the most significant longword stored first. We display the two numbers the user input so we can confirm that we didn't make a typo, then call the _mulu32b function to do our multiplication. Finally, we display the result and wait for the user to press a key before exiting.

Just so we know how it works, let's go over the _mulu32b function real quick.

_mulu32b:
    movem.l %d3-%d4,-(%sp)  | save registers

    moveq   #0,%d2          | reset counters
    moveq   #0,%d3
    moveq   #0,%d4          | upper 32-bits for shifting

1:
    tst.l   %d0             | base case, 0 means stop
    beq     3f              | goto done

Okay, _mulu32b takes two 32-bit (longword) unsigned numbers and returns the result of the multiplication. The 68000 has a built-in multiply opcode, but it only works for 16 bit operands. If we want to multiply 2 32-bit operands, we have to do it ourselves. The result of multiplying 2 32-bit operands will yield a 64-bit result. This result is too big to fit in a register, so we will use the result array to return the result.

If you're wondering how we are going to do the multiplication, it's easier than it might appear. Think about how you would do a multiply on paper. You take your two numbers, and multiply the right-most digit of the bottom number across the top number. You move over one decimal place for the next multiplication, and so on. Then you add all these partial products together to get a final result. It is even easier on the machine, because we can use binary instead of decimal. This means our partial products will always be multiples of either 1 or 0. This means we don't have to do any partial products, but rather can just add our product to the result.

We start out by zero-ing the d2, d3, and d4 registers. These will act as our running sum, and as the upper 32-bits of our shift. The shift value is for the next partial product. Just like in a decimal multiply, for each new decimal place we multiply by, we shift the result left by 1 place. It's the same in binary. The only problem is, we need a 64-bit space. So we will use two registers for our shift, d1 for the lower 32-bits, and d4 for the upper.

Our process then is simple. We test our first operand against 0. If it's zero, then we are done. There is nothing more to add. Otherwise, we grab the right-most bit of our first operand. If it is zero, there is nothing to add to our partial sum, so we continue. Otherwise, we add the second operand to our running sum. Then we shift the second operand left one bit. The upper bit will get shifted into the d4 register. We repeat until our first operand is zero. Let's have a look at more code.

1:
    tst.l   %d0             | base case, 0 means stop
    beq     3f              | goto done

    lsr.l   #1,%d0          | do we need to do an add
    bcc     2f              | if our bit was 0, then we do not

Okay, here we check our base case, the first operand being 0. If it is, we're done, and we branch to 3. We haven't used local labels yet, so real quick, we can define as many of these #: labels as we want. And we can use b (back) and f (forward) to refer to the most recent one either before your code or after.

Here is where we grab the right-most bit of the first operand. We use the lsr (Logical Shift Right) opcode to shift the value right 1 bit. If this results in a carry, then the bit shifted was 1, and we need to do an addition to our partial sum. Otherwise, it must have been a 0, and we don't need to do the add.

    add.l   %d1,%d3         | do the addition
    addx.l  %d4,%d2

Here is the addition code. We have seen add before, but addx (ADD with eXtend) is new. It does the add, but also adds the X (eXtend) bit to the result and clears it. Most processor's have an instruction like this so that additions with carry can be done. We need to add the carry (which is the same as the extend bit here) to our upper 32-bit partial sum.

2:
    lsl.l   #1,%d1          | 64-bit left shift
    roxl.l  #1,%d4

    bra     1b              | repeat

Here we perform another shift. In this case, we need to shift our second operand left 1 bit. However, there is the catch that if we shifted off the high-bit, that it needs to become the low bit of our upper 32-bits. lsl (Logical Shift Left) performs the left shift for us. Just like with lsr, the shifted bit will end up in the carry flag. The extend flag is almost always the same as the carry, and it has the bit as well. Our next opcode, roxl (ROtate with eXtend Left) performs the remainder of our shift. It places the extend flag as the lower bit, and shifts the rest of the data left 1 bit. The high order bit will be shifted into the carry flag, however, this will always be zero.

3:
    move.l  %d2,(%a0)+      | move result into address
    move.l  %d3,(%a0)

    movem.l (%sp)+,%d3-%d4  | restore registers
    rts

Finally, we move our 64-bit result into the supplied array.

Step 3 - Conclusions

This is a good first step. We learned about how to specify calling conventions for assembly function calls, and how that allows us to call an assembly function from C.

We have seen examples all along on how to call C functions from assembly, but remember that it all comes down to calling conventions. In the next section, we will talk about inline assembly, and C calling conventions for calling C functions from assembly.

Continue with Part II

 

Copyright © 1998-2007 Techno-Plaza
All Rights Reserved Unless Otherwise Noted

Get Firefox!    Valid HTML 4.01!    Made with jEdit