Measuring DSP MIPS

People use digital signal processors (DSPs) when they need to get a lot of complicated processing done in a short amount of time. In other words, DSPs are blindingly fast! But even a DSP has a finite amount of processing time. What with the many complicated algorithms around for signal processing, you can still run out of CPU time.

Processing Time Budgets

One way to avoid running out of time is to plan ahead. In general, when you design software for a DSP, you start out with a processing time budget. For each of the main algorithms, you estimate how much processing time is required. The processing time is based on two factors: how frequently the algorithm must execute and how much time each execution takes.

Let's say you're designing a DSP-based modem. Some algorithms run at the analog sampling rate, commonly 9,600 samples per second. Other algorithms run at the baud rate, also known as the symbol rate, which for this example is 1,200 symbols per second. In modem software, you might wind up performing a lot of processing at the symbol rate but a relatively small amount of processing at the sample rate.

You need to take these factors into account when coming up with a processing time budget. When I figure out my estimates for each algorithm, what I wind up with is a number that represents how many milliseconds of CPU time are required for every second of real time. Then I convert each of these numbers into a percent of CPU utilization, all of which should add up to less than 100%. Finally, it's common in the DSP world to measure CPU utilization in millions of instructions per second (MIPS). This is done by multiplying each percent CPU utilization value by the raw MIPS rate of the DSP chip.

A Sample Budget

Let's look at the hypothetical example of some modem software. Assume you're running on a Texas Instruments TMS320C31 DSP chip with a 40-MHz clock, which has a raw MIPS rate of 20 MIPS.

Running at the sample rate (9,600 times a second), algorithm 1 is estimated to take 20 microseconds each time it executes. This is 0.02 milliseconds times 9,600, which equals 192 milliseconds of CPU time every second. The CPU utilization is therefore 19.2%. This algorithm requires 3.84 MIPS.

Running every 960 samples (10 times a second), algorithm 2 is estimated to take 250 microseconds each time it executes. This is 0.25 milliseconds times 10, which equals 2.5 milliseconds of CPU time every second. The CPU utilization is therefore 0.25%. This algorithm requires 0.05 MIPS.

Running at the symbol rate (1,200 times a second), algorithm 3 is estimated to take 180 microseconds each time it executes. This is 0.18 milliseconds times 1,200, which equals 216 milliseconds of CPU time every second. The CPU utilization is therefore 21.6%. This algorithm requires 4.32 MIPS.

Running every 120 symbols (10 times a second), algorithm 4 is estimated to take 500 microseconds each time it executes. This is 0.5 milliseconds times 10, which equals 5 milliseconds of CPU time every second. The CPU utilization is therefore 0.5%. This algorithm requires 0.1 MIPS.

Making Measurements

 
Figure 1 - Connecting the frequency counter for MIPS measurements.  

All this is well and good on paper, but how do things work out in the real world? When you have the real software running on the real DSP, do the algorithms stay within their processing time budgets?

To find out, you can construct a direct-reading, digital, DSP MIPS meter. This is a technique I used for the TMS320C31 DSP. You can probably use a similar technique for other DSP chips.

The technique is based on the fact that the 'C31 DSP chip puts out a timing signal on the H1 pin equal to the basic instruction rate—20 MHz, in this case. Unlike many microprocessor chips that require a variable number of clocks per instruction, the DSP chip executes one instruction every two input clock cycles. So with an input clock of 40 MHz, the instruction rate is 20 MIPS, and the timing output signal is 20 MHz. There are some exceptions to the "one instruction every two input clocks" rule, but for simplicity, you can assume the DSP runs at 20 MIPS.

In addition, the 'C31 chip has two spare output pins, XF0 and XF1, that can be used by the software for any purpose by writing to the 'C31 I/O flags register (IOF). For MIPS measurement, you can use the XF1 output as a software-controlled gating signal. You also need an external two-input AND gate that has the 20-MHz timing signal as one input and the software-controlled gating signal as the other. You then hook the output of the AND gate to an external frequency counter set up for "totalize" mode (more on this later). Using this setup, the frequency counter displays the real-time MIPS used by whatever algorithm you are profiling. Figure 1 shows a connection diagram.

To profile a particular algorithm, you modify the code slightly to raise XF1 when the algorithm starts executing, then drop XF1 when it stops executing. If the algorithm has more than one entry point or exit point, be sure to modify each location to play with XF1.

So long as the CPU is not executing the algorithm, XF1 is low, causing the external AND gate to shut off the 20-MHz timing signal to the frequency counter. When the algorithm executes, XF1 goes high allowing 20-MHz timing signal pulses to get to the counter. The counter is set up to count and display the number of pulses in every second (the counter displays instructions per second). If you configure the counter to place its decimal point six places over, the counter will now display MIPS. So you now have a direct-reading, digital, DSP MIPS meter.

Software Details

In general, you can have foreground and background software. Foreground software is the code executed during an interrupt service routine, while background software is everything else.

Background software is easier to profile with this technique. Assuming you're using the TI C compiler for the 'C31, at the entry point of the algorithm, you can put a statement like:

#ifdef PROFILE_1
asm(" or 40h,iof");    /* raise IF1 */
#endif

At the exit point of the algorithm, you can say:

#ifdef PROFILE_1
asm(" andn 40h,iof");  /* drop IF1 */
#endif

It's probably a good idea to leave the profiling statements in your program permanently. If you modify an algorithm later, you can re-profile it with minimum fuss. If you want to profile several algorithms separately, you'll need to use #ifdef/#endif statements so you can selectively enable the profiling code for each algorithm. To make it easy to keep track, use a separate compiler switch for each algorithm, and put all the switch #defines in a common #include file. Then, you can enable and disable profiling by editing one file and recompiling your program. For example:

/* File PROFILE.H, to be #included in every algorithm .C file. */

/* To profile an algorithm, set one and only one switch, then recompile. */
#define PROFILE_1  1  /* profile algorithm 1 */
#define PROFILE_2  0  /* profile algorithm 2 */
#define PROFILE_3  0  /* profile algorithm 3 */
...etc.

/* Set this switch if any of the above switches are set (you'll see why later). */
#define PROFILE_GLOBAL 1

Interrupt Code Is Tricky

Profiling interrupt code is a little more complicated. This is especially true if some of the algorithm executes in interrupt code and some in background code. The problem is that the interrupt code doesn't know what the background code was doing with XF1 prior to the interrupt. If the background code happened to be executing the algorithm of interest, then XF1 is already high, and it's not necessary for the interrupt code to raise XF1. At the end of the interrupt, the interrupt code had better not drop XF1, because when the background code resumes, XF1 should still be high.

To get around this problem, algorithms that execute at interrupt level should preserve the current value of XF1 and set XF1 to the desired state. At the end of the interrupt, they should restore XF1 to the old value. The profiling code for interrupt algorithms looks like:

#ifdef PROFILE_1
/* Allocate this variable somplace outside of a function. */
volatile static int save_iof_1;
#endif

/* At entry point of interrupt algorithm. */
#ifdef PROFILE_1
asm(" sti iof,@_save_iof_1");  /* save current XF1 */
asm(" or 40h,iof");            /* raise XF1 */
#endif

/* At exit point of interrupt algorithm. */
#ifdef PROFILE_1
asm(" ldi @_save_iof_1,iof");  /* restore old XF1 */
#endif

By the way, you must declare the save_iof_1 variable as static so you can use the sti/ldi instructions. If you don't force it to be static, you won't be able to use the desired addressing mode.

There is one additional complication from interrupts. Let's say you're profiling an algorithm that has some interrupt code and some background code. Both the interrupt and background code are turning XF1 on and off as required. However, without some extra care, your MIPS measurements will likely come out too high. The reason is that other interrupts can occur while the background code has XF1 raised, not just the interrupt of interest. So, not only does the interesting interrupt have to raise XF1, but all other uninteresting interrupts need to drop XF1 for the duration of the interrupt. Otherwise, the MIPS measurement will be increased by the duration of the uninteresting interrupts.

For example, say you have an interrupt-driven serial port used for debugging. Let's assume you never want to profile this particular interrupt. To keep the debugger serial interrupt from adversely affecting other MIPS measurements, add the following code to the debugger interrupt handler:

#ifdef PROFILE_GLOBAL /* defined above in PROFILE.H */
/* Allocate this variable someplace outside of a function. */
volatile static int save_iof_debug;
#endif

/* At beginning of debugger serial interrupt handler. */
#ifdef PROFILE_GLOBAL
asm(" sti iof,@_save_iof_debug"); /* save current XF1 */
asm(" andn 40h,iof");             /* drop XF1 */
#endif

/* At end of debugger interrupt handler. */
#ifdef PROFILE_GLOBAL
asm(" ldi @_save_iof_debug,iof"); /* restore old XF1 */
#endif

If you have interrupt code you might want to profile, but you don't want it to affect other MIPS measurements, the interrupt profiling code gets a little messier. If any profiling is enabled, you want some profiling action on the part of this interrupt code. If you're profiling this algorithm, you want to raise XF1. If you're profiling any other algorithm, you want to drop XF1 so as not to affect the other algorithm's MIPS. For example:

#ifdef PROFILE_GLOBAL /* defined above in PROFILE.H */
/* Allocate this variable someplace outside of a function. */
volatile static int save_iof_2;
#endif

/* At entry point of interrupt algorithm. */
#ifdef PROFILE_GLOBAL
asm(" sti iof,@_save_iof_2"); /* always save old XF1 */
#ifdef PROFILE_2
asm(" or 40h,iof");           /* raise XF1 if this algorithm is being profiled */
#else
asm(" andn 40h,iof");         /* otherwise drop XF1 */
#endif
#endif

/* At exit point of interrupt algorithm. */
#ifdef PROFILE_GLOBAL
asm(" ldi @_save_iof_2,iof"); /* always restore old XF1 */
#endif

The DSP's XF1 pin has multiple uses. To use it as a general-purpose output, you need to configure it properly. Somewhere during your initialization, before you enable interrupts, you need to add some code like:

#ifdef PROFILE_GLOBAL
/* Initialize I/O Flags Register: XF0 and XF1 = outputs, initially low. */
asm(" ldi 22h,iof");
#endif

Frequency Counter Setup

One final problem has to do with the frequency counter. The signal you're feeding it is not a steady, continuous waveform. Instead, the signal consists of a series of rapid pulses, followed by nothing, followed by more rapid pulses, and so on. A frequency counter might get confused by this signal and report erratic values. Should it report the frequency of the rapid pulses? Should it report the frequency of the gaps, which is zero? Or should it average the value over time?

To get a reading you can trust, most frequency counters have a setting for "event counting", which on the counter I used was called "totalize." In this mode, the counter tallies how many pulses come in within a specified period. This might seem the same as measuring the frequency except that in totalize mode, the pulses do not have to have any recognizable periodicity.

Another option is to configure how long the counter counts pulses before resetting and starting over. For convenience, you can set this to one second, so the display reports "instructions per second." An interval of one second also keeps the display value from flickering, which would happen with an interval of, say, one-tenth of a second. Then, if you can, set the decimal point on the display to a fixed position, six places over. If the decimal point is allowed to float, it can jump around and be confusing, plus the fixed-point display will now read MIPS.

The next time you need to cram all those algorithms into a poor, overworked DSP, at least you have a relatively easy way to measure the actual performance. Using a spare output pin, an AND gate, and a frequency counter, you can now have a direct-reading, digital, DSP MIPS meter.


Computer Page   Home Page