#### REGISTERS - Two sets of register files, each set with 16 registers of 32 bit, are available Register file A (A0through A15) and register file B (B0 through B15) - A0, A1, B0, B1, and B2 are used as conditional registers - · A4 through A7 and B4 through B7 are used for circular addressing - Registers A0 through A9 and B0 through B9 (except B3) are temporary registers. - A10 through A15 and B10 through B15 used are saved and later restored before returning from a subroutine. - A 40-bit data value can be contained across a register pair. The 32 least LSBs in even register (e.g., A2) remaining 8 bits in the 8 LSBs of the next-upper (odd) register (A3). - A similar scheme is used to hold a 64-bit double-precision value within a pair of registers (evenand odd). #### TMS320C6x ARCHITECTURE Figure: Functional block diagram of TMS320C6x #### TIMERS ### Two 32-bit timers can be used to - Time events - Count events - Generate pulses - Interrupt the CPU - Send synchronization events to the DMA - A timer can direct an external ADC to start conversion or the DMA controller to start a data transfer. - A timer includes a time period register, which specifies the timer's frequency; a timer counter register, which contains the value of the incrementing counter; and a timer control register, which monitors the timer's status. - The timer has two signaling modes and can be clocked by an internal or an ## TMS320C6X FAMILY PROCESSORS Family C6x include both fixed-point (e.g., C62x, C64x) and floating-point processors (e.g., C67x) - The C6x fixed-point processor is a 32-bit processor - First member of the C6x fixed-point digital signal processors is the TMS320C6201 (C62x), announced in 1997 - · Based on Very long instruction word (VLIW) and Harvard architecture - · C62x is not code-compatible with the previous generation of fixed-point processors - TMS320C6701 (C67x) floating-point processor was introduced as another member of the C6x family - The instruction set of the C62x fixed-point processor is a subset of the instruction set of the C67x processor ## PIPELINING #### The program fetch stage - PG: program address generate (in the CPU) to fetch an address - PS: program address send (to memory) to send the address - PW: program address ready wait (memory read) to wait for data - PR: program fetch packet receive (at the CPU) to read opcode from memory #### · The decode stage - DP: to dispatch all instructions in a FP to the appropriate functional units - DC: instruction decode. #### The execute stage - Six phases (with fixed point) to 10 phases (with floating point), due to delays (latencies) associated with the following instructions: - Multiply instruction, which consists of two phases due to one delay - Load instruction, which consists of five phases due to four delays # TI: TMS320 Family Processors First-generation TMS32010 digital signal processor in 1983, the TMS320C25 in 1986 and the TMS320C50 in 1991. These 16-bit processors—C1x, C2x, and C5x—are available with different features, such as faster execution speed are all fixed-point processors and are code compatible. The TMS320C30 floating-point processor was introduced in the late 1980s. The C31, C32, and the more recent C33 are all members of the C3x family of floating point processors. The C4x floating-point processors, introduced subsequently, are code compatible with the C3x processors and are based on the modified Harvard architecture. The TMS320C6201 (C62x), announced in 1997, is the first member of the The TMS320C6201 (C62x), announced in 1997, is the Historical Control of C6x family of fixed-point digital signal processors. Unlike the previous fixed-point processors, C1x, C2x, and C5x, the C62x is based on a very-long-instruction-word (VLIW) architecture, still using separate memory spaces for instructions and data as with the Harvard architecture # TMS320C6x ARCHITECTURE - · VLIW instruction architecture - Modified Harvard Memory Architecture - Internal memory includes a two-level cache architecture with 4kB of level 1 program cache (L1P), 4kB of level 1 data cache(L1D) and 64kB of RAM or level 2 cache for data/program allocation (L2) - On-chip peripherals include two multichannel buffered serial ports (McBSPs), two timers, a 16-bit host port interface (HPI), and a 32-bit external memory interface (EMIF) - . It requires 3.3V for I/O and 1.8V for the core (internal). - Internal buses include a 32-bit program address bus, a 256-bit program data bus to accommodate eight 32-bit instructions, two 32-bit data address buses, two 64-bit data buses, and two 64-bit store data buses. - With a 32-bit address bus, the total memory space is 2^32= 4GB, including four external memory spaces: CEO, CE1, CE2, and CE3 #### Internal memory block diagram 10:05 AM #### FUNCTIONAL UNITS - Eight independent functional units divided into two data paths. - Four floating/fixed-point ALUs (two .L and two .S), two fixed-point ALUs (.D units), and two floating/fixed-point multipliers (.M units). - Each path has - .M unit for multiply operations - .L unit for logical and arithmetic operations - .S unit for branch, bit manipulation, and arithmetic operations - .D unit for loading/storing and arithmetic operation. - The .S and .L units are for arithmetic, logical, and branch instructions. All data transfers make use of the .D units. - Arithmetic operations, such as subtract or add (SUB or ADD), can be performed by all the units except the .M units