Saturday, 27 July 2013
PIC32 bit by bit
I love to wander from one platfform to the other, depending on the type of project i might be working on...
ATTiny2313 , ATMega8 or 644, PIC16f886,887,PIC18, some 24f and dsPIC’s dsc’s, Cortex M3 more recently and the one who still blows my mind is the PIC32 (MX line, i been using).
First came across it properly in Pinguino’s board from Olimex ( and what a dev board, i have to tell you... Arduino has a lot to learn from there), which was quite a change.
So lets start with called my attention.
The primary advantages of the 32-bit PICs over the 8-bit uCs are that they are faster (max clock rate of 80 MHz compared to 40 MHz, which the DIP versions like the PIC32MX259f128b do), have more peripherals available, offer more program memory (flash) and data memory (RAM), and have significantly more computational capabilitries due to the 32-bit address and data buses and single-cycle multiply for 32-bit math.
As Jerry said in his blog, “ Microchip didn’t follow in the footsteps of most the uC vendors by going with an ARM architecture. Instead, they went with MIPS ( been around since mid-80’s and their cores are solid) , and the M4K core."
For a good comparing article between the ARM Cortex-M3 and the PIC32, check Jerry’s Blog ( http://www.gardnerdudes.com/blog/2012/04/09/pic32-review/ ).
The PIC32 is supposed to use a high-performance version of the Multiply and Divide hardware module. with its owns its own autonomous pipeline. Which means that, once a multiply or divide instruction is issued, the CPU may continue to fetch and execute next instructions while the MDU (Multiply and Divide Unit) performs calculations in parallel. There are some details like the fact that, if the CPU tries to access the result before the multiply or divide operation is complete, the CPU will stall until the operation is complete.
There are different cycle counts for multiply and divide operations.
16x16 or 32x16 multiply operations = 1 cycle
Other sizes = 2 cycles
Divide operation = 11 to 32 cycles. ( depends on the dividend operand size)
By default, the PIC32 executes 32-bit instructions, but it may use MIPS16e instructions. The MIPS16e instructions are 16-bit wide. and can save up to 40% of code size compared to the 32-bit instructions, at the cost of a reduction in performance ; however, with the 128-bit wide prefetch cache, Microchip says some applications see no adverse impact.
The PIC32 architecture brings also a set of registers called SET, CLEAR, and INVERT.
When you write to any of these registers, the PIC32 performs the read-modify-write operation in a single clock, allowing the ability to quickly manipulate I/O ports and bits. This means that, you can toggle any general purpose I/O pin at the SYSCLK speed! This atomic bit manipulation capability means that the SET, CLR, and INV operations cannot be interrupted.
For example, the LATA SFR is followed by LATACLR, LATASET, and LATAINV. To clear a group of bits in the LATA register, you would write the corresponding mask values into the LATACLR register.
Similarly, a write to the SET register would set the corresponding bits and a write to INV register would toggle the bits.
Two other instructions, Multiply-Add (MADD) and Multiply-Subtract (MSUB), are used to perform the multiply-accumulate and multiply-subtract operations. The MADD and MSUB operations are commonly used in DSP algorithms.
The MADD instruction multiplies two numbers and then adds the product to the current contents of the HI and LO registers.
Similarly, the MSUB instruction multiplies two operands and then subtracts the product from the HI and LO registers.
According to the chapter 3.2.2 “MULTIPLY/DIVIDE UNIT ”, the PIC32 core includes a Multiply/ Divide Unit (MDU) that contains a separate pipeline for multiply and divide operations.
The high-performance MDU consists of a 32x16 booth recoded multiplier, result/accumulation registers (HI and LO) and a divide state machine along with all the necessary multiplexers and control logic.
The first number shown (‘32’ of 32x16) represents the rs operand. The second number (‘16’ of 32x16) represents the rt operand. The PIC32 core only checks the value of the latter (rt) operand to determine how many times the operation must pass through the multiplier. The 16x16 and 32x16 operations pass through the multiplier once. A 32x32 operation passes through the multiplier twice.The MDU allows 16x16 or 32x16 multiply operation every clock cycle; 32x32 multiply operations can be issued every other clock cycle.
Appropriate interlocks are implemented to stall the issuance of back-to-back 32x32 multiply operations.
The multiply operand size is automatically determined by logic built into the MDU.
Divide operations are implemented with a simple 1 bit per clock iterative algorithm. Any attempt to issue a subsequent MDU instruction while a divide is still active causes an IU pipeline stall until the divide operation is completed.
Also, in its review of the architecture, its said:
The PIC32 execution unit implements a load/store architecture with single-cycle ALU operations (logical, shift, add, subtract) and an autonomous multiply/divide unit. The core contains thirty-two 32-bit General Purpose Registers (GPRs) used for integer operations and address calculation.
The execution unit includes among other things :
• 32-bit adder used for calculating the data address
• Address unit for calculating the next instruction address
• Leading Zero/One detect unit for implementing the CLZ and CLO instructions
• Arithmetic Logic Unit (ALU) for performing bitwise logical operations
• Shifter and store aligner
As i love audio, dsp and the maths involved, im leaving you with some example benchmarks between the dsPIC and PIC32 DSP capabilities. dsPIC is said to be much more efficient, despite the lower frequency of the core. Ill be getting a bit more into the dsPIC's soon.
ps: PIC32 vs TI stellaris http://electrodesigns.net/blog/pic32-vs-stellaris/