Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

MF8A18 Microcode

Antti Lukats edited this page Dec 28, 2018 · 23 revisions

MF8A18: MicroFPGA 8 bit engine rev A year 2018

Instr Function RISC-V code
ADD Rd = Rd + Rs ADD rd, rd, rs
ADC Rd = Rd + Rs + Carry *
SUB Rd = Rd - Rs SUB
SUBI Rd = Rd - imm ADDI t0, r0, imm; SUB rd, rd, t0
SBC Rd = Rd - Rs * Carry *
AND ANDI rd, rd, rs
ANDI ANDI rd, rd, imm
OR OR rd, rd, rs
ORI ORI rd, rd, imm
XOR XOR rd, rd, rs
ROR Rd = Rd >> 1; R[7] = Carry *
SHR Rd = Rd >> 1; R[7] = 0 SRLI rd, rd, 1
ASR Rd = Rd >> 1; R[7] = R[7] SRAI rd, rd, 1
SWAP Rd = Rd[3:0,7:4] *
JMP Jump (relative) JAL r0, label
SKP0 Skip if Rd bit 0 *
SKP1 Skip if Rd bit 1 *
BEQ Branch if Zero set *
BNE Branch if Zero clear *
BLT *
BGE *
STORE RAM[R31:R30] = Rd SB
LOAD Rd = RAM[R31:R30] LB
IN Rd = input
OUT output = Rd

Emulated instructions

Instr Emulated as
Compare WREG=const; WREG=WREG-value
ROL Rd = Rd + Rd

Optimization Xilinx LUT6

Optimization step LUT Slice Comment
Initial 187 52 ?
PC unit # 1 181 55 PC Unit from 25/10 to 10/3
PC unit # 2a 182 52 PC Unit from 25/10 to 14/5
PC unit # 2b 179 52 PC Unit from 25/10 to 12/5
PC unit # 2c 179 52 PC Unit from 25/10 to 11/4
misc #1 165 48 cleaned up a bit

PC unit optimization step 1

PC unit should use 10 LUT5, 3 x CARRY4 and 10 FF and fit to 2 Slices, but it did show 25 LUT and 5 slice initially. First attempt to optimize PC unit did reduce its own resources but overall design used more LUT and more Slices then before optimization. Forcing PC unit to 2 Slices and 10 LUT5 did yield overall LUT decrease from 187 to 181 but had more slices used than initial design. One potential possible optimization found: the PC unit as re-designed used only 4 inputs of the LUT5, so one input was free in all 10 LUT and as "keep hierarchy" was set the logic from higher level was not able to merge into the LUT inside PC unit. Another bigger issue is merging of offset select multiplexer into the PC unit, keep hierarchy disables that too.

Note: 6 CARRY 4 are used because one extra CARRY4 is used as route through for the addsub unit, Xilinx optimization does not use transparent latches as route through. This CARRY4 could be optimized away also with manual ALU/addsub.

PC unit optimization step 2 a/b/c

Offset selector manually merged into PC unit, now the main PC logic seems to be optimized, there are 3 extra LUT, one is inverting active low reset, the other two are used to generate freeze and carry in logic values. Next step would be to use active high reset.

Misc optimization step 1

Different cleanup on RTL, several times reducing logic complexity added new LUT to final design. Still some cleanup possible to reduce a few more LUT/FF.