-
Notifications
You must be signed in to change notification settings - Fork 7
MF8A18 Microcode
MF8A18: MicroFPGA 8 bit engine rev A year 2018
Instr | Function | RISC-V code |
---|---|---|
ADD | Rd = Rd + Rs | ADD rd, rd, rs |
ADC | Rd = Rd + Rs + Carry | * |
SUB | Rd = Rd - Rs | SUB |
SUBI | Rd = Rd - imm | ADDI t0, r0, imm; SUB rd, rd, t0 |
SBC | Rd = Rd - Rs * Carry | * |
AND | ANDI rd, rd, rs | |
ANDI | ANDI rd, rd, imm | |
OR | OR rd, rd, rs | |
ORI | ORI rd, rd, imm | |
XOR | XOR rd, rd, rs | |
ROR | Rd = Rd >> 1; R[7] = Carry | * |
SHR | Rd = Rd >> 1; R[7] = 0 | SRLI rd, rd, 1 |
ASR | Rd = Rd >> 1; R[7] = R[7] | SRAI rd, rd, 1 |
SWAP | Rd = Rd[3:0,7:4] | * |
JMP | Jump (relative) | JAL r0, label |
SKP0 | Skip if Rd bit 0 | * |
SKP1 | Skip if Rd bit 1 | * |
BEQ | Branch if Zero set | * |
BNE | Branch if Zero clear | * |
BLT | * | |
BGE | * | |
STORE | RAM[R31:R30] = Rd | SB |
LOAD | Rd = RAM[R31:R30] | LB |
IN | Rd = input | |
OUT | output = Rd |
Emulated instructions
Instr | Emulated as |
---|---|
Compare | WREG=const; WREG=WREG-value |
ROL | Rd = Rd + Rd |
Optimization step | LUT | Slice | Comment |
---|---|---|---|
Initial | 187 | 52 | ? |
PC unit # 1 | 181 | 55 | PC Unit from 25/10 to 10/3 |
PC unit # 2a | 182 | 52 | PC Unit from 25/10 to 14/5 |
PC unit # 2b | 179 | 52 | PC Unit from 25/10 to 12/5 |
PC unit # 2c | 179 | 52 | PC Unit from 25/10 to 11/4 |
misc #1 | 165 | 48 | cleaned up a bit |
PC unit should use 10 LUT5, 3 x CARRY4 and 10 FF and fit to 2 Slices, but it did show 25 LUT and 5 slice initially. First attempt to optimize PC unit did reduce its own resources but overall design used more LUT and more Slices then before optimization. Forcing PC unit to 2 Slices and 10 LUT5 did yield overall LUT decrease from 187 to 181 but had more slices used than initial design. One potential possible optimization found: the PC unit as re-designed used only 4 inputs of the LUT5, so one input was free in all 10 LUT and as "keep hierarchy" was set the logic from higher level was not able to merge into the LUT inside PC unit. Another bigger issue is merging of offset select multiplexer into the PC unit, keep hierarchy disables that too.
Note: 6 CARRY 4 are used because one extra CARRY4 is used as route through for the addsub unit, Xilinx optimization does not use transparent latches as route through. This CARRY4 could be optimized away also with manual ALU/addsub.
Offset selector manually merged into PC unit, now the main PC logic seems to be optimized, there are 3 extra LUT, one is inverting active low reset, the other two are used to generate freeze and carry in logic values. Next step would be to use active high reset.
Different cleanup on RTL, several times reducing logic complexity added new LUT to final design. Still some cleanup possible to reduce a few more LUT/FF.