Mcl86jr correctness fixes by mx-shift · Pull Request #55 · MicroCoreLabs/Projects

mx-shift · 2026-05-21T04:50:37Z

A variety of verilog and microcode fixes to match Intel 8088 documented behavior. The currently released version (with microcode V4) hangs before reaching TOPBENCH's main menu. Tracking down the cause led to me building a system for running MCL86jr under verilator in a harness that kept it lock-step at an instruction level with an actual Intel 8088. Differences in cycle timing or undocumented behavior were preserved. Under that harness, I've tested up to 150M instructions of TOPBENCH, MSFS2, King's Quest 1, Cartridge BASIC, Lotus 123 cartridge, and Turbo Pascal compilation.

Real 8088 HALT is documented as one T1 phase (ALE pulse + S2:S0=011 status) and does NOT wait for READY. The original BIU runs HALT through the normal data-bus states 0x02..0x0A, including the READY-wait at state 0x07. There is no architectural reason for a host platform to drive READY in response to a HALT cycle -- HLT is an internal CPU state, not a bus transaction that needs acknowledgment -- so any HLT instruction risks stalling the BIU indefinitely on a board that doesn't happen to assert READY anyway. Once the BIU is stalled, it stops servicing PFQ refills and the EU's next fetch wedges. Fix: in HALT state 0x18, set s_bits=3'b011 (HALT status) and signal biu_done immediately, jumping straight to bus-idle at 0x0B without asserting ALE or driving an address. This matches the documented 8088 HALT semantics ("just one T1"). Behaviorally identical on platforms whose bus would have ack'd HALT quickly anyway -- the bus-cycle states for HALT didn't write or read anything, they just consumed clock cycles.

JMP FAR / CALL FAR / RET FAR microcode strobes the BIU twice: first an "update CS" strobe (eu_biu_strobe=2'b11, code 3'h2), then the actual "jump request" (eu_biu_req_code=5'h19) which atomically updates pfq_addr_in to the new IP. Real 8088 microcode treats CS+IP as a single update from the BIU's point of view; MCL86jr's microcode strobes them separately, and without staging the BIU briefly sees new CS + old pfq_addr_in. Any prefetch dispatched in that window forms an address as (new_CS << 4) + old_pfq_addr_in -- a phantom fetch that has no analogue on real 8088. Visible at the PCjr reset vector as a fetch at F000:0005 (new CS=F000, pfq_addr_in was 5 from FFFF4 increment) instead of either the real-8088 prefetch overshoot at FFFF:0005 or the clean jump to F000:0043. Stage the CS update. EU strobe code 3'h2 now writes biu_register_cs_staged + sets cs_update_staged; biu_register_cs itself stays at its old value. The matching JMP request commits both atomically on the same clock edge: if (eu_biu_req_caught==1'b1 && eu_biu_req_code==5'h19) begin pfq_addr_in <= eu_register_r3_d; if (cs_update_staged) begin biu_register_cs <= biu_register_cs_staged; cs_update_staged <= 1'b0; end end Side effect: prefetches dispatched in the gap between CS strobe and JMP request now see the OLD CS, producing the real-8088 prefetch overshoot at (old_CS << 4) + old_pfq_addr_in -- the byte gets discarded after the JMP-driven flush, just like real hardware. Visible at the realistic 110 MHz : 4.77 MHz core_clk:CLK ratio (matches the FPGA PLL); a smaller ratio masks the bug because the in-flight prefetch has time to complete on the old CS before the EU finishes JMP decode.

Standalone Python tool that parses the `p`-line triplet format used by Microcode_MCL86.txt and emits a Xilinx .coe (memory_initialization_vector) file or a Verilog $readmemh .mem file. Usage: python3 MCL86/Core/ucode_assembler.py assemble Microcode_MCL86.txt \ --coe MCL86_Microcode_Xilinx_Version_<N>.coe

The original procedures stashed the segment-override prefix bit in eu_flags bit 4 (AF) across the BIU operation. That corrupts the architectural AF on every string-op call site -- 8086 spec says string ops preserve flags, but every MOVS/LODS/CMPS would clear AF (path B, no override) or force-set it to 1 (path A, override). Manifests during PCjr BIOS POST: after `INC DI 0x4F->0x50` sets AF=1, the next MOVSW iteration runs STORE_SEG_PREFIX path B and clears AF. Switch to r2 as the scratch -- it's dead during string ops since the INC16/DEC16 flag-processing path that uses r2 isn't called between STORE and RESTORE. (r0 was tried first but conflicted with CMPSB/CMPSW which load r0 with the source byte at 0x0645 right before the STORE_SEG_PREFIX call at 0x0646.)

Real 8086 delays interrupts by one instruction only after MOV/POP to SS (atomic stack reload paired with MOV SP, ...). MCL86jr's microcode applied that delay to all segment-register loads -- POP ES / POP CS / POP DS / MOV ES / MOV CS / MOV DS -- by routing them through the legacy 0x0470-0x0478 ending whose final jump goes to 0x0011 (skipping the 0x0006-0x000B interrupt-poll section of the main fetch loop). That made the EU silently swallow any hardware interrupt that arrived during one of these instructions. Fix: split the common ending into two: * 0x0470-0x0478 (legacy): final jump to 0x0011 (skip INT). Used by POP SS and MOV SS to preserve the architecturally-required 1-instruction interrupt delay. * 0x0FC7-0x0FCF (new): mirror with final jump to 0x0006 (with INT poll). POP ES/CS/DS handlers redirect their final jumps from 0x0470 to 0x0FC7. MOV ES/CS/DS reach the new tail via 0x0D63's redirected jump (0x0474 -> 0x0FCB, the PFQ-wait entry of the new tail). MOV SS bypasses 0x0D62/0x0D63 entirely by jumping from 0x0D6B to a new SS-specific debounce + skip-INT path at 0x0FD0-0x0FD1. LDS/LES (0xC4/0xC5) already jumped to 0x0004 with INT poll, so they were already correct.

Real 8086 byte-2 rule: byte_2_linear = seg*16 + ((offset+1) & 0xFFFF). The offset wraps within the segment; the linear address does not. Two cases need to be handled correctly: - byte_1 at any 0x?FFFF linear address with a non-paragraph-aligned segment: byte_2 needs the carry past bit 15 to propagate through the upper 4 bits of addr_out_temp. Hit by code that does word reads across a 64KB linear boundary using a non-paragraph-aligned segment (e.g. LODSW at DS=4EFE SI=101F crosses 0x4FFFF/0x50000). - offset == 0xFFFF: byte_2 wraps back to seg*16 (offset 0 in same segment). The PCjr BIOS hits this on end-of-segment word reads with paragraph-aligned segments; a naive 20-bit linear+1 increment would land in the next paragraph instead. Implementation: latch byte-1's offset (= eu_register_r3_d[15:0]) into a new biu_byte1_offset register at every BIU dispatch path, then at the byte-2 increment in state 0x0A (and the parallel SRAM fast path in state 0x16): if (byte1_offset == 0xFFFF) addr_out_temp <= addr_out_temp - 0xFFFF; // back to seg*16 else addr_out_temp <= addr_out_temp + 1; // full 20-bit

Real 8086 only delays interrupt recognition after STI / MOV SS / POP SS. MCL86jr was also deferring after IRET, which broke the BIOS timer ISR's return path: IRET pops FLAGS (with IF=1) and the very next 18.2Hz tick that lands during the return-from-ISR should be taken immediately. MCL86jr was skipping that poll and the tick got dropped until the next real new_instruction edge. Two pieces needed fixing in tandem: 1. Microcode (Microcode_MCL86.txt). IRET's terminating jump went to 0x0474 (the legacy skip-INT path used by POP/MOV SS). Redirected to the with-INT-poll ending at 0x0FCB. 2. EU (eu.v). intr_enable_delayed only updates on new_instruction edges, so even with the with-INT-poll path the IRET microcode's post-FLAGS-load INT poll would still see IF=0 from before the restore. eu.v now detects the "OR r0 into FLAGS" microcode step (at ROM addr 0x0532; BRAM has 1-cycle synchronous read so the detection is gated on eu_rom_address == 0x0533) and syncs intr_enable_delayed in-cycle from bit 9 of the ALU output. Must come BEFORE the eu_flag_i==0 short-circuit because eu_flag_i is the OLD value at this point. Caught running King's Quest I on PCjr under the cosim: the BIOS timer ISR IRETs to user code, and the next 18.2 Hz tick races the return.

AAM (0xD4) and AAD (0xD5) microcode populated alu_last_result with the full AX register and called All_Flags_WORD, which uses the 16-bit domain for SF (bit 15) and ZF (whole word). Per Intel docs, AAM/AAD set SF, ZF, PF on AL only (CF, OF, AF are undefined). For AAM, the post-instruction AX has AH=quotient, AL=remainder. When AL=0 but AH!=0 (e.g. AL_pre=0x50, AAM 10 -> AX=0x0800), All_Flags_WORD saw alu_last=0x0800 != 0 and left ZF=0. Real 8086 sets ZF=1 because AL=0. Caught by Lotus 1-2-3 cosim at op=4901806 in CS:IP F000:19CB area; worker pushed ZF=0 (CF=1, AF=1 — undefined and masked) while host had ZF=1. For AAD the body already zeroes AH, so AX AND 0xFF == AX and the flag domain doesn't actually matter in practice — but fix it consistently so the SF check is on bit 7 of AL, not bit 15 of AX, and the microcode reads idiomatically. Both ops now AND AX with 0x00FF into alu_last_result and call CALCULATE_FLAGS_BYTE (the byte-domain path used by OR/AND/XOR). Assembled output ships as Version_8.coe in both MCL86/Core and MCL86jr/FPGA/src4synth (4-word diff vs Version_7 at ROM addresses 0x2BF / 0x2C0 / 0x2D7 / 0x2D8 — the AAM and AAD microcode entries).

mx-shift added 8 commits May 9, 2026 22:40

mx-shift force-pushed the mcl86jr-fixes branch from 2b4f132 to ff88e39 Compare May 21, 2026 04:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mcl86jr correctness fixes#55

Mcl86jr correctness fixes#55
mx-shift wants to merge 8 commits into
MicroCoreLabs:masterfrom
mx-shift:mcl86jr-fixes

mx-shift commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mx-shift commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant