The "Big Golf" Microarchitecture


Allowed memory combinations:
  * Any two loads
  * Any two stores with different addresses (n.b. LLC is limited to 1 eviction per cycle)
  * Any load with any younger store

Instruction opcodes:
  0 AND logical AND from memory to accumulator
  1 TAD Two's-complement ADd from memory to accumulator
  2 ISZ Increment and Skip if Zero
  3 DCA Deposit and Clear Accumulator
  4 JMS JuMp Subroutine
  5 JMP JuMP
  6 IOT In-Out Transfer (device accesses)
  7 OPR microsequenced OPeRations (miscellaneous, like clear/rotate/etc)

Memory transactions:       Opcodes that do it: (second set is the indirect versions)
  * Fetch instruction        01234567 01234567
  * Indirect address load             0123
  * Autoincrement store               0123
  * Execution load           012      012 45
  * Execution store            234      234


┌─────┐       ┌──────┐                  ┌────┐
│Fetch├──────►│Decode│               ┌─►│Exec│
└─────┘       └──────┘               │  └────┘
                                     │
 next_pc   ┌───init_indirect_load    │  init_execution_store
           │   init_execution_load───┤  retire
           │   init_execution_store  │
           │   retire                │
           │   rubberband_stall(1/2) │
           │                         │
           │  ┌───────┐              │
           └─►│Autoinc│              │
              └───────┘              │
                                     │
           ┌───init_autoinc_store    │
           │   init_execution_load───┤
           │   init_execution_store  │
           │   retire                │
           │                         │
           │  ┌─────┐                │
           └─►│Indir│                │
              └─────┘                │
                                     │
               init_execution_load───┘
               init_execution_store
               retire


Possible arbitration techniques:
  * Rubberband stalling in Decode + positional arbitration
  * Age/address/operation comparison without rubberbanding
    * Longer clock cycles, or
    * Extra cycle

What to do with cache misses?
  * Stall entire pipeline to maintain simpler ordering constraints
  * If only loads are missing, allow everything else to proceed?
  * Always allow Fetch to proceed?

Need separate logic to detect SMC clobbers *anyway*


OPR opcodes:

    "group 1"
         _0___1___2_ _3_ _4_ _5_ _6_ _7_ _8_ _9_ _10 _11
        |           |   |   |   |   |   |RAR|RAL| 0 |   |
        | 1   1   1 | 0 |CLA|CLL|CMA|CML|RTR|RTL| 1 |IAC|
        |___|___|___|___|___|___|___|___|___|___|___|___|

        CLA CLear Accumulator
        CLL CLear Link
        CMA CoMplement Accumulator
        CML CoMplement Link
        RAR Rotate Accumulator Right (if bit 10 is 0)
        RAL Rotate Accumulator Left (if bit 10 is 0)
        RTR Rotate (Twice) accumulator and link Right (if bit 10 is 1)
        RTL Rotate (Twice) accumulator and link Left (if bit 10 is 1)
        IAC Increment ACcumulator
        BSW Byte Swap word in accumulator (if bits 8 and 9 are 0, and bit 10 is 1)

        Logical order of operations:
            CLA, CLL
            CMA, CML
            IAC
            RAR, RAL, RTR, RTL, BSW

    "group 2"
         _0___1___2_ _3_ _4_ _5_ _6_ _7_ _8_ _9_ _10 _11
        |           |   |   |SMA|SZA|SNL| 0 |   |   |   |
        | 1   1   1 | 1 |CLA|SPA|SNA|SZL| 1 |OSR|HLT| 0 |
        |___|___|___|___|___|___|___|___|___|___|___|___|

        SMA Skip on Minus Accumulator (skip if high bit of accumulator is set) (if bit 8 is 0)
        SPA Skip on Plus Accumulator (skip if high bit of accumulator is clear) (if bit 8 is 1)
        SZA Skip on Zero Accumulator (if bit 8 is 0)
        SNA Skip on Nonzero Accumulator (if bit 8 is 1)
        SNL Skip on Nonzero Link (if bit 8 is 0)
        SZL Skip on Zero Link (if bit 8 is 1)
        OSR bitwise Or Switch Register into accumulator
        HLT HaLT processor
        CLA CLear Accumulator

        Logical order of operations:
            SMA, SZA, SNL
            SPA, SNA, SZL
            CLA
            OSR, HLT

    "mq"
         _0___1___2_ _3_ _4_ _5_ _6_ _7_ _8_ _9_ _10 _11
        |           |   |   |   |   |   |   |   |   |   |
        | 1   1   1 | 1 |CLA|MQA|   |MQL|   |   |   | 1 |
        |___|___|___|___|___|___|___|___|___|___|___|___|

        CLA CLear Accumulator
        MQL MQ Loads from Accumulator
        MQA bitwise or MQ into Accumulator

        bits 6,8,9,10 are used for extended arithmetic instructions
        see https://homepage.divms.uiowa.edu/~jones/pdp8/refcard/74.html

        Logical order of operations:
            CLA
            MQA, MQL (simultaneous parallel assignment)