The "Big Golf" Microarchitecture Allowed memory combinations: * Any two loads * Any two stores with different addresses (n.b. LLC is limited to 1 eviction per cycle) * Any load with any younger store Instruction opcodes: 0 AND logical AND from memory to accumulator 1 TAD Two's-complement ADd from memory to accumulator 2 ISZ Increment and Skip if Zero 3 DCA Deposit and Clear Accumulator 4 JMS JuMp Subroutine 5 JMP JuMP 6 IOT In-Out Transfer (device accesses) 7 OPR microsequenced OPeRations (miscellaneous, like clear/rotate/etc) Memory transactions: Opcodes that do it: (second set is the indirect versions) * Fetch instruction 01234567 01234567 * Indirect address load 0123 * Autoincrement store 0123 * Execution load 012 012 45 * Execution store 234 234 ┌─────┐ ┌──────┐ ┌────┐ │Fetch├──────►│Decode│ ┌─►│Exec│ └─────┘ └──────┘ │ └────┘ │ next_pc ┌───init_indirect_load │ init_execution_store │ init_execution_load───┤ retire │ init_execution_store │ │ retire │ │ rubberband_stall(1/2) │ │ │ │ ┌───────┐ │ └─►│Autoinc│ │ └───────┘ │ │ ┌───init_autoinc_store │ │ init_execution_load───┤ │ init_execution_store │ │ retire │ │ │ │ ┌─────┐ │ └─►│Indir│ │ └─────┘ │ │ init_execution_load───┘ init_execution_store retire Possible arbitration techniques: * Rubberband stalling in Decode + positional arbitration * Age/address/operation comparison without rubberbanding * Longer clock cycles, or * Extra cycle What to do with cache misses? * Stall entire pipeline to maintain simpler ordering constraints * If only loads are missing, allow everything else to proceed? * Always allow Fetch to proceed? Need separate logic to detect SMC clobbers *anyway* OPR opcodes: "group 1" _0___1___2_ _3_ _4_ _5_ _6_ _7_ _8_ _9_ _10 _11 | | | | | | |RAR|RAL| 0 | | | 1 1 1 | 0 |CLA|CLL|CMA|CML|RTR|RTL| 1 |IAC| |___|___|___|___|___|___|___|___|___|___|___|___| CLA CLear Accumulator CLL CLear Link CMA CoMplement Accumulator CML CoMplement Link RAR Rotate Accumulator Right (if bit 10 is 0) RAL Rotate Accumulator Left (if bit 10 is 0) RTR Rotate (Twice) accumulator and link Right (if bit 10 is 1) RTL Rotate (Twice) accumulator and link Left (if bit 10 is 1) IAC Increment ACcumulator BSW Byte Swap word in accumulator (if bits 8 and 9 are 0, and bit 10 is 1) Logical order of operations: CLA, CLL CMA, CML IAC RAR, RAL, RTR, RTL, BSW "group 2" _0___1___2_ _3_ _4_ _5_ _6_ _7_ _8_ _9_ _10 _11 | | | |SMA|SZA|SNL| 0 | | | | | 1 1 1 | 1 |CLA|SPA|SNA|SZL| 1 |OSR|HLT| 0 | |___|___|___|___|___|___|___|___|___|___|___|___| SMA Skip on Minus Accumulator (skip if high bit of accumulator is set) (if bit 8 is 0) SPA Skip on Plus Accumulator (skip if high bit of accumulator is clear) (if bit 8 is 1) SZA Skip on Zero Accumulator (if bit 8 is 0) SNA Skip on Nonzero Accumulator (if bit 8 is 1) SNL Skip on Nonzero Link (if bit 8 is 0) SZL Skip on Zero Link (if bit 8 is 1) OSR bitwise Or Switch Register into accumulator HLT HaLT processor CLA CLear Accumulator Logical order of operations: SMA, SZA, SNL SPA, SNA, SZL CLA OSR, HLT "mq" _0___1___2_ _3_ _4_ _5_ _6_ _7_ _8_ _9_ _10 _11 | | | | | | | | | | | | 1 1 1 | 1 |CLA|MQA| |MQL| | | | 1 | |___|___|___|___|___|___|___|___|___|___|___|___| CLA CLear Accumulator MQL MQ Loads from Accumulator MQA bitwise or MQ into Accumulator bits 6,8,9,10 are used for extended arithmetic instructions see https://homepage.divms.uiowa.edu/~jones/pdp8/refcard/74.html Logical order of operations: CLA MQA, MQL (simultaneous parallel assignment)