PLAN


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80

0. writeback cache
    [✔️] Cache dirty bits
    [✔️] Cache evicting dirty data on fills that would replace
    [✔️] Cache not immediately forwarding writes
    [✔️] Fix mem_cache to actually instantiate memory correctly
        [✔️] Run Quartus in Windows to generate a Verilog template for manual instantiation of RAM blocks
            [X] Try to use asynchronous clears for reset instead of occupying a port for SETS cycles NOPE
            [✔️] Need at least one port capable of read-before-write
            [✔️] Maybe don't need a second port if the first port can make write optional
        [✔️] We might need to split our accesses across two cycles
            [X] If so, can we infer the correct logic without explicit instantiation of the megafunction? NOPE
        [X] Can we do asynchronous clear without explicit instantiation of the megafunction? NOPE
        [✔️] Copy from said template into mem_cache.sv instead of trying to use inference
    ---
    [ ] Arbiter sending snoops to caches in response to CLI writes
    [ ] Cache updating itself to clean state for write snoops
    ---
    [ ] Arbiter sending snoops to caches in response to CLI reads
    [ ] Arbiter waiting for snoop responses from caches for CLI reads
    [ ] Arbiter sending correct data for CLI reads (snoop responses in preference over RAM response)
    [ ] Cache sending snoop responses for read snoops
    ---
    [ ] Cache forwarding snoops upstream
    [ ] Core updating itself for write snoops (no-op)
    [ ] Core sending snoop responses for read snoops (always no data)
1. pipelining that works with SMC / start working on minhdl version of the core
2. write an SPI or I2C master on the FPGA to sample analog inputs
3. support wider-than-single-word cache lines
4. add global shared cache
5. ethernet
∞. fix hello.pal (in noncpu)

   Command block:

             +--------------+   +----------------+
       +-----| Echo Arbiter |<--| Result Printer |<-------------------------------+
       |     +--------------+   +----------------+                                |
       |       ^                                                                  |
       v       |                                                                  |
+--------+   +----------------+                                                   |
| UART 0 |-->| Command Parser |------------+   RAM block:                         |
+--------+   +----------------+            |                                      |
                                           |               +----------------+     |
                                           |               | Off-chip DRAM  |     |
                                           |               +----------------+     |
   PDP-8 block:                            |                 ^           |        |
                                           |                 |           v        |
+--------+   +-------+                     |               +----------------+     |
| UART 1 |<->| PDP-8 |                     v               | RAM Controller |     |
+--------+   +-------+                   +-------------+   +----------------+   +---------------+
             | Cache |------------------>| Mem Arbiter |-->| Another Cache  |-->| Mem Broadcast |
             +-------+                   +-------------+   +----------------+   +---------------+
                   ^                       ^                                      | |
                   |                       |                                      | |
                   +--------------------------------------------------------------+ |
                                           |                                        |
           *                               |                                        |
           * (PDP-8s are replicated)       |                                        |
           *                               |                                        |
                                           |                                        |
+--------+   +-------+                     |                                        |
| UART N |<->| PDP-8 |                     |                                        |
+--------+   +-------+                     |                                        |
             | Cache |---------------------+                                        |
             +-------+                                                              |
                   ^                                                                |
                   |                                                                |
                   +----------------------------------------------------------------+

    Echo Arbiter:  Trivial priority arbiter (input echo has priority over the result printer)
    Nen Arbiter:   Adds clog2(1 + number of PDP-8s) tag bits indicating which channel was selected
                   For inputs coming from a PDP-8 as opposed to the command parser, add the appropriate prefix to the memory address
    Mem Broadcast: Removes clog2(1 + number of PDP-8s) tag bits to determine which channel to send to
                   For outputs going to a PDP-8 as opposed to the result printer, strip excess address bits

Note that the mem arbiter and broadcast have to be as wide as (number of
PDP-8s)+1, so they will wind up being bottlenecks. The memory protocol allows
arbitrary stalls, so multi-cycle arbitration is possible.

Not shown is the front panel interface.