DLX RISC Simulator - USER MANUAL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Contents
~~~~~~~~

1   Introduction
    1.1 The Purpose of this Manual
    1.2 Programs
    1.3 Portability
    1.4 DLX CPU Simulation
    1.5 Throughput

2   DAsm - The DLX Assembler
    2.1 Introduction
    2.2 Using DAsm
    2.3 Pseudo Codes
    2.4 Register Names
    2.5 Operands
    2.6 Numerical Expressions

3   MAsm - The Microcode Assembler
    3.1 Introduction
    3.2 Using MAsm
    3.3 Source Files
    3.4 Microcode Table
    3.5 Decode Tables

4   Mon - The DLX Monitor
    4.1 Introduction
    4.2 Commands
    4.3 Auto-loading Files
    4.4 Break Key Combination
    4.5 Help

5   The dlx.ini file
    5.1 Introduction
    5.2 Settings

6   DLX Opcodes
    6.1 Introduction
    6.2 TRAP Usage
    6.3 Bit Pattern Allocation

7   Notes on Supplied Source Files
    7.1 Introduction
    7.2 Include Files
    7.3 System Source Files
    7.4 Application/Example Files
    7.5 Microcode Tables

===============================================================================

1 Introduction
~~~~~~~~~~~~~~

1.1 The Purpose of this Manual
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This manual describes an implementation of the DLX System which consists of
a suite of programs that simulate the DLX processor as defined in the first
edition of the book "Computer Architecture: A Quantitative Approach", by
John L Hennessy and David A Patterson (subsequently referred to as H&P).

DLX is a simple, general 32-bit RISC CPU 'invented' by Hennessy and Patterson
purely for the purpose of illustrating the various different methods that have
been employed for producing processors. The suite of programs described attempt
to provide the user with a working simulation of three such methods: hardwired,
microcode and pipelined. Please note that this manual is not intended to teach
DLX programming.

1.2 Programs
~~~~~~~~~~~~

Mon - this is the implementation of the simulator and incorporates all of the
three different processor modules. Switching between each simulator is allowed
so that the performance of each may be directly compared.

Two assemblers are supplied: DAsm - which assembles DLX source code and MAsm
which assembles DLX microcode. Additionally, a file called dlx.ini will allow
users to customise the system to their own requirements (see section 5).

1.3 Portability
~~~~~~~~~~~~~~~

The system has been designed such that much of the source code is portable
between several systems that use an ANSI compliant C compiler. The CPU, Monitor
and Assembler modules have been successfully compiled and executed on the UNIX,
PC and Amiga platforms. The coding fully takes into account any Big/Little
Endian issues and performs identically on either.

DLX is a Big Endian machine like the Motorola 680x0, but unlike the Intel
80x86. Upon start-up, all programs execute a routine to test whether they are
running on a Big or Little Endian host machine. The result of this test
determines whether the program will need to reorder bytes to make sure that any
internal representation of values are always output in Big Endian fashion.
Output from the assemblers is therefore always Big Endian and the simulators
always expect data to be presented to them in this format. Furthermore, this
means that DLX object code generated on one platform can be run on all other
platforms without change.

1.4 DLX CPU Simulation
~~~~~~~~~~~~~~~~~~~~~~

The hardwired, microcode and pipelined versions of the DLX CPU Simulator have
the following features:

                                         Hardwired Microcode Pipelined
Integer simulation                              X         *         X

Single precision floating point simulation      X         *

Double precision floating point simulation      X

Integral clock cycle count for performance
statistics                                      X         X         X

Outputs machine state at every change in the
machine cycle                                             X         X

3 simulated memory mapped hardware timers
(located at $FF000000-$FF000028) able to
generate CPU interrupts                         X         X         X

Interrupts generated for the following CPU
exceptions:
   Unimplemented instructions                   X         X         X
   Integer overflow                             X                   X
   Divide by zero                               X         X         X
   Timer                                        X         X         X

* with suitable microcode table - the standard Hennessy and Patterson table is
supplied with the system but this will require extensions for full operation of
floating-point operation, special registers and returning from interrupts.

1.5 Throughput
~~~~~~~~~~~~~~

The following table shows measured throughput of the hardwired simulator on
several platforms as used from Mon. The program running used the following
types of instruction: load, store, alu, set, branch. Interrupts from one of the
simulated timers were also in operation. The performance statistics from Mon
gave an average of 6.15 CPI. The program being run did not output information
to the screen.

System                          Instructions/Sec (approx.)
Amiga 2000 (68000 7.14MHz)       1650
PC 386 16MHz 3MB RAM             2289
PC 486-DX2 66MHz 20MB RAM        7430
PC Pentium 100MHz 16MB RAM      27000
UEA UNIX network                20000 (system lightly loaded)
                                 8000 (normal load)

Throughput from the microcode simulator will depend upon the microcode table
currently in use. With the pipelined simulator the settings of the various
pipeline parameters will affect the number of cycles used in programs. The
hardwired simulator will always provide the fastest physical throughput due to
the algorithms used for its implementation although the pipelined simulator
will appear to provide better CPI ratings.

===============================================================================

2 DAsm - The DLX Assembler
~~~~~~~~~~~~~~~~~~~~~~~~~~

2.1 Introduction
~~~~~~~~~~~~~~~~

DAsm is a two-pass macro assembler which produces executable object code for
the DLX CPU simulators. Several example source files are supplied (see section
7) which not only provide working examples of DLX code but also provide a base
from which larger modules can be created. Note that the microcode version of
the CPU simulators may not be able to run all of the example code if the user
supplied microcode does not provide full functionality. The hardwired CPU
simulator will always be able to run the example code. The pipelined simulator
may provide unexpected results if the pipeline parameters are set such that
hazards are not eliminated.

The syntax used for DLX source has the following format:

(Label)     Opcode  Operands        (Comment)

Labels may be optionally suffixed with a colon. They are also case sensitive so
that LOOP and Loop will refer to different objects. With the current
implementation no local labels such as 0$ and 34$ are allowed. Because of this
current restriction, labels should not therefore be used in macros as these can
be included more than once.

Opcodes are not case sensitive so AddI, ADDI and addi are all recognised as
identical. Operands must not contain spaces between them; the assembler assumes
that any whitespace found after the first character in the operand list
terminates that list. Comments must always start with a semicolon no matter
where they are positioned.

Note that there is no Linker provided with the system and DAsm does not produce
object code that contains any information useful to a linker. It was not
considered to be a necessary element of the current project to provide linkable
object code. If anyone wants to expand upon this system in order to provide a
high level language compiler then it may be necessary to reconsider this aspect.

2.2 Using DAsm
~~~~~~~~~~~~~~

The convention adopted for DLX assembler files is that INCLUDE files have the
suffix .i and all others have .a. However, this is not a restriction and the
user may name their files as they wish. Note that under UNIX the file names
will be case sensitive.

The normal way to run DAsm is:

    dasm filename1.a filename2.a ...

DAsm will produce an output file called filename.o by stripping off any
extension after the first '.' found in the name. This means that a file name
of:

    test.one.a

will produce:

    test.o

There are several command line flags that alter the operation of DAsm:

-d      Debug. If problems are found which cannot be cured by altering the
        source code then there could be a fault with DAsm. Specifying -d will
        turn on debugging which will produce a significant amount of output
        which may go some way to locating the problem.

-e      Error Symbols. This flag forces DAsm to produce a full list of symbols
        upon detecting an error.

-fxxx   File List. Tell DAsm to assemble the files held in the text file xxx
        which contains each source file name on a separate line. DAsm can
        handle a mix of input files from both a file containing a list and
        directly from the command line.

-l      List. Specifying this will produce a listing to the screen on the
        second pass.

-lxxx   List to file. Adding a filename (xxx) after the -l (no space between
        them) will output the listing to that file. If multiple source file
        names are specified then all of their listings appear in the same file.

-s      Symbols. This flag will generate a symbol list at the end of the first
        pass showing all values as well as the full text of any macros.

Note that flags and file names may appear in any order on the command line. Any
flag appearing affects all files specified.


2.3 Pseudo Codes
~~~~~~~~~~~~~~~~

DAsm uses several pseudo codes to enhance its operation. Although specified
here in upper case they are not case sensitive.

*       * always contains the current address. It can be used as part of an
        expression, e.g.:

            JAL     4+*

        This particular piece of code can be found in the INCLUDE file
        macro.i and has the effect of doing a Jump And Link to the
        instruction following the JAL. The current address may be altered
        using:

            *=NewAddress        (e.g.: *=$1000)

        Note that specifying a start address is not recommended as DLX can
        easily support relocatable code. Only system code which is always
        expected to remain in the same locations should include a start
        address. An example of this is auto.a which always loads at $200.

DC.x    Declare Constant of size B, H or W (Byte, 16-bit Halfword or 32-bit
        Longword). DAsm will accept string constants in quotes mixed in with
        values such as:

            DC.B    10,'Abcd',10,0

        Note that, as with normal opcodes, whitespace is considered to be a
        terminator and code such as:

            DC.B    10, 'Abcd', 10, 0

        will only accept the first value (10 in this case).

DS.x    Declare Storage of size B, H or W (Byte, 16-bit Halfword or 32-bit
        Longword). This has the format, e.g.:

            DS.H    6

        which, in this case, will reserve 6, 16-bit halfwords or 12, 8-bit
        bytes.

        Note that any opcodes encountered after any DC or DS pseudo code will
        automatically get aligned to the next longword boundary. See PAD below
        for how to control padding bytes.

DEBUG/NODEBUG
        If the -d debugging flag has been enabled these pseudo-codes will
        provide the user with tools to localise the section of code being
        debugged. This can cut down on the amount of debugging data generated
        by DAsm. To use them to selectively debug a section insert a NODEBUG
        near the top of the source code (after any INCLUDEs) and then wrap a
        DEBUG/NODEBUG pair around the offending code. These codes have no
        effect if -d is not specified on the command line.

        Note that some of the supplied INCLUDE files deliberately disable
        debugging at the start and then re-enable it at the end. This is why
        it is best to insert NODEBUG after all of the INCLUDEs.

END     This specifies the physical end of the source and DAsm will not
        assemble any code after an END. It may be omitted in which case DAsm
        will assemble until it finds the end of the source file.


GENINC  DAsm has the ability to automatically generate an include file from a
        source file for specified labels. Each label has to be preceded with
        GENINC, e.g.:

                    *=$744
                    ADDI    R3,R4,#$14
                    GENINC
            Lab1    LW      R1,$80(R2)

        will generate the entry in the include file:

            Lab1    EQU     $00000748

        GENINC always works once only on the next label encountered and must
        therefore precede all labels required in the final include file. The
        name of the include file is xxx.i where xxx is the name of the source
        file with any .a suffix removed. For example, abc.a will produce abc.i.

        Note that the include file will always overwrite any other file of the
        same name without warning.

INCLUDE A source file may INCLUDE other source files using:

            INCLUDE filename

        The file name may optionally be enclosed in matching quotes which may
        be either ' or ". INCLUDEs may be nested as deep as the file handling
        limit of the system allows. See the supplied files for examples of
        usage.

LIST/NOLIST
        If the -l list flag has been enabled these pseudo codes will provide
        the user with control over which sections of the source will output a
        listing. Wrapping a NOLIST/LIST pair around a section of source will
        prevent listing of that section. An example of this is in the INCLUDE
        file macro.i.

MACRO/ENDM
        DAsm allows any number of macros to be created. They have the format:

            Label MACRO
                source of macro
                ENDM

        If the macro is to take parameters then these are specified within the
        macro source code as \1, \2 ... \n, where \1 corresponds to the first
        supplied parameter, \2 the second and so on.

        Currently, no initial checking is made to see if the user has supplied
        the correct numbers of parameters to the code calling the macro; any
        surplus parameters are ignored. However, DAsm will complain if not
        enough parameters are supplied.

PAD     Because all DLX instructions are exactly 4 bytes in length this
        implementation requires that instructions are always longword aligned.
        As DC.x and DS.x calls of non-longword sizes could upset this, DAsm
        will, if necessary, pad the output with NULLs until alignment is
        achieved.

        The PAD pseudo code will allow the user control over how many longwords
        are inserted. E.g. PAD 3 will pad up to the third longword from the
        current address. This could add up to 12 zero bytes - it may be less if
        the current address is not currently on a longword boundary.

PAGE    PAGE forces a new page during a listing. A page title can be added by
        using:

            PAGE    'Title'

        where 'Title' is a string of up to 50 characters.

SKIP n  SKIP will force n blank lines during listing. This can be used to split
        up sections of code with larger gaps in the listing than appear in the
        actual source.

SYM/NOSYM
        These codes have the same effect as the -s command line flag over which
        they take priority. If you always require a symbol listing then it is
        probably easier to place SYM inside your code.

2.4 Register Names
~~~~~~~~~~~~~~~~~~

DLX has two main sets of registers which Hennessy and Patterson call General
Purpose Registers (GPRs) and Floating Point Registers (FPRs). GPRs are accessed
using R0 to R31, FPRs with F0 to F31. Because the FPRs double up as single (4-
byte) AND double precision (8 byte) registers only F0, F2, F4 ... F28 and F30
are allowed for use as double precision. If DAsm finds an odd numbered double
precision FPR reference then it outputs a warning and reduces the reference by
one.

2.5 Operands
~~~~~~~~~~~~

Numeric values in operands may be specified in decimal (no prefix), hexadecimal
(prefixed by $) or binary (prefixed by %). Immediate values should be prefixed
by a hash (#).

As it takes two operations to load a 32-bit immediate value into a GPR, DAsm
also provides a method of splitting a 32-bit value or address into two halves
using > (upper 16 bits) and < (lower 16 bits). The following code fragment
shows this in operation:

        LHI R1,#>Lab1
        ORI R1,R1,#<Lab1
        ...

Lab1    ADDU    R3,R5,R7

The label Lab1 will contain an address value. This is then loaded into R1 using
the LHI which places a 16-bit value into the top half of R1 while also clearing
the lower half. The ORI ORs the least significant half of the address into R1
without affecting the top half.

Alternatively, use the L32 macro in macro.i (see section 7).

2.6 Numerical Expressions
~~~~~~~~~~~~~~~~~~~~~~~~~

Numerical operands such as immediate values or offsets may be made up from
expressions as well as single values. The following operators are available:

    *   Multiplication
    +   Addition
    -   Subtraction
    /   Division
    &   AND
    |   OR
    %   Modulo
    ^   Power

Each has the same function as in C except that there is (currently) no operator
precedence. All values and operators are parsed from left to right. In other
words an expression such as:

    1+2*3

will be processed as (1+2)*3 and yield a result of 9 instead of 6.

===============================================================================

3 MAsm - The Microcode Assembler
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

3.1 Introduction
~~~~~~~~~~~~~~~~

MAsm is a single-pass assembler which produces a microcode table for use with
the microcode version of the DLX CPU simulators. The tables described in H&P
can be found in the file hp.m. These instructions assume access to the above
book.

3.2 Using MAsm
~~~~~~~~~~~~~~

The normal way to run MAsm is:

    masm filename.m (flags)

MAsm normally produces an output file called dlxmcode.tbl which is the name
expected and automatically loaded by the microcode DLX CPU simulator.

The following command line flags offer extra features:

-oxxx   Output file name. The default table name of dlxmcode.tbl may be
        changed to xxx.

-s      Symbols or Labels. This flag will generate a list of labels used plus
        the lines on which they are defined and referenced.

-t      Show Tables. Display the resulting microcode and decode tables on
        screen.

Note that flags and file name may appear in any order on the command line.

3.3 Source File Layout
~~~~~~~~~~~~~~~~

By convention, all microcode source files have the extension '.m' to
distinguish them from other source files. The source must be split into three
sections:

    The Microcode table definitions
    #
    The Decode1 table definitions
    #
    The Decode2/3 table definitions
    #

MAsm recognises the hash character (#) as the indication to move onto the next
section when it appears as the first character on a line. The last # is
optional - MAsm will stop assembling after the third # or if the end of the
file is reached.

3.4 Microcode Table
~~~~~~~~~~~~~~~~~~~

The layout of a Microcode table line consists of 9 separate entries:

    Label, Dest, ALU Op, Src1, Src2, Const, Misc, Cond, Jump Label

where each entry is separated by a comma. Where entries are blank a comma must
still be inserted. This example is a fragment from the file hp.m:

    Mem,   MAR,  ADD,    A,    imm16, ,   ,       Load,    Load
    Store, MDR,  PassS2, ,     B,     ,   ,       ,
    Dloop, ,     ,       ,     ,      ,   DataWr, Mem,     Dloop
    ,      ,     ,       ,     ,      ,   ,       Uncond,  Ifetch
    Load,  ,     ,       ,     ,      ,   DataRd, Mem,     Load
    ,      ,     ,       ,     ,      ,   ,       Decode2,
    LB,    Temp, SLL,    MDR,  Const, 24, ,       ,
    ,      C,    SRA,    Temp, Const, 24, ,       Uncond,  Write1

The following tables show the permissible symbols allowed for each entry. In
most cases these follow the H&P examples from figures 5.7 and 5.22 on pages 211
and 229. In some cases there has been some rationalisation as to the names used
for entries in order to provide standardisation or to cut down on entry name
length. Compare the file 'hp.m' with the two tables.

Label   This is a user definable entry and is case sensitive such that Mem is
        considered to be a different label to MEM. Labels may be the same as
        any symbol used in any other column without affecting the operation of
        MAsm.

Dest    Only the following are allowed and correspond to the Destination column
        in figure 5.7 in H&P:
            C           Temp        PC
            IAR         MAR         MDR

ALU Op  This corresponds to figure 5.22 but has been expanded to allow for
        functions to handle floating point registers:
            ADD         SUB         RSUB
            AND         OR          XOR
            SLL         SRL         SRA
            PassS1      PassS2

        New non-H&P entries
            MULTI       Multiply integer values from FPRs
            DIVI        Divide integer values from FPRs
            MULTF       Multiply floating point values
            DIVF        Divide floating point values
            ADDF        Add floating point values
            SUBF        Subtract A-B floating point values
            RSUBF       Subtract B-A floating point values

Src1/2  These two entries are from figure 5.7 and, apart from one instance,
        both allow the same entries. The exception is where Src1 can accept
        data from output register A and Src2 from B.

            Src1            Src2
            ~~~~            ~~~~
            A               B
            Temp            Temp
            PC              PC
            IAR             IAR
            MAR             MAR
            MDR             MDR
            imm16           imm16
            imm26           imm26
            Const           Const

            (New non-H&P entries)

            SR              SR      Status register
            FPSR            FPSR    F/point status register

Misc        The Misc entry consists of one of the following six entries:
                InstrRd     DataRd      DataWr
                AB<-RF      Rd<-C       R31<-C

Cond        Possible conditions are:
                Uncond      Interrupt   Mem
                Zero        Negative    Load
                Decode1     Decode2     Decode3

3.5 Decode Tables
~~~~~~~~~~~~~~~~~

Each line of the Decode 1 and Decode 2/3 tables consist of two items separated
by a comma as follows:

    Operation, Label

The first is an entry determining the operation and must be selected from the
following lists:

    Decode 1:
        As defined by H&P
            Memory      MOVI2S      MOVS2I      S2=B
            S2=Imm      BEQZ        BNEZ        J
            JR          JAL         JALR        TRAP

        Additions
            RFE     Return from exception
            CVT     Convert
            SETF    Set floating point
            BFP     Floating point branch
            MOVFP2I Move F/p to Integer
            MOVI2FP Move Integer to F/p
            MOVFP   Move F/p to F/p

    Decode 2/3:
        As defined by H&P
            LB          LBU         LH          LHU
            LW          ADD         SUB         AND
            OR          XOR         SLL         SRL
            SRA         LHI         SEQ         SNE
            SLT         SGE         SGT         SLE

The operation is followed by a label that must already have been defined in the
Microcode table. The label determines the entry point into the main microcode
table for that operation.

===============================================================================

4 Mon - The DLX Monitor
~~~~~~~~~~~~~~~~~~~~~~~

4.1 Introduction
~~~~~~~~~~~~~~~~

Mon is a monitor program that contains all three DLX CPU simulator modules. It
has been designed to be as simple as possible to operate and to have the
ability to be compiled for any platform provided with an ANSI compatible C
compiler and a console interface - e.g.: UNIX, PC Compatible, Amiga.

Starting Mon is just a matter of typing mon and pressing the return key. Upon
start-up, Mon displays its version number and current datapath type. If the
program detects that it is running on a Little Endian system (such as an 80x86
based PC) it displays 'Little Endian system'. Finally, it displays a '>' as a
prompt and awaits user input.

4.2 Commands
~~~~~~~~~~~~

Entering ? or any unrecognised command displays a list of available commands as
follows:

    A  Memory set/display          O  Display registers
    B  Breakpoint set/display      P  Set PC
    C  Continue run                Q  Quit Monitor
    CD Continue run with Debug     R  Set registers
    D  Disassemble                 S  Save file
    DM Disassemble microcode       T  Trace
    E  Enter data                  U  Trace with Debug
    F  Fill memory                 V  Set/display version
    G  Go (Run)                    W  Write debug info to log file
    H  Go with Debug               X  Exit Monitor
    I  Set/display Load Addr       Y  Display performance
    J  Set/display debug           Z  Zero registers
    K  Pipeline Control            [  Start/end recording
    L  Load file                   ]  Play back recording
    M  Memory dump                 ^  Load Microcode table
    N  -                           @  System command

Further information can be accessed by entering ?x where x is any of the above
commands.

The commands are fully described below. Many of them require extra parameters,
the formats of which are:
    a, a1, a2   Addresses (specified in hexadecimal).

    n           Decimal number.

    r           Register number - a decimal number of 0 to 31. Where double
                precision registers are being specified then r should be even.
                If odd values are used then they are decreased by 1 so that
                    RD3 123.45
                actually sets double precision register 2 to 123.45.

    s           Size of memory unit (can be 1, 2 or 4 representing byte,
                halfword or word where the word length is 32 bits).

    v           Hexadecimal value. Where used with commands E and F the maximum
                size of v will depend upon the previously defined size.

The space between the command and the first parameter is optional, even in the
case of the L and S commands. However, two letter commands such as OD or RF
cannot contain a space between the two letters.

The following section describes the commands in full. They are arranged into
associated groups. Although all commands are specified here as upper case, mon
will recognise lower case as well.

4.2.1 Breakpoint Commands
~~~~~~~~~~~~~~~~~~~~~~~~~

B       Any command starting with a B is to do with breakpoints. Mon allows up
        to 30 simple breakpoints to be set. When the Program Counter (PC) of a
        running program hits an address set as a breakpoint, execution stops
        immediately (pending interrupts will not be serviced).

        B on its own will display any breakpoints that are currently set.

        Example:
            >B
            Breakpoints:
               1  $00001200
               2  $00001350
               4  $00001210

BC      clears all currently set breakpoints.

        Example:
            >BC
            Breakpoints cleared
            >B
            No breakpoints set

Bn      will clear breakpoint number n where n is the number displayed in the
        list generated by B. Clearing an already cleared breakpoint is
        pointless but allowed.

        Example:
           >B2

Bn a    will set breakpoint n to address a. A range of 1 to 30 is allowed for n
        and a is any valid address (i.e. 32-bit word aligned) in the allowable
        4GB addressing range of the DLX CPU. Note that there is no check to see
        if a breakpoint is being set to actual allocated memory (see the A
        command).

        Example:
          >B4 1F00

B a     sets the first free breakpoint to address a. If all breakpoints are in
        use then an error message is generated.

        Example:
          >B 2D24

4.2.2 Execution/Trace and Related Commands
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

C       Continue. This command continues a program run from the current PC
        setting. It is used after a breakpoint or the break key combination has
        been pressed (see section 4.4).

        Example:
          >C

CD      To continue with the debug (see below) setting turned on.

G       Go. This starts a program running from either the current DLX PC or
        from the specified address. Execution will continue until either:
            * the program encounters a HALT instruction (TRAP 0)
            * a breakpoint is hit
            * an error occurs
            * the break key combination is pressed (see section 4.4)

        Examples:
          >G
          >G 1000

H       Go with Debug. This is almost identical to G except that an internal
        debugging flag is turned on so that debugging information is displayed
        as each instruction is executed. See the J command to alter how much
        debugging information is displayed when using the microcode CPU.

        Examples:
          >H
          >H F60

        Note that both the G and H commands will reset the performance
        statistics to zero.

T       Trace. T n will trace from the current DLX PC for n instructions or
        until some instruction causes termination whichever is the sooner. If
        the number of instructions is omitted then one instruction is traced.

        Examples:
          >T
          >T 4

U       Trace with Debug. U performs almost the same function as T except that,
        as with H, an internal debugging flag is turned on so that debugging
        information is displayed as each instruction is executed. See the J
        command for how to alter how much debugging information is displayed
        when using the microcode or pipelined CPU simulators.

        Examples:
          >U
          >U6

J       Debugging control. The microcode and pipelined CPU simulators can
        display much more information during debugging:

            On      Off
            D        d       Disassemble DLX Instructions
            M        m       Show Microcode Instructions
            P        p       Show Pipeline Internal Registers
            R        r       Show Other DLX Registers
            S        s       Show Pipeline Stages

        Each option may be turned on or off by including either the upper or
        lower case letter after J, e.g.:
          >JMr

        This turns on microcode instruction display and turns off other DLX
        registers while leaving all other settings alone.

        Entering J on its own just shows the current settings.

K       Pipeline control. This affects the settings of the pipeline hazard
        control. Three areas: Branch, Forwarding and Load Detection can be
        altered:

            B0     No Branch hazard control
            B1     Stall on branch hazards
            B2     Early branch detection in ID enabled
            B3     Early branch detection plus branch delay slot

            F0     No register forwarding or detection
            F1     Stall on register data hazard
            F2     Feed register values back to ID stage

            L0     No load hazard detection
            L1     Load hazard enabled (stalls pipeline)

        These are entered as follows:
             >kb1f0

        This example will set to stall on branch hazards and turn off any
        register forwarding or detection. The setting of the load hazard is
        unaffected.

W       Write debugging information to Log File. Any debugging output from
        trace (see the H and U commands) may be diverted to a log file. To
        create a log file use:
          >Wxxx

        where xxx is the name of your log file; the actual file will be named
        xxx.log so that it may be easily found later. To return tracing output
        to the screen use:
          >W-

        To append to an existing log file use:
          >W+xxx

4.2.3 Memory Display and Entry
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All of these commands use an internal address variable called LastAddress so
that a memory dump of $1000 to $10FF will set LastAddress to $1100. If the next
memory command does not specify a start address then it will continue from
whatever address is held in LastAddress. Note that LastAddress should not be
confused with the DLX Program Counter (PC) which is completely separate.

D       Disassemble. Mon has a built in disassembler which is activated using
        the D command. D a1 a2 disassembles from address a1 to a2. If no end
        address is specified then 16 instructions are disassembled.

        D will always attempt to make sense out of each 32-bit value in memory
        even if the instruction is data and not code. If invalid DLX
        instructions are disassembled then a memory dump of the 4 bytes is
        displayed instead as shown in the example below at $1030.

        Example:
          >D 1000
          00001000 : 0C000000  JAL       $00001004
          00001004 : FFE0E821  ADDU      R29,R31,R0
          00001008 : 97BD0004  SUBUI     R29,R29,#4
          0000100C : 801E7FFC  ADDI      R30,R0,#$7FFC
          00001010 : BC1C0000  LHI       R28,#0
          00001014 : CB9C0200  ORI       R28,R28,#$200
          00001018 : 2F800000  JALR      R28
          0000101C : 80010001  ADDI      R1,R0,#1
          00001020 : BC1C0000  LHI       R28,#0
          00001024 : CB9C0420  ORI       R28,R28,#$420
          00001028 : 2F800000  JALR      R28
          0000102C : 48656C6C  LH        R5,$6C6C(R3)
          00001030 : 6F200000  ????      "o .."
          00001034 : 84210001  ADDUI     R1,R1,#1
          00001038 : BC1C0000  LHI       R28,#0
          0000103C : CB9C0420  ORI       R28,R28,#$420

E       Enter Data. The E command allows data to be entered as bytes, halfwords
        or words from an optional address. Es a will allow entry of size s
        (which may be 1, 2 or 4) from address a.

        Each memory location is prompted with its address and current value.
        Values to be entered are assumed to be hexadecimal. To terminate input
        just press the RETURN key. To move back a location enter a minus '-'
        character. To move to a new address enter a dollar '$' followed by the
        new address. To move to the next memory location without updating the
        value enter any non-hexadecimal character (other than - and $) and
        press RETURN.

        Example:
          >E4 800
          00000800 (00000000) > 1
          00000804 (00000000) > 2
          00000808 (00000000) > 3
          0000080C (00000000) > 4
          00000810 (00000000) > -
          0000080C (00000004) > $900
          00000900 (00000000) >

M       Memory Dump. The M command provides a memory dump showing word values
        as well as ASCII characters ($20 to $7E - all others appear as a dot).
        As with the D command, M can be optionally followed by either a start
        address or a start plus an end address.

        If non-existent memory is dumped it will appear as FF.

        Examples:
          >M
          >M6060
          >M FF000000 FF000010

4.2.4 Miscellaneous Memory Commands
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

F       Fill Memory. This command fills blocks of memory with specified values:

          Fs a1 a2 v

        This will fill a block between addresses a1 and a2 with the value v
        which is assumed to be of size s.

        Examples:
          >F1 800 811 A5
          >F2 1000 1032 A55A
          >F4 5000 5FFF 1234ABCD


4.2.5 Register Display and Entry
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

DLX has two main sets of registers: the 32 integer or General Purpose Registers
(GPRs) plus the 32 Floating Point Registers (FPRs). Just to confuse matters the
FPRs are used for three main purposes:
    *   As 32, 4 byte long single precision registers
    *   As 16, 8 byte long double precision registers
    *   As 32, 4 byte long integer registers for integer multiplication and
        division.

DLX also contains some other registers which are accessible by the programmer:
the program counter (PC), the interrupt status register, the floating point
status register and the Interrupt Address Register (IAR).

The following commands allow display and entry of the registers (note that
there are no commands to set the status or IAR registers). Some commands have
single character alternatives.

O       Output Register Values. O or ; will display the integer registers (GPR).
          >O
          >;

OD      Display the FPRs in double precision format. As pairs of registers are
        used to form them, only even numbered registers appear. '>' is the
        alternative to this command.
          >OD
          >>

OF      Display the FPRs in single precision format. '<' is the alternative.
          >OF
          ><

OI      Display the FPRs in integer format. ':' is the alternative.
          >OI
          >:

OS      Display the PC, status registers and IAR. '#' is the alternative.
          >OS
          >#

P       Set Program Counter. P a allows direct setting of the PC to address a.
        This action also resets the performance statistics (see the Y command).
          >P 5010

R       Set Register. Rr v sets integer register (GPR) r to value v. Note that
        R0 is always 0 and cannot be changed.
          >R3 15008FFF

RDr d   Sets double precision (FPR) register r to value d.
          >RD0 1234.567

RFr d   Sets single precision (FPR) register r to value d.
          >RF5 16.85

RIr v   Sets integer (FPR) register r to value v.
          >RI16 1FF

Z       Zero Registers. Before a program run this command will set all of the
        DLX registers to a known state. It also reset the performance
        statistics to zero.
          >Z

ZT      Reset all the timer registers at $FF000000 to $FF000028 to zero.

4.2.6 Program Load and Save
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Blocks of memory may be loaded and saved as separate files. Note that under
UNIX, file names are case sensitive. If required, file names may be enclosed
in quotes - this will be necessary if they contain spaces or other punctuation
that may be misinterpreted.

I       Set/Display Default Address. I is used to set a Default Load Address
        so that all files are loaded at a specific address. This address will
        override that in the object file header. I without a following value
        will display the current Default Load Address.

        Examples:
          >I
            Current default load address = $00001000
          >I 4000
            Current default load address = $00004000

L       Load File. 'L abcd a' will load the file abcd at optional address a.
        Note that DAsm produces object files containing a start address. Not
        specifying a load address will either use the address in the file or
        that set by the Default Load Address.

        Examples:
          >Ltest.o
          >L "Check Ints.o" 5000

S       Save File. 'S efgh a1 a2' will save the block at a1 to a2 with the file
        name efgh. Note that the addresses supplied to S are byte size; the
        last byte saved in the example below will be that at address $2079.

        Example:
          >S AddSets 2000 2079

4.2.7 Miscellaneous Commands
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A       Allocate Memory. The A command sets or displays the current memory
        size. Upon start up Mon allocates 256K of memory from $0 to $3FFFF
        (PC compatibles currently start up with 32K due to built-in memory
        size limitations of the Turbo C++ compiler I used to compile it with).
        This may be changed using A n where n is the number of kilobytes
        required. E.g. A 12 will allocate 12K (12 x 1024 bytes). Note that
        memory is always allocated from location 0. Programs or data in memory
        will not be lost except where the new size is less than the old and the
        program or data was being held at a memory location that no longer
        exists. The size of the memory that can be allocated is only restricted
        by the available host memory size. In theory it should be possible to
        allocate up to 4278190080 bytes ($0 to $FEFFFFFF - the built in timer
        is located at $FF000000) but, for obvious reasons, this has not been
        tested!

        A without any parameters just displays the current size.

        Examples:
          >A
          Current memory size = 256K
          >A 64
          Current memory size = 64K

DM      Disassemble Microcode Table. This disassembles the microcode table
        currently loaded. It does not show the decode tables as the tables
        generated by MAsm do not retain label information.

        Example:
          >DM

Q/X     Either Q or X will exit the Monitor. Example:
          >X

Y       Display Performance Statistics. This shows the following information:
            * Number of instructions run
            * Usage of instruction types as both quantity and percentage
            * Total clock cycles
            * Average Cycles Per Instruction (CPI)

        These values will differ depending upon which CPU module is used. For
        example, with the microcode simulator the number of cycles is usually
        higher than the hardwired version. Also, the microcode version cannot
        distinguish between taken and untaken branches, and just gives one
        combined total. The pipelined simulator splits instructions between
        those started, executed and completed.

        The G, H and P commands will reset the performance statistics to zero.

        The following example is from a run of the hardwired simulator:
          >Y
          Performance of last run:
            No. of instructions executed: 152
              Loads                    28 (18.42%)
              Stores                    2 (01.32%)
              ALU                      37 (24.34%)
              Set                       0 (00.00%)
              Move                      0 (00.00%)
              Convert                   0 (00.00%)
              Jumps                     1 (00.66%)
              JALs                      2 (01.32%)
              Branch (taken)           27 (17.76%)
              Branch (not taken)       26 (17.11%)
              Trap/RFE                 29 (19.08%)

            Total clock cycles        829
            Average CPI             5.454

V       Version. Display the current CPU Simulator version.

VH      Change to hardwired CPU

VM      Change to microcode CPU

VP      Change to pipelined CPU

VT      Display full title and settings

@       Execute a System Command. E.g. @dir executed on a PC will perform a DOS
        DIRectory command. This command is included for the convenience of the
        user and is not guaranteed to produce the expected result on all
        platforms.

[       Record a Macro. This command allows the recording of keyboard entries
        into a named file so that lengthy repeated entries may be performed
        again and again using a short 'macro'. To set up a recording use:

          >[xxxx

        where xxxx is the name of the macro file (it is stored on disk with the
        extension .rec). To terminate the recording just enter another [
        command.

        The file generated is in ASCII format and may edited as required so
        that changes can be made without having to manually enter the sequence
        again.

]       Playback a macro. To playback a macro recorded using the [ command
        enter:

          >]xxxx

        where xxxx is the name of the required macro.

^       Load a Microcode Table. By default, Mon loads the microcode table
        'dlxmcode.tbl'. This command allows other tables to be loaded as
        follows:

          >^xxx.tbl

        The above example loads the table xxx.tbl.

4.3 Auto-loading Files
~~~~~~~~~~~~~~~~~~~~~~

If you require certain object files to be loaded automatically every time Mon
starts up then create a file called:

    dlxaload.lst

with entries as follows:

    name1   startloc1
    name2   startloc2
        ...
    namen   startlocn

where each object file name is followed with its start location. There is no
limit to the number of files that can be loaded in this way. Currently the
system auto-loads only one file by default: auto.o. Note that Mon does not
display any loading information when auto-loading.


4.4 Break Key Combination
~~~~~~~~~~~~~~~~~~~~~~~~~

It is possible to stop a running program using a combined keypress on the PC or
Amiga. On the Amiga use Ctrl-C while the PC can use Ctrl-A or Ctrl-C. However,
it is recommended that Ctrl-A is used on the PC as, depending upon the way the
PC is set up, Ctrl-C may completely drop out of Mon.

Currently the UNIX version cannot be stopped in this manner.

4.5 Help
~~~~~~~~

Mon comes with some on-line help which is contained in the file dlxhlp.dat. To
access this enter ? or ?X where X is one of the available commands. When first
run the help system creates an index file for the help file so that it can
access it faster next time - the file is called dlxhlp.idx.

===============================================================================

5. The dlx.ini File
~~~~~~~~~~~~~~~~~~~

5.1 Introduction
~~~~~~~~~~~~~~~~

The DLX CPU simulators can be customised through the use of a file named
dlx.ini (note that the file name must be in lower case under UNIX). If the file
is not found then the internal defaults are used.

Each setting is a keyword assigned to a decimal value:

        Keyword = value

The Keywords are not case sensitive. Some settings affect all simulators,
others affect only one or two. Each entry below has a single letter following
the keyword showing which are affected:

    H       Hardwired
    M       Microcode
    P       Pipelined
    A       All three

5.2 Settings
~~~~~~~~~~~~

The following settings can be changed:

CpuType (A)     Set the CPU Datapath type:
                        1   Hardwired
                        2   Microcode
                        3   Pipelined

                All unrecognised values will set the hardwired version.

DefLoad (A)     Default Load Address for Mon. This may be set so that all file
                loads are located at one specific place and override the load
                location in the load file itself. Note that the I command
                within Mon can further change the setting. The default is a
                load address of $0.

                This example sets the load address to $1000:
                    DefLoad = 4096

Memory (A)      The amount of memory allocated at start-up defined in
                Kilobytes, e.g.:
                    Memory = 512

                allocates 512K (512 x 1024). The default is 256K for UNIX, the
                Amiga and the Mac, while the PC defaults to 32K (due to a limit
                within the Turbo C++ compiler I was using to compile it with).

Timer (A)       Enables or disables the simulated timer located at $FF000000.
                    Timer = 0   disabled
                    Timer = 1   enabled (default)

Traps (A)       Because the DLX CPUs have no direct access to hardware some of
                the TRAP instructions are used for input and output (via
                register R1). This may be disabled so that the TRAPs may be
                intercepted for user specified purposes.

                    Traps = 0   disabled
                    Traps = 1   enabled (default)

                TRAP 0 (used to HALT a program) cannot be disabled.

TrapType (H/P)  Normal TRAPS may be vectored through low memory:
                    PC ? M [TRAP value]

                or just allow the value after the TRAP to be copied into the
                PC:
                    PC ? TRAP value

                Setting TrapType = 1 sets the vector version (default).

Comments within dlx.ini are any line that starts with a semicolon or a left
square bracket.

===============================================================================

6. DLX Opcodes
~~~~~~~~~~~~~~

6.1 Introduction
~~~~~~~~~~~~~~~~

This section describes the bit patterns assigned to the DLX opcodes. Because
H&P do not specify any bit patterns they have had to be constructed for this
project.

(NOTE: These codes do not replicate those used by Boston University, USA for
their DLXsim program which, in most cases, uses the same codes as the MIPS
processor. This was due to DLXsim being converted from a MIPS simulator.)

There are three basic types of DLX instruction:
    I   Immediate, Conditional, Jump (and Link) Register
    J   Jump, Trap and RFE (using 26-bit offset)
    R   Register to Register

Figure 4.19 in H&P specifies the main layout of the three types. All
instructions are initially encoded using the high-end 6 bits. This
implementation specifies that all I and J types have a unique 6-bit code
while R types share one code (all 6 bits set to 1) and are identified by the
lower 11 bits (Function) of which only the lower 6 are used.

6.2 TRAP Usage
~~~~~~~~~~~~~~

Because the simulators do not have any real hardware to allow them to
communicate with the outside world a compromise option is required.
Additionally, there is no specific DLX Halt command which could be used to
return control back to the user.

The TRAP command is normally used to transfer program operation through a
vectored memory address. For example, TRAP 32 will copy the value stored in
8-bit memory locations 32 to 35 into the program counter so that execution
continues from that new point.

The system therefore uses certain TRAP values to implement input/output
functions. Five TRAPS are used which, because DLX uses memory alignment for
32-bit values, are numbered as follows:

     0  Halt execution and return operation to the calling program.
     4  Output the ASCII character held in the lower byte of register R1 to the
        screen.
     8  Output the value held in R1 as a decimal number.
    12  Output the value held in R1 as a hexadecimal number.
    16  Get a character from the user and store it in the lower byte of R1.

There is one specific reason for using TRAP 0 as the halt instruction. In the
implementation, memory is allocated to a simulator using the C calloc library
call; this clears the memory as it is allocated. User programs are likely to
crash upon occasion with the result that they corrupt the program counter and
try to run non-existent code. If the system is set up to treat an instruction
whose bit pattern is all zeroes as a halt, then jumping into cleared memory
will immediately stop a wild program. The opcode of the TRAP instruction has
therefore been set to bit pattern 000000 and TRAP 0 results in an instruction
containing all zeroes.

Because the TRAP command is a J type it follows that the other three J types
should share part of the 6-bit opcode.

6.3 Bit Pattern Allocation
~~~~~~~~~~~~~~~~~~~~~~~~~~

Allocation of bit patterns to instructions is not arbitrary but has been done
as logically as possible. Where similar Immediate (I) and Register-to-Register
(R) operations exist then the top 6-bit code of the I opcode will be identical
to the lower 6 bits of the R opcode. This can also help some of the CPU
simulator decoding. For example, the ADDI and ADD codes are:

ADDI    100000xx xxxxxxxx xxxxxxxx xxxxxxxx
ADD     111111xx xxxxxxxx xxxxxxxx xx100000

Types of instruction have also been grouped together so that, in most cases,
the top three or four bits of the 6-bit codes define the 'class' or group as
follows:

I/J Type
    0000        J Type
    0001        Not used
    001x        Jump/Branch
    010x        Loads
    011x        Stores
    10xx        Math types
    1100        Bit
    1101        Shift
    111x        Set (with exception of R Type 111111)

R Type
    000x        Move
    001x        Convert
    010x        F/P Set (Single Precision)
    011x        F/P Set (Double Precision)
    10xx        Math types
    1100        Bit
    1101        Shift
    111x        Set

The following table shows (in bit-pattern order) the allocation of all DLX
instructions. Further explanation is shown where other bits have specific
functions (such as in Load/Store instructions).

I/J     R       Code    I/J R

TRAP    MOVS2I  000000  00  00
RFE     MOVI2S  000001  04  01
J       MOVFP2I 000010  08  02
JAL     MOVI2FP 000011  0C  03

-       MOVF    000100  10  04
-       MOVD    000101  14  05
-       -       000110  18  06
-       -       000111  1C  07

-       CVTF2D  001000  20  08
-       CVTF2I  001001  24  09
JR      CVTD2F  001010  28  0A
JALR    CVTD2I  001011  2C  0B

BEQZ    CVTI2F  001100  30  0C
BNEZ    CVTI2D  001101  34  0D
BFPF    -       001110  38  0E
BFPT    -       001111  3C  0F

LB      LTF     010000  40  10  Load/Store
LBU     GTF     010001  44  11    010xxx = Load
LH      LEF     010010  48  12    011xxx = Store
LHU     GEF     010011  4C  13
LW      EQF     010100  50  14    xxx00x = Byte
-       NEF     010101  54  15    xxx01x = Half word
LF      -       010110  58  16    xxx10x = Word
LD      -       010111  5C  17    xxx11x = Floating point
SB      LTD     011000  60  18
-       GTD     011001  64  19    xxxxx0 = Signed/Single Precision
SH      LED     011010  68  1A    xxxxx1 = Unsigned/Double Precision
-       GED     011011  6C  1B
SW      EQD     011100  70  1C
-       NED     011101  74  1D
SF      -       011110  78  1E
SD      -       011111  7C  1F

ADDI    ADD     100000  80  20    1000xx = ADD
ADDUI   ADDU    100001  84  21    1001xx = SUB
-       ADDF    100010  88  22    1010xx = MULT
-       ADDD    100011  8C  23    1011xx = DIV
SUBI    SUB     100100  90  24
SUBUI   SUBU    100101  94  25    xxxx00 = Signed
-       SUBF    100110  98  26    xxxx01 = Unsigned
-       SUBD    100111  9C  27    xxxx10 = Single Prec fp
-       MULT    101000  A0  28    xxxx11 = Double Prec fp
-       MULTU   101001  A4  29
-       MULTF   101010  A8  2A
-       MULTD   101011  AC  2B
-       DIV     101100  B0  2C
-       DIVU    101101  B4  2D
-       DIVF    101110  B8  2E
LHI     DIVD    101111  BC  2F

ANDI    AND     110000  C0  30    xxxx0x = AND
-       -       110001  C4  31    xxxx1x = OR
ORI     OR      110010  C8  32    xxxxx0 = non-exclusive
XORI    XOR     110011  CC  33    xxxxx1 = exclusive

SLLI    SLL     110100  D0  34    xxxx0x = Shift Left
-       -       110101  D4  35    xxxx1x = Shift Right
SRLI    SRL     110110  D8  36    xxxxx0 = Logical
SRAI    SRA     110111  DC  37    xxxxx1 = Arithmetic

SLTI    SLT     111000  E0  38
SGTI    SGT     111001  E4  39
SLEI    SLE     111010  E8  3A
SGEI    SGE     111011  EC  3B
SEQI    SEQ     111100  F0  3C
SNEI    SNE     111101  F4  3D
-       -       111110  F8  3E
RType   -       111111  FC  3F

===============================================================================

7. Notes on Supplied Source Files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

7.1 Introduction
~~~~~~~~~~~~~~~~

Several source files are supplied with the package so that users may see
examples of how to produce executable files. There are four types of file:
include, system, application/example and microcode table.

7.2 Include files
~~~~~~~~~~~~~~~~~

macro.i         This INCLUDE file contains several macros. Some are those
                mentioned by Hennessy and Patterson such as MOV and NOP, others
                included are ones that have proved useful during the
                development of the system. Full explanations are given with
                each macro along with the required format and, in some cases,
                examples of usage. If you intend using relocatable code then
                particular attention should be paid to the explanation of the
                LPC macro.

timer.i         The DLX simulators include a timer device. timer.i includes
                full instructions for its use.

start.i         Including this file is recommended as it gives access to the
                interrupt and printing facilities in auto.a. It also includes
                some initialisation code necessary for easy code relocation.
                start.i also INCLUDEs macro.i and auto.i.

auto.i          Automatically generated from auto.a (see below).


7.3 System source files
~~~~~~~~~~~~~~~~~~~~~~~

System source files should be assembled so that they automatically create their
include files using GENINC (see section 2.3). They should then be added to the
dlxaload.lst file. Their object files will then be automatically loaded upon
start-up. Currently there is only one supplied system source file.

auto.a          This file has been created to supply the DLX simulators with
                some standard interrupt and printing facilities. If you include
                start.i you will be able to call the InitInt, ClearInt and
                PRINT system routines. Assembling this file will automatically
                generate auto.i using GENINC.

7.4 Application/Example files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some of these files require auto.i to be included so auto.a must be assembled
first.

align.a         Tests memory access to non-aligned memory locations.

alu.a           Tests some of the ALU opcodes as well as the integer overflow
                exception.

clock.a         Tests the simulated timer.

clockp.a        Tests the simulated timer with on-screen countdown.

error.a         Tests the divide by zero exception.

input.a         Tests the INPUT (TRAP 20).

loop.a          Tests the branching as well as OUTPUT1 (TRAP 4).

num.a           Tests the floating point and int/fp conversion.

pbr.a           Pipeline branch test code.

pdata.a         Pipeline data hazard test code.

pld.a           Pipeline load hazard test file.

print.a         Tests the print routine in auto.a.

sra.a           Tests the SRA and SRAI instructions.

test.a          Original DLX test program. Tests JAL, macros etc.

trap.a          Tests the setting up and calling of a user TRAP.

7.5 Microcode Tables
~~~~~~~~~~~~~~~~~~~~

hp.m            This contains a copy of the standard H&P microcode which
                implements only a subset of the DLX instructions.

===============================================================================



