From a191f385878fd8a7725b3a95b0af1a8b0cef6225 Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Mon, 2 Dec 2024 12:30:52 -0500 Subject: [PATCH 01/17] Overhaul of notes from Abscond to Hustle. --- www/notes/abscond.scrbl | 113 ++++---- www/notes/blackmail.scrbl | 240 +++++++++++------ www/notes/con.scrbl | 76 +++--- www/notes/dodger.scrbl | 62 ++--- www/notes/dupe.scrbl | 464 +++++++++++++++++++++----------- www/notes/evildoer.scrbl | 545 ++++++++++++++++++-------------------- www/notes/extort.scrbl | 315 ++++++++++++++++++---- www/notes/fraud.scrbl | 243 +++++++++-------- www/notes/hustle.scrbl | 352 ++++++++++++++---------- 9 files changed, 1442 insertions(+), 968 deletions(-) diff --git a/www/notes/abscond.scrbl b/www/notes/abscond.scrbl index 149ceae4..760c2ea8 100644 --- a/www/notes/abscond.scrbl +++ b/www/notes/abscond.scrbl @@ -15,7 +15,7 @@ @(ev '(require rackunit a86)) @(for-each (λ (f) (ev `(require (file ,(path->string (build-path langs "abscond" f)))))) - '("interp.rkt" "ast.rkt" "compile.rkt")) + '("main.rkt" "correct.rkt")) @(define (shellbox . s) (parameterize ([current-directory (build-path langs "abscond")]) @@ -80,7 +80,15 @@ implementation artifacts. Formal definitions balance precision while allowing for under-specification, but require detailed definitions and training to understand. -We will use a combination of each. +For the purposes of this course, we will use interpreters to specify +the meaning of programs. The interpreters provide a specification for +the compilers we write and make precise what means for a compiler to +be @emph{correct}. Any time the compiler produces code that, when +run, produces a different result that the interpreter produces for the +same program, the compiler is broken (or the specification is wrong). +Interpreters are useful for specifying what the compiler should do and +sometimes writing interpreters is also useful for informing @emph{how} +it should do it. To begin, let's start with a dead simple programming language called @@ -151,13 +159,6 @@ While not terribly useful for a language as overly simplistic as Abscond, we use an AST datatype for representing expressions and another syntactic categories. For each category, we will have an appropriate constructor. In the case of Abscond all expressions are integers, so we have a single constructor, @racket[Lit]. - -@(define-language A-concrete - (e ::= (Lit i)) - (i ::= integer)) - -@centered{@render-language[A-concrete]} - A datatype for representing expressions can be defined as: @codeblock-include["abscond/ast.rkt"] @@ -168,6 +169,12 @@ an integer and constructs an instance of the AST datatype if it is, otherwise it signals an error: @codeblock-include["abscond/parse.rkt"] +@ex[ +(parse 5) +(parse 42) +(eval:error (parse #t))] + + @section{Meaning of Abscond programs} The meaning of an Abscond program is simply the number itself. So @@ -184,6 +191,14 @@ produces it's meaning: (interp (Lit -8)) ) +The @racket[interp] function specifies the meaning of expressions, +i.e. elements of the type @tt{Expr}. This language is so simple, the +@racket[interp] function really doesn't @emph{do} much of anything, +but this will change as the langauge grows. + + + + We can add a command line wrapper program for interpreting Abscond programs from stdin: @@ -198,6 +213,7 @@ the result. For example, interpreting the program @tt{42.rkt} shown above: @shellbox["cat 42.rkt | racket -t interp-stdin.rkt -m"] +@;{ Even though the semantics is obvious, we can provide a formal definition of Abscond using @bold{operational semantics}. @@ -249,6 +265,7 @@ and integers @racket[i], if (@racket[e],@racket[i]) in @render-term[A We now have a complete (if overly simple) programming language with an operational semantics and an interpreter, which is (obviously) correct. Now let's write a compiler. +} @section{Toward a Compiler for Abscond} @@ -283,7 +300,7 @@ computer; it's interpreter is implemented in hardware on your computer's CPU,} @item{it is one of the two dominant computing architectures (the other -being ARM), and} +being ARM) in use today, and} @item{it is a mature technology with good tools and materials.} ] @@ -303,7 +320,7 @@ as follows: Separating out @tt{print_result}, which at this point is just a simple @tt{printf} statement, seems like overkill, but it will be useful in -the future as the language gets more complicated. +the future as the language and its set of values gets more complicated. The runtime must be linked against an object file that provides the definition of @tt{entry}; this is the code our compiler will emit. @@ -496,75 +513,69 @@ Moreover, we can compare our compiled code to code compiled by Racket: @section{But is it @emph{Correct}?} At this point, we have a compiler for Abscond. But is it correct? +What does that even mean, to be correct? -Here is a statement of compiler correctness: +First, let's formulate an alternative implementation of +@racket[interp] that composes our compiler and a86 interpreter to define +a (hopefully!) equivalent function to @racket[interp]: -@bold{Compiler Correctness}: @emph{For all expressions @racket[e] and -integers @racket[i], if (@racket[e],@racket[i]) in @render-term[A -𝑨], then @racket[(asm-interp (compile e))] equals -@racket[i].} +@codeblock-include["abscond/exec.rkt"] -Ultimately, we want the compiler to capture the operational semantics -of our language (the ground truth of what programs mean). However, -from a practical stand-point, relating the compiler to the intepreter -may be more straightforward. What's nice about the interpreter is we -can run it, so we can @emph{test} the compiler against the -interpreter. Moreover, since we claimed the interpreter is correct -(w.r.t. to the semantics), testing the compiler against the interpreter -is a way of testing it against the semantics, indirectly. If the -compiler and interpreter agree on all possible inputs, then the -compiler is correct with respect to the semantics since it is -equivalent to the interpreter, and the interpreter is correct. +This function can be used as a drop-in replacement to @racket[interp]: -So, in this setting, means we have the following equivaluence: +@ex[ +(exec (Lit 42)) +(exec (Lit 19))] -@verbatim{ -(interp e) @emph{equals} (asm-interp (compile e)) -} +It captures the idea of a phase-distinction in that you can first +compile a program into a program in another language---in this case +a86---and can then interpret @emph{that} program to get the result. +If the compiler is correct, the result should be the same: + +@bold{Compiler Correctness}: @emph{For all @racket[e] @math{∈} +@tt{Expr} and @racket[i] @math{∈} @tt{Integer}, if @racket[(interp e)] +equals @racket[i], then @racket[(exec e)] equals +@racket[i].} + +One thing that is nice about specifying our language with an +interpreter is that we can run it. So we can @emph{test} the compiler +against the interpreter. If the compiler and interpreter agree on all +possible inputs, then the compiler is correct. -But we don't actually have @racket[asm-interp], a function that -interprets the Asm code we generate. Instead we printed the code and -had @tt{gcc} assembly and link it into an executable, which the OS -could run. But this is a minor distinction. We can use -@racket[asm-interp] to interact with the OS to do all of these steps. This is actually a handy tool to have for experimenting with compilation within Racket: -@examples[#:eval ev -(asm-interp (compile (Lit 42))) -(asm-interp (compile (Lit 37))) -(asm-interp (compile (Lit -8))) -] +@ex[ +(exec (Lit 42)) +(exec (Lit 37)) +(exec (Lit -8))] This of course agrees with what we will get from the interpreter: -@examples[#:eval ev +@ex[ (interp (Lit 42)) (interp (Lit 37)) -(interp (Lit -8)) -] +(interp (Lit -8))] We can turn this in a @bold{property-based test}, i.e. a function that computes a test expressing a single instance of our compiler correctness claim: -@examples[#:eval ev -(define (check-compiler e) - (check-eqv? (interp e) - (asm-interp (compile e)))) +@codeblock-include["abscond/correct.rkt"] + +@ex[ (check-compiler (Lit 42)) (check-compiler (Lit 37)) -(check-compiler (Lit -8)) -] +(check-compiler (Lit -8))] This is a powerful testing technique when combined with random generation. Since our correctness claim should hold for @emph{all} Abscond programs, we can randomly generate @emph{any} Abscond program and check that it holds. -@examples[#:eval ev +@ex[ (check-compiler (Lit (random 100))) ; test 10 random programs diff --git a/www/notes/blackmail.scrbl b/www/notes/blackmail.scrbl index 6b2e05c1..0492230f 100644 --- a/www/notes/blackmail.scrbl +++ b/www/notes/blackmail.scrbl @@ -13,7 +13,7 @@ @(ev '(require rackunit a86)) @(for-each (λ (f) (ev `(require (file ,(path->string (build-path langs "blackmail" f)))))) - '("interp.rkt" "compile.rkt" "random.rkt" "ast.rkt")) + '("main.rkt" "random.rkt" "correct.rkt")) @(define (shellbox . s) (parameterize ([current-directory (build-path langs "blackmail")]) @@ -41,9 +41,9 @@ @section{Refinement, take one} -We've seen all the essential pieces (a grammar, an AST data type -definition, an operational semantics, an interpreter, a compiler, -etc.) for implementing a programming language, albeit for an amazingly +We've seen all the essential pieces---a grammar, an AST data type +definition, a semantic specification (the interpreter), a compiler, +etc.---for implementing a programming language, albeit for an amazingly simple language. We will now, through a process of @bold{iterative refinement}, grow @@ -73,24 +73,27 @@ An example concrete program: @section{Abstract syntax for Blackmail} -The grammar of abstract Backmail expressions is: +The datatype for abstractly representing expressions can be defined +as: -@centered{@render-language[B]} +@codeblock-include["blackmail/ast.rkt"] So, @racket[(Lit 0)], @racket[(Lit 120)], and @racket[(Lit -42)] are Blackmail AST expressions, but so are @racket[(Prim1 'add1 (Lit 0))], @racket[(Sub1 (Lit 120))], @racket[(Prim1 'add1 (Prim1 'add1 (Prim1 'add1 (Lit -42))))]. -A datatype for representing expressions can be defined as: - -@codeblock-include["blackmail/ast.rkt"] The parser is more involved than Abscond, but still straightforward: @codeblock-include["blackmail/parse.rkt"] +@ex[ +(parse '42) +(parse '(add1 42)) +(parse '(add1 (sub1 (add1 42))))] + @section{Meaning of Blackmail programs} @@ -101,6 +104,7 @@ The meaning of a Blackmail program depends on the form of the expression: @item{the meaning of an increment expression is one more than the meaning of its subexpression, and} @item{the meaning of a decrement expression is one less than the meaning of its subexpression.}] +@;{ The operational semantics reflects this dependence on the form of the expression by having three rules, one for each kind of expression: @@ -150,13 +154,27 @@ This may seem a bit strange at the moment, but it helps to view the semantics through its correspondence with an interpreter, which given an expression @math{e}, computes an integer @math{i}, such that @math{(e,i)} is in @render-term[B 𝑩]. +} -Just as there are three rules, there will be three cases to the -interpreter, one for each form of expression: +To compute the meaning of an expression, the @racket[interp] +function does a case analysis of the expression: @codeblock-include["blackmail/interp.rkt"] -@examples[#:eval ev +In the case of a @racket[(Prim1 p e)] expression, the interpreter +first recursively computes the meaning of the subexpression @racket[e] +and then defers to a helper function @racket[interp-prim1] which +interprets the meaning of a given unary operation and value: + +@codeblock-include["blackmail/interp-prim.rkt"] + +If the given operation is @racket['add1], the function adds 1; +if it's @racket['sub1], it subtracts 1. + +We can write examples of the @racket[interp] function, writing inputs +in abstract syntax: + +@ex[ (interp (Lit 42)) (interp (Lit -7)) (interp (Prim1 'add1 (Lit 42))) @@ -164,6 +182,20 @@ interpreter, one for each form of expression: (interp (Prim1 'add1 (Prim1 'add1 (Prim1 'add1 (Lit 8))))) ] +We could also write examples using concrete syntax, using the parser +to construct the appropriate abstract syntax for us: + +@ex[ +(interp (parse '42)) +(interp (parse '-7)) +(interp (parse '(add1 42))) +(interp (parse '(sub1 8))) +(interp (parse '(add1 (add1 (add1 8))))) +] + + + +@;{ Here's how to connect the dots between the semantics and interpreter: the interpreter is computing, for a given expression @math{e}, the integer @math{i}, such that @math{(e,i)} is in @render-term[B 𝑩]. The @@ -195,56 +227,95 @@ induction of the interpreter's correctness: @render-term[B 𝑩], then @racket[(interp e)] equals @racket[i].} +} + @section{An Example of Blackmail compilation} Just as we did with Abscond, let's approach writing the compiler by first writing an example. -Suppose we want to compile @racket[(add1 (add1 40))]. We already -know how to compile the @racket[40]: @racket[(Mov 'rax 40)]. To do -the increment (and decrement) we need to know a bit more x86-64. In -particular, the @tt{add} (and @tt{sub}) instruction is relevant. It -increments the contents of a register by some given amount. - -Concretely, the program that adds 1 twice to 40 looks like: +Suppose we want to compile @racket[(add1 (add1 40))]. We already know +how to compile the @racket[40]: @racket[(Mov 'rax 40)]. To do the +increment (and decrement) we need to know a bit more a86. In +particular, the @racket[Add] instruction is relevant. It increments +the contents of a register by some given amount. -@filebox-include[fancy-nasm blackmail "add1-add1-40.s"] +So, a program that adds 1 twice to 40 looks like: -The runtime stays exactly the same as before. -@shellbox["make add1-add1-40.run" "./add1-add1-40.run"] +@ex[ +(asm-interp + (prog (Global 'entry) + (Label 'entry) + (Mov 'rax 40) + (Add 'rax 1) + (Add 'rax 1) + (Ret)))] + -@section{A Compiler for Blackmail} +@;{filebox-include[fancy-nasm blackmail "add1-add1-40.s"]} -To compile Blackmail, we make use of two more a86 -instructions, @racket[Add] and @racket[Sub]: -@ex[ -(asm-display - (list (Label 'entry) - (Mov 'rax 40) - (Add 'rax 1) - (Add 'rax 1) - (Ret))) -] +@section{A Compiler for Blackmail} -The compiler consists of two functions: the first, which is given a -program, emits the entry point and return instructions, invoking -another function to compile the expression: +To compile Blackmail, we make use of two more a86 instructions, +@racket[Add] and @racket[Sub]. The compiler consists of two +functions: the first, which is given a program, emits the entry point +and return instructions, invoking another function to compile the +expression: @codeblock-include["blackmail/compile.rkt"] Notice that @racket[compile-e] is defined by structural recursion, much like the interpreter. +In the case of a unary primitive @racket[(Prim1 p e)], the compiler +first compiles the subexpression @racket[e] obtaining a list of +instructions that, when executed, will place @racket[e]'s value in the +@racket['rax] register. After that sequence of instructions, the +compiler emits instructions for carrying out the operation @racket[p], +defering to a helper function @racket[compile-op1]: + +@codeblock-include["blackmail/compile-ops.rkt"] + +This function either emits an @racket[Add] or @racket[Sub] +instruction, depending upon @racket[p]. We can now try out a few examples: @ex[ -(compile (Prim1 'add1 (Prim1 'add1 (Lit 40)))) -(compile (Prim1 'sub1 (Lit 8))) -(compile (Prim1 'add1 (Prim1 'add1 (Prim1 'sub1 (Prim1 'add1 (Lit -8)))))) -] +(compile-e (parse '(add1 (add1 40)))) +(compile-e (parse '(sub1 8))) +(compile-e (parse '(add1 (add1 (sub1 (add1 -8))))))] + +To see the complete code for these examples, we can use the +@racket[compile] function: + +@ex[ +(compile (parse '(add1 (add1 40)))) +(compile (parse '(sub1 8))) +(compile (parse '(add1 (add1 (sub1 (add1 -8))))))] + +We can also run the code produced in these examples in order to see +what they each produce: + +@ex[ +(asm-interp (compile (parse '(add1 (add1 40))))) +(asm-interp (compile (parse '(sub1 8)))) +(asm-interp (compile (parse '(add1 (add1 (sub1 (add1 -8)))))))] + +Based on this, it's useful to define an @racket[exec] function that +(should) behave like @racket[interp], just as we did for Abscond: + +@codeblock-include["blackmail/exec.rkt"] + +@ex[ +(exec (parse '(add1 (add1 40)))) +(exec (parse '(sub1 8))) +(exec (parse '(add1 (add1 (sub1 (add1 -8))))))] + +This function will be the basis of our compiler correctness statement +and a primary tool for testing the compiler. And give a command line wrapper for parsing, checking, and compiling in @link["code/blackmail/compile-stdin.rkt"]{@tt{compile-stdin.rkt}}, @@ -259,34 +330,26 @@ single command: @void[(shellbox "touch add1-add1-40.rkt")] @shellbox["make add1-add1-40.run" "./add1-add1-40.run"] -Likewise, to test the compiler from within Racket, we use -the same @racket[asm-interp] function to encapsulate running -assembly code: - -@ex[ -(asm-interp (compile (Prim1 'add1 (Prim1 'add1 (Lit 40))))) -(asm-interp (compile (Prim1 'sub1 (Lit 8)))) -(asm-interp (compile (Prim1 'add1 (Prim1 'add1 (Prim1 'add1 (Prim1 'add1 (Lit -8))))))) -] @section{Correctness and random testing} We can state correctness similarly to how it was stated for Abscond: -@bold{Compiler Correctness}: @emph{For all expressions @racket[e] and -integers @racket[i], if (@racket[e],@racket[i]) in @render-term[B -𝑩], then @racket[(asm-interp (compile e))] equals +@bold{Compiler Correctness}: @emph{For all @racket[e] @math{∈} +@tt{Expr} and @racket[i] @math{∈} @tt{Integer}, if @racket[(interp e)] +equals @racket[i], then @racket[(exec e)] equals @racket[i].} +(This statement is actually identical to the statement of correctness +for Abscond, however, it should be noted that the meaning of +@tt{Expr}, @racket[interp], @racket[exec] refer to their Blackmail +definitions.) And we can test this claim by comparing the results of running compiled and interpreted programs, leading to the following property, which hopefully holds: -@ex[ -(define (check-compiler e) - (check-eqv? (interp e) - (asm-interp (compile e))))] +@codeblock-include["blackmail/correct.rkt"] The problem, however, is that generating random Blackmail programs is less obvious compared to generating random Abscond programs @@ -311,6 +374,8 @@ e (check-compiler (random-expr))) ] +@section[#:tag "broken"]{A Broken Compiler} + It's now probably time to acknowledge a short-coming in our compiler. Although it's great that random testing is confirming the correctness of the compiler on @@ -332,10 +397,10 @@ x86 does. Let's see: @ex[ (define max-int (sub1 (expt 2 63))) (define min-int (- (expt 2 63))) -(asm-interp (compile (Lit max-int))) -(asm-interp (compile (Prim1 'add1 (Lit max-int)))) -(asm-interp (compile (Lit min-int))) -(asm-interp (compile (Prim1 'sub1 (Lit min-int))))] +(exec (Lit max-int)) +(exec (Prim1 'add1 (Lit max-int))) +(exec (Lit min-int)) +(exec (Prim1 'sub1 (Lit min-int)))] Now there's a fact you didn't learn in grade school: in the first example, adding 1 to a number made it smaller; in the @@ -358,6 +423,29 @@ correctness: (check-compiler (Prim1 'sub1 (Lit min-int))) ] +The problem also exists in Abscond in that we can write literals that +exceed these bounds. The interpreter has no problem with such +literals: + +@ex[ +(interp (Lit (add1 max-int)))] + +But the compiler will produce the wrong result: + +@ex[ +(exec (Lit (add1 max-int)))] + +It's also possible to exceed the bounds so thoroughly, that the +program can't even be compiled: + +@ex[ +(interp (Lit (expt 2 64))) +(eval:error (exec (Lit (expt 2 64))))] + +The issue here being that a @racket[Mov] instruction can only take an +argument that can be represented in 64-bits. + + What can we do? This is the basic problem of a program not satisfying its specification. We have two choices: @@ -366,17 +454,16 @@ satisfying its specification. We have two choices: @item{change the program (i.e. the compiler)} ] -We could change the spec to make it match the behaviour of -the compiler. This would involve writing out definitions -that match the ``wrapping'' behavior we see in the compiled -code. Of course if the specification is meant to capture -what Racket actually does, taking this route would be a -mistake. Even independent of Racket, this seems like a -questionable design choice. Wouldn't it be nice to reason -about programs using the usual laws of mathematics (or at -least something as close as possible to what we think of as -math)? For example, wouldn't you like know that -@racket[(< i (add1 i))] for all integers @racket[i]? +We could change the spec to make it match the behaviour of the +compiler. This would involve writing out definitions that match the +``wrapping'' behavior we see in the compiled code. Of course if the +specification is meant to capture what Racket actually does, taking +this route would be a mistake. Even independent of Racket, this seems +like a questionable design choice. Wouldn't it be nice to reason about +programs using the usual laws of mathematics (or at least something as +close as possible to what we think of as math)? For example, wouldn't +you like know that @racket[(< i (add1 i))] for all integers +@racket[i]? Unforunately, the other choice seems to paint us in to a corner. How can we ever hope to represent all possible @@ -402,15 +489,13 @@ these pieces in the two compilers we've written: @itemlist[@item{we use @racket[parse] to convert an s-expression into an AST}]} -@item{@bold{Checked} to make sure code is well-formed (and well-typed)} - -@item{@bold{Simplified} into some convenient @bold{Intermediate Representation} +@item{@bold{Checked} to make sure code is well-formed -@itemlist[@item{we don't do any; the AST is the IR}]} +@itemlist[@item{we don't current do any, but more checking will come with more sophisticated langauges.}]} @item{@bold{Optimized} into (equivalent) but faster program -@itemlist[@item{we don't do any}]} +@itemlist[@item{we don't do any yet}]} @item{@bold{Generated} into assembly x86 @@ -428,7 +513,8 @@ Our recipe for building compiler involves: @itemlist[#:style 'ordered @item{Build intuition with @bold{examples},} @item{Model problem with @bold{data types},} -@item{Implement compiler via @bold{type-transforming-functions},} +@item{Implement compiler via @bold{type-guided-functions},} + @item{Validate compiler via @bold{tests}.} ] diff --git a/www/notes/con.scrbl b/www/notes/con.scrbl index c2a1ebac..5365b33c 100644 --- a/www/notes/con.scrbl +++ b/www/notes/con.scrbl @@ -18,7 +18,7 @@ @(ev '(require rackunit a86)) @(for-each (λ (f) (ev `(require (file ,(path->string (build-path langs "con" f)))))) - '("interp.rkt" "compile.rkt" "parse.rkt" "ast.rkt" "random.rkt")) + '("main.rkt" "random.rkt" "correct.rkt")) @title[#:tag "Con"]{Con: branching with conditionals} @@ -44,10 +44,6 @@ This leads to the following grammar for concrete Con: And abstract grammar: -@centered{@render-language[C]} - -Which can be modeled with the following definitions: - @codeblock-include["con/ast.rkt"] @;{ @@ -59,6 +55,11 @@ The parser is similar to what we've seen before: @codeblock-include["con/parse.rkt"] +@ex[ +(parse '(if (zero? 42) 1 2)) +(parse '(if (zero? (sub1 1)) (add1 2) (sub1 7))) +(parse '(if (zero? 0) (if (zero? 1) 2 3) 4))] + @section{Meaning of Con programs} @@ -85,7 +86,7 @@ Let's consider some examples (using concrete notation): ] -The semantics is inductively defined as before. There are @emph{two} +@;{The semantics is inductively defined as before. There are @emph{two} new rules added for handling if-expressions: one for when the test expression means @racket[0] and one for when it doesn't. @@ -160,19 +161,14 @@ according to @render-term[C 𝑪𝒓]: @(show-judgment 𝑪 0 1) } +} -The interpreter has an added case for if-expressions, which -recursively evaluates the test expression and branches based on its -value. +The semantics is defined by extending the interpreter to add a case +for if-expressions, which recursively evaluates the test expression +and branches based on its value. @codeblock-include["con/interp.rkt"] -We've also made one trivial change, which is to move @racket[interp-prim1] to its -own module. This will be useful in the future when more primitive operations are -added, we won't have to clutter up the interpreter: - -@codeblock-include["con/interp-prim.rkt"] - We can confirm the interpreter computes the right result for the examples given earlier (using @racket[parse] to state the examples with concrete notation): @@ -184,9 +180,6 @@ with concrete notation): (interp (parse '(if (zero? (add1 0)) (add1 2) (if (zero? (sub1 1)) 1 0)))) ] -The argument for the correctness of the interpreter follows the same -structure as for @seclink["Blackmail"]{Blackmail}, but with an added case for -if-expressions. @section{An Example of Con compilation} @@ -247,14 +240,14 @@ this, we arrive at the following code for the compiler: @racketblock[ (let ((l0 (gensym 'if)) (l1 (gensym 'if))) - (append (compile-e e1) - (list (Cmp 'rax 0) - (Je l0)) - (compile-e e3) - (list (Jmp l1) - (Label l0)) - (compile-e e2) - (list (Label l1)))) + (seq (compile-e e1) + (Cmp 'rax 0) + (Je l0) + (compile-e e3) + (Jmp l1) + (Label l0) + (compile-e e2) + (Label l1))) ] @@ -266,11 +259,6 @@ The complete compiler code is: @codeblock-include["con/compile.rkt"] -Mirroring the change we made to the interpreter, we separate out a -module for compiling primitives: - -@codeblock-include["con/compile-ops.rkt"] - Let's take a look at a few examples: @ex[ (define (show s) @@ -284,13 +272,10 @@ Let's take a look at a few examples: And confirm they are running as expected: @ex[ -(define (tell s) - (asm-interp (compile (parse s)))) - -(tell '(if (zero? 8) 2 3)) -(tell '(if (zero? 0) 1 2)) -(tell '(if (zero? 0) (if (zero? 0) 8 9) 2)) -(tell '(if (zero? (if (zero? 2) 1 0)) 4 5)) +(exec (parse '(if (zero? 8) 2 3))) +(exec (parse '(if (zero? 0) 1 2))) +(exec (parse '(if (zero? 0) (if (zero? 0) 8 9) 2))) +(exec (parse '(if (zero? (if (zero? 2) 1 0)) 4 5))) ] @@ -298,17 +283,12 @@ And confirm they are running as expected: The statement of correctness follows the same outline as before: -@bold{Compiler Correctness}: @emph{For all expressions @racket[e] and -integers @racket[i], if (@racket[e],@racket[i]) in @render-term[C 𝑪], -then @racket[(asm-interp (compile e))] equals @racket[i].} +@bold{Compiler Correctness}: @emph{For all @racket[e] @math{∈} @tt{Expr}, +@racket[(interp e)] equals @racket[(exec e)].} Again, we formulate correctness as a property that can be tested: -@ex[ -(define (check-compiler e) - (check-equal? (asm-interp (compile e)) - (interp e) - e))] +@codeblock-include["con/correct.rkt"] Generating random Con programs is essentially the same as Blackmail programs, and are provided in a @link["con/random.rkt"]{random.rkt} @@ -323,3 +303,7 @@ module. (for ([i (in-range 10)]) (check-compiler (random-expr))) ] + +This compiler has continues to have the issues identified in +@secref{broken}, but appears correct in its implementation of +conditional expressions. \ No newline at end of file diff --git a/www/notes/dodger.scrbl b/www/notes/dodger.scrbl index 83945c16..ae85114a 100644 --- a/www/notes/dodger.scrbl +++ b/www/notes/dodger.scrbl @@ -15,7 +15,7 @@ @(ev '(require rackunit a86)) @(for-each (λ (f) (ev `(require (file ,(path->string (build-path langs "dodger" f)))))) - '("interp.rkt" "compile.rkt" "ast.rkt" "parse.rkt" "types.rkt")) + '("main.rkt" "random.rkt" "correct.rkt")) @title[#:tag "Dodger"]{Dodger: addressing a lack of character} @@ -55,6 +55,12 @@ The s-expression parser is defined as follows: @codeblock-include["dodger/parse.rkt"] +@ex[ +(parse #\a) +(parse '(char? #\λ)) +(parse '(char->integer #\λ)) +(parse '(integer->char 97))] + @section{Characters in Racket} @@ -98,8 +104,6 @@ system, described below, takes care of printing them. @section{Meaning of Dodger programs} -The semantics are omitted for now (there's really nothing new that's interesting). - The interpeter is much like that of Dupe, except we have a new base case: @codeblock-include["dodger/interp.rkt"] @@ -154,41 +158,40 @@ We can use the following encoding scheme: Notice that each kind of value is disjoint. -We can write an interpreter that operates at the level of bits just as we did for Dupe; -notice that it only ever constructs characters at the very end when converting from -bits to values. Let's first define our bit encodings: +We can write down functions for encoding into and decoding out of bits: @codeblock-include["dodger/types.rkt"] -And now the interpreter: - -@codeblock-include["dodger/interp-bits.rkt"] - - @section{A Compiler for Dodger} -Compilation is pretty easy, particularly since we took the -time to develop the bit-level interpreter. The compiler uses -the same bit-level representation of values and uses logical -operations to implement the same bit manipulating operations. -Most of the work happens in the compilation of primitives: +Compilation is pretty easy. The compiler uses the bit-level +representation of values described earlier and uses logical operations +to implement the bit manipulating operations. Most of the work +happens in the compilation of primitives: @codeblock-include["dodger/compile-ops.rkt"] -The top-level compiler for expressions now has a case for character -literals, which are compiled like other kinds of values: +In fact the @racket[compile] is identical to its Dodger predecessor, +since all the new work is done in @racket[value->bits] and +@racket[compile-op1]: @codeblock-include["dodger/compile.rkt"] We can take a look at a few examples: @ex[ -(define (show e) - (displayln (asm-string (compile-e (parse e))))) -(show '#\a) -(show '#\λ) -(show '(char->integer #\λ)) -(show '(integer->char 97))] +(compile-e (parse #\a)) +(compile-e (parse #\λ)) +(compile-e (parse '(char->integer #\λ))) +(compile-e (parse '(integer->char 97)))] + +We can run them: + +@ex[ +(exec (parse #\a)) +(exec (parse #\λ)) +(exec (parse '(char->integer #\λ))) +(exec (parse '(integer->char 97)))] @section{A Run-Time for Dodger} @@ -210,13 +213,4 @@ the case of printing characters: @filebox-include[fancy-c dodger "print.c"] -Will these pieces in place, we can try out some examples: - -@ex[ - (define (run e) - (bits->value (asm-interp (compile (parse e))))) - (run '#\a) - (run '(integer->char (add1 (char->integer #\a)))) - (run '(integer->char 955)) -] - +@;{FIXME: examples should be creating executable at the command-line, not exec.} \ No newline at end of file diff --git a/www/notes/dupe.scrbl b/www/notes/dupe.scrbl index 5c52a18c..20535ce0 100644 --- a/www/notes/dupe.scrbl +++ b/www/notes/dupe.scrbl @@ -18,7 +18,7 @@ @(ev '(require rackunit a86)) @(for-each (λ (f) (ev `(require (file ,(path->string (build-path langs "dupe" f)))))) - '("interp.rkt" "interp-prim.rkt" "compile.rkt" "ast.rkt" "parse.rkt" "random.rkt" "types.rkt")) + '("main.rkt" "random.rkt" "correct.rkt")) @title[#:tag "Dupe"]{Dupe: a duplicity of types} @@ -39,7 +39,7 @@ To start, we will consider two: integers and booleans. We'll call it @bold{Dupe}. We will use the following syntax, which relaxes the syntax of Con to -make @racket['(zero? _e)] its own expression form and conditionals can +make @racket[(zero? _e)] its own expression form and conditionals can now have arbitrary expressions in test position: @racket[(if _e0 _e1 _e2)] instead of just @racket[(if (zero? _e0) _e1 _e2)]. We also add syntax for boolean literals. @@ -50,21 +50,18 @@ Together this leads to the following grammar for concrete Dupe. And abstract Dupe: -@centered{@render-language[D]} - - -One thing to take note of is the new nonterminal @math{v} -which ranges over @bold{values}, which are integers and -booleans. - -Abstract syntax is modelled with the following datatype definition: - @codeblock-include["dupe/ast.rkt"] The s-expression parser is defined as follows: @codeblock-include["dupe/parse.rkt"] +@ex[ +(parse '#t) +(parse '(if #t 1 2)) +(parse '(zero? 8)) +(parse '(if (zero? 8) 1 2))] + @section{Meaning of Dupe programs} To consider the meaning of Dupe programs, we must revisit the meaning @@ -117,7 +114,7 @@ Languages adopt several approaches: We are going to start by taking the last approach. Later we can reconsider the design, but for now this is a simple approach. - +@;{ The semantics is still a binary relation between expressions and their meaning, however the type of the relation is changed to reflect the values of Dupe, which may either be integers or booleans: @@ -179,66 +176,72 @@ meaning to an @racket[(Prim1 'add1 _e0)] expression and it's premise is that @math{(@racket[(Lit #f)], i) ∉ 𝑫} for any @math{i}. So there's no value @math{v} such that @math{(@racket[(Prim1 'add1 (Lit #f))], v) ∈ 𝑫}. This expression is @bold{undefined} according to the semantics. +} - -The interpreter follows the rules of the semantics closely and is -straightforward: +The interpreter follows a similar pattern to what we've done so far, +although notice that the result of interpretation is now a @tt{Value}: +either a boolean or an integer: @codeblock-include["dupe/interp.rkt"] -And the interpretation of primitives closely matches @math{𝑫-𝒑𝒓𝒊𝒎}: +The interpretation of primitives is extended to account for the new +@racket[zero?] primitive: @codeblock-include["dupe/interp-prim.rkt"] + We can confirm the interpreter computes the right result for the examples given earlier: @ex[ -(interp (Lit #t)) -(interp (Lit #f)) -(interp (If (Lit #f) (Lit 1) (Lit 2))) -(interp (If (Lit #t) (Lit 1) (Lit 2))) -(interp (If (Lit 0) (Lit 1) (Lit 2))) -(interp (If (Lit 7) (Lit 1) (Lit 2))) -(interp (If (Prim1 'zero? (Lit 7)) (Lit 1) (Lit 2))) -] +(interp (parse #t)) +(interp (parse #f)) +(interp (parse '(if #f 1 2))) +(interp (parse '(if #t 1 2))) +(interp (parse '(if 0 1 2))) +(interp (parse '(if 7 1 2))) +(interp (parse '(if (zero? 7) 1 2)))] + +@section{(Lack of) Meaning for some Dupe programs} -Correctness follows the same pattern as before, although it is worth -keeping in mind the ``hypothetical'' form of the statement: @emph{if} -the expression has some meaning, then the interpreter must produce it. -In cases where the semantics of the expression is undefined, the -interpreter can do whatever it pleases; there is no specification. +Viewed as a specification, what is this interpreter saying about programs that +do nonsensical things like @racket[(add1 #f)]? +First, let's revise the statement of compiler correctness to reflect +the fact that @racket[interp] can return different kinds of values: -@bold{Interpreter Correctness}: @emph{For all Dupe expressions -@racket[e] and values @racket[v], if (@racket[e],@racket[v]) in -@render-term[D 𝑫], then @racket[(interp e)] equals +@bold{Compiler Correctness}: @emph{For all @racket[e] @math{∈} +@tt{Expr} and @racket[v] @math{∈} @tt{Value}, if @racket[(interp e)] +equals @racket[v], then @racket[(exec e)] equals @racket[v].} -Consider what happens with @racket[interp] on undefined programs such -as @racket[(add1 #f)]: the interpretation of this expression is just -the application of the Racket @racket[add1] function to @racket[#f], -which results in the @racket[interp] program crashing and Racket -signalling an error: +Now, the thing to notice here is that this specification only +obligates the compiler to produce a result consistent with the +interpreter when the interpreter produces a @emph{value}. It says +nothing about what the compiler must produce if the intepreter fails +to produce a value, which is exactly what happens when the interpreter +is run on examples like @racket[(add1 #f)]: @ex[ -(eval:error (interp (Prim1 'add1 (Lit #f)))) -] - -This isn't a concern for correctness, because the interpreter is free -to crash (or do anything else) on undefined programs; it's not in -disagreement with the semantics, because there is no semantics. - -From a pragmatic point of view, this is a concern because it -complicates testing. If the interpreter is correct and every -expression has a meaning (as in all of our previous languages), it -follows that the interpreter can't crash on any @tt{Expr} input. That -makes testing easy: think of an expression, run the interpreter to -compute its meaning. But now the interpreter may break if the -expression is undefined. We can only safely run the interpreter on -expressions that have a meaning. - -We'll return to this point in the design of later langauges. +(eval:error (interp (parse '(add1 #f))))] + +Since the intepreter does not return a value on such an input, the +meaning of such expression is undefined and the compiler is +unconstrained and may do whatever it likes: crash at compile-time, +crash at run-time, return 7, format your hard drive; there are no +wrong answers. (These possibilities hopeful make clear while +undefined behavior opens up all kinds of bad outcomes and has been the +source of many costly computing failures and vulnerabilities and why +as a language designer, leaving the behavior of some programs +undefined may not be a great choice. As a compiler implementor, +however, it makes our life easier: we simply don't need to worry about +what happens on these kinds of programs.) + +This will, however complicate testing the correctness of the compiler, +which is addressed in @secref["Correctness_and_testing"]. + +Let's now turn to the main technical challenge introduced by the Dupe +langauge. @section{Ex uno plures: Out of One, Many} @@ -253,30 +256,98 @@ as a 64-bit integer. But we put off worrying about this until later.) The problem now is how to represent integers @emph{and} booleans, which should be @bold{disjoint} sets of values. Representing these -things in the interpreter as Racket values was easy: we used booleans -and integers. Representing this things in x86 will be more -complicated. x86 doesn't have a notion of ``boolean'' per se and it -doesn't have a notion of ``disjoint'' datatypes. There is only one -data type and that is: bits. - -We chose 64-bit integers to represent Con integers, because that's the -kind of value that can be stored in a register or used as an argument -to a instruction. It's the kind of thing that can be returned to the -C run-time. It's (more or less) the only kind of thing we have to -work with. So had we started with booleans instead of integers, we -still would have represented values as a sequence of bits -@emph{because that's all there is}. Now that we have booleans -@emph{and} integers, we will have to represent both as bits (64-bit -integers). The bits will have to encode both the value and the -@emph{type} of value. - -@margin-note{As discussed in the lecture video, there are many possible ways of -representing multiple types, this is just how we've decided to do it.} +things in the interpreter as Racket values was easy: we Racket +booleans to represent Dupe booleans; we used Racket integers to +represent Dupe integers. Since Racket booleans and integers are +disjoint types, everything was easy and sensible. + +Representing this things in x86 will be more complicated. x86 doesn't +have a notion of ``boolean'' per se and it doesn't have a notion of +``disjoint'' datatypes. There is only one data type and that is: +bits. + +To make the problem concrete, consider the Dupe expression +@racket[5]; we know the compiler is going to emit an single instruction +that moves this value into the @racket[rax] register: + +@ex[ +(Mov 'rax 5)] + +But now consider @racket[#t]. The compiler needs to emit an +instruction that moves ``@racket[#t]'' into @racket[rax], but the +@racket[Mov] instruction doesn't take booleans: + +@ex[ +(eval:error (Mov 'rax #t))] + +We have to move some 64-bit integer into @racket[rax], but the +question is: which one? + +The immediate temptation is to just pick a couple of integers, one for +representing @racket[#t] and one for @racket[#f]. We could follow the +C tradition and say @racket[#f] will be @racket[0] and @racket[#t] +will be 1. So compiling @racket[#t] would emit: + +@ex[ +(Mov 'rax 1)] + +And compiling @racket[#f] would emit: + +@ex[ +(Mov 'rax 0)] + +Seems reasonable. Well except that the specification of @racket[if] +in our interpreter requires that @racket[(if 0 1 2)] evaluates to +@racket[1] and @racket[(if #f 1 2)] evaluates to @racket[2]. But +notice that @racket[0] and @racket[#f] compile to exactly the same +thing under the above scheme. How could the code emitted for +@racket[if] possibly distinguish between whether the test expression +produced @racket[#f] or @racket[0] if they are represented by the same +bits? + +So the compiler is doomed to be incorrect on some expressions under +this scheme. Maybe we should revise our choice and say @racket[#t] +will be represented as @racket[0] and @racket[#f] should be +@racket[1], but now the compiler must be incorrect either on +@racket[(if #f 1 2)] or @racket[(if 1 1 2)]. We could choose +different integers for the booleans, but there's no escaping it: just +picking some bits for @racket[#t] and @racket[#f] without also +changing the representation of integers is bound to fail. + +Here's another perspective on the same problem. Our compiler emits +code that leaves the value of an expression in the @racket[rax] +register, which is how the value is returned to the run-time system +that then prints the result. Things were easy before having only +integers: the run-time just prints the integer. But with booleans, +the run-time will need to print either @tt{#t} or @tt{#f} if the +result is the true or false value. But if we pick integers to +represent @racket[#t], how can the run-time know whether it should +print the result as an integer or as @tt{#t}? It can't! + +The fundamental problem here is that according to our specification, +@racket[interp], these values need to be disjoint. No value can be +both an integer @emph{and} a boolean. Yet in x86 world, only one kind +of thing exists: bits, so how can we make values of different kinds? + +Ultimately, the only solution is to incorporate an explicit +representation of the @emph{type} of a value and use this information +to distinguish between values of different type. We could do this in +a number of ways: we could desginate another register to hold (a +representation of) the type of the value in @racket[rax]. This would +mean you couldn't know what the value in @racket[rax] meant without +consulting this auxiliary register. In essence, this is just using +more than 64-bits to represent values. Alternatively, we could +instead encode this information within the 64-bits so that only a +single register is needed to completely determine a value. This does +come at a cost though: if some bits are needed to indicate the type, +there are fewer bits for the values! Here is the idea of how this could be done: We have two kinds of data: integers and booleans, so we could use one bit to indicate whether a value is a boolean or an integer. The remaining 63 bits can be used to represent the value itself, either true, false, or some integer. +There are other approaches to solving this problem, but this is a +common approach called @emph{type-tagging}. Let's use the least significant bit to indicate the type and let's use @binary[type-int] for integer and @@ -298,9 +369,29 @@ is represented by the number @racket[#,(value->bits #f)] One nice thing about our choice of encoding: @racket[0] is represented as @racket[0] (@binary[0 2]). + +To encode a value as bits: + +@itemlist[ + +@item{If the value is an integer, shift the value to the left one bit. (Mathematically, this has the effect of doubling the number.)} +@item{If the value is a boolean, + @itemlist[ + @item{if it's the boolean @racket[#t], encode as @racket[#,(value->bits #t)],} + @item{if it's the value @racket[#f], encode as @racket[#,(value->bits #f)].}]}] + +To decode bits as a value: + +@itemlist[ +@item{If the least significant bit is @racket[0], shift to the right one bit. (Mathematically, this has the effect of halving the number.)} +@item{If the bits are @racket[#,(value->bits #t)], decode to @racket[#t].} +@item{If the bits are @racket[#,(value->bits #f)], decode to @racket[#f].} +@item{All other bits don't encode a value.}] + + If you wanted to determine if a 64-bit integer represented -an integer or a boolean, you simply need to inquire about -the value of the least significant bit. At a high-level, +an integer value or a boolean value, you simply need to inquire about +the value of the least significant bit. Mathematically, this just corresponds to asking if the number is even or odd. Odd numbers end in the bit (@binary[1]), so they reprepresent booleans. Even numbers represent integers. Here @@ -347,7 +438,11 @@ We can also write the inverse: ) The interpreter operates at the level of @tt{Value}s. The compiler -will have to work at the level of @tt{Bits}. Of course, we could, +will have to work at the level of @tt{Bits}. + + +@;{ +Of course, we could, as an intermediate step, define an interpreter that works on bits, which may help us think about how to implement the compiler. @@ -817,13 +912,12 @@ Notice the last two examples. What's going on? The @racket[interp.v2] function is also a correct interpreter for Dupe, and importantly, it sheds light on how to implement the compiler since it uses the same representation of values. +} @section{An Example of Dupe compilation} The most significant change from Con to Dupe for the compiler is the -change in representation, but having sorted those issues out at the -level of @racket[interp-bits], it should be pretty easy to write the -compiler. +change in representation of values. Let's consider some simple examples: @@ -831,30 +925,36 @@ Let's consider some simple examples: @item{@racket[42]: this should compile just like integer literals before, but needs to use the new representation, i.e. the compiler -should produce @racket[(Mov 'rax 84)], which is @racket[(* 42 2)].} +should produce @racket[(Mov 'rax #,(value->bits 42))], which is +@racket[42] shifted to the left @racket[#,int-shift]-bit.} @item{@racket[#f]: this should produce @racket[(Mov 'rax #,(value->bits #f))].} @item{@racket[#t]: this should produce @racket[(Mov 'rax #,(value->bits #t))].} @item{@racket[(add1 _e)]: this should produce the instructions for -@racket[_e] followed by an instruction to add @racket[#,(value->bits 1)], which is -just how @racket[interp-bits] interprets an @racket[add1].} +@racket[_e], which when executed would leave @emph{the encoding of the +value of @racket[_e]} in the @racket[rax] register. To these +instructions, the compiler needs to append instructions that will +leave the encoding of one more than the value of @racket[_e] in +@racket[rax]. In other words, it should add @racket[#,(value->bits +1)] to @racket[rax]!} @item{@racket[(sub1 _e)]: should work like @racket[(add1 _e)] but subtracting @racket[#,(value->bits 1)].} -@item{@racket[(zero? _e)]: this should produce the - instructions for @racket[_e] followed by instructions that - compare @racket['rax] to 0 and set @racket['rax] to - @racket[#t] (i.e. @binary[(value->bits #t) 2]) if true and - @racket[#f] (i.e. @binary[(value->bits #f) 2]) otherwise. +@item{@racket[(zero? _e)]: this should produce the instructions for + @racket[_e] followed by instructions that compare @racket[rax] to + the encoding of the value @racket[0], which is just the bits + @racket[0], and set @racket[rax] to @racket[#t] + (i.e. @binary[(value->bits #t) 2]) if true and @racket[#f] + (i.e. @binary[(value->bits #f) 2]) otherwise. This is a bit different from what we saw with Con, which combined conditional execution with testing for equality to @racket[0]. Here there is no need to @emph{jump} anywhere based on whether @racket[_e] produces @racket[0] or not. Instead we want to move either the -encoding of @racket[#t] or @racket[#f] into @racket['rax] depending on +encoding of @racket[#t] or @racket[#f] into @racket[rax] depending on what @racket[_e] produces. To accomplish that, we can use a new kind of instruction, the @bold{conditional move} instruction: @racket[Cmov]. @@ -864,7 +964,7 @@ of instruction, the @bold{conditional move} instruction: @racket[Cmov]. compiling each subexpression, generating some labels and the appropriate comparison and conditional jump. The only difference is we now want to compare the result of executing @racket[_e0] with -@racket[#f] (i.e. @binary[(value->bits #f) 2]) and jumping to the code for @racket[_e2] when +(the encoding of the value) @racket[#f] (i.e. @binary[(value->bits #f) 2]) and jumping to the code for @racket[_e2] when they are equal.} ] @@ -880,7 +980,14 @@ they are equal.} @section{A Compiler for Dupe} -Based on the examples, we can write the compiler: +Based on the examples, we can write the compiler. Notice that the +compiler uses the @racket[value->bits] function we wrote earlier for +encoding values as bits. This helps make the code more readable and +easier to maintain should the encoding change in the future. But it's +important to note that this function is used only at compile-time. By +the time the assemble code executes (i.e. run-time) the +@racket[value->bits] function (and indeed all Racket functions) no +longer exists. @codeblock-include["dupe/compile.rkt"] @@ -905,34 +1012,43 @@ but you'll notice the results are a bit surprising: The reason for this is @racket[asm-interp] doesn't do any interpretation of the bits it gets back; it is simply -producing the integer that lives in @racket['rax] when the +producing the integer that lives in @racket[rax] when the assembly code finishes. This suggests adding a call to @racket[bits->value] can be added to interpret the bits as values: @ex[ -(define (interp-compile e) - (bits->value (asm-interp (compile e)))) - -(interp-compile (Lit #t)) -(interp-compile (Lit #f)) -(interp-compile (parse '(zero? 0))) -(interp-compile (parse '(zero? -7))) -(interp-compile (parse '(if #t 1 2))) -(interp-compile (parse '(if #f 1 2))) -(interp-compile (parse '(if (zero? 0) (if (zero? 0) 8 9) 2))) -(interp-compile (parse '(if (zero? (if (zero? 2) 1 0)) 4 5))) -] +(bits->value (asm-interp (compile (Lit #t))))] + +Which leads us to the following definition of @racket[exec]: +@codeblock-include["dupe/exec.rkt"] +@ex[ +(exec (parse #t)) +(exec (parse #f)) +(exec (parse '(zero? 0))) +(exec (parse '(zero? -7))) +(exec (parse '(if #t 1 2))) +(exec (parse '(if #f 1 2))) +(exec (parse '(if (zero? 0) (if (zero? 0) 8 9) 2))) +(exec (parse '(if (zero? (if (zero? 2) 1 0)) 4 5)))] The one last peice of the puzzle is updating the run-time system to incorporate the new representation. The run-time system is essentially playing the role of @racket[bits->value]: it determines what is being represented and prints it appropriately. -For the run-time system, we define the bit representations in a header -file corresponding to the definitions given in @tt{types.rkt}: +@section{Updated Run-time System for Dupe} + +Any time there's a change in the representation or set of values, +there's going to be a required change in the run-time system. From +Abscond through Con, there were no such changes, but now we have to +udpate our run-time system to reflect the changes made to values in +Dupe. + +We define the bit representations in a header file corresponding to +the definitions given in @tt{types.rkt}: @filebox-include[fancy-c dupe "types.h"] @@ -965,68 +1081,112 @@ type of the result and print accordingly: @section{Correctness and testing} -We can randomly generate Dupe programs. The problem is many randomly -generated programs will have type errors in them: +We already established our definition of correctness: + +@bold{Compiler Correctness}: @emph{For all @racket[e] @math{∈} +@tt{Expr} and @racket[v] @math{∈} @tt{Value}, if @racket[(interp e)] +equals @racket[v], then @racket[(exec e)] equals +@racket[v].} + +As a starting point for testing, we can consider a revised version of +@racket[check-compiler] that also uses @racket[bits->value]: @ex[ -(eval:alts (require "random.rkt") (void)) -(random-expr) -(random-expr) -(random-expr) -(random-expr) -(random-expr) -(random-expr) -] +(define (check-compiler* e) + (check-equal? (interp e) + (exec e)))] + +This works just fine for meaningful expressions: -When interpreting programs with type errors, we get @emph{Racket} -errors, i.e. the Racket functions used in the implementation of the -interpreter will signal an error: @ex[ -(eval:error (interp (parse '(add1 #f)))) -(eval:error (interp (parse '(if (zero? #t) 7 8)))) -] +(check-compiler* (parse #t)) +(check-compiler* (parse '(add1 5))) +(check-compiler* (parse '(zero? 4))) +(check-compiler* (parse '(if (zero? 0) 1 2)))] + +But, as discussed earlier, the hypothetical form of the compiler +correctness statement is playing an important role: @emph{if} the +interpreter produces a value, the compiler must produce code that when +run produces the (representation of) that value. But if the +interpeter does not produce a value, all bets are off. + +From a testing perspective, this complicates how we use the +interpreter to test the compiler. If every expression has a meaning +according to @racket[interp] (as in all of our previous languages), it +follows that the interpreter cannot crash on any @tt{Expr} input. +That makes testing easy: think of an expression, run the interpreter +to compute its meaning and compare it to what running the compiled +code produces. But now the interpreter may break if the expression is +undefined. We can only safely run the interpreter on expressions that +have a meaning. + +This means the above definition of @racket[check-compiler] won't +suffice, because this function crashes on inputs like @racket[(add1 +#f)], even though such an example doesn't actually demonstrate a +problem with the compiler: -On the other hand, the compiler may produce bits that are illegal +@ex[ +(check-compiler* (parse '(add1 #f)))] + +To overcome this issue, we can take advantage of Racket's exception +handling mechanism to refine the @racket[check-compiler] function: + +@codeblock-include["dupe/correct.rkt"] + +This version installs an exception handler around the call to +@racket[interp] and in case the interpreter raises an exception (such +as when happens when interpreting @racket[(add1 #f)]), then the +handler simply returns the exception itself. + +The function then guards the @racket[check-equal?] test so that the +test is run @emph{only} when the result was not an exception. (It's +important that the compiler is not run within the exception handler +since we don't want the compiler crashing to circumvent the test: if +the interpreter produces a value, the compiler is not allowed to +crash!) + +We can confirm that the compiler is still correct on meaningful expressions: + +@ex[ +(check-compiler (parse #t)) +(check-compiler (parse #f)) +(check-compiler (parse '(if #t 1 2))) +(check-compiler (parse '(if #f 1 2))) +(check-compiler (parse '(if 0 1 2))) +(check-compiler (parse '(if 7 1 2))) +(check-compiler (parse '(if (zero? 7) 1 2)))] + + +For meaningless expressions, the compiler may produce bits that are illegal or, even worse, simply do something by misinterpreting the meaning of the bits: @ex[ -(eval:error (interp-compile (parse '(add1 #f)))) -(interp-compile (parse '(if (zero? #t) 7 8))) +(eval:error (exec (parse '(add1 #f)))) +(exec (parse '(if (zero? #t) 7 8))) ] -@;codeblock-include["dupe/correct.rkt"] +Yet these are not counter-examples to the compilers correctness: -This complicates testing the correctness of the compiler. Consider -our usual appraoch: @ex[ -(define (check-correctness e) - (check-equal? (interp-compile e) - (interp e))) +(check-compiler (parse '(add1 #f))) +(check-compiler (parse '(if (zero? #t) 7 8)))] -(check-correctness (parse '(add1 7))) -;;(eval:error (check-correctness (parse '(add1 #f)))) -] - -This isn't a counter-example to correctness because @racket['(add1 -#f)] is not meaningful according to the semantics. Consequently the -interpreter and compiler are free to do anything on this input. -Since we know Racket will signal an error when the interpreter tries -to interpret a meaningless expression, we can write an alternate -@racket[check-correctness] function that first runs the interpreter -with an exception handler installed. Should an error occur, -the test is ignored, otherwise the value produced is compared -to that of the compiler: +With this set up, we can randomly generate Dupe programs and throw +them at @racket[check-compiler]. Many randomly generated programs will +have type errors in them, but with our revised version +@racket[check-compiler], it won't matter (although it's worth thinking about +how well this is actually testing the compiler). @ex[ -(define (check-correctness e) - (with-handlers ([exn:fail? void]) - (let ((v (interp e))) - (check-equal? v (interp-compile e))))) - -(check-correctness (parse '(add1 7))) -(check-correctness (parse '(add1 #f))) +(eval:alts (require "random.rkt") (void)) +(random-expr) +(random-expr) +(random-expr) +(random-expr) +(random-expr) +(random-expr) +(for ([i (in-range 10)]) + (check-compiler (random-expr))) ] -Using this approach, we check the equivalence of the results only when -the interpreter runs without causing an error. diff --git a/www/notes/evildoer.scrbl b/www/notes/evildoer.scrbl index 7f771bdb..3ae9feb8 100644 --- a/www/notes/evildoer.scrbl +++ b/www/notes/evildoer.scrbl @@ -4,6 +4,7 @@ @(require redex/pict racket/runtime-path scribble/examples + evildoer/types "../fancyverb.rkt" "utils.rkt" "ev.rkt" @@ -19,7 +20,7 @@ @(ev '(require rackunit a86)) @(for-each (λ (f) (ev `(require (file ,(path->string (build-path langs "evildoer" f)))))) - '("interp.rkt" "interp-io.rkt" "compile.rkt" "ast.rkt" "parse.rkt")) + '("main.rkt" "compile-ops.rkt" "correct.rkt")) @(ev `(current-directory ,(path->string (build-path langs "evildoer")))) @(void (ev '(with-output-to-string (thunk (system "make runtime.o"))))) @@ -154,6 +155,16 @@ The s-expression parser is defined as follows: @codeblock-include["evildoer/parse.rkt"] +@ex[ +(parse 'eof) +(parse '(void)) +(parse '(read-byte)) +(parse '(peek-byte)) +(parse '(write-byte 97)) +(parse '(eof-object? eof)) +(parse '(begin (write-byte 97) + (write-byte 98)))] + @section{Reading and writing bytes in Racket} @@ -297,16 +308,6 @@ can then use to assert the expected behavior: @section{Meaning of Evildoer programs} -Formulating the semantics of Evildoer is more complicated -than the languages we've developed so far. Let's put it off -for now and instead focus on the interpreter, which remains -basically as simple as before. The reason for this disparity -is that math doesn't have side-effects. The formal semantics -will need to account for effectful computations without -itself having access to them. Racket, on the other hand, can -model effectful computations directly as effectful Racket -programs. - Here's an interpreter for Evildoer: @codeblock-include["evildoer/interp.rkt"] @@ -328,8 +329,9 @@ read and write: (interp (parse '(write-byte (read-byte)))))) ] -We can also build a useful utility for interpreting programs -with strings representing stdin and stdout: +Using @racket[with-input-from-string] and +@racket[with-output-to-string], we can also build a useful utility for +interpreting programs with strings representing stdin and stdout: @codeblock-include["evildoer/interp-io.rkt"] @@ -347,6 +349,7 @@ computation into a pure one: (cons (void) "h")) ] +@;{ OK, so now, what about the formal mathematical model of Evildoer? We have to reconsider the domain of program meanings. No longer does an expression just mean a value; @@ -366,251 +369,201 @@ facility, but instead of capturing the effects with string ports, we will define the meaning of effects directly. (Semantics omitted for now.) +} -@section{A Run-Time for Evildoer} +@section{Encoding values in Evildoer} -With new values comes the need to add new bit encodings. So -we add new encodings for @racket[eof] and @racket[void]: +With new values, namely the void and eof values, comes the need to add +new bit encodings. So we add new encodings for @racket[eof] and +@racket[void], for which we simply pick two unused bit patterns: +@binary[(value->bits eof)] and @binary[(value->bits (void))], +respectively. -@filebox-include[fancy-c evildoer "types.h"] +@codeblock-include["evildoer/types.rkt"] -The main run-time file is extended slightly to take care of -printing the new kinds of values (eof and void). Note that a -void result causes nothing to be printed: +@section[#:tag "calling-c"]{Detour: Calling external functions} -@filebox-include[fancy-c evildoer "runtime.h"] -@filebox-include[fancy-c evildoer "main.c"] +Some aspects of the Evildoer compiler will be straightforward, e.g., +adding @racket[eof], @racket[(void)], @racket[eof-object?], etc. +There's conceptually nothing new going on there. But what about +@racket[read-byte], @racket[write-byte] and @racket[peek-byte]? These +will require a new set of tricks to implement. -But the real novelty of the Evildoer run-time is that there -will be new functions that implement @racket[read-byte], -@racket[peek-byte], and @racket[write-byte]; these will be C -functions called @racket[read_byte], @racket[peek_byte] and -@racket[write_byte]: +We have a couple of options for how to approach these primitives: -@filebox-include[fancy-c evildoer "io.c"] +@itemlist[#:style 'ordered -The main novely of the @emph{compiler} will be that emits code -to make calls to these C functions. +@item{generate assembly code for issuing operating system calls to +do I/O operations, or} -@section[#:tag "calling-c"]{Calling C functions from a86} +@item{add C code for I/O primitives in the run-time and generate +assembly code for calling them.} -If you haven't already, be sure to read up on how calls work -in @secref{a86}. +] -Once you brushed up on how calls work, you'll know you can -define labels that behave like functions and call them. +The first option will require looking up details for system calls on +the particular operating system in use, generating code to make those +calls, and adding logic to check for errors. For the second option, +we can simply write C code that calls standard functions like +@tt{getc}, @tt{putc}, etc. and let the C compiler do the heavy lifting +of generating robust assembly code for calling into the operating +system. The compiler would then only need to generate code to call +those functions defined in the run-time system. This is the simpler +approach and the one we adopt. -Let's start by assuming we have a simple stand-in for the -run-time system, which is this C program that invokes an -assembly program with a label called @tt{entry} and prints -the result: +Up to this point, we've seen how C code can call code written in +assembly as though it were a C function. To go the other direction, +we need to explore how to make calls to functions written in C from +assembly. Let's look at that now. -@filebox-include[fancy-c evildoer "simple.c"] +@margin-note{If you haven't already, be sure to read up on how calls +work in @secref{a86}.} -Now, here is a little program that has a function called @tt{meaning} -that returns @tt{42}. The main entry point calls @tt{meaning}, -adds 1 to the result, and returns: -@ex[ -(define p - (prog (Global 'entry) - (Label 'entry) - (Call 'meaning) - (Add 'rax 1) - (Ret) - (Label 'meaning) - (Mov 'rax 42) - (Ret))) -] -Let's save it to a file called @tt{p.s}: +Once you brushed up on how calls work, you'll know you can +define labels that behave like functions and call them. -@ex[ - (with-output-to-file "p.s" - (λ () - (asm-display p)) - #:exists 'truncate)] -We can assemble it, link it together with the printer, and run it: +Instead of @racket[read-byte] and friends, let's first start with +something simpler. Imagine we want a function to compute the greatest +common divisor of two numbers. We could of course write such a +function in assembly, but it's convenient to be able to write it +in a higher-level language like C: + +@filebox-include[fancy-c evildoer "gcd.c"] + +We can compile this into an object file: @(define format (if (eq? (system-type 'os) 'macosx) "macho64" "elf64")) -@shellbox["gcc -c simple.c -o simple.o" - (string-append "nasm -f " format " p.s -o p.o") - "gcc simple.o p.o -o simple" - "./simple"] - -In this case, the @tt{meaning} label is defined @emph{within} the same -assembly program as @tt{entry}, although that doesn't have to be the case. -We can separate out the definition of @tt{meaning} into its own file, -so long as we declare in this one that @tt{meaning} is an external label: +@shellbox["gcc -c gcd.c -o gcd.o"] -@ex[ -(define p - (prog (Extern 'meaning) - (Global 'entry) - (Label 'entry) - (Call 'meaning) - (Add 'rax 1) - (Ret))) -(define life - (prog (Global 'meaning) - (Label 'meaning) - (Mov 'rax 42) - (Ret))) -] - -By declaring an external label, we're saying this program -makes use of that label, but doesn't define it. The -definition will come from a later phase where the program is -linked against another that provides the definition. - -There is an important invariant that has to be maintained -once these programs are moved into separate object files -though. According to the System V ABI, the stack address -must be aligned to 16-bytes before the call instruction. Not -maintaining this alignment can result in a segmentation -fault. Since the @racket[p] program is the one doing the -calling, it is the one that has to worry about the issue. - -Now keep in mind that the @racket[p] program is itself -called by the C program that prints the result. So when the -call @emph{to} @racket[p] was made, the stack was aligned. -In executing the @racket[Push] instruction, a word, which is -8-byte, was pushed. This means at the point that control -transfers to @racket['entry], the stack is not aligned to a -16-byte boundary. To fix the problem, we can push another -element to the stack, making sure to pop it off before -returning. We opt to decrement (remember the stack grows -toward low memory) and increment to make clear we're not -saving anything; this is just about alignment. The revised -@racket[p] program is: - -@ex[ -(define p - (prog (Extern 'meaning) - (Global 'entry) - (Label 'entry) - (Sub 'rsp 8) - (Call 'meaning) - (Add 'rax 1) - (Add 'rsp 8) - (Ret)))] +Now, how can we call @tt{gcd} from aseembly code? Just as there is a +convention that a return value is communicated through @racket[rax], +there are conventions governing the communication of arguments. The +conventions are known as an @bold{Application Binary Interface} or +ABI. The set of conventions we're following is called the +@bold{System V} ABI, and it used by Unix variants like Mac OS, Linux, +and BSD systems. (Windows follows a different ABI.) +The convention for arguments is that the first six integer +or pointer parameters are passed in the registers +@racket['rdi], @racket['rsi], @racket['rdx], @racket['rcx], +@racket['r8], @racket['r9]. Additional arguments and large +arguments such as @tt{struct}s are passed on the stack. -Now save each program in its nasm format: +So we will pass the two arguments of @tt{gcd} in registers +@racket[rdi] and @racket[rsi], respectively, then we use the +@racket[Call] instruction to call @tt{gcd}. Suppose we want to +compute @tt{gcd(36,60)}: @ex[ -(with-output-to-file "p.s" - (λ () - (asm-display p)) - #:exists 'truncate) -(with-output-to-file "life.s" - (λ () - (asm-display life)) - #:exists 'truncate)] - -And assemble: - -@shellbox[(string-append "nasm -f " format " p.s -o p.o") - (string-append "nasm -f " format " life.s -o life.o")] - - -Then we can link all the pieces together and run it: +(define p + (prog (Global 'entry) + (Label 'entry) + (Mov 'rdi 36) + (Mov 'rsi 60) + (Sub 'rsp 8) + (Extern 'gcd) + (Call 'gcd) + (Sal 'rax int-shift) + (Add 'rsp 8) + (Ret)))] + +A few things to notice in the above code: + +@itemlist[#:style 'ordered + +@item{The label @racket['gcd] is declared to be external with +@racket[Extern], this means the label is used but not defined in this +program. We placed this declaration immediately before the @racket[Call] +instruction, but it can appear anywhere in the program.} + +@item{The stack pointer register is decremented by @racket[8] before +the @racket[Call] instruction and then incremented by @racket[8] after +the call returns. This is to ensure the stack is aligned to 16-bytes +for call; a requirement of the System V ABI.} + +@item{For consistency with our run-time system, we return the result +encoded as an integer value, which is accomplished by shifting the +result in @racket[rax] to the left by @racket[#,int-shift].} -@shellbox["gcc simple.o p.o life.o -o simple" - "./simple"] - -Now if we look at @tt{life.s}, this is an assembly program -that defines the @tt{meaning} label. We defined it by -writing assembly code, but we could've just as easily -defined it in any other language that can compile to an -object file. So let's write it in C: +] +We could attempt to run this program with @racket[asm-interp], but it +will complain about @tt{gcd} being an undefined label: +@ex[ +(eval:error (bits->value (asm-interp p)))] -@filebox-include[fancy-c evildoer "life.c"] +The problem is that @racket[asm-interp] doesn't know anything about +the @tt{gcd.o} file, which defines the @tt{gcd} symbol, however, +there is a mechanism for linking in object files to the assembly +interprer: -We can compile it to an object file: +@ex[ +(current-objs '("gcd.o")) +(bits->value (asm-interp p))] -@shellbox["gcc -c life.c -o life.o"] +We also could create an executable using the run-time system. +To do this, first, let's save the assembly code to a file: -This object file will have a single globally visible label -called @tt{meaning}, just like our previous implementation. -@;{ -To confirm this, the standard @tt{nm} utility can be used to -list the defined symbols of an object file: +@ex[ + (with-output-to-file "p.s" + (λ () + (asm-display p)) + #:exists 'truncate)] -@shellbox["nm -j life.o"] -} -We can again link together the pieces and confirm that it -still produces the same results: +Now we can assemble it into an object file, link the objects together +to make an executable, and then run it: -@shellbox["gcc simple.o p.o life.o -o simple" - "./simple"] +@shellbox[(string-append "nasm -f " format " p.s -o p.o") + "gcc runtime.o gcd.o p.o -o p.run" + "./p.run"] +So now we've seen the essence of how to call functions from assembly +code, which opens up an implementation strategy for implementing +features: write C code as part of the run-time system and call it from +the compiled code. -At this point, we've written a little assembly program (@tt{ - p.s}) that calls a function named @tt{meaning}, that was -written in C. +@section{A Run-Time for Evildoer} -One thing that you can infer from this example is that the C -compiler generates code for @tt{meaning} that is like the -assembly code we wrote, namely it ``returns'' a value to the -caller by placing a value in @racket['rax]. +With new values comes the need to add new bit encodings. So +we add new encodings for @racket[eof] and @racket[void]: -The next natural question to ask is, how does an assembly -program provide arguments to the call of a C function? +@filebox-include[fancy-c evildoer "types.h"] -Just as there is a convention that a return value is -communicated through @racket['rax], there are conventions -governing the communication of arguments. The conventions -are known as an @bold{Application Binary Interface} or ABI. -The set of conventions we're following is called the @bold{ - System V} ABI, and it used by Unix variants like Mac OS, -Linux, and BSD systems. (Windows follows a different ABI.) +The interface for the run-time system is extended to include +file pointers for the input and output ports: -The convention for arguments is that the first six integer -or pointer parameters are passed in the registers -@racket['rdi], @racket['rsi], @racket['rdx], @racket['rcx], -@racket['r8], @racket['r9]. Additional arguments and large -arguments such as @tt{struct}s are passed on the stack. - -So now let's try calling a C function that takes a -parameter. Here we have a simple C function that doubles -it's input: +@filebox-include[fancy-c evildoer "runtime.h"] -@filebox-include[fancy-c evildoer "double.c"] +The main entry point for the run-time sets up the input and output +pointers to point to @tt{stdin} and @tt{stdout} and is updated +to handle the proper printing of a void result: -We can compile it to an object file: +@filebox-include[fancy-c evildoer "main.c"] -@shellbox["gcc -c double.c -o double.o"] +But the real novelty of the Evildoer run-time is that there +will be new functions that implement @racket[read-byte], +@racket[peek-byte], and @racket[write-byte]; these will be C +functions called @racket[read_byte], @racket[peek_byte] and +@racket[write_byte]: -Now, to call it, the assembly program should put the value -of its argument in @racket['rdi] before the call: +@filebox-include[fancy-c evildoer "io.c"] -@ex[ - (define q - (prog (Extern 'dbl) - (Global 'entry) - (Label 'entry) - (Mov 'rdi 21) - (Call 'dbl) - (Add 'rax 1) - (Ret))) -(with-output-to-file "q.s" - (λ () - (asm-display q)) - #:exists 'truncate)] +This functionality is implemented in terms of standard C library +functions @tt{getc}, @tt{ungetc}, @tt{putc} and the run-time system's +functions for encoding and decoding values such as +@tt{val_unwrap_int}, @tt{val_wrap_void}, etc. -We can assemble it into an object file: +As we'll see in the next section, the main novely of the +@emph{compiler} will be that emits code to make calls to these C +functions. -@shellbox[(string-append "nasm -f " format " q.s -o q.o")] -And linking everything together and running shows it works -as expected: -@shellbox["gcc simple.o q.o double.o -o simple" - "./simple"] +@;{ Now we have all the tools needed to interact with libraries @@ -627,29 +580,6 @@ result of calling @tt{dbl}. Now we need to save it before writing the argument. All we need to do is add a push and pop around the call: -@ex[ - (define q - (prog (Extern 'dbl) - (Global 'entry) - (Label 'entry) - (Sub 'rsp 8) - (Mov 'rdi 1) - (Push 'rdi) - (Mov 'rdi 21) - (Call 'dbl) - (Pop 'rdi) - (Add 'rax 'rdi) - (Add 'rsp 8) - (Ret))) -(with-output-to-file "q.s" - (λ () - (asm-display q)) - #:exists 'truncate)] - -@shellbox[(string-append "nasm -f " format " q.s -o q.o") - "gcc simple.o q.o double.o -o simple" - "./simple"] - The wrinkle is actually a bit deeper than this too. Suppose we are using other registers, maybe some that are not used for parameters, but nonetheless are registers that the @@ -687,6 +617,10 @@ registers. OK, now let's use these new powers to write the compiler. +} + + + @section{A Compiler for Evildoer} @@ -715,67 +649,110 @@ The primitive operation compiler: @codeblock-include["evildoer/compile-ops.rkt"] + +Notice how expressions like @racket[(read-byte)] and @racket[(write-byte)] +compile to calls into the run-time system: + +@ex[ +(compile-op0 'read-byte) +(compile-op1 'write-byte)] + + + +@section{Testing and correctness} + + + We can continue to interactively try out examples with @racket[asm-interp], although there are two issues we need to deal with. -The first is that the @racket[asm-interp] utility doesn't -know anything about the Evildoer run-time. Hence we need to -tell @racket[asm-interp] to link it in when running -an example; otherwise labels like @tt{byte_write} will be -undefined. +The first is that the @racket[asm-interp] utility doesn't know +anything about the Evildoer run-time. Hence we need to tell +@racket[asm-interp] to link it in when running an example; otherwise +labels like @tt{byte_write} will be undefined. We saw how to do this +in @secref["calling-c"] using the @racket[current-objs] parameter to +link in object files to @racket[asm-interp]. This time, the object +file we want to link in is the Evildoer run-time. + +The other is that we need to have an @racket[asm-interp/io] analog of +@racket[interp/io], i.e. we need to be able to redirect input and +output so that we can run programs in a functional way. The +@secref["a86"] library provides this functionality by providing +@racket[asm-interp/io]. The way this function works is @emph{if} +linked objects define an @tt{in} and @tt{out} symbol, it will set +these appropriately to read input from a given string and collect +output into a string. -The other is that we need to have an @racket[asm-interp/io] -counterpart that is analogous to @racket[interp/io], i.e. we -need to be able to redirect input and output so that we can -run programs in a functional way. +@ex[ +(current-objs '("runtime.o")) +(asm-interp/io (compile (parse '(write-byte (read-byte)))) "a")] -There is a parameter that @racket[asm-interp] uses called -@racket[current-objs] that can be used to add additional -object files to be linked against when running examples. +Notice though, that @racket[asm-interp/io] gives back a pair +consisting of the @emph{bits} and the output string. To match the +return type of @racket[interp/io] we need to convert the bits to a +value: + +@ex[ +(match (asm-interp/io (compile (parse '(write-byte (read-byte)))) "a") + [(cons b o) (cons (bits->value b) o)])] + +Using these pieces, we can write a function that matches the type signature +of @racket[interp/io]: -So for example, to make an example with the @tt{dbl} -function from before, we can do the following: +@codeblock-include["evildoer/exec-io.rkt"] @ex[ - (current-objs '("double.o")) - (asm-interp - (prog (Extern 'dbl) - (Global 'entry) - (Label 'entry) - (Mov 'rdi 21) - (Call 'dbl) - (Ret)))] - - -The other issue is bit uglier to deal with. We need to do -this redirection at the C-level. Our solution is write an -alternative version of @tt{byte.o} that has functions for -setting the input and out streams that are used in @tt{ - write_byte} etc. The implementation of -@racket[asm-interp/io] is expected to be linked against a -library that implements these functions and will use them to -set up temporary files and redirect input and output there. -It's a hack, but a useful one. +(exec/io (parse '(write-byte (read-byte))) "z")] -@;{ -You can see the alternative implementation of @tt{io.c} in -@link["code/evildoer/byte-shared.c"]{@tt{byte-shared.c}} if -interested. Once compiled, it can be used with -@racket[current-objs] in order to interactively run examples -involving IO: -} +Note that we still provide an @racket[exec] function that works for +programs that don't do I/O: + +@ex[ +(exec (parse '(eof-object? #f)))] + +But it will fail if executing a program that uses I/O: + +@ex[ +(eval:error (exec (parse '(write-byte 97))))] + +We can now state the correctness property we want of the compiler: + +@bold{Compiler Correctness}: @emph{For all @racket[e] @math{∈} +@tt{Expr}, @racket[i], @racket[o] @math{∈} @tt{String}, and @racket[v] +@math{∈} @tt{Value}, if @racket[(interp/io e i)] equals @racket[(cons +v o)], then @racket[(exec/io e i)] equals +@racket[(cons v o)].} + +Testing compiler correctness is updated as follows, notice it now +takes an additional parameter representing the state of the input +stream: + +@codeblock-include["evildoer/correct.rkt"] + +@ex[ +(check-compiler (parse '(void)) "") +(check-compiler (parse '(read-byte)) "a") +(check-compiler (parse '(write-byte 97)) "")] + +The @racket[random-expr] function generates random expressions and +@racket[random-good-expr] generates random expressions that are +guaranteed to be well-defined, as usual. Additionally, the +@racket[random-input] function produces a random string that can be +used as the input. + +@ex[ +(require "random.rkt") +(random-expr) +(random-good-expr) +(random-input)] + +Together, these can be used to randomly test the correctness of the +compiler: @ex[ - (current-objs '("runtime.o")) - (asm-interp/io - (prog (Extern 'read_byte) - (Extern 'write_byte) - (Global 'entry) - (Label 'entry) - (Call 'read_byte) - (Mov 'rdi 'rax) - (Call 'write_byte) - (Mov 'rax 42) - (Ret)) - "a")] +(for ((i 100)) + (check-compiler (random-expr) (random-input))) +(for ((i 100)) + (check-compiler (random-good-expr) (random-input)))] + \ No newline at end of file diff --git a/www/notes/extort.scrbl b/www/notes/extort.scrbl index 9d83100f..0e4d0c03 100644 --- a/www/notes/extort.scrbl +++ b/www/notes/extort.scrbl @@ -7,6 +7,7 @@ "../fancyverb.rkt" "utils.rkt" "ev.rkt" + extort/types extort/semantics "../utils.rkt") @@ -16,10 +17,14 @@ @(ev '(require rackunit a86)) @(for-each (λ (f) (ev `(require (file ,(path->string (build-path langs "extort" f)))))) - '("interp.rkt" "ast.rkt" "parse.rkt" "compile.rkt" "types.rkt")) + '("main.rkt" "correct.rkt" "compile-ops.rkt")) @(ev `(current-directory ,(path->string (build-path langs "extort")))) @(void (ev '(with-output-to-string (thunk (system "make runtime.o"))))) +@;{Hack to get un-provided functions from compile-ops} +@(ev '(require (only-in rackunit require/expose))) +@(ev '(require/expose extort/compile-ops [assert-integer assert-char assert-byte assert-codepoint])) + @(define this-lang "Extort") @@ -67,7 +72,11 @@ does and signal an error. The meaning of @this-lang programs that have type errors will now be -defined as @racket['err]: +defined as @racket['err]. (You shouldn't make too much out of how we +choose to represent the ``this program caused an error'' answer; all +that really matters at this point is that it is disjoint from values. +Since the langauge doesn't yet include symbols, using a symbol is a +fine choice.): @itemlist[ @@ -79,6 +88,7 @@ defined as @racket['err]: ] +@;{ @(define ((rewrite s) lws) (define lhs (list-ref lws 2)) (define rhs (list-ref lws 3)) @@ -110,35 +120,247 @@ And there are four rules for propagating errors from subexpressions: Now what does the semantics say about @racket[(add1 #f)]? What about @racket[(if 7 #t -2)]? +} -The signature of the interpreter is extended to produce answers. Each -use of a Racket primitive is guarded by checking the type of the -arguments and an error is produced if the check fails. Errors are -also propagated when a subexpression produces an error: +In order to define the semantics, we first introduce the type of +results that may be given by the interpretation function: + +@#reader scribble/comment-reader +(ex +;; type Answer = Value | 'err +) + +Type mismatches can arise as the result of primitive operations being +applied to arguments for which the primitive is undefined, so we +revise @racket[interp-prim1] to check all necessary preconditions +before carrying out an operation, and producing an error in case +those conditions are not met: -@codeblock-include["extort/interp.rkt"] @codeblock-include["extort/interp-prim.rkt"] +Within the interpreter, we update the type signature to reflect the +fact that interpreting an expression produces an answer, no longer +just an expression. We must also take care to observe that evaluating +a subexpression may produce an error and as such it should prevent +further evaluation. To do this, the interpreter is written to check +for an error result of any subexpression it evaluates before +proceeding to evaluate another subexpression: + +@codeblock-include["extort/interp.rkt"] + We can confirm the interpreter computes the right result for the examples given earlier: @ex[ -(interp (Prim1 'add1 (Lit #f))) -(interp (Prim1 'zero? (Lit #t))) -(interp (If (Prim1 'zero? (Lit #f)) (Lit 1) (Lit 2))) +(interp (parse '(add1 #f))) +(interp (parse '(zero? #t))) +(interp (parse '(if (zero? #f) 1 2))) ] -The statement of correctness stays the same, but now observe that -there is no way to crash the interpreter with any @tt{Expr} value. +This interpreter implicitly relies on the state of the input and +output port, but we can define a pure interpreter like before, +which we take as the specification of our language: + +@codeblock-include["extort/interp-io.rkt"] + +An important property of this semantics is that it provides a meaning +for all possible expressions; in other words, @racket[interp/io] is a +total function, and our language has no undefined behaviors. + +@bold{Total Semantics}: @emph{For all @racket[e] @math{∈} @tt{Expr}, +there exists @racket[a] @math{∈} @tt{Answer}, @racket[o] @math{∈} +@tt{String}, such that @racket[(interp/io e i)] equals @racket[(cons +a o)].} + +The statement of correctness, which is revised to use the answer type +in place of values, reads: + +@bold{Compiler Correctness}: @emph{For all @racket[e] @math{∈} +@tt{Expr}, @racket[i], @racket[o] @math{∈} @tt{String}, and @racket[a] +@math{∈} @tt{Answer}, if @racket[(interp/io e i)] equals @racket[(cons +a o)], then @racket[(exec/io e i)] equals +@racket[(cons a o)].} + +By virtue of the semantics being total, we have a complete +specification for the compiler. There are no longer programs for +which it is free to do arbitrary things; it is always obligated to +produce the same answer as the interpreter. -@section{A Compiler for @this-lang} +@section{Checking and signalling errors at run-time} Suppose we want to compile @racket[(add1 #f)], what needs to happen? Just as in the interpreter, we need to check the integerness of the argument's value before doing the addition operation. +This checking needs to be emitted as part of the compilation of the +@racket[add1] primitive. It no longer suffices to simply do an +@racket[Add] instruction on whatever is in the @racket[rax] register; +the compiled code should first check that the value in @racket[rax] +encodes an integer, and if it doesn't, it should somehow stop the +computation and signal that an error has occurred. + +The checking part is fairly easy. Our encoding of values, first +discussed in @secref["dupe"], devotes some number of bits within a +value to indicate the type. Checking whether something is an integer +involves inspecting just those parts of the value. + +Suppose we have an arbitrary value in @racket[rax]. If it's an +integer, the least significant bit has to be @racket[0]. We could +mask out just that bit by @racket[And]ing the value with the bit +@racket[#,mask-int] and compare the result to type tag for integers +(i.e. @racket[#,type-int]). Let's write a little function to play +around with this idea: + +@#reader scribble/comment-reader +(ex +;; Produces 0 if v is an integer value +(define (is-int? v) ;; Value -> 0 | 1 + (asm-interp + (prog (Global 'entry) + (Label 'entry) + (Mov 'rax (value->bits v)) + (And 'rax mask-int) + (Ret)))) + +(is-int? 0) +(is-int? 1) +(is-int? 2) +(is-int? #t) +(is-int? #f)) + +Unfortunately, the use of @racket[And] here destroys the value in +@racket[rax] in order to determine if it is an integer. That's fine +for this predicate, but if we wanted to compute something with the +integer, we'd be toast as soon as checked it's type. + +Suppose we wanted to write a function that did @racket[add1] in case +the argument is an integer value, otherwise it produces false. For +that, we would need to use a temporary register for the type tag +check: + +@#reader scribble/comment-reader +(ex +;; Produces (add1 v) if v is an integer value, #f otherwise +(define (plus1 v) ;; Value -> Integer | Boolean + (bits->value + (asm-interp + (prog (Global 'entry) + (Label 'entry) + (Mov 'rax (value->bits v)) + (Mov 'r9 'rax) + (And 'r9 mask-int) + (Cmp 'r9 type-int) + (Jne 'err) + (Add 'rax (value->bits 1)) + (Ret) + (Label 'err) + (Mov 'rax (value->bits #f)) + (Ret))))) + +(plus1 0) +(plus1 1) +(plus1 2) +(plus1 #t) +(plus1 #f)) + +This is pretty close to how we can implement primitives like +@racket[add1]. The only missing piece is that instead of returning a +specific @emph{value}, like @racket[#f], we want to stop computation +and signal that an error has occurred. To accomplish this we add a +function called @tt{raise_error} to our run-time system that, when +called, prints @tt{err} and exits with a non-zero number to signal an +error. The @racket[asm-interp] intercepts these calls are returns the +@racket['err] symbol to match what the interpreter does: + +@ex[ +(current-objs '("runtime.o")) +(asm-interp + (prog (Global 'entry) + (Label 'entry) + (Extern 'raise_error) + (Call 'raise_error)))] + +Now we can make a function that either does the addition or signals an +error: + +@#reader scribble/comment-reader +(ex +;; Produces (add1 v) if v is an integer, 'err otherwise +(define (plus1 v) ;; Value -> Integer | 'err + (match + (asm-interp + (prog (Global 'entry) + (Label 'entry) + (Mov 'rax (value->bits v)) + (Mov 'r9 'rax) + (And 'r9 mask-int) + (Cmp 'r9 type-int) + (Jne 'err) + (Add 'rax (value->bits 1)) + (Ret) + (Label 'err) + (Extern 'raise_error) + (Call 'raise_error))) + ['err 'err] + [b (bits->value b)])) + +(plus1 0) +(plus1 1) +(plus1 2) +(plus1 #t) +(plus1 #f)) + +This can form the basis of how primitives with error checking can be +implemented. Looking at @racket[interp-prim1], we can see that in +addition to checking for whether an argument is an integer, we will +also need checks for characters, bytes, and unicode codepoints; the +latter two are refinements of the integer check: they require there +arguments to be integers in a certain range. + +To support these checks, we develop a small library of type assertion +functions that, given a register name, produce code to check the type +of value held in the register, jumping to @racket[err] whenever the +value is not of the asserted type: + +@itemlist[ +@item{@racket[assert-integer] @tt{: Register -> Asm} produces code to check that the value in the given register is an integer,} +@item{@racket[assert-char] @tt{: Register -> Asm} produces code to check that the value in the given register is a character,} +@item{@racket[assert-byte] @tt{: Register -> Asm} produces code to check that the value in the given register is an integer and in the range [0,256).} +@item{@racket[assert-codepoint] @tt{: Register -> Asm} produces code to check that the value in the given register is an integer and either in the range [0,55295] or [57344, 1114111].} +] + +@codeblock-include["extort/assert.rkt"] + +The compiler for primitive operations is updated to include +appropriate type assertions: + +@codeblock-include["extort/compile-ops.rkt"] + +@ex[ +(compile-op1 'add1) +(compile-op1 'char->integer) +(compile-op1 'write-byte)] + + +The top-level compiler largely stays the same, but it now declares an +external label @racket['raise_error] that will be defined by the +run-time system and defines a label called @racket['err] that calls +@tt{raise_error}: + +@codeblock-include["extort/compile.rkt"] + +@ex[ +(compile-e (parse '(add1 #f))) +(compile-e (parse '(char->integer #\a))) +(compile-e (parse '(write-byte 97)))] + + + +@section{Run-time for @this-lang} + + We extend the run-time system with a C function called @tt{raise_error} that prints "err" and exits with a non-zero status to indicate something has gone wrong. @margin-note{The runtime system is @@ -149,60 +371,37 @@ but it can be ignored.} @filebox-include[fancy-c extort "main.c"] -Most of the work of error checking happens in the code emitted for -primitive operations. Whenever an error is detected, control jumps to -a label called @racket['err] that immediately calls @tt{raise_error}: -@codeblock-include["extort/compile-ops.rkt"] +Linking in the run-time allows us to define the @racket[exec] and +@racket[exec/io] functions: -All that's left for the top-level compile function to declare an -external label @racket['raise_error] that will be defined by the -run-time system and to emit a label called @racket['err] that calls -@tt{raise_error}, otherwise this part of the compiler doesn't change: +@codeblock-include["extort/exec.rkt"] +@codeblock-include["extort/exec-io.rkt"] -@codeblock-include["extort/compile.rkt"] +We can run examples: -Here's the code we generate for @racket['(add1 #f)]: @ex[ -(define (show e) - (displayln (asm-string (compile-e (parse e))))) +(exec (parse '(add1 8))) +(exec (parse '(add1 #f)))] + -(show '(add1 #f)) -] -@(void (ev '(current-objs '("runtime.o")))) +@section{Correctness, revisited} + +This allows to re-formulate the check for compiler correctness check +in its earlier, simpler form that just blindly runs the interpreter +and compiler and checks that the results are the same, thanks to the +totality of the semantics: + +@codeblock-include["extort/correct.rkt"] -Here are some examples running the compiler: @ex[ -(define (tell e) - (match (asm-interp (compile (parse e))) - ['err 'err] - [b (bits->value b)])) -(tell #t) -(tell #f) -(tell '(zero? 0)) -(tell '(zero? -7)) -(tell '(if #t 1 2)) -(tell '(if #f 1 2)) -(tell '(if (zero? 0) (if (zero? 0) 8 9) 2)) -(tell '(if (zero? (if (zero? 2) 1 0)) 4 5)) -(tell '(add1 #t)) -(tell '(sub1 (add1 #f))) -(tell '(if (zero? #t) 1 2)) -] +(check-compiler (parse '(add1 8)) "") +(check-compiler (parse '(add1 #f)) "")] -Since the interpreter and compiler have well defined specifications -for what should happen when type errors occur, we can test in the -usual way again: +And again, we can randomly test the compiler by generating programs and inputs: @ex[ -(define (check-correctness e) - (check-equal? (match (asm-interp (compile e)) - ['err 'err] - [b (bits->value b)]) - (interp e) - e)) - -(check-correctness (Prim1 'add1 (Lit 7))) -(check-correctness (Prim1 'add1 (Lit #f))) -] +(require "random.rkt") +(for ((i 100)) + (check-compiler (random-expr) (random-input)))] diff --git a/www/notes/fraud.scrbl b/www/notes/fraud.scrbl index 12cbc880..2ee0f6f0 100644 --- a/www/notes/fraud.scrbl +++ b/www/notes/fraud.scrbl @@ -16,7 +16,7 @@ @(ev `(current-directory ,(path->string (build-path langs "fraud")))) @(void (ev '(with-output-to-string (thunk (system "make runtime.o"))))) @(for-each (λ (f) (ev `(require (file ,f)))) - '("interp.rkt" "compile.rkt" "ast.rkt" "parse.rkt" "types.rkt" "translate.rkt")) + '("main.rkt" "translate.rkt")) @(define this-lang "Fraud") @@ -133,6 +133,83 @@ We can model it as a datatype as usual: } + +@section{Syntax matters} + +With the introduction of variables comes the issue of expressions that +have @bold{free} and @bold{bound variables}. A bound variable is a +variable that occurs within the scope of some binder for that +variable. A free variable is one that occurs within an expression +that does not bind that variable. So for example, in @racket[(let ((x +5)) (add1 x))], the occurrence of @racket[x] is bound because it +occurs within the body of a @racket[let] expression that binds +@racket[x]. Although, if the expression were just @racket[(add1 x)], +then the occurrence of @racket[x] is free (or unbound). Note that an +expression might include both free and bound occurrences of a +variable, e.g. in @racket[(+ (let ((x 5)) x) x)] there are two +occurrences of @racket[x]: one is bound and one is free. + +An important set of expressions are those that contain no free +variables. + +@#reader scribble/comment-reader +(racketblock +;; type ClosedExpr = { e ∈ Expr | e contains no free variables } +) + +This set is important because only closed expressions should be +interpreted (or compiled). We will consider free variables to be a +syntax error. To this end, we provide two versions of the parser. +The @racket[parse-closed] parser raises an error when it encounters an +unbound variable and therefore guarantees to always produce an element +of @tt{ClosedExpr}. The @racket[parse] parser on the other hand +produces elements of @tt{Expr} and parse unbound variables as +variables: + +@ex[ +(parse 'x) +(parse '(add1 x)) +(parse '(let ((x 5)) (add1 x))) +(eval:error (parse-closed 'x)) +(eval:error (parse-closed '(add1 x))) +(parse-closed '(let ((x 5)) (add1 x)))] + +Another issue that comes up now is the potential to bind variables +that conflict with other keywords used in the language such as +@racket[add1], @racket[sub1], @racket[if], etc. Racket, following +Scheme, adopts a flexible approach that allows variables bindings to +shadow @emph{any} keyword in the language. We can do the same. + +This means that parsing an expression depends on the binding structure +parsed so far. For example @racket[add1] might be a variable +occurrence if it appears in a context that binds the variable +@racket[add1]. The parser has been updated to take this in to account. +Consider the following examples: + +@ex[ +(parse '(let ((add1 1)) (sub1 add1))) +(eval:error (parse '(let ((add1 1)) (add1 add1)))) +(parse '(let ((let 1)) let)) +(parse 'let)] + + +The heart of this revised parsing strategy is the function +@racket[parse/acc] with takes an s-expression to be parsed into an +expression as well as a list of bound and free variables. It computes +both the parsed expression and a list of variables that occur free in +it. + +When encountering a keyword like @racket[if], @racket[let], etc., we +check that it is not in the set of bound variables before parsing the +input as that particular form of expression. When we encounter a +variable occurrence, if it is not in the set of bound variables, we +add it to the set of free variables in the result. When binding a +variable, we add it to the set of bound variables before parsing the +relevant part of the input where the variable is bound: + +@codeblock-include["fraud/parse.rkt"] + + @section{Meaning of @this-lang programs} The meaning of @this-lang programs depends on the form of the expression and @@ -226,77 +303,22 @@ expressions, we will need to keep track of some number of pairs of variables and their meaning. We will refer to this contextual information as an @bold{environment}. -@margin-note{To keep things simple, we omit the treatment of - IO in the semantics, but it's easy enough to incorporate - back in if desired following the template of @secref{ - Evildoer}.} - The meaning of a variable is resolved by looking up its meaning in the environment. The meaning of a @racket[let] will depend on the meaning of its body with an extended environment that associates its variable binding to the value of the right hand side. -The heart of the semantics is an auxiliary relation, @render-term[F -𝑭-𝒆𝒏𝒗], which relates an expression and an environement to the integer -the expression evaluates to (in the given environment): - -@(define ((rewrite s) lws) - (define lhs (list-ref lws 2)) - (define rhs (list-ref lws 3)) - (list "" lhs (string-append " " (symbol->string s) " ") rhs "")) - -@(require (only-in racket add-between)) -@(define-syntax-rule (show-judgment name cases) - (with-unquote-rewriter - (lambda (lw) - (build-lw (lw-e lw) (lw-line lw) (lw-line-span lw) (lw-column lw) (lw-column-span lw))) - (with-compound-rewriters (['+ (rewrite '+)] - ['- (rewrite '–)] - ['< (rewrite '<)] - ['= (rewrite '=)]) - (apply centered - (add-between - (map (λ (c) (parameterize ([judgment-form-cases (list c)] - [judgment-form-show-rule-names #f]) - (render-judgment-form name))) - cases) - (hspace 4)))))) - -The rules for dealing with the new forms (variables and lets) are: -@(show-judgment 𝑭-𝒆𝒏𝒗 '("var" "let")) - -These rely on two functions: one for extending an environment with a -variable binding and one for lookup up a variable binding in an -environment: - -@centered{ -@render-metafunction[sem:ext #:contract? #t] - -@(with-atomic-rewriter - 'undefined - "⊥" - (render-metafunction sem:lookup #:contract? #t))} - -The remaining rules are just an adaptation of the existing rules from -Extort to thread the environment through. For example, here are just -a couple: -@(show-judgment 𝑭-𝒆𝒏𝒗 '("prim")) -@(show-judgment 𝑭-𝒆𝒏𝒗 '("if-true" "if-false")) -And rules for propagating errors through let: -@(show-judgment 𝑭-𝒆𝒏𝒗 '("let-err")) - - - -The operational semantics for @this-lang is then defined as a binary relation -@render-term[F 𝑭], which says that @math{(e,i)} in @render-term[F 𝑭], -only when @math{e} evaluates to @math{i} in the empty environment -according to @render-term[F 𝑭-𝒆𝒏𝒗]: - -@(show-judgment 𝑭 '("mt-env")) +The heart of the semantics is a function @racket[interp-env] the +provides the meaning of an expression under a given environment. The +top-level @racket[interp] function simply calls @racket[interp-env] +with an empty enivornment. +These rely on two functions: one for extending an environment with a +variable binding and one for lookup up a variable binding in an +environment. With the semantics of @racket[let] and variables out of the @@ -307,37 +329,10 @@ of @racket[_e0] and @racket[_e1], when they mean integers, otherwise the meaning is an error. -The handling of primitives occurs in the following rule: - -@(show-judgment 𝑮-𝒆𝒏𝒗 '("prim")) - -It makes use of an auxiliary judgment for interpreting primitives: - -@centered[ - - (with-compound-rewriters (['+ (rewrite '+)] - ['- (rewrite '–)] - ['< (rewrite '<)] - ['= (rewrite '=)] - ['= (rewrite '=)] - ['!= (rewrite '≠)]) - (render-metafunction 𝑭-𝒑𝒓𝒊𝒎 #:contract? #t)) - - #;(with-unquote-rewriter - (lambda (lw) - (build-lw (lw-e lw) (lw-line lw) (lw-line-span lw) (lw-column lw) (lw-column-span lw))) - (render-metafunction 𝑮-𝒑𝒓𝒊𝒎 #:contract? #t))] - - - -The interpreter closely mirrors the semantics. The top-level -@racket[interp] function relies on a helper function -@racket[interp-env] that takes an expression and environment and -computes the result. It is defined by structural recursion on the -expression. Environments are represented as lists of associations -between variables and values. There are two helper functions for -@racket[ext] and @racket[lookup]: +It is defined by structural recursion on the expression. Environments +are represented as lists of associations between variables and values. +There are two helper functions for @racket[ext] and @racket[lookup]: @codeblock-include["fraud/interp.rkt"] @@ -352,22 +347,12 @@ examples given earlier: (interp (parse '(let ((x 7)) (let ((y 2)) x)))) (interp (parse '(let ((x 7)) (let ((x 2)) x)))) (interp (parse '(let ((x 7)) (let ((x (add1 x))) x)))) -] - -We can see that it works as expected: - -@ex[ (interp (parse '(+ 3 4))) (interp (parse '(+ 3 (+ 2 2)))) (interp (parse '(+ #f 8))) ] -@bold{Interpreter Correctness}: @emph{For all @this-lang expressions -@racket[e] and values @racket[v], if (@racket[e],@racket[v]) in -@render-term[F 𝑭], then @racket[(interp e)] equals -@racket[v].} - @section{Lexical Addressing} Just as we did with @seclink["Dupe"], the best way of understanding @@ -443,10 +428,8 @@ variables, but just lexical addresses: @#reader scribble/comment-reader (racketblock ;; type IExpr = +;; | (Lit Datum) ;; | (Eof) -;; | (Int Integer) -;; | (Bool Boolean) -;; | (Char Character) ;; | (Prim0 Op0) ;; | (Prim1 Op1 IExpr) ;; | (Prim2 Op2 IExpr IExpr) @@ -840,10 +823,7 @@ stack-alignment issues, but is otherwise the same as before: We can now take a look at the main compiler for expressions. Notice the compile-time environment which is weaved through out the @racket[compile-e] function and its subsidiaries, which is critical in -@racket[compile-variable] and extended in @racket[compile-let]. It is -passed to the @racket[compile-op0], @racket[compile-op1] and -@racket[compile-op2] functions for the purposes of stack alignment -before calls into the runtime system. +@racket[compile-variable] and extended in @racket[compile-let]. @filebox-include[codeblock fraud "compile.rkt"] @@ -872,18 +852,13 @@ Let's take a look at some examples of @racket[let]s and variables: And running the examples: @ex[ -(current-objs '("runtime.o")) -(define (tell e) - (match (asm-interp (compile (parse e))) - ['err 'err] - [b (bits->value b)])) -(tell '(let ((x 7)) x)) -(tell '(let ((x 7)) 2)) -(tell '(let ((x 7)) (add1 x))) -(tell '(let ((x (add1 7))) x)) -(tell '(let ((x 7)) (let ((y 2)) x))) -(tell '(let ((x 7)) (let ((x 2)) x))) -(tell '(let ((x 7)) (let ((x (add1 x))) x))) +(exec (parse '(let ((x 7)) x))) +(exec (parse '(let ((x 7)) 2))) +(exec (parse '(let ((x 7)) (add1 x)))) +(exec (parse '(let ((x (add1 7))) x))) +(exec (parse '(let ((x 7)) (let ((y 2)) x)))) +(exec (parse '(let ((x 7)) (let ((x 2)) x)))) +(exec (parse '(let ((x 7)) (let ((x (add1 x))) x)))) ] Here are some examples of binary operations: @@ -896,9 +871,9 @@ Here are some examples of binary operations: And running the examples: @ex[ -(tell '(+ 1 2)) -(tell '(+ (+ 3 4) (+ 1 2))) -(tell '(let ((y 3)) (let ((x 2)) (+ x y)))) +(exec (parse '(+ 1 2))) +(exec (parse '(+ (+ 3 4) (+ 1 2)))) +(exec (parse '(let ((y 3)) (let ((x 2)) (+ x y))))) ] Finally, we can see the stack alignment issues in action: @@ -909,3 +884,21 @@ Finally, we can see the stack alignment issues in action: (show '(add1 #f) '()) (show '(add1 #f) '(x)) ] + +@section{Correctness} + +For the statement of compiler correctness, we must now restrict the +domain of expressions to be just @bold{closed expressions}, i.e. those +that have no unbound variables. + +@bold{Compiler Correctness}: @emph{For all @racket[e] @math{∈} +@tt{ClosedExpr}, @racket[i], @racket[o] @math{∈} @tt{String}, and @racket[v] +@math{∈} @tt{Value}, if @racket[(interp/io e i)] equals @racket[(cons +v o)], then @racket[(exec/io e i)] equals +@racket[(cons v o)].} + +The check for correctness is the same as before, although the check should only be applied +to elements of @tt{ClosedExpr}: + +@filebox-include[codeblock fraud "correct.rkt"] + diff --git a/www/notes/hustle.scrbl b/www/notes/hustle.scrbl index 6630826b..0ad22987 100644 --- a/www/notes/hustle.scrbl +++ b/www/notes/hustle.scrbl @@ -17,7 +17,7 @@ @(ev `(current-directory ,(path->string (build-path langs "hustle")))) @(void (ev '(with-output-to-string (thunk (system "make runtime.o"))))) @(for-each (λ (f) (ev `(require (file ,f)))) - '("interp.rkt" "compile.rkt" "compile-ops.rkt" "ast.rkt" "parse.rkt" "types.rkt")) + '("main.rkt" "heap.rkt" "unload.rkt" "interp-prims-heap.rkt")) @(define this-lang "Hustle") @@ -100,66 +100,57 @@ just need another distinguished value to designate it. Using @racket[cons] and @racket['()] in a structured way we can form @emph{proper list}, among other useful data structures. -We use the following grammar for @|this-lang|: - -@centered[(render-language H)] - -We can model this as an AST data type: +We use the following AST data type for @|this-lang|: @filebox-include-fake[codeblock "hustle/ast.rkt"]{ #lang racket -;; type Expr = ... -;; | (Empty) -;; type Op1 = ... -;; | 'box | 'car | 'cdr | 'unbox | 'box? | 'cons? -;; type Op2 = ... -;; | 'cons +;; type Expr = ... | (Lit Datum) +;; type Datum = ... | '() +;; type Op1 = ... | 'box | 'car | 'cdr | 'unbox | 'box? | 'cons? +;; type Op2 = ... | 'cons } @section{Meaning of @this-lang programs, implicitly} -The meaning of @this-lang programs is just a slight update to the -prior language, namely we add a few new primitives. - -The update to the semantics is just an extension of the semantics of -primitives: - -@(judgment-form-cases #f) +The interpreter has an update to the @racket[interp-prim] +module: -@;centered[(render-judgment-form 𝑯-𝒆𝒏𝒗)] +@codeblock-include["hustle/interp-prim.rkt"] -@(define ((rewrite s) lws) - (define lhs (list-ref lws 2)) - (define rhs (list-ref lws 3)) - (list "" lhs (string-append " " (symbol->string s) " ") rhs "")) +The interpreter doesn't really shed light on how constructing +inductive data works because it simply uses the mechanism of the +defining language to construct it. Inductively defined data is easy +to model in this interpreter because we can rely on the mechanisms +provided for constructing inductively defined data at the meta-level +of Racket. -@centered[ - (with-compound-rewriters (['+ (rewrite '+)] - ['- (rewrite '–)] - ['= (rewrite '=)] - ['!= (rewrite '≠)]) - (render-metafunction 𝑯-𝒑𝒓𝒊𝒎 #:contract? #t)) -] - -The interpreter similarly has an update to the @racket[interp-prim] -module: +The real trickiness comes when we want to model such data in an +impoverished setting that doesn't have such things, which of course is +the case in assembly. -@codeblock-include["hustle/interp-prim.rkt"] +The problem is that a value such as @racket[(box _v)] has a value +inside it. Pairs are even worse: @racket[(cons _v0 _v1)] has +@emph{two} values inside it. If each value is represented with 64 +bits, it would seem a pair takes @emph{at a minimum} 128-bits to +represent (plus we need some bits to indicate this value is a pair). +What's worse, those @racket[_v0] and @racket[_v1] may themselves be +pairs or boxes. The great power of inductive data is that an +arbitrarily large piece of data can be constructed. But it would seem +impossible to represent each piece of data with a fixed set of bits. -Inductively defined data is easy to model in the semantics and -interpreter because we can rely on inductively defined data at the -meta-level in math or Racket, respectively. +The solution is to @bold{allocate} such data in memory, which can in +principle be arbitrarily large, and use a @bold{pointer} to refer to +the place in memory that contains the data. -In some sense, the semantics and interpreter don't shed light on -how constructing inductive data works because they simply use -the mechanism of the defining language to construct inductive data. -Let's try to address that. +Before tackling the compiler, let's look at an alternative version of +the interpreter that makes explicit a representation of memory and is +able to interpret programs that construct and manipulate inductive +data without itself relying on those mechanisms. @section{Meaning of @this-lang programs, explicitly} -Let's develop an alternative semantics and interpreter that -describes constructing inductive data without itself -constructing inductive data. +Let's develop an alternative interpreter that describes constructing +inductive data without itself constructing inductive data. The key here is to describe explicitly the mechanisms of memory allocation and dereference. Abstractly, memory can be @@ -168,120 +159,188 @@ values stored in those addresses. As programs run, there is a current state of the memory, which can be used to look up values (i.e. dereference memory) or to extend by making a new association between an available address and a value -(i.e. allocating memory). Memory will be assumed to be -limited to some finite association, but we'll always assume -programs are given a sufficiently large memory to run to -completion. +(i.e. allocating memory). -In the semantics, we can model memory as a finite function -from addresses to values. The datatype of addresses is left -abstract. All that matters is we can compare them for -equality. +The representation of values changes to represent inductive data +through pointers to memory: -We now change our definition of values to make it -non-recursive: +@#reader scribble/comment-reader +(racketblock +;; type Value* = +;; | Integer +;; | Boolean +;; | Character +;; | Eof +;; | Void +;; | '() +;; | (box-ptr Address) +;; | (cons-ptr Address) +(struct box-ptr (i)) +(struct cons-ptr (i)) -@centered{@render-language[Hm]} +;; type Address = Natural +) -We define an alternative semantic relation equivalent to 𝑯 called -𝑯′: +Here we have two kinds of pointer values, @emph{box pointers} and +@emph{cons pointers}. A box value is represented by an address (some +natural number) and a tag, the @racket[box-ptr] constructor, which +indicates that the address should be interpreted as the contents of a +box. A cons is represented by an address tagged with +@racket[cons-ptr], indicating that the memory contains a pair of +values. + +To model memory, we use a list of @tt{Value*} values. When memory is +allocated, new elements are placed at the front of the list. To model +memory locations, use the distance from the element to the end of the +list, this way addresses don't change as memory is allocated. + +For example, suppose we have allocated memory to hold four values +@racket['(97 98 99 100)]. The address of 100 is 0; the address of 99 +is 1; etc. When a new value is allocated, say, @racket['(96 97 98 +99)], the address of 99 is still 0, and so on. The newly allocated +value 96 is at address 4. In this way, memory grows toward higher +addresses and the next address to allocate is given by the size of the +heap currently in use. -@centered[(render-judgment-form 𝑯′)] +@#reader scribble/comment-reader +(racketblock +;; type Heap = (Listof Value*) +) -Like 𝑯, it is defined in terms of another relation. Instead -of 𝑯-𝒆𝒏𝒗, we define a similar relation 𝑯-𝒎𝒆𝒎-𝒆𝒏𝒗 that has an -added memory component both as input and out: +When a program is intepreted, it results in a @tt{Value*} paired +together with a @tt{Heap} that gives meaning to the addresses in the +value, or an error: -@centered[(render-judgment-form 𝑯-𝒎𝒆𝒎-𝒆𝒏𝒗)] +@#reader scribble/comment-reader +(racketblock +;; type Answer* = (cons Heap Value*) | 'err +) -For most of the relation, the given memory σ is simply -threaded through the judgment. When interpreting a primitive -operation, we also thread the memory through a relation -analagous to 𝑯-𝒑𝒓𝒊𝒎 called 𝑯-𝒎𝒆𝒎-𝒑𝒓𝒊𝒎. The key difference -for 𝑯-𝒎𝒆𝒎-𝒑𝒓𝒊𝒎 is that @racket[cons] and @racket[box] -operations allocate memory by extending the given memory σ -and the @racket[car], @racket[cdr], and @racket[unbox] -operations dereference memory by looking up an association -in the given memory σ: +So for example, to represent a box @racket[(box 99)] we could have +a box value, i.e. a tagged pointer that points to memory containing 99: -@centered[(render-metafunction 𝑯-𝒎𝒆𝒎-𝒑𝒓𝒊𝒎 #:contract? #t)] +@#reader scribble/comment-reader +(racketblock +(cons (list 99) (box-ptr 0))) -There are only two unexplained bits at this point: +The value at list index 0 is 99 and the box value points to that +element of the heap and indicates it is the contents of a box. -@itemlist[ - @item{the metafunction -@render-term[Hm (alloc σ (v ...))] which consumes a memory -and a list of values. It produces a memory and an address -@render-term[Hm (σ_′ α)] such that @render-term[Hm σ_′] is -like @render-term[Hm σ] except it has a new association for -some @render-term[Hm α] and @render-term[Hm α] is @bold{ - fresh}, i.e. it does not appear in the domain of -@render-term[Hm σ].} - - @item{the metafunction @render-term[Hm (unload σ a)] used - in the conclusion of @render-term[Hm 𝑯′]. This function does - a final unloading of the answer and memory to obtain a answer - in the style of 𝑯.}] - - -The definition of @render-term[Hm (alloc σ (v ...))] is -omitted, since it depends on the particular representation -chosen for @render-term[Hm α], but however you choose to -represent addresses, it will be easy to define appropriately. - -The definition of @render-term[Hm (unload σ a)] just traces -through the memory to reconstruct an inductive piece of data: - -@centered[(render-metafunction unload #:contract? #t)] - - -With the semantics of explicit memory allocation and -dereference in place, we can write an interepreter to match -it closely. - -We could define something @emph{very} similar to the -semantics by threading through some representation of a -finite function serving as the memory, just like the -semantics. Or we could do something that will produce the -same result but using a more concrete mechanism that is like -the actual memory on a computer. Let's consider the latter -approach. - -We can use a Racket @racket[list] to model the memory. - -@;{ -We will use a @racket[vector] of some size to model the -memory used in a program's evaluation. We can think of -@racket[vector] as giving us a continguous array of memory -that we can read and write to using natural number indices -as addresses. The interpreter keeps track of both the -@racket[vector] and an index for the next available memory -address. Every time the interpreter allocates, it writes in -to the appropriate cell in the @racket[vector] and bumps the -current address by 1.} +It's possible that other memory was used in computing this result, so +we might end up with an answer like: -@codeblock-include["hustle/interp-heap.rkt"] +@#reader scribble/comment-reader +(racketblock +(cons (list 97 98 99) (box-ptr 0))) +Or: +@#reader scribble/comment-reader +(racketblock +(cons (list 97 98 99 100 101) (box-ptr 2))) -The real trickiness comes when we want to model such data in an -impoverished setting that doesn't have such things, which of course is -the case in assembly. +Both of which really mean the same value: @racket[(box 99)]. + +A pair contains two values, so a @racket[cons-ptr] should point to the +start of elements that comprise the pair. For example, this answer +represents a pair @racket[(cons 100 99)]: + +@#reader scribble/comment-reader +(racketblock +(cons (list 99 100) (cons-ptr 0))) + +Note that the @racket[car] of this pair is at address 0 and the +@racket[cdr] is at address 1. + +Note that we could have other things residing in memory, but so long +as the address points to same values as before, these answers mean the +same thing: + +@#reader scribble/comment-reader +(racketblock +(cons (list 97 98 99 100 101) (cons-ptr 1))) + +In fact, we can reconstruct a @tt{Value} from a @tt{Value*} and +@tt{Heap}: + +@codeblock-include["hustle/unload.rkt"] + +Which relies on our interface for heaps: + +@codeblock-include["hustle/heap.rkt"] + +Try it out: + +@ex[ +(unload-value (box-ptr 0) (list 99)) +(unload-value (cons-ptr 0) (list 99 100)) +(unload-value (cons-ptr 1) (list 97 98 99 100 101))] + +What about nested pairs like @racket[(cons 1 (cons 2 (cons 3 '())))]? +Well, we already have all the pieces we need to represent values like +these. + +@ex[ +(unload-value (cons-ptr 0) + (list '() 3 (cons-ptr 4) 2 (cons-ptr 2) 1))] + +Notice that this list could laid out in many different ways, but when +viewed through the lens of @racket[unload-value], they represent the +same list: + +@ex[ +(unload-value (cons-ptr 4) + (list (cons-ptr 2) 1 (cons-ptr 0) 2 '() 3))] + +The idea of the interpreter that explicitly models memory will be +thread through a heap that is used to represent the memory allocated +by the program. Operations that manipulate or create boxes and pairs +will have to be updated to work with this new representation. + +So for example, the @racket[cons] operation should allocate two new +memory locations and produce a tagged pointer to the address of the +first one. The @racket[car] operation should dereference the memory +pointed to by the given @racket[cons-ptr] value. + +@ex[ +(unload (alloc-cons 100 99 '())) +(unload (alloc-box 99 '())) + +#; +(unload + (match (alloc-cons 3 '() '()) + [(cons h v) + (match (alloc-cons 2 v h) + [(cons h v) + (alloc-cons 1 v h)])]))] + + +Much of the work is handled in the new @tt{interp-prims-heap} module: + +@codeblock-include["hustle/interp-prims-heap.rkt"] + + +@ex[ +(unload (interp-prim1 'box 99 '())) +(unload (interp-prim2 'cons 100 99 '())) +(unload + (match (interp-prim1 'box 99 '()) + [(cons h v) + (interp-prim1 'unbox v h)]))] + + +Finally, we can write the overall interpreter, which threads a heap +throughout the interpretation of a program in +@racket[interp-env-heap]. The top-level @racket[interp] function, +which is intended to be equivalent to the original @racket[interp] +function that modelled memory implicitly, calls +@racket[interp-env-heap] with an initially empty heap and the unloads +the final answer from the result: + +@codeblock-include["hustle/interp-heap.rkt"] -The problem is that a value such as @racket[(box _v)] has a value -inside it. Pairs are even worse: @racket[(cons _v0 _v1)] has -@emph{two} values inside it. If each value is represented with 64 -bits, it would seem a pair takes @emph{at a minimum} 128-bits to -represent (plus we need some bits to indicate this value is a pair). -What's worse, those @racket[_v0] and @racket[_v1] may themselves be -pairs or boxes. The great power of inductive data is that an -arbitrarily large piece of data can be constructed. But it would seem -impossible to represent each piece of data with a fixed set of bits. -The solution is to @bold{allocate} such data in memory, which can in -principle be arbitrarily large, and use a @bold{pointer} to refer to -the place in memory that contains the data. @;{ Really deserves a "bit" level interpreter to bring this idea across. } @@ -551,3 +610,14 @@ values to print them. It also must account for the wrinkle of how the printing of proper and improper lists is different: @filebox-include[fancy-c hustle "print.c"] + +@section{Correctness} + +The statement of correctness for the @|this-lang| compiler is the same +as the previous one: + +@bold{Compiler Correctness}: @emph{For all @racket[e] @math{∈} +@tt{ClosedExpr}, @racket[i], @racket[o] @math{∈} @tt{String}, and @racket[v] +@math{∈} @tt{Value}, if @racket[(interp/io e i)] equals @racket[(cons +v o)], then @racket[(exec/io e i)] equals +@racket[(cons v o)].} From 9e2d321f2058789d750e6d5cda6d192174a44201 Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Mon, 2 Dec 2024 17:33:45 -0500 Subject: [PATCH 02/17] Fix up name for random expr generator. --- www/notes/evildoer.scrbl | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/www/notes/evildoer.scrbl b/www/notes/evildoer.scrbl index 3ae9feb8..1b2a4ecc 100644 --- a/www/notes/evildoer.scrbl +++ b/www/notes/evildoer.scrbl @@ -736,7 +736,7 @@ stream: (check-compiler (parse '(write-byte 97)) "")] The @racket[random-expr] function generates random expressions and -@racket[random-good-expr] generates random expressions that are +@racket[random-well-defined-expr] generates random expressions that are guaranteed to be well-defined, as usual. Additionally, the @racket[random-input] function produces a random string that can be used as the input. @@ -744,7 +744,7 @@ used as the input. @ex[ (require "random.rkt") (random-expr) -(random-good-expr) +(random-well-defined-expr) (random-input)] Together, these can be used to randomly test the correctness of the @@ -754,5 +754,5 @@ compiler: (for ((i 100)) (check-compiler (random-expr) (random-input))) (for ((i 100)) - (check-compiler (random-good-expr) (random-input)))] + (check-compiler (random-well-defined-expr) (random-input)))] \ No newline at end of file From 5bb67e152e80b59e74f5cfb7a285e7099860093c Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Mon, 16 Dec 2024 11:14:27 -0500 Subject: [PATCH 03/17] Working in literals to Hustle notes. --- www/notes/hustle.scrbl | 205 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 188 insertions(+), 17 deletions(-) diff --git a/www/notes/hustle.scrbl b/www/notes/hustle.scrbl index 0ad22987..065f2b9b 100644 --- a/www/notes/hustle.scrbl +++ b/www/notes/hustle.scrbl @@ -1,6 +1,6 @@ #lang scribble/manual -@(require (for-label (except-in racket ... compile) a86)) +@(require (for-label (except-in racket ... compile) (except-in a86 exp))) @(require redex/pict racket/runtime-path scribble/examples @@ -13,7 +13,7 @@ @(define codeblock-include (make-codeblock-include #'h)) -@(ev '(require rackunit a86)) +@(ev '(require rackunit (except-in a86 exp))) @(ev `(current-directory ,(path->string (build-path langs "hustle")))) @(void (ev '(with-output-to-string (thunk (system "make runtime.o"))))) @(for-each (λ (f) (ev `(require (file ,f)))) @@ -70,10 +70,6 @@ The new operations include constructors @racket[(box _e)] and predicates for identifying boxes and pairs: @racket[(box? _e)] and @racket[(cons? _e)]. -@margin-note{Usually boxes are @emph{mutable} data structures, like -OCaml's @tt{ref} type, but we will examine this aspect later. For now, -we treat boxes as immutable data structures.} - These features will operate like their Racket counterparts: @ex[ (unbox (box 7)) @@ -85,6 +81,24 @@ These features will operate like their Racket counterparts: (cons? (box 7)) ] +@margin-note{Usually boxes are @emph{mutable} data structures, like +OCaml's @tt{ref} type, but we will examine this aspect later. For now, +we treat boxes as immutable data structures.} + +We will also add support for writing pair and box @emph{literals} +using the same @racket[quote] notation that Racket uses. + +These features will operate like their Racket counterparts: +@ex[ +(unbox '#&7) +(car '(3 . 4)) +(cdr '(3 . 4)) +(box? '#&7) +(cons? '(3 . 4)) +(box? '(3 . 4)) +(cons? '#&7) +] + @section{Empty lists can be all and end all} While we've introduced pairs, you may wonder what about @emph{lists}? @@ -105,31 +119,188 @@ We use the following AST data type for @|this-lang|: @filebox-include-fake[codeblock "hustle/ast.rkt"]{ #lang racket ;; type Expr = ... | (Lit Datum) -;; type Datum = ... | '() +;; type Datum = ... | (cons Datum Datum) | (box Datum) | '() ;; type Op1 = ... | 'box | 'car | 'cdr | 'unbox | 'box? | 'cons? ;; type Op2 = ... | 'cons } +@section{Parsing} + +Mostly the parser updates for @|this-lang| are uninteresting. The +only slight twist is the addition of compound literal datums. + +It's worth observing a few things about how @racket[quote] works in +Racket. First, some datums are @emph{self-quoting}, i.e. we can +write them with or without quoting and they mean the same thing: +@ex[ +5 +'5] + +All of the datums consider prior to @|this-lang| have been self-quoting: +booleans, integers, and characters. + +Of the new datums, boxes are self-quoting, but pairs and the empty +list are not. +@ex[ +#&7 +'#&7 +(eval:error ()) +'() +(eval:error (1 . 2)) +'(1 . 2)] + +The reason for this is that unquoted list datums would be confused +with expression forms without the @racket[quote], so its required, +however for the other datums, there's no possible confusion and the +@racket[quote] is inferred. Note also that once inside a self-quoting +datum, it's unambiguous that we're talking about literal data and not +expressions that need to be evaluated, so you can have empty lists and +pairs: +@ex[ +#&() +#&(1 . 2)] + +This gives rise to two notions of datums that our parser uses, +with (mutually defined) predicates for each: + +@filebox-include-fake[codeblock "hustle/parse.rkt"]{ +;; Any -> Boolean +(define (self-quoting-datum? x) + (or (exact-integer? x) + (boolean? x) + (char? x) + (and (box? x) (datum? (unbox x))))) + +;; Any -> Boolean +(define (datum? x) + (or (self-quoting-datum? x) + (empty? x) + (and (cons? x) (datum? (car x)) (datum? (cdr x))))) +} + +Now when the parser encounters something that is a self-quoting datum, +it can parse it as a @racket[Lit]. But for datums that are quoted, it +will need to recognize the @racket[quote] form, so anything that has +the s-expression shape @racket[(quote d)] will also get parsed as a +@racket[Lit]. + +Things can get a little confusing here so let's look at some examples: +@ex[ +(parse 5) +(parse '5) +] + +Here, both examples are really the same. When we write @racket['5], +that @racket[read]s it as @racket[5], so this is really the same +example and corresponds to an input program that just contains the +number @racket[5] and we are calling @racket[parse] with an argument +of @racket[5]. + +If the input program contained a quoted @racket[5], then it would be +@racket['5], which we would represent as an s-expression as +@racket[''5]. Note that this reads as @racket['(quote 5)], i.e. a +two-element list with the symbol @racket['quote] as the first element +and the number @racket[5] as the second. So when writing examples +where the input program itself uses @racket[quote] we will see this +kind of double quotation, and we are calling @racket[parse] with +a two-element list as the argument: + +@ex[ +(parse ''5)] + +This is saying that the input program was @racket['5]. Notice that it +gets parsed the same as @racket[5] by our parser. + +If we were to parse the empty list, this should be considered a parse +error because it's like writing @racket[()] in Racket; it's not a valid +expression form: + +@ex[ +(eval:error (parse '()))] + +However, if the empty list is quoted, i.e. @racket[''()], then we are +talking about the expression @racket['()], so this gets parsed as +@racket[(Lit '())]: + +@ex[ +(parse ''())] + +It works similarly for pairs: + +@ex[ +(eval:error (parse '(1 . 2))) +(parse ''(1 . 2))] + +While these examples can be a bit confusing at first, implementing +this behavior is pretty simple. If the input is a +@racket[self-quoting-datum?], then we parse it as a @racket[Lit] +containing that datum. If the the input is a two-element list of the +form @racket[(list 'quote _d)] and @racket[_d] is a @racket[datum?], +the we parse it as a @racket[Lit] containing @racket[_d]. + +Note that @emph{if} the examples are confusing, the parser actually +explains what's going on in Racket. Somewhere down in the code that +implements @racket[read] is something equivalent to what we've done +here in @racket[parse] for handling self-quoting and explicitly quoted +datums. Also note that after the parsing phase, self-quoting and +quoted datums are unified as @racket[Lit]s and we no longer need to be +concerned with any distinctions that existed in the concrete syntax. + +The only other changes to the parser are that we've added some new +unary and binary primitive names that the parser now recognizes for +things like @racket[cons], @racket[car], @racket[cons?], etc. + +@codeblock-include["hustle/parse.rkt"] + + + + @section{Meaning of @this-lang programs, implicitly} -The interpreter has an update to the @racket[interp-prim] -module: +To extend our interpreter, we can follow the same pattern we've been +following so far. We have new kinds of values such as pairs, boxes, +and the empty list, so we have to think about how to represent them, +but the natural thing to do is to represent them with the +corresponding kind of value from Racket. Just as we represent Hustle +booleans with Racket booleans, Hustle integers with Racket integers, +and so on, we can also represent Hustle pairs with Racket pairs. We +can represent Hustle boxes with Racket boxes. We can represent +Hustle's empty list with Racket's empty list. + +Under this choice of representation, there's very little to do in +the interpreter. We only need to update the interpretation of +primitives to account for our new primitives such as @racket[cons], +@racket[car], etc. And how should these primitives be interpreted? +Using their Racket counterparts of course! @codeblock-include["hustle/interp-prim.rkt"] -The interpreter doesn't really shed light on how constructing -inductive data works because it simply uses the mechanism of the -defining language to construct it. Inductively defined data is easy -to model in this interpreter because we can rely on the mechanisms -provided for constructing inductively defined data at the meta-level -of Racket. +We can try it out: + +@ex[ +(interp (parse '(cons 1 2))) +(interp (parse '(car (cons 1 2)))) +(interp (parse '(cdr (cons 1 2)))) +(interp (parse '(car '(1 . 2)))) +(interp (parse '(cdr '(1 . 2)))) +(interp (parse '(let ((x (cons 1 2))) + (+ (car x) (cdr x))))) +] + + +Now while this is a perfectly good specification, this interpreter +doesn't really shed light on how constructing inductive data works +because it simply uses the mechanism of the defining language to +construct it. Inductively defined data is easy to model in this +interpreter because we can rely on the mechanisms provided for +constructing inductively defined data at the meta-level of Racket. The real trickiness comes when we want to model such data in an impoverished setting that doesn't have such things, which of course is the case in assembly. -The problem is that a value such as @racket[(box _v)] has a value -inside it. Pairs are even worse: @racket[(cons _v0 _v1)] has +The main challenge is that a value such as @racket[(box _v)] has a +value inside it. Pairs are even worse: @racket[(cons _v0 _v1)] has @emph{two} values inside it. If each value is represented with 64 bits, it would seem a pair takes @emph{at a minimum} 128-bits to represent (plus we need some bits to indicate this value is a pair). From 2ec898a3b2011b877454c49ccf8ddec2415d8900 Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Sat, 23 Aug 2025 13:34:13 -0400 Subject: [PATCH 04/17] Old changes to overhaul branch. --- www/notes/a86.scrbl | 1596 +--------------------------------------- www/notes/hustle.scrbl | 4 +- 2 files changed, 3 insertions(+), 1597 deletions(-) diff --git a/www/notes/a86.scrbl b/www/notes/a86.scrbl index d470c313..d8723a0d 100644 --- a/www/notes/a86.scrbl +++ b/www/notes/a86.scrbl @@ -436,1599 +436,5 @@ interactively exploring the a86 language (you can write assembly in a REPL), but also an important tool when it comes time to test the compilers we write. -@section[#:tag "stacks"]{Stacks: pushing, popping, calling, returning} - -The a86 execution model includes access to memory that can -be used as a stack data structure. There are operations that -manipulate the stack, such as @racket[Push], @racket[Pop], -@racket[Call], and @racket[Ret], and the stack register -pointer @racket['rsp] is dedicated to the stack. Stack -memory is allocated in ``low'' address space and grows -downward. So pushing an element on to the stack @emph{ - decrements} @racket['rsp]. - -The stack is useful as a way to save away values that may be -needed later. For example, let's say you have two -(assembly-level) functions and you want to produce the sum -of their results. By convention, functions return their -result in @racket['rax], so doing something like this -won't work: -@racketblock[ -(seq (Call 'f) - (Call 'g) - (Add 'rax ...)) -] - -The problem is the return value of @racket['f] gets -clobbered by @racket['g]. You might be tempted to fix the -problem by moving the result to another register: - -@racketblock[ -(seq (Call 'f) - (Mov 'rbx 'rax) - (Call 'g) - (Add 'rax 'rbx)) -] - -This works only so long as @racket['g] doesn't clobber -@racket['rbx]. In general, it might not be possible to avoid -that situation. So the solution is to use the stack to save -the return value of @racket['f] while the call to @racket['g] -proceeds: - -@racketblock[ -(seq (Call 'f) - (Push 'rax) - (Call 'g) - (Pop 'rbx) - (Add 'rax 'rbx)) -] - -This code pushes the value in @racket['rax] on to the stack -and then pops it off and into @racket['rbx] after -@racket['g] returns. Everything works out so long as -@racket['g] maintains a stack-discipline, i.e. the stack -should be in the same state when @racket['g] returns as when -it was called. - -We can make a complete example to confirm that this works as -expected. First let's set up a little function for letting -us try out examples: - -@#reader scribble/comment-reader -(ex -(define (eg asm) - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - asm ; the example code we want to try out - (Ret) - - (Label 'f) ; calling 'f returns 36 - (Mov 'rax 36) - (Ret) - - (Label 'g) ; calling 'g returns 6, but - (Mov 'rbx 4) ; it clobbers 'rbx just for the lulz - (Mov 'rax 6) - (Ret)))) -) - -Now let's try it, using the stack to confirm it does the -right thing: - -@#reader scribble/comment-reader -(ex -(eg (seq (Call 'f) - (Push 'rax) - (Call 'g) - (Pop 'rbx) - (Add 'rax 'rbx))) -) - -Compare that with the first version that used a register to -save the result of @racket['f]: - -@#reader scribble/comment-reader -(ex -(eg (seq (Call 'f) - (Mov 'rbx 'rax) - (Call 'g) - (Add 'rax 'rbx))) -) - - -The @racket[Push] and @racket[Pop] instructions offer a -useful illusion, but of course, there's not really any data -structure abstraction here; there's just raw memory and -registers. But so long as code abides by conventions, the -illusion turns out to be the true state of affairs. - -What's really going on under the hood of @racket[Push] and -@racket[Pop] is that the @racket['rsp] register is -decremented and the value is written to the memory location -pointed to by the value of @racket['rsp]. - -The following code is @emph{mostly} equivalent to what we wrote -above (and we will discuss the difference in the next section): - -@#reader scribble/comment-reader -(ex -(eg (seq (Call 'f) - (Sub 'rsp 8) ; "allocate" a word on the stack - (Mov (Offset 'rsp 0) 'rax) ; write 'rax to top frame - (Call 'g) - (Mov 'rbx (Offset 'rsp 0)) ; load top frame into 'rbx - (Add 'rsp 8) ; "deallocate" word on the stack - (Add 'rax 'rbx))) -) - -As you can see from this code, it would be easy to violate -the usual invariants of stack data structure to, for -example, access elements beyond the top of the stack. The -value of @racket[Push] and @racket[Pop] is they make clear -that you are using things in a stack-like way and they keep -you from screwing up the accesses, offsets, and adjustments -to @racket['rsp]. - -Just as @racket[Push] and @racket[Pop] are useful illusions, -so too are @racket[Call] and @racket[Ret]. They give the -impression that there is a notion of a procedure and -procedure call mechanism in assembly, but actually there's -no such thing. - -Think for a moment about what it means to ``call'' @racket['f] -in the examples above. When executing @racket[(Call 'f)], -control jumps to the instruction following -@racket[(Label 'f)]. When we then get to @racket[(Ret)], -somehow the CPU knows to jump @emph{back} to the instruction -following the @racket[(Call 'f)] that we started with. - -What's really going on is that @racket[(Call 'f)] is pushing -the address of subsequent instruction on to the stack and -then jumping to the label @racket['f]. This works in concert -with @racket[Ret], which pops the return address off the -stack and jumping to it. - -Just as we could write equivalent code without @racket[Push] -and @racket[Pop], we can write the same code without -@racket[Call] and @racket[Ret]. - -We do need one new trick, which is the @racket[Lea] -instruction, which loads an effective address. You can think -of it like @racket[Mov] except that it loads the address of -something rather than what is pointed to by an address. For our -purposes, it is useful for loading the address of a label: - -@racketblock[ - (Lea 'rax 'f) - ] - -This instruction puts @emph{the address} of label -@racket['f] into @racket[rax]. You can think of this as -loading a @emph{function pointer} into @racket['rax]. With -this new instruction, we can illuminate what is really going -on with @racket[Call] and @racket[Ret]: - -@#reader scribble/comment-reader -(ex -(eg (seq (Lea 'rax 'fret) ; load address of 'fret label into 'rax - (Push 'rax) ; push the return pointer on to stack - (Jmp 'f) ; jump to 'f - (Label 'fret) ; <-- return point for "call" to 'f - (Push 'rax) ; save result (like before) - (Lea 'rax 'gret) ; load address of 'gret label into 'rax - (Push 'rax) ; push the return pointer on to stack - (Jmp 'g) ; jump to 'g - (Label 'gret) ; <-- return point for "call" to 'g - (Pop 'rbx) ; pop saved result from calling 'f - (Add 'rax 'rbx))) -) - -@;{ -Or to avoid the use of register to temporarily hold the -address to jump to, we could've also written it as: - -@#reader scribble/comment-reader -(ex -(eg (seq (Sub 'rsp 8) ; allocate a frame on the stack - ; load address of 'fret label into top of stack - (Lea (Offset 'rsp 0) 'fret) - (Jmp 'f) ; jump to 'f - (Label 'fret) ; <-- return point for "call" to 'f - (Push 'rax) ; save result (like before) - (Sub 'rsp 8) ; allocate a frame on the stack - ; load address of 'gret label into top of stack - (Lea (Offset 'rsp 0) 'gret) - (Jmp 'g) ; jump to 'g - (Label 'gret) ; <-- return point for "call" to 'g - (Pop 'rbx) ; pop saved result from calling 'f - (Add 'rax 'rbx))) -) -} - -The above shows how to encode @racket[Call] as @racket[Lea], -@racket[Push], and @racket[Jmp]. The encoding of @racket[Ret] is just: - -@racketblock[ - (seq (Pop 'rbx) ; pop the return pointer - (Jmp 'rbx)) ; jump to it - ] - - - -@section[#:tag "a86-flags"]{Flags} - -As mentioned earlier, the processor makes use of @emph{flags} to -handle comparisons. For our purposes, there are four flags to -be aware of: zero (ZF), sign (SF), carry (CF), and overflow (OF). - -These flags are set by each of the arithmetic operations, which -are appropriately annotated in the @secref{a86-instructions}. -Each of these operations is binary (meaning they take two -arguments), and the flags are set according to properties of -the result of the arithmetic operation. Many of these properties -look at the most-significant bit (MSB) of the inputs and output. - -@itemlist[ - @item{@bold{ZF} is set when the result is @tt{0}.} - @item{@bold{SF} is set when the MSB of the result is set.} - @item{@bold{CF} is set when a bit was set beyond the MSB.} - @item{@bold{OF} is set when one of two conditions is met: - - @itemlist[#:style 'ordered - @item{The MSB of each input is @emph{set} and the MSB of - the result is @emph{not set}.} - @item{The MSB of each input is @emph{not set} and the MSB - of the result is @emph{set}.} - ]} -] - -Note that CF is only useful for unsigned arithmetic, while OF -is only useful for signed arithmetic. In opposite cases, they -provide no interesting information. - -These flags, along with many others, are stored in a special -FLAGS register that cannot be accessed by normal means. Each -flag is represented by a single bit in the register, and they -all have specific bits assigned by the x86 specification. For -example, CF is bit 0, ZF is bit 6, SF is bit 7, and OF is bit -11, as indexed from the least-significant bit position (but -you don't need to know these numbers). - -The various conditions that can be tested for correspond to -combinations of the flags. For example, the @racket[Jc] -instruction will jump if CF is set, otherwise execution will -fall through to the next instruction. Most of the condition -suffixes are straightforward to deduce from their spelling, -but some are not. The suffixes (e.g., the @tt{c} in @tt{Jc}) -and their meanings are given below. For brevity's sake the -flags' names are abbreviated by ommitting the F suffix and -prefixing them with either @tt{+} or @tt{-} to indicate set -and unset positions, respectively, as needed. Some of the -meanings require use of the bitwise operators @tt{|} (OR), -@tt{&} (AND), @tt{^} (XOR), and @tt{=?} (equality). - -@tabular[#:style 'boxed - #:row-properties '(bottom-border ()) - (list (list @bold{Suffix} @bold{Flag} @bold{Suffix} @bold{Flag}) - (list @tt{z} @tt{+Z} @tt{nz} @tt{-Z}) - (list @tt{e} @tt{+Z} @tt{ne} @tt{-Z}) - (list @tt{s} @tt{+S} @tt{ns} @tt{-S}) - (list @tt{c} @tt{+C} @tt{nc} @tt{-C}) - (list @tt{o} @tt{+O} @tt{no} @tt{-O}) - (list @tt{l} @tt{ (S ^ O)} @tt{g} @tt{(-Z & (S =? O))}) - (list @tt{le} @tt{(+Z | (S ^ O))} @tt{ge} @tt{ (S =? O)}))] - -The @tt{e} suffix (``equal?'') is just a synonym -for the @tt{z} suffix (``zero?''). This is because it is -common to use the @racket[Cmp] instruction to perform -comparisons, but @racket[Cmp] is actually identical to -@racket[Sub] with the exception that the result is not -stored anywhere (i.e., it is only used for setting flags -according to subtraction). If two values are subtracted -and the resulting difference is zero (ZF is set), then the -values are equal. - - -@subsection{Push and Pop} - -In the previous section (@secref{stacks}), it was explained -that the @racket[Push] and @racket[Pop] operations are -essentially equivalent to manually adjusting the stack -pointer and target register. The one difference is that these -special stack-manipulation operations do not set any flags -like @racket[Add] and @racket[Sub] do. So while you can -often choose to manually implement stack manipulation, you'll -need to use these instructions specifically if you want to -preserve the condition flags after adjusting the stack. - - - - -@section{a86 Reference} - -@defmodule[a86 #:no-declare] - -@margin-note{The a86 language may evolve some over the - course of the semester, but we will aim to document any - changes by updating this section. Also, because the run-time - system changes for each language, you may need to do some - work to have @racket[asm-interp] cooperate with your - run-time system.} - -This module provides all of the bindings from -@racketmodname[a86/ast], @racketmodname[a86/printer], -and @racketmodname[a86/interp], described below. - -@section[#:tag "a86-instructions"]{Instruction set} - -@defmodule[a86/ast] - -This section describes the instruction set of a86. - -There are 16 registers: @racket['rax], @racket['rbx], @racket['rcx], -@racket['rdx], @racket['rbp], @racket['rsp], @racket['rsi], -@racket['rdi], @racket['r8], @racket['r9], @racket['r10], -@racket['r11], @racket['r12], @racket['r13], @racket['r14], and -@racket['r15]. These registers are 64-bits wide. There is also -@racket['eax] which accesses the lower 32-bits of @racket['rax]. -This is useful in case you need to read or write 32-bits of memory. - -The registers @racket['rbx], @racket['rsp], @racket['rbp], and -@racket['r12] through @racket['r15] are ``callee-saved'' registers, -meaning they are preserved across function calls (and must be saved -and restored by any callee code). - -Each register plays the same role as in x86, so for example -@racket['rsp] holds the current location of the stack. - -@defproc[(register? [x any/c]) boolean?]{ - A predicate for registers. -} - -@defproc[(label? [x any/c]) boolean?]{ - A predicate for label @emph{names}, i.e. symbols which are not register names. - - Labels must also follow the NASM restrictions on label names: "Valid - characters in labels are letters, numbers, @tt{_}, @tt{$}, @tt{#}, @tt{@"@"}, @tt{~}, @tt{.}, and - @tt{?}. The only characters which may be used as the first character of an - identifier are letters, @tt{.} (with special meaning), @tt{_} - and @tt{?}." - - @ex[ - (label? 'foo) - (label? "foo") - (label? 'rax) - (label? 'foo-bar) - (label? 'foo.bar) - ] - -} - -@defproc[(instruction? [x any/c]) boolean?]{ - A predicate for instructions. -} - -@defproc[(offset? [x any/c]) boolean?]{ - A predicate for offsets. -} - -@defproc[(64-bit-integer? [x any/c]) boolean?]{ - A predicate for determining if a value is an integer that fits in 64-bits. - - @ex[ - (64-bit-integer? 0) - (64-bit-integer? (sub1 (expt 2 64))) - (64-bit-integer? (expt 2 64)) - (64-bit-integer? (- (expt 2 63))) - (64-bit-integer? (sub1 (- (expt 2 63))))] -} - -@defproc[(32-bit-integer? [x any/c]) boolean?]{ - A predicate for determining if a value is an integer that fits in 64-bits. - - @ex[ - (32-bit-integer? 0) - (32-bit-integer? (sub1 (expt 2 32))) - (32-bit-integer? (expt 2 32)) - (32-bit-integer? (- (expt 2 32))) - (32-bit-integer? (sub1 (- (expt 2 32))))] -} - -@defproc[(seq [x (or/c instruction? (listof instruction?))] ...) (listof instruction?)]{ - A convenience function for splicing togeter instructions and lists of instructions. - - @ex[ - (seq) - (seq (Label 'foo)) - (seq (list (Label 'foo))) - (seq (list (Label 'foo) - (Mov 'rax 0)) - (Mov 'rdx 'rax) - (list (Call 'bar) - (Ret))) - ] -} - -@defproc[(prog [x (or/c instruction? (listof instruction?))] ...) (listof instruction?)]{ - - Like @racket[seq], but also checks that the instructions - are well-formed in the following sense: - - @itemlist[ - - @item{Programs have at least one label which is declared @racket[Global]; the first label is used as the entry point.} - @item{All label declarations are unique.} - @item{All label targets are declared.} - @item{... other properties may be added in the future.} - - ] - - This function is useful to do some early error checking - over whole programs and can help avoid confusing NASM - errors. Unlike @racket[seq] it should be called at the - outermost level of a function that produces a86 code and not - nested. - - @ex[ - (prog (Global 'foo) (Label 'foo)) - (eval:error (prog (Label 'foo))) - (eval:error (prog (list (Label 'foo)))) - (eval:error (prog (Mov 'rax 32))) - (eval:error (prog (Label 'foo) - (Label 'foo))) - (eval:error (prog (Jmp 'foo))) - (prog (Global 'foo) - (Label 'foo) - (Jmp 'foo)) - ] -} - -@defproc[(symbol->label [s symbol?]) label?]{ - - Returns a modified form of a symbol that follows NASM label conventions. - - @ex[ - (let ([l (symbol->label 'my-great-label)]) - (seq (Label l) - (Jmp l))) - ] -} - -@deftogether[(@defstruct*[% ([s string?])] - @defstruct*[%% ([s string?])] - @defstruct*[%%% ([s string?])])]{ - - Creates a comment in the assembly code. The @racket[%] - constructor adds a comment toward the right side of the - current line; @racket[%%] creates a comment on its own line - 1 tab over; @racket[%%%] creates a comment on its own line - aligned to the left. - - @#reader scribble/comment-reader - (ex - (asm-display - (prog (Global 'foo) - (%%% "Start of foo") - (Label 'foo) - ; Racket comments won't appear - (%% "Inputs one argument in rdi") - (Mov 'rax 'rdi) - (Add 'rax 'rax) (% "double it") - (Sub 'rax 1) (% "subtract one") - (%% "we're done!") - (Ret)))) -} - -@defstruct*[Offset ([r register?] [i exact-integer?])]{ - - Creates an memory offset from a register. Offsets are used - as arguments to instructions to indicate memory locations. - An error is signalled when given invalid inputs. - - @ex[ - (Offset 'rax 0) - (eval:error (Offset 'rax 4.1)) - ] -} - -@defstruct*[Text ()]{ - - Declares the start of a text section, which includes instructions to - be executed. - -} - -@defstruct*[Data ()]{ - - Declares the start of a data section, which includes data and constants. - -} - -@defstruct*[Label ([x label?])]{ - - Creates a label from the given symbol. Each label in a - program must be unique. Labels must follow the NASM restrictions - on valid label names (see @racket[label?] for details). - - @ex[ - (Label 'fred) - (eval:error (Label "fred")) - (eval:error (Label 'fred-wilma)) - ] - -} - -@defstruct*[Extern ([x label?])]{ - - Declares an external label. - -} - -@defstruct*[Global ([x label?])]{ - - Declares a label as global, i.e. linkable with other object files. - -} - - -@defstruct*[Call ([x (or/c label? register?)])]{ - - A call instruction. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Call 'f) - (Add 'rax 1) - (Ret) - (Label 'f) - (Mov 'rax 41) - (Ret))) - ] -} - -@defstruct*[Ret ()]{ - - A return instruction. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Ret))) - ] - -} - -@defstruct*[Mov ([dst (or/c register? offset?)] [src (or/c register? offset? 64-bit-integer?)])]{ - - A move instruction. Moves @racket[src] to @racket[dst]. - - Either @racket[dst] or @racket[src] may be offsets, but not both. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rbx 42) - (Mov 'rax 'rbx) - (Ret))) - (eval:error (Mov (Offset 'rax 0) (Offset 'rbx 0))) - ] - -} - -@defstruct*[Add ([dst register?] [src (or/c register? offset? 32-bit-integer?)])]{ - - An addition instruction. Adds @racket[src] to @racket[dst] - and writes the result to @racket[dst]. Updates the conditional flags. - - In the case of a 32-bit immediate, it is sign-extended to 64-bits. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 32) - (Add 'rax 10) - (Ret))) - ] -} - -@defstruct*[Sub ([dst register?] [src (or/c register? offset? 32-bit-integer?)])]{ - - A subtraction instruction. Subtracts @racket[src] from - @racket[dst] and writes the result to @racket[dst]. - Updates the conditional flags. - - In the case of a 32-bit immediate, it is sign-extended to 64-bits. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 32) - (Sub 'rax 10) - (Ret))) - ] -} - -@defstruct*[Cmp ([a1 (or/c register? offset?)] [a2 (or/c register? offset? 32-bit-integer?)])]{ - Compare @racket[a1] to @racket[a2] by subtracting @racket[a2] from @racket[a1] - and updating the comparison flags. Does not store the result of subtraction. - - In the case of a 32-bit immediate, it is sign-extended to 64-bits. - - In the case of a 32-bit immediate, it is sign-extended to 64-bits. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Cmp 'rax 2) - (Jg 'l1) - (Mov 'rax 0) - (Label 'l1) - (Ret))) - ] -} - -@defstruct*[Jmp ([x (or/c label? register?)])]{ - Jump to label @racket[x]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Jmp 'l1) - (Mov 'rax 0) - (Label 'l1) - (Ret))) - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Pop 'rbx) - (Jmp 'rbx))) - ] - -} - -@defstruct*[Jz ([x (or/c label? register?)])]{ - Jump to label @racket[x] if the zero flag is set. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Cmp 'rax 2) - (Jz 'l1) - (Mov 'rax 0) - (Label 'l1) - (Ret))) - ] -} - -@defstruct*[Jnz ([x (or/c label? register?)])]{ - Jump to label @racket[x] if the zero flag is @emph{not} set. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Cmp 'rax 2) - (Jnz 'l1) - (Mov 'rax 0) - (Label 'l1) - (Ret))) - ] -} - -@defstruct*[Je ([x (or/c label? register?)])]{ - An alias for @racket[Jz]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Cmp 'rax 2) - (Je 'l1) - (Mov 'rax 0) - (Label 'l1) - (Ret))) - ] -} - -@defstruct*[Jne ([x (or/c label? register?)])]{ - An alias for @racket[Jnz]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Cmp 'rax 2) - (Jne 'l1) - (Mov 'rax 0) - (Label 'l1) - (Ret))) - ] -} - -@defstruct*[Jl ([x (or/c label? register?)])]{ - Jump to label @racket[x] if the conditional flags are set to ``less than'' (see @secref{a86-flags}). - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Cmp 'rax 2) - (Jl 'l1) - (Mov 'rax 0) - (Label 'l1) - (Ret))) - ] -} - -@defstruct*[Jle ([x (or/c label? register?)])]{ - Jump to label @racket[x] if the conditional flags are set to ``less than or equal'' (see @secref{a86-flags}). - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Cmp 'rax 42) - (Jle 'l1) - (Mov 'rax 0) - (Label 'l1) - (Ret))) - ] -} - -@defstruct*[Jg ([x (or/c label? register?)])]{ - Jump to label @racket[x] if the conditional flags are set to ``greater than'' (see @secref{a86-flags}). - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Cmp 'rax 2) - (Jg 'l1) - (Mov 'rax 0) - (Label 'l1) - (Ret))) - ] -} - -@defstruct*[Jge ([x (or/c label? register?)])]{ - Jump to label @racket[x] if the conditional flags are set to ``greater than or equal'' (see @secref{a86-flags}). - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Cmp 'rax 42) - (Jg 'l1) - (Mov 'rax 0) - (Label 'l1) - (Ret))) - ] -} - -@defstruct*[Jo ([x (or/c label? register?)])]{ - Jump to @racket[x] if the overflow flag is set. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax (sub1 (expt 2 63))) - (Add 'rax 1) - (Jo 'l1) - (Mov 'rax 0) - (Label 'l1) - (Ret))) - ] -} - -@defstruct*[Jno ([x (or/c label? register?)])]{ - Jump to @racket[x] if the overflow flag is @emph{not} set. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax (sub1 (expt 2 63))) - (Add 'rax 1) - (Jno 'l1) - (Mov 'rax 0) - (Label 'l1) - (Ret))) - ] -} - -@defstruct*[Jc ([x (or/c label? register?)])]{ - Jump to @racket[x] if the carry flag is set. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax -1) - (Add 'rax 1) - (Jc 'l1) - (Mov 'rax 0) - (Label 'l1) - (Ret))) - ] -} - -@defstruct*[Jnc ([x (or/c label? register?)])]{ - Jump to @racket[x] if the carry flag is @emph{not} set. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax -1) - (Add 'rax 1) - (Jnc 'l1) - (Mov 'rax 0) - (Label 'l1) - (Ret))) - ] -} - -@defstruct*[Cmovz ([dst register?] [src (or/c register? offset?)])]{ - Move from @racket[src] to @racket[dst] if the zero flag is set. - - Note that the semantics for conditional moves is not what many people expect. - The @racket[src] is @emph{always} read, regardless of the condition's evaluation. - This means that if your source is illegal (such as an offset beyond the bounds - of memory allocated to the current process), a segmentation fault will arise - even if the condition ``should have'' prevented the error. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 0) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmovz 'rax 'r9) - (Ret))) - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 2) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmovz 'rax 'r9) - (Ret))) - ] -} - - -@defstruct*[Cmove ([dst register?] [src (or/c register? offset?)])]{ - An alias for @racket[Cmovz]. See notes on @racket[Cmovz]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 0) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmove 'rax 'r9) - (Ret))) - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 2) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmove 'rax 'r9) - (Ret))) - ] -} - -@defstruct*[Cmovnz ([dst register?] [src (or/c register? offset?)])]{ - Move from @racket[src] to @racket[dst] if the zero flag is @emph{not} set. - See notes on @racket[Cmovz]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 0) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmovnz 'rax 'r9) - (Ret))) - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 2) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmovnz 'rax 'r9) - (Ret))) - ] -} - -@defstruct*[Cmovne ([dst register?] [src (or/c register? offset?)])]{ - An alias for @racket[Cmovnz]. See notes on @racket[Cmovz]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 0) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmovne 'rax 'r9) - (Ret))) - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 2) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmovne 'rax 'r9) - (Ret))) - ] -} - -@defstruct*[Cmovl ([dst register?] [src (or/c register? offset?)])]{ - Move from @racket[src] to @racket[dst] if the conditional flags are set to ``less than'' (see @secref{a86-flags}). - See also the notes on @racket[Cmovz]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 0) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmovl 'rax 'r9) - (Ret))) - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax -1) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmovl 'rax 'r9) - (Ret))) - ] -} - -@defstruct*[Cmovle ([dst register?] [src (or/c register? offset?)])]{ - Move from @racket[src] to @racket[dst] if the conditional flags are set to ``less than or equal'' (see @secref{a86-flags}). - See also the notes on @racket[Cmovz]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 0) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmovle 'rax 'r9) - (Ret))) - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 2) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmovle 'rax 'r9) - (Ret))) - ] -} - -@defstruct*[Cmovg ([dst register?] [src (or/c register? offset?)])]{ - Move from @racket[src] to @racket[dst] if the conditional flags are set to ``greather than'' (see @secref{a86-flags}). - See also the notes on @racket[Cmovz]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 0) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmovg 'rax 'r9) - (Ret))) - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 2) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmovg 'rax 'r9) - (Ret))) - ] -} - -@defstruct*[Cmovge ([dst register?] [src (or/c register? offset?)])]{ - Move from @racket[src] to @racket[dst] if the conditional flags are set to ``greater than or equal'' (see @secref{a86-flags}). - See also the notes on @racket[Cmovz]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax -1) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmovge 'rax 'r9) - (Ret))) - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 2) - (Cmp 'rax 0) - (Mov 'r9 1) - (Cmovge 'rax 'r9) - (Ret))) - ] -} - -@defstruct*[Cmovo ([dst register?] [src (or/c register? offset?)])]{ - Move from @racket[src] to @racket[dst] if the overflow flag is set. - See notes on @racket[Cmovz]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax (- (expt 2 63) 1)) - (Add 'rax 1) - (Mov 'r9 1) - (Cmovo 'rax 'r9) - (Ret))) - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax (- (expt 2 63) 2)) - (Add 'rax 1) - (Mov 'r9 1) - (Cmovo 'rax 'r9) - (Ret))) - ] -} - -@defstruct*[Cmovno ([dst register?] [src (or/c register? offset?)])]{ - Move from @racket[src] to @racket[dst] if the overflow flag is @emph{not} set. - See notes on @racket[Cmovz]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax (- (expt 2 63) 1)) - (Add 'rax 1) - (Mov 'r9 1) - (Cmovno 'rax 'r9) - (Ret))) - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax (- (expt 2 63) 2)) - (Add 'rax 1) - (Mov 'r9 1) - (Cmovno 'rax 'r9) - (Ret))) - ] -} - -@defstruct*[Cmovc ([dst register?] [src (or/c register? offset?)])]{ - Move from @racket[src] to @racket[dst] if the carry flag is set. - See notes on @racket[Cmovz]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax (- (expt 2 64) 1)) - (Add 'rax 1) - (Mov 'r9 1) - (Cmovc 'rax 'r9) - (Ret))) - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax (- (expt 2 64) 2)) - (Add 'rax 1) - (Mov 'r9 1) - (Cmovc 'rax 'r9) - (Ret))) - ] -} - -@defstruct*[Cmovnc ([dst register?] [src (or/c register? offset?)])]{ - Move from @racket[src] to @racket[dst] if the carry flag is @emph{not} set. - See notes on @racket[Cmovz]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax (- (expt 2 64) 1)) - (Add 'rax 1) - (Mov 'r9 1) - (Cmovnc 'rax 'r9) - (Ret))) - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax (- (expt 2 64) 2)) - (Add 'rax 1) - (Mov 'r9 1) - (Cmovnc 'rax 'r9) - (Ret))) - ] -} - - -@defstruct*[And ([dst (or/c register? offset?)] [src (or/c register? offset? 32-bit-integer?)])]{ - - Compute logical ``and'' of @racket[dst] and @racket[src] and put result in @racket[dst]. Updates the conditional flags. - - In the case of a 32-bit immediate, it is sign-extended to 64-bits. - - @#reader scribble/comment-reader - (ex - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax #b1011) ; #b1011 = 11 - (And 'rax #b1110) ; #b1110 = 14 - (Ret))) ; #b1010 = 10 - ) -} - -@defstruct*[Or ([dst (or/c register? offset?)] [src (or/c register? offset? 32-bit-integer?)])]{ - Compute logical ``or'' of @racket[dst] and @racket[src] and put result in @racket[dst]. Updates the conditional flags. - - In the case of a 32-bit immediate, it is sign-extended to 64-bits. - - In the case of a 32-bit immediate, it is sign-extended to 64-bits. - - @#reader scribble/comment-reader - (ex - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax #b1011) ; #b1011 = 11 - (Or 'rax #b1110) ; #b1110 = 14 - (Ret))) ; #b1111 = 15 - ) -} - -@defstruct*[Xor ([dst (or/c register? offset?)] [src (or/c register? offset? 32-bit-integer?)])]{ - Compute logical ``exclusive or'' of @racket[dst] and @racket[src] and put result in @racket[dst]. Updates the conditional flags. - - In the case of a 32-bit immediate, it is sign-extended to 64-bits. - - In the case of a 32-bit immediate, it is sign-extended to 64-bits. - - @#reader scribble/comment-reader - (ex - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax #b1011) ; #b1011 = 11 - (Xor 'rax #b1110) ; #b1110 = 14 - (Ret))) ; #b0101 = 5 - ) -} - -@defstruct*[Sal ([dst register?] [i (integer-in 0 63)])]{ - Shift @racket[dst] to the left @racket[i] bits and put result in @racket[dst]. - The most-significant (leftmost) bits are discarded. Updates the conditional - flags. - - @#reader scribble/comment-reader - (ex - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax #b100) ; #b100 = 4 = 2^2 - (Sal 'rax 6) - (Ret))) ; #b100000000 = 256 - ) -} - -@defstruct*[Sar ([dst register?] [i (integer-in 0 63)])]{ - Shift @racket[dst] to the right @racket[i] bits and put result in @racket[dst]. - For each shift count, the least-significant (rightmost) bit is shifted into - the carry flag. The new most-significant (leftmost) bits are filled with the - sign bit of the original @racket[dst] value. Updates the conditional flags. - - @#reader scribble/comment-reader - (ex - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax #b100000000) ; #b100000000 = 256 - (Sar 'rax 6) - (Ret))) ; #b100 = 4 - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax #b100001101) ; #b100001101 = 269 - (Sar 'rax 6) - (Ret))) ; #b100 = 4 - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax #b1000000000000000000000000000000000000000000000000000000000000000) ; 1 in MSB - (Sar 'rax 6) - (Ret))) ; #b1111111000000000000000000000000000000000000000000000000000000000 - ) -} - -@defstruct*[Shl ([dst register?] [i (integer-in 0 63)])]{ - Alias for @racket[Sal]. -} - -@defstruct*[Shr ([dst register?] [i (integer-in 0 63)])]{ - Shift @racket[dst] to the right @racket[i] bits and put result in @racket[dst]. - For each shift count, the least-significant (rightmost) bit is shifted into - the carry flag, and the most-significant bit is cleared. Updates the - conditional flags. - - @#reader scribble/comment-reader - (ex - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax #b100000000) ; #b100000000 = 256 - (Shr 'rax 6) - (Ret))) ; #b100 = 4 - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax #b100001101) ; #b100001101 = 269 - (Shr 'rax 6) - (Ret))) ; #b100 = 4 - - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax #b1000000000000000000000000000000000000000000000000000000000000000) ; 1 in MSB - (Shr 'rax 6) - (Ret))) ; #b0000001000000000000000000000000000000000000000000000000000000000 - ) -} - -@defstruct*[Push ([a1 (or/c 32-bit-integer? register?)])]{ - - Decrements the stack pointer and then stores the source - operand on the top of the stack. - - In the case of a 32-bit immediate, it is sign-extended to 64-bits. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Push 'rax) - (Mov 'rax 0) - (Pop 'rax) - (Ret))) - ] -} - -@defstruct*[Pop ([a1 register?])]{ - Loads the value from the top of the stack to the destination operand and then increments the stack pointer. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Push 'rax) - (Mov 'rax 0) - (Pop 'rax) - (Ret))) - ] -} - -@defstruct*[Not ([a1 register?])]{ -Perform bitwise not operation (each 1 is set to 0, and each 0 is set to 1) on the destination operand. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Mov 'rax 0) - (Not 'rax) - (Ret))) - ] -} - -@defstruct*[Lea ([dst (or/c register? offset?)] [x label?])]{ - Loads the address of the given label into @racket[dst]. - - @ex[ - (asm-interp - (prog - (Global 'entry) - (Label 'entry) - (Lea 'rbx 'done) - (Mov 'rax 42) - (Jmp 'rbx) - (Mov 'rax 0) - (Label 'done) - (Ret))) - ] -} - -@defstruct*[Db ([d integer?])]{ - Psuedo-instruction for declaring 8-bits of initialized static memory. -} - -@defstruct*[Dw ([d integer?])]{ - Psuedo-instruction for declaring 16-bits of initialized static memory. -} - -@defstruct*[Dd ([d integer?])]{ - Psuedo-instruction for declaring 32-bits of initialized static memory. -} - -@defstruct*[Dq ([d integer?])]{ - Psuedo-instruction for declaring 64-bits of initialized static memory. -} - -@section{From a86 to x86} - -@defmodule[a86/printer] - -@defproc[(asm-display [is (listof instruction?)]) void?]{ - - Prints an a86 program to the current output port in nasm syntax. - - @ex[ - (asm-display (prog (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Ret))) - ] - -} - -@defproc[(asm-string [is (listof instruction?)]) string?]{ - - Converts an a86 program to a string in nasm syntax. - - @ex[ - (asm-string (prog (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Ret))) - ] - -} - -@section{An Interpreter for a86} - -@defmodule[a86/interp] - -As you've seen throughout this chapter, @racketmodname[a86] -is equiped with an interpreter, which enables you to run -assembly programs from within Racket. This won't be directly -useful in building a compiler, but it will be very handy for -interactively exploring assembly programs and making examples -and test cases for your compiler. - -The simplest form of interpreting an a86 program is to use -@racket[asm-interp]. - -@defproc[(asm-interp [is (listof instruction?)]) integer?]{ - - Assemble, link, and execute an a86 program. - - @ex[ - (asm-interp (prog (Global 'entry) - (Label 'entry) - (Mov 'rax 42) - (Ret))) - ] - - Programs do not have to start with @racket['entry]. The - interpreter will jump to whatever the first label in the - program is: - -@ex[ - (asm-interp (prog (Global 'f) - (Label 'f) - (Mov 'rax 42) - (Ret))) - ] - - The argument of @racket[asm-interp] should be a complete, - well-formed a86 program. For best results, always use - @racket[prog] to construct the program so that error - checking is done early. If you use @racket[prog] and - @racket[asm-interp] and you get a NASM syntax error message, - please report it to the course staff as this is a bug in the - interpreter. - - While we try to make syntax errors impossible, it is - possible---quite easy, in fact---to write well-formed, but - erroneous assembly programs. For example, this program tries - to jump to null, which causes a segmentation fault: - - @ex[ - (eval:error (asm-interp (prog (Global 'crash) - (Label 'crash) - (Mov 'rax 0) - (Jmp 'rax)))) - ] - -} - -It is often the case that we want our assembly programs to -interact with the oustide or to use functionality -implemented in other programming languages. For that reason, -it is possible to link in object files to the running of an -a86 program. - -The mechanism for controlling which objects should be linked -in is a parameter called @racket[current-objs], which -contains a list of paths to object files which are linked to -the assembly code when it is interpreted. - -@defparam[current-objs objs (listof path-string?) #:value '()]{ - -Parameter that controls object files that will be linked in to -assembly code when running @racket[asm-interp]. - -} - -For example, let's implement a GCD function in C: - -@filebox-include[fancy-c a86 "gcd.c"] - -First, compile the program to an object file: - -@shellbox["gcc -fPIC -c gcd.c -o gcd.o"] - -The option @tt{-fPIC} is important; it causes the C compiler -to emit ``position independent code,'' which is what enables -Racket to dynamically load and run the code. - -Once the object file exists, using the @racket[current-objs] -parameter, we can run code that uses things defined in the C -code: - -@ex[ -(parameterize ((current-objs '("gcd.o"))) - (asm-interp (prog (Extern 'gcd) - (Global 'f) - (Label 'f) - (Mov 'rdi 11571) - (Mov 'rsi 1767) - (Sub 'rsp 8) - (Call 'gcd) - (Add 'rsp 8) - (Ret))))] - -This will be particularly relevant for writing a compiler -where emitted code will make use of functionality defined in -a runtime system. - -Note that if you forget to set @racket[current-objs], you will get a -linking error saying a symbol is undefined: - -@ex[ -(eval:error - (asm-interp (prog (Extern 'gcd) - (Global 'f) - (Label 'f) - (Mov 'rdi 11571) - (Mov 'rsi 1767) - (Sub 'rsp 8) - (Call 'gcd) - (Add 'rsp 8) - (Ret))))] - - -@defproc[(asm-interp/io [is (listof instruction?)] [in string?]) (cons integer? string?)]{ - - Like @racket[asm-interp], but uses @racket[in] for input and produce the result along - with any output as a string. - -} +@include-section[(lib "a86/scribblings/a86.scrbl")] diff --git a/www/notes/hustle.scrbl b/www/notes/hustle.scrbl index 065f2b9b..da62a18d 100644 --- a/www/notes/hustle.scrbl +++ b/www/notes/hustle.scrbl @@ -1,6 +1,6 @@ #lang scribble/manual -@(require (for-label (except-in racket ... compile) (except-in a86 exp))) +@(require (for-label (except-in racket ... compile) a86)) @(require redex/pict racket/runtime-path scribble/examples @@ -13,7 +13,7 @@ @(define codeblock-include (make-codeblock-include #'h)) -@(ev '(require rackunit (except-in a86 exp))) +@(ev '(require rackunit a86)) @(ev `(current-directory ,(path->string (build-path langs "hustle")))) @(void (ev '(with-output-to-string (thunk (system "make runtime.o"))))) @(for-each (λ (f) (ev `(require (file ,f)))) From bb9e256b0676211b5ee0024fab83d8d99abf9557 Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Sat, 23 Aug 2025 16:57:27 -0400 Subject: [PATCH 05/17] Temporary fix to Hustle notes. --- www/notes/hustle.scrbl | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/www/notes/hustle.scrbl b/www/notes/hustle.scrbl index da62a18d..66952b61 100644 --- a/www/notes/hustle.scrbl +++ b/www/notes/hustle.scrbl @@ -205,8 +205,10 @@ where the input program itself uses @racket[quote] we will see this kind of double quotation, and we are calling @racket[parse] with a two-element list as the argument: +@margin-note{FIXME: langs needs to be update to parse this correctly.} + @ex[ -(parse ''5)] +(eval:error (parse ''5))] This is saying that the input program was @racket['5]. Notice that it gets parsed the same as @racket[5] by our parser. @@ -227,9 +229,11 @@ talking about the expression @racket['()], so this gets parsed as It works similarly for pairs: +@margin-note{FIXME: langs needs to be update to parse second example correctly.} + @ex[ (eval:error (parse '(1 . 2))) -(parse ''(1 . 2))] +(eval:error (parse ''(1 . 2)))] While these examples can be a bit confusing at first, implementing this behavior is pretty simple. If the input is a @@ -275,14 +279,16 @@ Using their Racket counterparts of course! @codeblock-include["hustle/interp-prim.rkt"] +@margin-note{FIXME} + We can try it out: @ex[ (interp (parse '(cons 1 2))) (interp (parse '(car (cons 1 2)))) (interp (parse '(cdr (cons 1 2)))) -(interp (parse '(car '(1 . 2)))) -(interp (parse '(cdr '(1 . 2)))) +(eval:error (interp (parse '(car '(1 . 2))))) +(eval:error (interp (parse '(cdr '(1 . 2))))) (interp (parse '(let ((x (cons 1 2))) (+ (car x) (cdr x))))) ] From bb4faaa3073cfa980d302e06804e15047d6a95d7 Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Sat, 23 Aug 2025 17:03:42 -0400 Subject: [PATCH 06/17] Fixes #187. --- www/notes/mug.scrbl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/www/notes/mug.scrbl b/www/notes/mug.scrbl index f99bc6e8..0f4192c1 100644 --- a/www/notes/mug.scrbl +++ b/www/notes/mug.scrbl @@ -124,7 +124,7 @@ For example, here is a data section: ] These psuedo-instructions will add to the data segment of our program -56-bytes of data. The first 8-bytes consist of the number 6. The +32-bytes of data. The first 8-bytes consist of the number 6. The next 4-bytes consist of the number @racket[72], i.e. the codepoint for @racket[#\H]. The next 4-bytes consist of the codepoint for @racket[#\e] and so on. The names of these psuedo-instructions From fa091b051d7b086de0d3183ee2b56e4fea1ef419 Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Sat, 23 Aug 2025 17:05:15 -0400 Subject: [PATCH 07/17] Fixes #186. --- www/notes/1/ocaml-to-racket.scrbl | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/www/notes/1/ocaml-to-racket.scrbl b/www/notes/1/ocaml-to-racket.scrbl index 85d18f3d..5a43154a 100644 --- a/www/notes/1/ocaml-to-racket.scrbl +++ b/www/notes/1/ocaml-to-racket.scrbl @@ -755,7 +755,7 @@ Racket. @section{Symbols} One of the built-in datatypes we will use often in Racket is -that of a @emph{symbol}. A symbol is just an atomic peice of +that of a @emph{symbol}. A symbol is just an atomic piece of data. A symbol is written using the @racket[quote] notation @racket[(code:quote symbol-name)], which is abbreviated @racket['symbol-name]. What's allowable as a symbol name @@ -965,7 +965,7 @@ representation of itself. For example, @racket[(+ 1 2)] is an expression. When run, it applies the @emph{function} bound to the variable @racket[+] to the arguments @racket[1] and @racket[2] and produces @racket[3]. On the other hand: -@racket['(+ 1 2)] constructs a peice of data, namely, a list of three +@racket['(+ 1 2)] constructs a piece of data, namely, a list of three elements. The first element is the @emph{symbol} @tt{+}, the second element is @racket[2], the third element is @racket[3]. @@ -989,7 +989,7 @@ then the @emph{expression} @racket[e] is evaluated and it's value will be used in place of @tt{(unquote e)}. This gives us the ability to ``escape'' out of a quoted -peice of data and go back to expression mode. +piece of data and go back to expression mode. If we think of @racket[quasiquote] like @racket[quote] in terms of ``pushing in'' then the rules are exactly the same @@ -1009,7 +1009,7 @@ instead as data.. @emph{unless we encounter a things as expressions. -The last remaining peice is @racket[unquote-splicing], which +The last remaining piece is @racket[unquote-splicing], which is abbreviated with ``comma-at'': @racket[,@e] means @tt{ (unquote-splicing e)}. The @racket[unquote-splicing] form is like @racket[unquote] in that if it occurs within a @@ -1048,7 +1048,7 @@ data. It doesn't contain anything and its only real purpose is to be distinguishable from @racket[node] structures. On the other hand a @racket[node] structure needs to be distinguishable from @racket[leaf]s, but also contain 3 -peices of data within it. +pieces of data within it. We can formulate definition of binary trees using only symbols and lists as: From 2108c53246ab5d1c68adad6a005c23c66c5bd623 Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Sat, 23 Aug 2025 22:32:24 -0400 Subject: [PATCH 08/17] Start of new semester. --- www/Makefile | 2 +- www/assignments.scrbl | 8 ++++---- www/defns.rkt | 40 +++++++++++++--------------------------- www/main.scrbl | 20 ++++++++++---------- www/syllabus.scrbl | 4 ++-- 5 files changed, 30 insertions(+), 44 deletions(-) diff --git a/www/Makefile b/www/Makefile index aa9d83be..cdcefa97 100644 --- a/www/Makefile +++ b/www/Makefile @@ -28,7 +28,7 @@ scribble: $(course).scrbl push: - rsync -rvzp main/ dvanhorn@junkfood.cs.umd.edu:/fs/www/class/fall2024/cmsc430/ + rsync -rvzp main/ dvanhorn@junkfood.cs.umd.edu:/fs/www/class/fall2025/cmsc430/ clean: rm -rf $(course) diff --git a/www/assignments.scrbl b/www/assignments.scrbl index 51caaca9..4ad05e98 100644 --- a/www/assignments.scrbl +++ b/www/assignments.scrbl @@ -4,10 +4,10 @@ @local-table-of-contents[#:style 'immediate-only] @include-section{assignments/1.scrbl} -@include-section{assignments/2.scrbl} -@include-section{assignments/3.scrbl} -@include-section{assignments/4.scrbl} -@include-section{assignments/5.scrbl} +@;include-section{assignments/2.scrbl} +@;include-section{assignments/3.scrbl} +@;include-section{assignments/4.scrbl} +@;include-section{assignments/5.scrbl} @;include-section{assignments/6.scrbl} @;;include-section{assignments/7.scrbl} diff --git a/www/defns.rkt b/www/defns.rkt index 6b5488e0..b0586c02 100644 --- a/www/defns.rkt +++ b/www/defns.rkt @@ -13,7 +13,7 @@ (define prof1-initials "DVH") (define semester "fall") -(define year "2024") +(define year "2025") (define courseno "CMSC 430") (define lecture-dates "" #;"May 30 -- July 7, 2023") @@ -25,45 +25,31 @@ (define office-hour-location (elem AVW " " "4122")) -(define m1-date "October 10") -(define m2-date "November 7") +(define m1-date "October 9") +(define m2-date "November 6") (define midterm-hours "24") -(define final-date "Saturday, December 14") -(define final-end-time "12:30PM") -(define elms-url "https://umd.instructure.com/courses/1368381") +(define final-date "TBD") +(define final-end-time "TBD") +(define elms-url "https://umd.instructure.com/courses/1388468") -(define racket-version "8.13") +(define racket-version "8.18") (define staff (list (list "Pierce Darragh" "pdarragh@umd.edu") - (list "Kalyan Bhetwal" "kbhetwal@umd.edu") - ;(list "Justin Frank" "jpfrank@umd.edu") - (list "Deena Postol" "dpostol@umd.edu") - (list "Caspar Popova" "caspar@umd.edu") - (list "Emma Shroyer" "eshroyer@umd.edu") - (list "Kazi Tasnim Zinat" "kzintas@umd.edu") - #;(list "Fuxiao Liu" "fl3es@umd.edu") - #;(list "Vivian Chen" "vchen8@terpmail.umd.edu") - #;(list "Ian Morrill" "imorrill@terpmail.umd.edu") - #;(list "Matthew Schneider" "mgl@umd.edu") - #;(list "Rhea Jajodia" "rjajodia@terpmail.umd.edu") - #;(list "Syed Zaidi" "szaidi@umd.edu") - #;(list "William Wegand" "wfweg@verizon.net") - #;(list "Wilson Smith" "smith@umd.edu") - #;(list "Yuhwan Lee" "ylee9251@terpmail.umd.edu") - )) - + (list "Benjamin Quiring" "bquiring@umd.edu") + (list "Kalyan Bhetwal" "bhetwal@umd.edu") + (list "Zhongqi Wang" "zqwang@umd.edu"))) (define lecture-schedule1 "TTh, 2:00-3:15pm") (define classroom1 "LEF 2205") ;(define discord "TBD") -(define piazza "https://piazza.com/umd/fall2024/cmsc430/home") -(define gradescope "https://www.gradescope.com/courses/818295") +(define piazza "https://piazza.com/umd/fall2025/cmsc430/home") +(define gradescope "https://www.gradescope.com/courses/1098215/") -(define feedback "https://forms.gle/A6U3CCR2KyA86UTh6") +(define feedback "https://forms.gle/99yTz7HVfopCaDMz9") (define (assign-deadline i) (list-ref '("Tuesday, September 10, 11:59PM" diff --git a/www/main.scrbl b/www/main.scrbl index cd5b1bc1..1e2a793c 100644 --- a/www/main.scrbl +++ b/www/main.scrbl @@ -45,16 +45,16 @@ implement several related languages. @tabular[#:style 'boxed #:row-properties '(bottom-border ()) (list (list @bold{Time} @bold{Monday} @bold{Tuesday} @bold{Wednesday} @bold{Thursday} @bold{Friday}) - (list "8 AM" 'cont 'cont "Kalyan" "Kalyan" 'cont) - (list "9 AM" "Deena" "Deena" "Kalyan" "Kalyan" "Caspar") - (list "10 AM" "Deena" "Deena" "Kalyan" "Kalyan" "Caspar") - (list "11 AM" "Deena" 'cont "Emma" 'cont "Caspar") - (list "12 PM" "Deena" 'cont "Emma" 'cont "Emma") - (list "1 PM" 'cont 'cont "Emma" "Kazi" "Emma") - (list "2 PM" 'cont 'cont 'cont 'cont "Emma") - (list "3 PM" 'cont "Kazi" "Caspar" 'cont 'cont) - (list "4 PM" 'cont "Kazi" "Caspar" 'cont 'cont) - (list "5 PM" 'cont "Kazi" "Caspar" 'cont 'cont))] + (list "8 AM" 'cont 'cont 'cont 'cont 'cont) + (list "9 AM" 'cont 'cont 'cont 'cont 'cont) + (list "10 AM" 'cont 'cont 'cont 'cont 'cont) + (list "11 AM" 'cont 'cont 'cont 'cont 'cont) + (list "12 PM" 'cont 'cont 'cont 'cont 'cont) + (list "1 PM" 'cont 'cont 'cont 'cont 'cont) + (list "2 PM" 'cont 'cont 'cont 'cont 'cont) + (list "3 PM" 'cont 'cont 'cont 'cont 'cont) + (list "4 PM" 'cont 'cont 'cont 'cont 'cont) + (list "5 PM" 'cont 'cont 'cont 'cont 'cont))] @bold{Communications:} @link[@elms-url]{ELMS}, @link[@piazza]{Piazza} diff --git a/www/syllabus.scrbl b/www/syllabus.scrbl index ce33f611..98360b92 100644 --- a/www/syllabus.scrbl +++ b/www/syllabus.scrbl @@ -58,7 +58,7 @@ in-person lectures, which will be recorded and available on ELMS immediately after each lecture. There are two midterms, a final project, which counts as the final assessment for the class, several assignments, and several quizes and surveys. Midterms are take-home -exams and completed online over a @midterm-hours period. +exams and completed online over a @|midterm-hours|-hour period. @bold{Contents:} @@ -179,7 +179,7 @@ of the course: @itemlist[ @item{Overview of compilation} - @item{Operational semantics} + @;item{Operational semantics} @item{Interpreters} @item{Intermediate representations and bytecode} @item{Code generation} From 1c15f828eadf0b2594d351a1eb758c9ca8d2eb8c Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Sat, 23 Aug 2025 22:38:12 -0400 Subject: [PATCH 09/17] Final date and time. --- www/defns.rkt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/www/defns.rkt b/www/defns.rkt index b0586c02..1f7b7789 100644 --- a/www/defns.rkt +++ b/www/defns.rkt @@ -28,8 +28,8 @@ (define m1-date "October 9") (define m2-date "November 6") (define midterm-hours "24") -(define final-date "TBD") -(define final-end-time "TBD") +(define final-date "December 18") +(define final-end-time "12:30pm") (define elms-url "https://umd.instructure.com/courses/1388468") From 1b831eee01c32e71abebe50107c9b4a288b72e76 Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Sat, 23 Aug 2025 22:38:40 -0400 Subject: [PATCH 10/17] Fix up midterm duration to use defns. --- www/syllabus.scrbl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/www/syllabus.scrbl b/www/syllabus.scrbl index 98360b92..40df3823 100644 --- a/www/syllabus.scrbl +++ b/www/syllabus.scrbl @@ -255,7 +255,7 @@ right to reject survey responses that are not considered thoughtful. @section[#:tag "syllabus-midterms"]{Midterms} There will be two @secref{Midterms}, which will be @bold{take-home} -exams. Exams will be distributed at least 48 hours before the due +exams. Exams will be distributed at least @|midterm-hours|-hours before the due date of the midterm. @itemlist[ From ac2159d52b191a36cd1227e11218377d16bd334a Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Tue, 26 Aug 2025 13:23:09 -0400 Subject: [PATCH 11/17] Add start date and TA. --- www/defns.rkt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/www/defns.rkt b/www/defns.rkt index 1f7b7789..fdf3eedc 100644 --- a/www/defns.rkt +++ b/www/defns.rkt @@ -24,7 +24,7 @@ (define office-hour-location (elem AVW " " "4122")) - +(define start-date "September 2") (define m1-date "October 9") (define m2-date "November 6") (define midterm-hours "24") @@ -39,7 +39,8 @@ (list (list "Pierce Darragh" "pdarragh@umd.edu") (list "Benjamin Quiring" "bquiring@umd.edu") (list "Kalyan Bhetwal" "bhetwal@umd.edu") - (list "Zhongqi Wang" "zqwang@umd.edu"))) + (list "Zhongqi Wang" "zqwang@umd.edu") + (list "Kazi Tasnim Zinat" "kzintas@umd.edu"))) (define lecture-schedule1 "TTh, 2:00-3:15pm") From 3e45a72a161471c986434e330292ddaf188bb6a0 Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Tue, 26 Aug 2025 13:36:39 -0400 Subject: [PATCH 12/17] Generic midterms. --- www/midterms/1.scrbl | 16 +++++++++------- www/midterms/2.scrbl | 45 +++++++++++++++++++++++++++++--------------- 2 files changed, 39 insertions(+), 22 deletions(-) diff --git a/www/midterms/1.scrbl b/www/midterms/1.scrbl index 9ced0b71..fce52762 100644 --- a/www/midterms/1.scrbl +++ b/www/midterms/1.scrbl @@ -12,11 +12,13 @@ its due date. @section{Practice} -There is a practice midterm from Summer 2023 available on ELMS as -@tt{m1-summer-2023.zip}. You may submit to the Practice Midterm 1 - -Summer 2023 assignment on Gradescope to get feedback on your solution. -However during the real midterm, you will not get this level of -feedback from the autograder. +There is a practice midterm available on ELMS as @tt{m1-PRACTICE.zip}. +You may submit to the Practice Midterm 1 assignment on Gradescope to +get feedback on your solution. However during the real midterm, you +will not get this level of feedback from the autograder. @bold{Make +sure you do not submit your practice midterm solution for the real +midterm! We will not allow late submissions if you submit the wrong +work.} @section{Instructions} @@ -30,12 +32,12 @@ midterm. @section{Communications} If you have questions about the exam, send a @bold{private} message on -Piazza. +@link[piazza]{Piazza}. Answers to common clarifying questions will be posted to Piazza. -If you have trouble reaching the course staff via Discord, email +If you have trouble reaching the course staff via Piazza, email @tt{@prof1-email}. You may not communicate with anyone outside of the course staff about diff --git a/www/midterms/2.scrbl b/www/midterms/2.scrbl index 6d3f5788..596bbe9a 100644 --- a/www/midterms/2.scrbl +++ b/www/midterms/2.scrbl @@ -12,25 +12,33 @@ Midterm 2 will be released at least @midterm-hours hours prior to its due date. -@;{ +@section{Practice} + +There is a practice midterm from Summer 2023 available on ELMS as +@tt{m2-summer-2023.zip}. You may submit to the Practice Midterm 1 - +Summer 2023 assignment on Gradescope to get feedback on your solution. +However during the real midterm, you will not get this level of +feedback from the autograder. + @section{Instructions} -The midterm will be released as a zip file @tt{m2.zip} on ELMS. +The midterm will be released as a zip file @tt{m1.zip} on ELMS. -There are several parts to this midterm. Each part has its own directory -with a README and supplementary files. Read the README in each part -for instructions on how to complete that part of the midterm. +There are several parts to this midterm. Each part has its own +directory with a README and supplementary files. Read the README in +each part for instructions on how to complete that part of the +midterm. @section{Communications} -If you have questions about the exam, send a DM to ModMail on Discord. -This will go to the entire course staff. +If you have questions about the exam, send a @bold{private} message on +@link[piazza]{Piazza}. -Answers to common clarifying questions will be posted to the -@tt{#midterm-2} channel on Discord. +Answers to common clarifying questions will be posted to +Piazza. -If you have trouble reaching the course staff via Discord, email -@tt|{dvanhorn@cs.umd.edu}|. +If you have trouble reaching the course staff via Piazza, email +@tt{@prof1-email}. You may not communicate with anyone outside of the course staff about the midterm. @@ -38,7 +46,14 @@ the midterm. @section{Submissions} You should submit your work as a single zip file of this directory on -Gradescope. Unlike past assignments, Gradescope will not provide -feedback on the correctness of your solutions so you are encouraged to -check your own work. -} \ No newline at end of file +Gradescope. Unlike past assignments, Gradescope will only do a basic +test for well-formedness of your submission. It will make sure the +directory layout is correct and that all the functions that will be +tested are available. It will catch syntax errors in your code, but +it does not run any correctness tests. + +If you fail these tests, we will not be able to grade your submission. +Passing these tests only means your submission is well-formed. Your +actual grade will be computed after the deadline. + +You are encouraged to check your own work. From ca1adc0c751af488feb9c02a5aa24f81ed12bd5b Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Tue, 26 Aug 2025 13:37:00 -0400 Subject: [PATCH 13/17] Make project TBD. --- www/project.scrbl | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/www/project.scrbl b/www/project.scrbl index bfff34c8..1929f91a 100644 --- a/www/project.scrbl +++ b/www/project.scrbl @@ -15,6 +15,10 @@ completed project. @bold{Due: @final-date, @final-end-time} +Project details will be released later in the semester. + +@;{ + @section[#:style 'unnumbered]{Arity Checking, Rest Arguments, Case Functions, and Apply} @(define-runtime-path iniquity-plus "iniquity-plus/") @@ -505,3 +509,4 @@ submit.zip} from within the @tt{iniquity-plus} directory to create a zip file with the proper structure. +} \ No newline at end of file From 29fc9ce2310f524d0c4eaf6df794070848a86c01 Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Tue, 26 Aug 2025 22:50:33 -0400 Subject: [PATCH 14/17] Revise midterm dates. --- www/defns.rkt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/www/defns.rkt b/www/defns.rkt index fdf3eedc..625ca7b0 100644 --- a/www/defns.rkt +++ b/www/defns.rkt @@ -25,8 +25,8 @@ (define office-hour-location (elem AVW " " "4122")) (define start-date "September 2") -(define m1-date "October 9") -(define m2-date "November 6") +(define m1-date "October 16") +(define m2-date "November 13") (define midterm-hours "24") (define final-date "December 18") (define final-end-time "12:30pm") From 179a132d16d5933c51e1afd628b9c1682ece4856 Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Tue, 26 Aug 2025 22:50:47 -0400 Subject: [PATCH 15/17] Semester schedule. --- www/schedule.scrbl | 48 +++++++++++++++++++++------------------------- 1 file changed, 22 insertions(+), 26 deletions(-) diff --git a/www/schedule.scrbl b/www/schedule.scrbl index d7f0e202..b28ba8ad 100644 --- a/www/schedule.scrbl +++ b/www/schedule.scrbl @@ -23,92 +23,88 @@ @bold{Tuesday} @bold{Thursday}) - (list @wk{8/27} + (list @wk{9/2} #;"" @secref["Intro"] @elem{@secref["OCaml to Racket"]}) - (list @wk{9/3} + (list @wk{9/9} @;seclink["Assignment 1"]{A1} @elem{@secref["a86"]} @elem{@secref["a86"]}) - (list @wk{9/10} + (list @wk{9/16} @;seclink["Assignment 2"]{A2} @itemlist[@item{@secref["Abscond"]} @item{@secref["Blackmail"]}] @itemlist[@item{@secref["Con"]} @item{@secref["Dupe"]}]) - (list @wk{9/17} + (list @wk{9/23} @;"" @secref["Dodger"] @secref["Evildoer"]) - (list @wk{9/24} + (list @wk{9/30} @;elem{A3} @;elem{@seclink["Assignment 2"]{A2}} @secref["Evildoer"] @secref{Extort}) - (list @wk{10/1} + (list @wk{10/7} @;"" - @secref{Extort} + @secref{Fraud} @secref{Fraud}) - (list @wk{10/8} + (list @wk{10/14} @;elem{A4} - @secref{Fraud} + @elem{No class: Fall Break} @secref["Midterm_1"]) - (list @wk{10/15} + (list @wk{10/21} @;"" - @secref{Fraud} - @secref{Hustle}) - (list @wk{10/22} - @;"" @secref{Hustle} @secref{Hustle}) - (list @wk{10/22} + (list @wk{10/28} @;elem{A5} @;elem{@seclink["Assignment 4"]{A4}} @secref{Hoax} - @secref{Iniquity}) + @secref{Hoax}) - (list @wk{10/29} + (list @wk{11/4} @;"" @secref{Iniquity} @secref{Iniquity}) - (list @wk{11/5} + (list @wk{11/11} @;elem{A6} @secref{Knock} @secref["Midterm_2"]) - (list @wk{11/12} + (list @wk{11/18} @;"" @secref{Jig} @secref{Loot}) - (list @wk{11/19} + (list @wk{11/25} @;elem{A7} @;elem{@seclink["Assignment 5"]{A5}} @secref{Loot} - @secref{Mug}) + @elem{No class: Thanksgiving}) - (list @wk{11/26} + (list @wk{12/2} @;"" - @secref{Neerdowell} - @elem{No class}) + @secref{Mug} + @secref{Neerdowell}) - (list @wk{12/3} + (list @wk{12/9} @;"" @secref{Outlaw} - @elem{@secref{Outlaw}, cont.}) + @secref{Outlaw}) )] From 646918aaf7dc97ff606fe8647b9e5e1d8b154a66 Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Tue, 2 Sep 2025 10:01:24 -0400 Subject: [PATCH 16/17] Overhaul ready for day 1. --- www/assignments.scrbl | 16 +- www/assignments/1.scrbl | 17 +- www/assignments/10.scrbl | 9 + www/assignments/2.scrbl | 17 +- www/assignments/3.scrbl | 2 +- www/assignments/4.scrbl | 291 ++++++++++++++++-------- www/assignments/5.scrbl | 128 ++--------- www/assignments/6.scrbl | 89 +------- www/assignments/7.scrbl | 234 +------------------ www/assignments/8.scrbl | 9 + www/assignments/9.scrbl | 9 + www/defns.rkt | 15 +- www/main.scrbl | 29 ++- www/midterms/1.scrbl | 16 +- www/midterms/2.scrbl | 14 +- www/notes/1/what-is-a-compiler.scrbl | 284 ++++++++++++++++++++++++ www/schedule.scrbl | 7 +- www/syllabus.scrbl | 321 +++++++++++++++++++-------- 18 files changed, 854 insertions(+), 653 deletions(-) create mode 100644 www/assignments/10.scrbl create mode 100644 www/assignments/8.scrbl create mode 100644 www/assignments/9.scrbl diff --git a/www/assignments.scrbl b/www/assignments.scrbl index 4ad05e98..1a3d71e8 100644 --- a/www/assignments.scrbl +++ b/www/assignments.scrbl @@ -4,12 +4,16 @@ @local-table-of-contents[#:style 'immediate-only] @include-section{assignments/1.scrbl} -@;include-section{assignments/2.scrbl} -@;include-section{assignments/3.scrbl} -@;include-section{assignments/4.scrbl} -@;include-section{assignments/5.scrbl} -@;include-section{assignments/6.scrbl} -@;;include-section{assignments/7.scrbl} +@include-section{assignments/2.scrbl} +@include-section{assignments/3.scrbl} +@include-section{assignments/4.scrbl} +@include-section{assignments/5.scrbl} +@include-section{assignments/6.scrbl} +@include-section{assignments/7.scrbl} +@include-section{assignments/8.scrbl} +@include-section{assignments/9.scrbl} +@include-section{assignments/10.scrbl} + @;{assignment 8: quote in general, and quasiquote} @;{assignment 9: standard library, IO} diff --git a/www/assignments/1.scrbl b/www/assignments/1.scrbl index 367ae713..41346251 100644 --- a/www/assignments/1.scrbl +++ b/www/assignments/1.scrbl @@ -1,6 +1,6 @@ #lang scribble/manual @(require "../defns.rkt") -@title[#:tag "Assignment 1" #:style 'unnumbered]{Assignment 1: Racket Primer} +@title[#:tag "Assignment 1" #:style 'unnumbered]{Assignment 1: Racket primer} @bold{Due: @assign-deadline[1]} @@ -10,13 +10,14 @@ The goal of this assignment is to gain practice programming in Racket. you'd like on this assignment, but each person must submit their @tt{submit.zip} file on Gradescope. -You are given a @tt{racket-basics.zip} file (on ELMS under "Files"), -that contains a README, a Makefile, and a number of Racket modules. -In each module there are several function ``stubs,'' i.e. incomplete -function definitions with type signatures, descriptions, and a small -set of tests. Each function has a bogus (but type correct) body -marked with a ``TODO'' comment. Your job is to replace each of these -expressions with a correct implementation of the function. +You are given a @tt{racket-basics.zip} file (in ELMS, linked in the +description for Assignment 1) that contains a README, a Makefile, and +a number of Racket modules. In each module there are several function +``stubs,'' i.e. incomplete function definitions with type signatures, +descriptions, and a small set of tests. Each function has a bogus +(but type correct) body marked with a ``TODO'' comment. Your job is +to replace each of these expressions with a correct implementation of +the function. The last section of problems deals with functions that operate over a representation of expressions in a lambda-calculus-like language and diff --git a/www/assignments/10.scrbl b/www/assignments/10.scrbl new file mode 100644 index 00000000..e1790abd --- /dev/null +++ b/www/assignments/10.scrbl @@ -0,0 +1,9 @@ +#lang scribble/manual +@(require "../defns.rkt") +@title[#:tag "Assignment 10" #:style 'unnumbered]{Assignment 10: Patterns} + +@(require (for-label a86 (except-in racket ...))) + +@bold{Due: @assign-deadline[10]} + +Details of this assignment will be released later in the semester. diff --git a/www/assignments/2.scrbl b/www/assignments/2.scrbl index 01ed28db..f9927e6e 100644 --- a/www/assignments/2.scrbl +++ b/www/assignments/2.scrbl @@ -1,6 +1,6 @@ #lang scribble/manual @(require "../defns.rkt") -@title[#:tag "Assignment 2" #:style 'unnumbered]{Assignment 2: a86 Primer} +@title[#:tag "Assignment 2" #:style 'unnumbered]{Assignment 2: Assembly primer} @bold{Due: @assign-deadline[2]} @@ -10,13 +10,14 @@ The goal of this assignment is to gain practice programming in a86. you'd like on this assignment, but each person must submit their @tt{submit.zip} file on Gradescope. -You are given a @tt{a86-basics.zip} file (on ELMS under "Files"), that -contains a README, a Makefile, and a number of Racket modules. In -each module there are several ``stubs,'' i.e. incomplete definitions -with type signatures, descriptions, and a small set of tests. Each -definition has a bogus (but type correct) body marked with a ``TODO'' -comment. Your job is to replace each of these expressions with a -correct implementation of the a86 code. +You are given a @tt{a86-basics.zip} file (in ELMS, linked in the +description for Assignment 1), that contains a README, a Makefile, and +a number of Racket modules. In each module there are several +``stubs,'' i.e. incomplete definitions with type signatures, +descriptions, and a small set of tests. Each definition has a bogus +(but type correct) body marked with a ``TODO'' comment. Your job is +to replace each of these expressions with a correct implementation of +the a86 code. Make sure you do not rename any files. Also make sure not to change the name or signature of any definition given to you. You may add any diff --git a/www/assignments/3.scrbl b/www/assignments/3.scrbl index 9f778063..4ac51298 100644 --- a/www/assignments/3.scrbl +++ b/www/assignments/3.scrbl @@ -1,6 +1,6 @@ #lang scribble/manual @(require "../defns.rkt") -@title[#:tag "Assignment 3" #:style 'unnumbered]{Assignment 3: Primitives, Conditionals, and Dispatch} +@title[#:tag "Assignment 3" #:style 'unnumbered]{Assignment 3: Primitives, conditionals} @(require (for-label a86 (except-in racket ...))) diff --git a/www/assignments/4.scrbl b/www/assignments/4.scrbl index 82cc2c0e..a52a7ac6 100644 --- a/www/assignments/4.scrbl +++ b/www/assignments/4.scrbl @@ -1,156 +1,259 @@ #lang scribble/manual @(require "../defns.rkt") -@title[#:tag "Assignment 4" #:style 'unnumbered]{Assignment 4: Let There Be (Many) Variables} +@title[#:tag "Assignment 4" #:style 'unnumbered]{Assignment 4: Case} -@bold{Due: @assign-deadline[4]} +@(require (for-label a86 (except-in racket ...))) -The goal of this assignment is to extend a compiler with binding forms and -primitives that can take any number of arguments. +@bold{Due: @assign-deadline[4]} +The goal of this assignment is to extend the language developed in +@secref{Dupe} with a new form of control flow expressions: +@racket[case]-expressions. -@section[#:tag-prefix "a4-" #:style 'unnumbered]{Overview} +@section[#:tag-prefix "a4-" #:style 'unnumbered]{Dupe+} -For this assignment, you are given a @tt{fraud-plus.zip} file on ELMS -with a starter compiler similar to the @seclink["Fraud"]{Fraud} -language we studied in class. +The Dupe+ language extends Dupe in the follow ways: -Unlike @seclink["Assignment 3"]{Assignment 3}, the following files have already -been updated for you @bold{and should not be changed by you}: @itemlist[ -@item{@tt{ast.rkt}} -@item{@tt{parse.rkt}} +@item{adding @racket[case].} ] -So you will only need to modify: +@subsection[#:tag-prefix "a4-" #:style 'unnumbered]{Primitives} + +The following new primitves are included in Dupe+: + @itemlist[ -@item{@tt{interp.rkt}} -@item{@tt{interp-prim.rkt}} -@item{@tt{compile.rkt}} -@item{@tt{compile-ops.rkt}} +@item{@racket[(abs _e)]: compute the absolute value of @racket[_e],} +@item{@racket[(- _e)]: flips the sign of @racket[_e], i.e. compute @math{0-@racket[_e]}, and} +@item{@racket[(not _e)]: compute the logical negation of @racket[_e]; note that the negation of @emph{any} value other than @racket[#f] is @racket[#f] and the negation of @racket[#f] is @racket[#t].} ] -to correctly implement the new features. These features are described below. +@subsection[#:tag-prefix "a4-" #:style 'unnumbered]{Conditional expressions} -@subsection[#:tag-prefix "a4-" #:style 'unnumbered]{Submitting} +The following new conditional form is included in Dupe+: -Submit a zip file containing your work to Gradescope. Use @tt{make submit.zip} -from within the @tt{fraud-plus} directory to create a zip file with the proper -structure. +@racketblock[ +(cond [_e-p1 _e-a1] + ... + [else _e-an]) +] -We will not use your @tt{ast.rkt} or @tt{parse.rkt} files. Part of Assignment 3 -was learning to design your own structures, but part of Assignment 4 is -learning to work within the constraints of an existing design! +A @racket[cond] expression has any number of clauses @racket[[_e-pi +_e-ai] ...], followed by an ``else'' clause @racket[[else _en]]. For +the purposes of this assignment, we will assume every @racket[cond] +expression ends in an @racket[else] clause, even though this is not +true in general for Racket. The parser should reject any +@racket[cond]-expression that does not end in @racket[else]. -@subsection[#:tag-prefix "a4-" #:style 'unnumbered]{Testing} +The meaning of a @racket[cond] expression is computed by evaluating +each expression @racket[_e-pi] in order until the first one that +does not evaluate to @racket[#f] is found, in which case, the corresponding expression +@racket[_e-ai] is evaluated and its value is the value of the +@racket[cond] expression. If no such @racket[_e-pi] exists, the +expression @racket[_e-an]'s value is the value of the @racket[cond]. -You can test your code in several ways: +@subsection[#:tag-prefix "a4-" #:style 'unnumbered]{Case expressions} + +The following new case form is included in Dupe+: + +@racketblock[ +(case _ev + [(_d1 ...) _e1] + ... + [else _en]) +] + +The @racket[case] expression form is a mechanism for dispatching +between a number of possible expressions based on a value, much like +C's notion of a @tt{switch}-statement. + +The meaning of a @racket[case] expression is computed by evaluating +the expression @racket[_ev] and then proceeding in order through each +clause until one is found that has a datum @racket[_di] equal to +@racket[_ev]'s value. Once such a clause is found, the corresponding +expression @racket[_ei] is evaluated and its value is the value of the +@racket[case] expression. If no such clause exists, expression +@racket[_en] is evaluated and its value is the value of the +@racket[case] expression. + +Note that each clause consists of a parenthesized list of +@emph{datums}, which in the setting of Dupe means either integer or +boolean literals. + +@section[#:tag-prefix "a4-" #:style 'unnumbered]{Implementing Dupe+} + +You must extend the parser, interpreter, and compiler to implement +Dupe+. You are given a file @tt{dupe-plus.zip} on ELMS with a starter +compiler based on the @secref{Dupe} language we studied in class. +You may use any a86 instructions you'd like, however it is possible to +complete the assignment using @racket[Cmp], @racket[Je], @racket[Jg], +@racket[Jmp], @racket[Label], @racket[Mov], and @racket[Sub]. + +@subsection[#:tag-prefix "a4-" #:style 'unnumbered]{Implementing primitives} + +Implement the primitives as described earlier. + +There are many ways to implement these at the assembly level. You should try implementing +these using the limited a86 instruction set. + +To do this, you should: @itemlist[ +@item{Study @tt{ast.rkt} and the new forms of expression (i.e. new AST nodes) + then update the comment at the top describing what the grammmar should look like.} + +@item{Study @tt{parse.rkt} and add support for parsing these +expressions. (See @secref[#:tag-prefixes '("a4-")]{parse} for guidance.)} - @item{Using the command line @tt{raco test test/} from the @tt{fraud-plus} - directory to test everything.} +@item{Update @tt{interp-prim.rkt} and @tt{interp.rkt} to correctly interpret these expressions.} - @item{Using the command line @tt{raco test } to only test @tt{}.} - ] +@item{Make examples of these primitives and potential translations of them +to assembly.} -Note that only a small number of tests are given to you, so you should -write additional test cases. +@item{Update @tt{compile.rkt} to correctly compile these expressions.} +@item{Check your implementation by running the tests in @tt{test/all.rkt}.} +] -@section[#:tag-prefix "a4-" #:style 'unnumbered]{Fraud+} +@section[#:tag-prefix "a4-" #:style 'unnumbered]{Implementing cond} -The Fraud+ language extends the Fraud language we studied in class with some -new features: +Implement the @racket[cond] expression form as described earlier. +To do this, you should: @itemlist[ +@item{Study @tt{ast.rkt} to add appropriate AST nodes.} +@item{Extend @tt{parse.rkt} to parse such expressions. (See @secref[#:tag-prefixes '("a4-")]{parse} for guidance.)} +@item{Update @tt{interp-prim.rkt} and @tt{interp.rkt} to correctly interpret @racket[cond] expressions.} -@item{The features added in @seclink["Assignment 3"]{Assignment 3}, namely: +@item{Make examples of @racket[cond]-expressions and potential translations of them +to assembly.} - @itemlist[ +@item{Update @tt{compile.rkt} to correctly compile @racket[cond] +expressions based on your examples.} - @item{@racket[abs], @racket[-], and @racket[not]} - @item{@racket[cond]} - @item{@racket[case]} +@item{Check your implementation by running the tests in @tt{test/all.rkt}.} +] - ]} +@section[#:tag-prefix "a4-" #:style 'unnumbered]{Implementing case} -@item{New primitives @racket[integer?] and @racket[boolean?].} +Implement the @racket[case] expression form as described earlier. +To do this, you should: -@item{An extended @racket[+] that accepts any number of arguments.} +@itemlist[ +@item{Study @tt{ast.rkt} to add appropriate AST nodes.} +@item{Extend @tt{parse.rkt} to parse such expressions. (See @secref[#:tag-prefixes '("a4-")]{parse} for guidance.)} +@item{Update @tt{interp-prim.rkt} and @tt{interp.rkt} to correctly interpret @racket[case] expressions.} -@item{An extended @racket[let] that can bind multiple variables at once.} +@item{Make examples of @racket[case]-expressions and potential translations of them +to assembly.} -@item{Back-referencing @racket[let*] that can bind multiple variables at once.} +@item{Update @tt{compile.rkt} to correctly compile @racket[case] expressions based on your examples.} +@item{Check your implementation by running the tests in @tt{test/all.rkt}.} ] +@section[#:tag-prefix "a4-" #:style 'unnumbered #:tag "parse"]{A Leg Up on Parsing} + +In the past, designing the AST type and structure definitions has +given students some grief. Getting stuck at this point means you +can't make any progress on the assignment and making a mistake at this +level can cause real trouble down the line for your compiler. + +For that reason, let us give you a strong hint for a potential design +of the ASTs and examples of how parsing could work. You are not +required to follow this design, but you certainly may. + +Here's a potential AST definition for the added primitives, +@racket[cond], and @racket[case]: + +@#reader scribble/comment-reader +(racketblock +;; type Expr = +;; ... +;; | (Cond [Listof CondClause] Expr) +;; | (Case Expr [Listof CaseClause] Expr) + +;; type CondClause = (Clause Expr Expr) +;; type CaseClause = (Clause [Listof Datum] Expr) + +;; type Datum = Integer | Boolean + +;; type Op = +;; ... +;; | 'abs | '- | 'not + +(struct Cond (cs e) #:prefab) +(struct Case (e cs el) #:prefab) +(struct Clause (p b) #:prefab) +) + +There are two new kinds of expression constructors: @racket[Cond] and +@racket[Case]. A @racket[Cond] AST node contains a list of +cond-clauses and expression, which the expression of the @racket[else] +clause. Each cond-clause is represented by a @racket[Clause] +structure containing two expressions: the left-hand-side of the +clause which is used to determine whether the right-hand-side is +evaluated, and the right-hand-side expression. + +The @racket[Case] AST node contains three things: an expression that +is the subject of the dispatch (i.e. the expression that is evaluated +to determine which clause should be taken), a list of case-clauses +(not to be confused with cond-clauses), and an @racket[else]-clause +expression. Each case-clause, like a cond-clause, consists of two +things. Hence we re-use the @racket[Clause] structure, but with +different types of elements. The first element is a list of +@emph{datums}, each being either an integer or a boolean. + +Now, we won't go so far as to @emph{give} you the code for +@racket[parse], but we can give you some examples: -@subsection[#:tag-prefix "a4-" #:style 'unnumbered]{From Dupe+ to Fraud+} +@itemlist[ -Implement the @racket[abs], unary @racket[-], and @racket[not] operations and -the @racket[cond] and @racket[case] forms from -@seclink["Assignment 3"]{Assignment 3} by modifying @tt{interp.rkt}, -@tt{interp-prim.rkt}, @tt{compile.rkt}, and @tt{compile-ops.rkt}. You can -start from your previous code, but you will need to update it to work for the -structures provided. What's essentially left for you to do is to make sure to -correctly signal an error (@racket['err]) when these constructs are -applied to the wrong type of argument. +@item{@racket[(abs 1)] parses as @racket[(Prim1 'abs (Lit 1))],} -While you're at it, implement the predicates @racket[integer?] and -@racket[boolean?] for checking the type of an argument, modeled by the -@racket[char?] predicate that was covered in the lectures. +@item{@racket[(not #t)] parses as @racket[(Prim1 'not (Lit #t))],} +@item{@racket[(cond [else 5])] parses as @racket[(Cond '() (Lit 5))],} -@subsection[#:tag-prefix "a4-" #:style 'unnumbered]{From Binary to Variadic Addition} +@item{@racket[(cond [(not #t) 3] [else 5])] parses as @racket[(Cond +(list (Clause (Prim1 'not (Lit #t)) (Lit 3))) (Lit 5))],} -In Fraud, we implemented a binary operation for addition. However, Racket -supports an arbitrary number of arguments for @racket[+]. Your job is to extend -the interpreter and compiler to behave similarly. +@item{@racket[(cond [(not #t) 3] [7 4] [else 5])] parses as +@racket[(Cond (list (Clause (Prim1 'not (Lit #t)) (Lit 3)) (Clause +(Lit 7) (Lit 4))) (Lit 5))],} +@item{@racket[(case (add1 3) [else 2])] parses as @racket[(Case (Prim1 +'add1 (Lit 3)) '() (Lit 2))].} -@subsection[#:tag-prefix "a4-" #:style 'unnumbered]{Generalizing Let} +@item{@racket[(case 4 [(4) 1] [else 2])] parses as @racket[(Case (Lit +4) (list (Clause (list 4) (Lit 1))) (Lit 2))],} -The Fraud language has a @tt{let} form that binds a single variable in the -scope of some expression. This is a restriction of the more general form of -@racket[let] that binds any number of expressions. So, for example, +@item{@racket[(case 4 [(4 5 6) 1] [else 2])] parses as @racket[(Case (Lit +4) (list (Clause (list 4 5 6) (Lit 1))) (Lit 2))], and} -@racketblock[ -(let ((x 1) (y 2) (z 3)) - _e) +@item{@racket[(case 4 [(4 5 6) 1] [(#t #f) 7] [else 2])] parses as @racket[(Case (Lit +4) (list (Clause (list 4 5 6) (Lit 1)) (Clause (list #t #f) (Lit 7))) (Lit 2))].} ] -simultaneously binds @racket[x], @racket[y], and @racket[z] in the scope of -@racket[_e]. -The syntax of a @racket[let] expression allows any number of binders to occur, -so @racket[(let () _e)] is valid syntax and is equivalent to @racket[_e]. - -The binding of each variable is only in-scope within the body, @bold{not} in -the right-hand sides of any of the @racket[let]. So, for example, -@racketblock[(let ((x 1) (y x)) 0)] is a syntax error because the occurrence of -@racket[x] is not bound. +@section[#:tag-prefix "a4-" #:style 'unnumbered]{Testing} +You can test your code in several ways: -@subsection[#:tag-prefix "a4-" #:style 'unnumbered]{Back-Referencing Let} +@itemlist[ -Similar to @racket[let], there is also @racket[let*] that can also bind any -number of expressions. The difference is that previous bindings are available -in the right-hand sides of subsequent bindings. For example, + @item{Using the command line @tt{raco test .} from + the directory containing the repository to test everything.} -@racketblock[ -(let* ((x 1) (y 2) (z (add1 y))) - _e) + @item{Using the command line @tt{raco test } to + test only @tt{}.} ] -binds @racket[x] to 1, @racket[y] to 2, and @racket[z] to 3 in -the scope of @racket[_e]. - -The syntax of a @racket[let*] expression allows any number of binders to occur, -so @racket[(let* () _e)] is valid syntax and is equivalent to @racket[_e]. +Note that only a small number of tests are given to you, so you should +write additional test cases. -Unlike @racket[let], @racketblock[(let* ((x 1) (y x)) 0)] is @emph{not} a -syntax error. However, bindings are only available forward, so -@racketblock[(let* ((x y) (y 1)) 0)] @emph{is} a syntax error. +@section[#:tag-prefix "a4-" #:style 'unnumbered]{Submitting} -HINT: Think about what a lazy compiler writer would do. +To submit, use @tt{make} from within the @tt{dupe-plus} directory to +create a zip file containing your work and submit it to Gradescope. diff --git a/www/assignments/5.scrbl b/www/assignments/5.scrbl index 8b044bb7..d754d1fd 100644 --- a/www/assignments/5.scrbl +++ b/www/assignments/5.scrbl @@ -1,127 +1,47 @@ #lang scribble/manual @(require "../defns.rkt") -@title[#:tag "Assignment 5" #:style 'unnumbered]{Assignment 5: Patterns} - -@(require (for-label (except-in racket ...))) -@(require "../notes/ev.rkt" - "../notes/utils.rkt") +@title[#:tag "Assignment 5" #:style 'unnumbered]{Assignment 5: When and unless} +@(require (for-label a86 (except-in racket ...))) @bold{Due: @assign-deadline[5]} -The goal of this assignment is to extend a compiler with new pattern -matching forms for matching lists, vectors, and predicates. - -You are given a file @tt{knock-plus.zip} on ELMS with a starter -compiler similar to the @seclink["Knock"]{Knock} language we studied -in class. You are tasked with: +Details of this assignment will be released later in the semester. -@itemlist[ +@;{ +The goal of this assignment is to extend the language developed in +@secref{Extort} with new forms of control flow expressions: +@racket[when]- and @racket[unless]-expressions. -@item{implementing the @tt{list} pattern,} +@section[#:tag-prefix "a5-" #:style 'unnumbered]{Extort+} -@item{implementing the @tt{vector} pattern, and} +The Extort+ language extends Extort in the follow ways: -@item{implementing the @tt{?} pattern.} +@itemlist[ +@item{adding @racket[when],} +@item{adding @racket[unless], and} +@item{bringing forward all the features of Dupe+.} ] -Unlike previous assignments, you do not need to bring forward your -past features to this language; there is no need to implement -@racket[cond], @racket[case], etc. -The following files have already been updated for you @bold{and should -not be changed by you}: +@section[#:tag-prefix "a5-" #:style 'unnumbered]{Testing} -@itemlist[ @item{@tt{ast.rkt}} - @item{@tt{parse.rkt}} - @item{@tt{interp.rkt}} - @item{@tt{interp-prim.rkt}} - @item{@tt{compile-op.rkt}} -] +You can test your code in several ways: -So you will only need to modify: @itemlist[ -@item{@tt{compile.rkt}} -] -to correctly implement the new features. These features are described below. - -As a convenience, two new n-ary primitives have been added (and fully -implemented): @racket[list] and @racket[vector]. The @racket[list] -primitive takes any number of arguments and produces a list containing -the arguments as elements; the @racket[vector] primitive does the -same, but constructs a vector. - -@ex[ -(list) -(list 1 2 3) -(list 1 #t #\c) -(vector) -(vector 1 2 3) -(vector 1 #t #\c)] - -These are not directly useful in implementing the patterns above, but -do make it easier to write examples and tests. - -@section[#:tag-prefix "a5-" #:style 'unnumbered #:tag "list"]{List patterns} - -The @racket[(list _p1 ... _pn)] pattern matches a list of elements. The -pattern matches a list with as many elements as there are patterns -@racket[_p1] through @racket[_pn] and each element must match the -respective pattern. - - -@ex[ -(match (list) - [(list) #t] - [_ #f]) -(match (list 1 2 3) - [(list x y z) x]) -(match (list (list 1) (list 2)) - [(list (list x) (list 2)) x]) -] - -@section[#:tag-prefix "a5-" #:style 'unnumbered #:tag "vector"]{Vector patterns} - -The @racket[(vector _p1 ... _pn)] pattern matches a vector of elements. The -pattern matches a vector with as many elements as there are patterns -@racket[_p1] through @racket[_pn] and each element must match the -respective pattern. + @item{Using the command line @tt{raco test .} from + the directory containing the repository to test everything.} -@ex[ -(match (vector) - [(vector) #t] - [_ #f]) -(match (vector 1 2 3) - [(vector x y z) x]) -(match (vector (vector 1) (vector 2)) - [(vector (vector x) (vector 2)) x]) -] - -@section[#:tag-prefix "a5-" #:style 'unnumbered #:tag "vector"]{Predicate patterns} - -The @racket[(? _f)] pattern matches any value for which the predicate -@racket[_f] returns a true value (any value other than @racket[#f]) -when applied to the value being matched. In Knock+, @racket[_f] must be -the name of a user defined function. - -@ex[ -(define (is-eight? x) (= x 8)) -(define (id x) x) - -(match 8 - [(? is-eight?) #t] - [_ #f]) -(match (vector 1 2 3) - [(and (? id) x) x]) -(match 16 - [(? is-eight?) #t] - [_ #f]) + @item{Using the command line @tt{raco test } to + test only @tt{}.} ] +Note that only a small number of tests are given to you, so you should +write additional test cases. @section[#:tag-prefix "a5-" #:style 'unnumbered]{Submitting} -Submit a zip file containing your work to Gradescope. Use @tt{make -submit.zip} from within the @tt{knock-plus} directory to create a zip -file with the proper structure. +To submit, use @tt{make} from within the code directory to create a +zip file containing your work and submit it to Gradescope. +} \ No newline at end of file diff --git a/www/assignments/6.scrbl b/www/assignments/6.scrbl index ab2293b4..6fa300c9 100644 --- a/www/assignments/6.scrbl +++ b/www/assignments/6.scrbl @@ -1,88 +1,9 @@ #lang scribble/manual -@title[#:tag "Assignment 6" #:style 'unnumbered]{Assignment 6: Squid Game} +@(require "../defns.rkt") +@title[#:tag "Assignment 6" #:style 'unnumbered]{Assignment 6: Binding many variables} -@(require (for-label (except-in racket ...))) -@(require "../notes/ev.rkt" - "../notes/utils.rkt") +@(require (for-label a86 (except-in racket ...))) -@bold{Due: Monday, July 3, 11:59PM EST} +@bold{Due: @assign-deadline[6]} -The goal of this assignment is to hone your testing skills. - -@section[#:tag-prefix "a6-" #:style 'unnumbered #:tag "game"]{The Game} - -The autograder for this assignment includes a collection of compilers -that implement @secref["Assignment 5"] and a reference interpreter. - -You must submit a list of programs that will be run on each compiler. -If a compiler produces a result that is inconsistent with the -reference interpreter, it is eliminated. Your goal is to construct -a set of test programs that eliminate the largest number of compilers. -The player that eliminates the largest number of compilers, wins. - -Note that the notion of correctness we're using is the same one we've -been using all semester: if the interpreter crashes when evaluating a -program, that program has unspecified behavior and therefore the -compiler cannot be incorrect for that program. On the other hand if -the interpreter produces an answer (either a value or the error -result), then the compiler is obligated to produce the same answer. - -When you submit, choose a name to display on the leaderboard. It does -not need to be your real name, but please keep it appropriate for this -setting. - -After submitting, click "Leaderboard" to see the latest standings. - -There are 59 compilers included. Your score will be 15 + 2.5 times -the number of compilers you are able to eliminate, with a maximum -score of 100. - -We reserve the right to update the reference interpreter and will -announce any changes on Discord. - -The following updates have been made since the release: - -@itemlist[ - -@item{The interpreter checks for integer overflow and crashes when -this happens, thereby making overflow behavior unspecified for the compilers.} - -@item{The interpreter crashes when interpreting unbound variables, - making unbound variable behavior unspecified.} - -] - -Submissions should be written using the following format: - -@codeblock|{ -#lang info -(define programs - (list - '[ (add1 1) ] - '[ (write-byte 97) ] - '[ (define (f x) (+ x x)) (f 5) ])) -}| - -If you'd like to include a program reads data from the standard input -port, you can add an enties which are two-element lists, where the first -element is a string that is used as the contents of the input port -and the second element is the program, for example: - -@codeblock|{ -#lang info -(define programs - (list - '[ (add1 1) ] - '[ (write-byte 97) ] - '[ "abc" [ (read-byte) ]] - '[ (define (f x) (+ x x)) (f 5) ])) -}| - - -You may add as many programs as you'd like to the file. - - -@section[#:tag-prefix "a6-" #:style 'unnumbered]{Submitting} - -You should submit on Gradescope. You should a single file named -@tt{info.rkt} that conforms to the format shown above. +Details of this assignment will be released later in the semester. diff --git a/www/assignments/7.scrbl b/www/assignments/7.scrbl index 47f65396..0eb18c47 100644 --- a/www/assignments/7.scrbl +++ b/www/assignments/7.scrbl @@ -1,233 +1,9 @@ #lang scribble/manual -@title[#:tag "Assignment 7" #:style 'unnumbered]{Assignment 7: Symbols, interning, and gensym} +@(require "../defns.rkt") +@title[#:tag "Assignment 7" #:style 'unnumbered]{Assignment 7: Binding sequentially, n-ary prmimitives} -@(require (for-label (except-in racket ...))) +@(require (for-label a86 (except-in racket ...))) -@(require "../notes/ev.rkt") - -@bold{Due: Tues, Nov 12, 11:59PM} - -@(define repo "https://classroom.github.com/a/5UM2CXXa") - -The goal of this assignment is to (1) implement symbols and the -@racket[eq?] primitive operation, (2) to implement symbol interning by -program transformation. - -Assignment repository: -@centered{@link[repo repo]} - -You are given a repository with a starter compiler similar to the -@seclink["Loot"]{Loot} language we studied in class. - -The given code also implements all the ``plus'' features we've -developed in past assignments. - -@section[#:tag-prefix "a7-" #:style 'unnumbered]{Symbols} - -Your first task is to implement symbols for the Loot+ language. -You've used symbols extensively throughout the semester, so their use -should be familiar to you. A symbol evaluates to itself: - -@ex[ -'foo -] - -Your first task is to implement a symbol data type. The given code -includes syntax checking for programs that may contain symbols and -run-time support for printing symbols. The compiler has been stubbed -for compiling symbols. You will need to implement -@racket[compile-symbol] in @tt{compile.rkt}. - -A symbol can be represented much like a string: as a continuous -sequence of characters in memory, along with a length field. The type -tag is different, since strings and symbols should be disjoint data -types. - -Once you implement @racket[compile-symbol], you should be able to -write programs that contain symbols. - -@section[#:tag-prefix "a7-" #:style 'unnumbered]{Pointer equality} - -Your next task is to implement the @racket[eq?] primitive operation, -which compares two values for pointer equality. Immediate values -(characters, integers, booleans, empty list, etc.) should be -pointer-equal to values that are ``the same.'' So for example: - -@ex[ -(eq? '() '()) -(eq? 5 5) -(eq? #\a #\a) -(eq? #\t #\t) -] - -On the other hand, values that are allocated in memory such as boxes, -pairs, procedures, etc., are only @racket[eq?] to each other if they -are allocated to the same location in memory. So for example, the -following could all produce @racket[#f]: - -@ex[ -(eq? (λ (x) x) (λ (x) x)) -(eq? (cons 1 2) (cons 1 2)) -(eq? (box 1) (box 1)) -] - -However these must be produce @racket[#t]: - -@ex[ -(let ((x (λ (x) x))) - (eq? x x)) -(let ((x (cons 1 2))) - (eq? x x)) -(let ((x (box 1))) - (eq? x x)) -] - -Applying @racket[eq?] to any two values from disjoint data types -should produce @racket[#f]: - -@ex[ -(eq? 0 #f) -(eq? #\a "a") -(eq? '() #t) -(eq? 'fred "fred") -] - -The given compiler is stubbed for the @racket[eq?] primitive. You -must implement @racket[compile-eq?]. - -@section[#:tag-prefix "a7-" #:style 'unnumbered]{Interning symbols} - -One thing you may notice at this point is that because symbols are -allocated in memory, the behavior @racket[eq?] with your compiler -differs from Racket's behavior. - -In Racket, two symbols which are written the same way in a given -program are @racket[eq?] to each other. - -@ex[ -(eq? 'x 'x) -] - -But your compiler will (probably) produce @racket[#f]. - -The problem is that Racket ``interns'' symbols, meaning that all -occurrences of a symbol are allocated to the same memory location. -(Languages like Java also do this with string literals.) - -Extend your compiler so that @racket[eq?] behaves correctly on -symbols. Note, you should @emph{not change the way @racket[eq?] -works}, rather you should change how symbols are handled by the -compiler. - -The most effective way to implement symbol interning is to apply a -program transformation to the given program to compile. This -transformation should replace multiple occurrences of the same symbol -with a variable that is bound to that symbol, and that symbol should -be allocated exactly once. - -So for example, - -@racketblock[ -(eq? 'fred 'fred) -] - -could be transformed to: - -@racket[ -(let ((x 'fred)) - (eq? x x)) -] - -The latter should result in @racket[#t] since the @racket['fred] -symbol is allocated exactly once. - -The compiler uses a @racket[intern-symbols] function, which does -nothing in the given code, but should be re-defined to perform the -symbol interning program transformation. Note: you probably want to -define a few helper functions to make @racket[intern-symbols] work. - -@section[#:tag-prefix "a7-" #:style 'unnumbered]{Generating symbols} - -Finally, implement the @racket[gensym] primitive, which generates a -symbol distinct from all other symbols. - -To keep things simple, you should implement the nullary version of -@racket[gensym], i.e. it should take zero arguments and produce a new -symbol. - -The following program should always produce @racket[#f]: - -@ex[ -(eq? (gensym) (gensym)) -] - -But the following should always produce @racket[#t]: - - -@ex[ -(let ((x (gensym))) - (eq? x x)) -] - -Note: Racket's @racket[gensym] will generate a new name for a symbol, -usually something like @racket['g123456], where each successive call -to @racket[gensym] will produce @racket['g123457], @racket['g123458], -@racket['g123459], etc. Yours does not have to do this (although it's -fine if it does). All that matters is that @racket[gensym] produces a -symbol that is not @racket[eq?] to any other symbol but itself. - -@section[#:tag-prefix "a7-" #:style 'unnumbered]{Bonus} - -Should you find yourself having completed the assignment with time to -spare, you could try implementing @racket[compile-tail-apply], which -compiles uses of @racket[apply] that appear in tail position. It is -currently defined to use the non-tail-call code generator, which means -@racket[apply] does not make a proper tail call. - -Keep in mind that this language, the subexpression of @racket[apply] -are arbitrary expressions: @racket[(apply _e0 _e1)] and that -@racket[_e0] may evaluate to a closure, i.e. a function with a saved -environment. Moreover, the function may have been defined to have -variable arity. All of these issues will conspire to make tail calls -with @racket[apply] tricky to get right. - -This isn't worth any credit, but you might learn something. - -@section[#:tag-prefix "a7-" #:style 'unnumbered]{Testing} - -You can test your code in several ways: - -@itemlist[ - - @item{Using the command line @tt{raco test .} from - the directory containing the repository to test everything.} - - @item{Using the command line @tt{raco test } to - test only @tt{}.} - - @item{Pushing to github. You can - see test reports at: - @centered{@link["https://travis-ci.com/cmsc430/"]{ - https://travis-ci.com/cmsc430/}} - - (You will need to be signed in in order see results for your private repo.)}] - -Note that only a small number of tests are given to you, so you should -write additional test cases. - -@bold{There is separate a repository for tests!} When you push your -code, Travis will automatically run your code against the tests. If -you would like to run the tests locally, clone the following -repository into the directory that contains your compiler and run -@tt{raco test .} to test everything: - -@centered{@tt{https://github.com/cmsc430/assign07-test.git}} - -This repository will evolve as the week goes on, but any time there's -a significant update it will be announced on Piazza. - -@section[#:tag-prefix "a7-" #:style 'unnumbered]{Submitting} - -Pushing your local repository to github ``submits'' your work. We -will grade the latest submission that occurs before the deadline. +@bold{Due: @assign-deadline[7]} +Details of this assignment will be released later in the semester. diff --git a/www/assignments/8.scrbl b/www/assignments/8.scrbl new file mode 100644 index 00000000..7cb3f28e --- /dev/null +++ b/www/assignments/8.scrbl @@ -0,0 +1,9 @@ +#lang scribble/manual +@(require "../defns.rkt") +@title[#:tag "Assignment 8" #:style 'unnumbered]{Assignment 8: List primitives} + +@(require (for-label a86 (except-in racket ...))) + +@bold{Due: @assign-deadline[8]} + +Details of this assignment will be released later in the semester. diff --git a/www/assignments/9.scrbl b/www/assignments/9.scrbl new file mode 100644 index 00000000..d9b501b0 --- /dev/null +++ b/www/assignments/9.scrbl @@ -0,0 +1,9 @@ +#lang scribble/manual +@(require "../defns.rkt") +@title[#:tag "Assignment 9" #:style 'unnumbered]{Assignment 9: Functions with default arguments} + +@(require (for-label a86 (except-in racket ...))) + +@bold{Due: @assign-deadline[9]} + +Details of this assignment will be released later in the semester. diff --git a/www/defns.rkt b/www/defns.rkt index 625ca7b0..51614b27 100644 --- a/www/defns.rkt +++ b/www/defns.rkt @@ -53,9 +53,14 @@ (define feedback "https://forms.gle/99yTz7HVfopCaDMz9") (define (assign-deadline i) - (list-ref '("Tuesday, September 10, 11:59PM" - "Thursday, September 12, 11:59PM" - "Thursday, October 3, 11:59PM" - "Thursday, October 31, 11:59PM" - "Tuesday, November 26, 11:59PM") + (list-ref '("Thursday, September 11, 11:59PM" + "Thursday, September 18, 11:59PM" + "Thursday, September 25, 11:59PM" + "Thursday, October 2, 11:59PM" + "Thursday, October 9, 11:59PM" + "Thursday, October 23, 11:59PM" + "Thursday, October 30, 11:59PM" + "Thursday, November 6, 11:59PM" + "Thursday, November 20, 11:59PM" + "Thursday, December 4, 11:59PM") (sub1 i))) diff --git a/www/main.scrbl b/www/main.scrbl index 1e2a793c..00ca86ce 100644 --- a/www/main.scrbl +++ b/www/main.scrbl @@ -40,8 +40,8 @@ implement several related languages. #;(list prof2 prof2-email) staff)] -@bold{Office hours:} @office-hour-location +@;{ @tabular[#:style 'boxed #:row-properties '(bottom-border ()) (list (list @bold{Time} @bold{Monday} @bold{Tuesday} @bold{Wednesday} @bold{Thursday} @bold{Friday}) @@ -55,6 +55,7 @@ implement several related languages. (list "3 PM" 'cont 'cont 'cont 'cont 'cont) (list "4 PM" 'cont 'cont 'cont 'cont 'cont) (list "5 PM" 'cont 'cont 'cont 'cont 'cont))] +} @bold{Communications:} @link[@elms-url]{ELMS}, @link[@piazza]{Piazza} @@ -71,6 +72,32 @@ class via ELMS. @bold{Feedback:} We welcome anonymous feedback on the course and its staff using this @link[feedback]{form}. +@bold{TA office hours:} @office-hour-location + +@itemlist[ + #:style 'compact + @item{Monday + @itemlist[ + @item{9:00–12:00 — Zhonqi} + @item{12:00–3:00 — Ben} + @item{3:30–6:30 — Kalyan}]} + @item{Tuesday + @itemlist[ + @item{11:00–1:00 — Kazi}]} + @item{Wednesday + @itemlist[ + @item{9:00–12:00 — Zhonqi} + @item{12:00–3:00 — Ben} + @item{3:30–6:30 — Kalyan}]} + @item{Thursday + @itemlist[ + @item{11:00–1:00 — Kazi}]} + @item{Friday + @itemlist[ + @item{11:00–1:00 — Kazi} + @item{1:00–4:00 — Kalyan}]} +] + @include-section{syllabus.scrbl} @include-section{texts.scrbl} @include-section{schedule.scrbl} diff --git a/www/midterms/1.scrbl b/www/midterms/1.scrbl index fce52762..048009bb 100644 --- a/www/midterms/1.scrbl +++ b/www/midterms/1.scrbl @@ -12,17 +12,17 @@ its due date. @section{Practice} -There is a practice midterm available on ELMS as @tt{m1-PRACTICE.zip}. -You may submit to the Practice Midterm 1 assignment on Gradescope to -get feedback on your solution. However during the real midterm, you -will not get this level of feedback from the autograder. @bold{Make -sure you do not submit your practice midterm solution for the real -midterm! We will not allow late submissions if you submit the wrong -work.} +There is a practice midterm available on ELMS. You may submit to the +Practice Midterm 1 assignment on Gradescope to get feedback on your +solution. However during the real midterm, you will not get this +level of feedback from the autograder. @bold{Make sure you do not +submit your practice midterm solution for the real midterm! We will +not allow late submissions if you submit the wrong work.} @section{Instructions} -The midterm will be released as a zip file @tt{m1.zip} on ELMS. +The midterm will be released as a zip file on ELMS (see the +description of Midterm 1 there for the link). There are several parts to this midterm. Each part has its own directory with a README and supplementary files. Read the README in diff --git a/www/midterms/2.scrbl b/www/midterms/2.scrbl index 596bbe9a..130a7253 100644 --- a/www/midterms/2.scrbl +++ b/www/midterms/2.scrbl @@ -14,15 +14,17 @@ its due date. @section{Practice} -There is a practice midterm from Summer 2023 available on ELMS as -@tt{m2-summer-2023.zip}. You may submit to the Practice Midterm 1 - -Summer 2023 assignment on Gradescope to get feedback on your solution. -However during the real midterm, you will not get this level of -feedback from the autograder. +There is a practice midterm available on ELMS. You may submit to the +Practice Midterm 1 assignment on Gradescope to get feedback on your +solution. However during the real midterm, you will not get this +level of feedback from the autograder. @bold{Make sure you do not +submit your practice midterm solution for the real midterm! We will +not allow late submissions if you submit the wrong work.} @section{Instructions} -The midterm will be released as a zip file @tt{m1.zip} on ELMS. +The midterm will be released as a zip file on ELMS (see the +description of Midterm 1 there for the link). There are several parts to this midterm. Each part has its own directory with a README and supplementary files. Read the README in diff --git a/www/notes/1/what-is-a-compiler.scrbl b/www/notes/1/what-is-a-compiler.scrbl index 494e8624..ec6aed1d 100644 --- a/www/notes/1/what-is-a-compiler.scrbl +++ b/www/notes/1/what-is-a-compiler.scrbl @@ -6,6 +6,290 @@ @title[#:tag "Intro"]{What @emph{is} a Compiler?} + +@section{Introduction} + +Welcome to CMSC 430, An Introduction to Compilers. Ostensibly this +class is about @emph{compilers}. It's right there in the name. And +yes, this course is very much about compilers, but really this course +is about the design and implementation of programming languages. +Compilers just happen to be one possible implementation strategy. + +@section{What is a Programming Language?} + +It is worth taking a moment and thinking about what exactly @emph{is} +a programming language? At this point in your life, you've probably +used several different programming languages (we make you, after all). +You've written programs in C, Java, Rust, OCaml, and likely several +others. You've maybe used these things without every really +reflecting on what they are. So do that for a moment. + +What did you come up with? + +I start every semester by asking students to answer this question and +I get a variety of responses. Here are a few: + +A programming language: +@itemlist[ +@item{"is an interface with the computer"} +@item{"is a human readable way to create computer instructions"} +@item{"is a combination of syntax and semantics for creating behavior"} +@item{"a formal language (a set of strings) that can describe any Turing machine"} +@item{"is a toolbox that prioritizes certain methods of solving problems"} +@item{"is a mechanism for communicating computational ideas with other people"} +] + +Let's try to synthesize something coherent out of these different +perspectives. + +Formally speaking, we can characterize a programming language by the +set of programs in that langauge. For example, we might say the C +programming language is, at its core, the set of all possible C +programs that could be written. This is the way one might +characterize a language in a theory of computation course. If we +wanted to rigorously define this set, we could provide a formal +recognizer such as a finite automaton or even a Turing-machine that +given a string, determines membership in the set of programs. This +aspect of a language we call its @bold{syntax}: it concerns the rules +governing the formation of phrases in the language. While important, +syntax is not the end-all of a programming language, in fact it really +just the start of where the rubber meets the road. For example, when +writing a C program we mostly don't care about whether the thing we +are writing is or is not in the set of C programs --- it being a C +program is a prerequisite to thing we want to do, which is to +@emph{compute}. We write programs so that we run them. Sure, you +might make a syntax error, forget to put a semicolon here or mispell +an identifier there, but the thing you really care about comes after +those issues have been resolved. So another aspect of what a +programming language "is" concerns the @emph{meaning} of the sentences +in that language. Every language has meaning. In the context of +programming language, the meaning is a computation. This aspect of a +langauge we its @bold{semantics}: it concerns the computational +content of programs in the language. + +There are many ways we might define the meaning of a program, and +therby define the semantics of the language. We could give examples, +e.g. "@tt{3 + 4} computes @tt{7}." We could informally describe what +expressions of the langauge should compute, e.g. "`n + m` computes the +sum of @tt{n} and @tt{m}". We could make more rigorous formal +definitions, appealing to mathematical notation and concepts. We +could write a program (perhaps in a different language) that +@emph{interprets} expressions and computes their meaning. Typically +some combination of all of these approaches are used. + +But all of this @emph{defines} what a programming language is, which +is not the same as @emph{realizing} that definition. For that we need +to implement the language, meaning we need to construct a universal +program that given any element of the language, it carries out the +computation defined by the meaning of that program. + +Of course this is all a very formal view of what a programming +language is. Arguably more important are other more human-centered +perspectives. A programming language is a human-made language for +conveying computational ideas, both to machines and to other people. +Programming language design can enable (or prohibit) effective +expression of those ideas. + +Programming languages have evolved over time from being very concerned +with low-level, machine-oriented details to higher-level abstractions, +freeing the programmer from those low-level details and often making +it easier to write better programs. Language design can eliminate +whole classes of mistakes that might be possible in other languages. + +@subsection{What is a Compiler?} + +All that and no mention of compilers. OK, so if we have a sense of +what a programming language is now, what is a compiler? You've +probably used several before but maybe never reflected too deeply on +what exactly they were. Take a moment and think about what being a +compiler means to you. + +Compilers are a particular implementation strategy for realizing a +programming language. At their heart, compilers employ a fundamental +technique of computer science for solving problems, they perform a +@emph{reduction}. In order to compute the meaning of a program, a +compiler translates that program into a program in a different +language and then uses the realization of that language to compute the +answer to the original program. This is in contrast to an +@emph{interpreter} which computes the meaning of a program directly. + +There's something lurking here that may be causing you some +discomfort. Let's say you have a compiler that translates programs in +language A into programs in language B. How does that help you run +your language A programs? Well you just run your language B program +and what it computes will be the thing that the language A program +computes (assuming the compiler performed a correct translation). But +you may be asking yourself, how do you compute the language B +program!? It's not clear you're any closer to the thing you want. +Well, maybe you have a compiler from B to C. And D to E. Uh oh. OK, +well maybe at some point you have an interpreter. Let's say you have +an interpreter for langauge E. After chaining all these compilers +together, you have a thing you can run directly by feeding it the E +interpreter and compute the result which is actually the result of the +original A program. Pfew. + +But wait... the program that interprets E programs, it has to be +written in a language. Let's say it was written in F. How do we run F +programs? Maybe we compile to G, H, I, K. Maybe there's an intpreter +for K, written in L... Oh dear. We still have this seemingly endless +regress because in order to compute programs in one langauge we have +to know how to compute programs in another language: that other +language is either the target language of some compiler or the +implementation language of some interpreter. Where does this all +bottom out? + +One concrete answer to this question is: it bottoms out at the machine +language of your computer and its CPU. Eventually we hit a point +where there's a compiler that translates from some language into +machine code, or there's an interpreter that's written in machine code +(or we can compile the interpreter into machine code, etc.). The +machine code is just another programming language, but it's a language +that has an intrepreter implemented in hardware. It runs programs by +physically carrying out the computation of the program, thus providing +a foundation for building up our computational house of cards. +Ultimately, @emph{all} of the computations your computer is doing are +carried out at the level of machine code intepreted by the CPU. So +even though you may be running Java, Python, or Rust code, what's +@emph{actually} happening is the CPU is running machine code. But +here's the magical thing: it doesn't really matter. Once you know +that you can run, say, Java code on your computer, you can view Java +as the foundation if you want. If you wanted to develop a new +programming language, you could write a compiler that targets Java or +you could write an intepreter in Java. Your new language could run +anywhere that Java can run (which is essentially anywhere). One of +the great contributions of programming languages is that it allows us +to build the world we want to live in and leave the old one behind. +This is in fact @emph{why} compilers were invented in the first place. + + + + +An important observation to make: programming languages exist +independently of their implementations. Java is not `javac`. C is +not `gcc` (or `clang`). Haskell is not `ghc`. Compilers are a +particular implementation strategy for realizing a programming +language. Sometimes languages get conflated with their +implementations. And sometimes we go further and conflate properties +of the implementation with the language itself. This leads to people +saying things like "Python is an interpreted language; C is a compiled +language." Python is not an interpreted language, rather it's most +well-known implementation is an interpreter. Compilers for Python +exist as do interpreters for C. + +@subsection{What will be do in this course?} + +We are going to study how to implementat a programming language by +making our own programming language. In particular, we are going to +build a compiler for our language. + +The language we are going to build is a modern, high-level +general-purpose programming language. It will have features like: + + +@itemlist[ +@item{built-in datatypes including integers, boolean, characters, strings, vectors, lists, and user-defined structures} +@item{first-class functions} +@item{pattern matching} +@item{memory safety} +@item{proper tail calls} +@item{automatic memory management} +@item{file I/O} +] + +Our compiler is going to target x86-64 machine code, meaning that +programs written in our high-level language can be run on any system +capable of running x86-64 code, either natively in hardware, or +through software simulation (e.g. Rosette on Apple Silicon Macs). +x86-64 is one of the most ubiquitous instruction set architectures in +use today. It's also quite large, complicated, and old, with some +design constraints that date back over 4 decades as it has evolved +from its 16-bit precursor first released in 1978. The choice of +x86-64 as the target of our compiler is largely an arbitrary one. We +could easily have picked another instruction set architecture, like +ARM64, or targeted an intermediate representation like LLVM IR. In +choosing x86-64 we get a messy, "real", and bare-bones, low-level view +of computation and moving to any of these other similar settings would +be fairly easy to do. Alternatively, we could have targeted a +higher-level language such as C or Rust, or even JavaScript or OCaml. +The more high-level the language, the easier it will be to serve as a +target for the compiler, especially when the target language's +semantics are close the source language's. Doing so could certainly +be interesting, but it would illuminate compilation only down to the +level of abstractions of the target language. By going all the way +down to machine code, we are able to understand the implementation of +our high-level language features completely down to the actions of the +CPU. There will be no more mystery. + +In order to side-step getting bogged down in making design choices, we +are going to implement (a subset of) an existing high-level langauge +called Racket. Racket is a member of the Lisp family of langauges, a +descendent of the Scheme programming language, and a close cousin of +OCaml, which you've previously studied. This is largely an arbitrary +choice. We could have picked any programming language to build a +compiler for. In choosing Racket, we get a mature, well-designed +language with a minimal, easy to parse semantics and a well-understood +and clear semantics that has features representative of many modern +high-level programming languages such as OCaml, Java, Python, +JavaScript, Haskell, etc. If you can write a compiler for Racket, you +are well-equiped to write a compiler for any of those or similar +languages, or even to design your own new language. + +We will us an incremental approach to study and build compilers, +meaning we will start by specifying and implementing a very small +subset of Racket. Once specified, implemented, and tested, we will +enlarge the set of programs in the langauge, typically by adding some +new langauge feature. Doing so necessitates revisiting the +specification and implementation to handle this extended set of +programs. And then we rinse and repeat, growing the language from +something trivial into a fairly sophisticated language in which it's +possible to write interesting programs. Each iteration of the +langauge will be a strict superset of the previous one, meaning that +all programs in the previous language will continue to be programs in +the next language. The first iteration has nothing more that integer +literals and the final iteration contains a substantial subset of +Racket. + +In this course, we pay particular attention to the concept of a +specification, which defines what it means for our compiler to be +correct. We will use testing extensively to gain some evidence that +our compiler is not incorrect. Our primary goal will be in writing a +correct, maintainable compiler for a full-featured language. We will +not concern ourselves too much with concerns of efficiency or +optimization. First things first: we have to build something that +works. After that, we can think about making it work faster. + +We make extensive use of writing @emph{interpreters} as our form of +language specification. Interpreters, when written as clear, concise +code in a high-level language, provide a precise distillation of the +semantics of a programming language. They can also serve as a +reference implementation, used to validate other implementations of +the language. + +@section{Beyond Compilers} + +In my opinion programming langauges are fascinating objects, worthy of +study in their own right. But there are reasons beyond compilers and +programming langauges for taking this class. + +For any kind of craft it's worth reflecting on the tools one uses. +Doing so will make you better. It may open you up to use new tools or +existing tools in new ways. It may even allow you to design and +fabricate your own tools. As computer scientists, our primary +expression of computational ideas comes in the form of writing +programs. It's worth reflecting on the language in which we express +those ideas. + +But part of what's valuable about a course like this really has little +to do with programming languages or compilers. Compilers are +sophisticated artifacts. They have complex invariants and subtle +specifications. Getting them right is hard. Learning how to program +to a specification, to think about invariants, to write clear, concise +and well-tested code is valuable beyond just writing compilers. + + + + + A function that maps an @emph{input} string to an @emph{output} string. diff --git a/www/schedule.scrbl b/www/schedule.scrbl index b28ba8ad..010eea92 100644 --- a/www/schedule.scrbl +++ b/www/schedule.scrbl @@ -4,8 +4,6 @@ @title[#:style 'unnumbered]{Schedule} -@;(TuTh 9:30-10:45, IRB 0318) - @(define (wk d) (nonbreaking (bold d))) @; for unreleased assignments, switch to seclink when ready to release @@ -25,8 +23,9 @@ (list @wk{9/2} #;"" - @secref["Intro"] - @elem{@secref["OCaml to Racket"]}) + @elem{No class} + @itemlist[@item{@secref["Intro"]} + @item{@secref["OCaml to Racket"]}]) (list @wk{9/9} diff --git a/www/syllabus.scrbl b/www/syllabus.scrbl index 40df3823..d5598ed6 100644 --- a/www/syllabus.scrbl +++ b/www/syllabus.scrbl @@ -20,7 +20,9 @@ @title[#:style 'unnumbered]{Syllabus} -@bold{Introduction to Compilers, CMSC 430} +@local-table-of-contents[] + +@section{Introduction to Compilers (@courseno)} @bold{Term:} @string-titlecase[semester], @year @@ -31,38 +33,151 @@ @bold{Office Hours:} By appointment. Send email or ELMS message to set up. -@bold{Prerequisite:} a grade of C or better in CMSC330; and permission -of department; or CMSC graduate student. +@bold{Credits:} 3 + +@bold{Course Dates:} From @start-date to @final-date + +@bold{Lectures:} +@lecture-schedule1, @classroom1 -@bold{Credits:} 3. @;{@bold{Lecture dates:} @lecture-dates} -@bold{Lectures:} -@lecture-schedule1, @classroom1 (@prof1-initials) -@bold{Course Description:} @courseno is an introduction to compilers. -Its major goal is to arm students with the ability to design, -implement, and extend a programming language. Throughout the course, -students will design and implement several related high-level -programming languages, building compilers that target the x86 CPU -architecture. +@section{Course Description} + +@;{ +Have you wondered what @emph{really} happens when you compile and run +a program? Have you ever had dreams of making your own programming +language, but thought that seems hard or even... mystical? Like maybe +it involves a kind of wizard-level abilitiy that you don't have access +to, or perhaps even know about? Maybe the thought of making your own +programming language has never even occurred to you until reading this +just now. They have always seemed to be things given to you from on +high, right? Sure, you can choose from a bunch of existing options: +C, Rust, Java, PostScript, etc. But where did these choices come +from? How were they designed and built? And what if none of the +existing choices are quite what you want and you decide to strike out +on your own? + +As you read this, your mind is maybe spawning a background process +asking itself, what would even be involved in making my own +programming language? What even is a programming language, if you +really think about it? What are they made of? Are they made of the +same stuff as other kinds of programs? Are they like the pedestrian +kind I learned to write in school:,the ones that shuffle lists or find +the shortest path between two vertices in a graph? If they are like +other programs, what are they written in? Some sort of programming +language ostensibly, but where did @emph{that} lanuage come from? +What's @emph{it} made of? Where's the bottom? Is there a bottom? I +thought I was just going to read a syllabus and now I've lost my +computational footing. +} + +@courseno is an introduction to compilers. Its major goal is to arm +students with the ability to design, implement, and extend a +programming language. Throughout the course, students will design and +implement several related high-level programming languages, building +compilers that target the x86 CPU architecture. The course assumes familiarity with a functional programming such as OCaml from CMSC 330, and, to a lesser extent, imperative programming in C and Assembly as covered in CMSC 216. +@bold{Prerequisite:} a grade of C or better in CMSC330; and permission +of department; or CMSC graduate student. -@bold{Course Structure:} The course will consist of -in-person lectures, which will be recorded and available on ELMS -immediately after each lecture. There are two midterms, a final -project, which counts as the final assessment for the class, several -assignments, and several quizes and surveys. Midterms are take-home -exams and completed online over a @|midterm-hours|-hour period. -@bold{Contents:} +@section{Learning Outcomes} -@local-table-of-contents[] +After successfully completing this course, you will be able to: + +@itemlist[ + +@item{Design a programming language.} + +@item{Implement a high-level programming language via interpretation in a high-level programming language.} + +@item{Implement a high-level programming language via compilation into a low-level programming language.} + +@item{Define and test a compiler's correctness.} + +@item{Analyze language design choices.} + +@item{Evaluate language implementation choices.} + +@item{Evolve a large software artifact with complex invariants and specifications.} + +] + +@section{Topics} + +The following list of lecture topics will vary according to the pace +of the course: + +@itemlist[ + @item{Overview of compilation} + @;item{Operational semantics} + @item{Interpreters} + @item{Intermediate representations and bytecode} + @item{Code generation} + @item{Run-time systems} + @item{Garbage collection} + @item{Type systems, type soundness, type inference} + @item{Register allocation and optimization} + @item{Language design} + @item{Advanced topics in compilation}] + +@;section{Required Resources} + + + + +@section{Course Structure} + +The course will consist of in-person lectures, which will be recorded +and available on ELMS immediately after each lecture. There are two +midterms, a final project, which counts as the final assessment for +the class, several assignments, and several quizes and surveys. +Midterms are take-home exams and completed online over a +@|midterm-hours|-hour period. + + +@section{Tips for Success in this Course} + +@itemlist[#:style 'numbered + +@item{@bold{Participate.} Engage deeply, ask questions, and talk about +the course content with your classmates. You can learn a great deal +from discussing ideas and perspectives with your peers and +professor. Participation can also help you articulate your thoughts +and develop critical thinking skills.} + +@item{@bold{Stay current.} This is a deeply cumulative class. If you +fall behind it will be challenging to get back on track. The moment +you disconnect with the material, seek help (from TAs, classmates, +prof, etc.).} + +@item{@bold{Go slow; you'll get there faster.} You will be asked to +implement several parts of a compiler. The code will not be large, +but it will require thought and precision. You will be served well to +make sure you know the material and have a clear idea of what you +intend to do before you start doing it. The other path is pain.} + +@item{@bold{Start early.} Thinking requires time and can't be rushed, +but if you spend the time thinking, it will ultimately be a huge time +saver. This means you have to start thinking early, which means +starting early.} + +@item{@bold{Submit often.} Things happen. Life goes sideways. Make +sure you submit your work as you go, as often as you can.} + +@item{@bold{Bring your curiosity.} The study of programming languages +is rooted in logic, philosophy, computation, algorithms, optimization, +and design. If you approach it with an open and curious mind, it will +be rewarding.} + +] @section{Policies and Resources for Undergraduate Courses} @@ -79,10 +194,10 @@ like: @item{Copyright and intellectual property} ] -Please visit -@link["https://www.ugst.umd.edu/courserelatedpolicies.html"]{https://www.ugst.umd.edu/courserelatedpolicies.html} -for the Office of Undergraduate Studies' full list of campus-wide -policies and follow up with the instructor if you have questions. +Please visit the +@link["https://www.ugst.umd.edu/courserelatedpolicies.html"]{Office of +Undergraduate Studies' full list of campus-wide policies} and follow +up with the course staff if you have questions. @section{Course Guidelines} @@ -105,7 +220,6 @@ will do their best to address and refer to all students accordingly, and we ask you to do the same for all of your fellow Terps. @bold{Communication with Instructor:} - Email: If you need to reach out and communicate with @prof1, please email at @|prof1-email|. Please DO NOT email questions that are easily found in the syllabus or on ELMS (i.e. When @@ -120,7 +234,6 @@ not miss any messages. You are responsible for checking your email and Canvas/ELMS inbox with regular frequency. @bold{Communication with Peers:} - With a diversity of perspectives and experience, we may find ourselves in disagreement and/or debate with one another. As such, it is important that we agree to conduct ourselves in a professional manner @@ -143,11 +256,9 @@ feel threatened, dismissed, or silenced at any point during the semester and/or if your engagement in discussion has been in some way hindered by the learning environment. -@;{HERE} - -@section{Office Hours} +@;{section{Office Hours}} -Office hours will be held online and in-person. Details TBD. +@;{TA office hours will be held online and in-person. Details TBD.} @;{Please make sure that you fill out @link["https://docs.google.com/spreadsheets/d/1sDCpekBHIGjVSuGDsabPb74wZ5nHA_sTLvIPOzTUQ4k/edit?usp=sharing"]{ @@ -172,24 +283,6 @@ discussion/questions/help regarding the material of the course, make sure that you keep that channel free from noise so that other students and course staff can easily see what issues are being brought up.} -@section{Topics} - -The following list of lecture topics will vary according to the pace -of the course: - -@itemlist[ - @item{Overview of compilation} - @;item{Operational semantics} - @item{Interpreters} - @item{Intermediate representations and bytecode} - @item{Code generation} - @item{Run-time systems} - @item{Garbage collection} - @item{Type systems, type soundness, type inference} - @item{Register allocation and optimization} - @item{Language design} - @item{Advanced topics in compilation}] - @section{Grades} All assessment scores will be posted on the course @@ -215,36 +308,34 @@ percentages: (list "Midterms (2)" "25%") (list "Final project" "15%")] -Final letter grades will be assigned based on the following cutoff -table: +Final letter grades are assigned following the University of +Maryland's "97 A+ Grading Scheme": + +@tabular[#:style 'boxed @;#:sep @;"|" @;@hspace[1] + (list (list "A+" "[100,100]" "B+" "(92,87]" "C+" "(80,77]" "D+" "(70,67]" " " " ") + (list "A" "(100,97]" "B" "(87,84]" "C" "(77,74]" "D" "(67,64]" "F" "(60,0]") + (list "A-" "(97,92]" "B-" "(84,80]" "C-" "(74,70]" "D-" "(64,60]" " " " "))] -@tabular[#:style 'boxed #:sep @hspace[1] - (list (list "A+" "97%" "C+" "77%" "D+" "67%" " " " ") - (list "A" "94%" "C" "74%" "D" "64%" "F" "<60%") - (list "A-" "90%" "C-" "70%" "D-" "60%" " " " "))] +This table uses interval notation, so "(@math{x},@math{y}]" means any +number less than @math{x} and greater than or equal to @math{y}. @section[#:tag "syllabus-videos"]{Videos} Lectures will be recorded and posted to ELMS shortly after every class. There are also prepared videos available covering the -material. - -These videos will be made available as the course +material. These videos will be made available as the course progresses. If there is ever any issue with accessing these videos, let the instructor know as soon as possible. @section[#:tag "syllabus-assignments"]{Assignments} -There will be several programming @secref{Assignments}, often with a full week -given for completion and submission (e.g. if it assigned on a Tuesday it will -be due the following Tuesday at 11:59pm EST unless otherwise noted). - -Assignments will be submitted through Gradescope. - -On the weeks were there are no programming assignments, there will be assigned -reading. +There will be several programming @secref{Assignments}, often with a +full week given for completion and submission (e.g. if it assigned on +a Tuesday it will be due the following Tuesday at 11:59pm EST unless +otherwise noted). Assignments will be submitted through +@link[gradescope]{Gradescope}. @section[#:tag "syllabus-quiz"]{Quizzes & surveys} @@ -255,8 +346,8 @@ right to reject survey responses that are not considered thoughtful. @section[#:tag "syllabus-midterms"]{Midterms} There will be two @secref{Midterms}, which will be @bold{take-home} -exams. Exams will be distributed at least @|midterm-hours|-hours before the due -date of the midterm. +exams. Exams will be distributed at least @|midterm-hours| hours +before the due date of the midterm. @itemlist[ @item{Midterm 1: @bold{@m1-date}} @@ -278,32 +369,33 @@ before the due date. @section{Computing Resources} Programming projects can be developed on your own system and subitted -via Gradescope, which will provide virtual machines suitably -configured for running your code. All project submissions @bold{must} -work correctly on the Gradescope VMs, and your projects will be graded -solely based on their results on those machines. Because language and -library versions may vary with the installation, in unfortunate -circumstances a program might work perfectly on your system but not -work at all on the VMs. Thus we strongly recommend that as you develop -any project, you should run it @bold{several days early} on Gradescope -to have time to address any compatibility problems. +via @link[gradescope]{Gradescope}, which will provide virtual machines +suitably configured for running your code. All project submissions +@bold{must} work correctly on the Gradescope VMs, and your projects +will be graded solely based on their results on those +machines. Because language and library versions may vary with the +installation, in unfortunate circumstances a program might work +perfectly on your system but not work at all on the VMs. Thus we +strongly recommend that as you develop any project, you should run it +@bold{several days early} on Gradescope to have time to address any +compatibility problems. @section{Outside-of-class communication with course staff} -Course staff will interact with students outside of class in primarily two -ways: office hours, and electronically via e-mail. The use of Piazza and/or -other classroom forums is allowed, and discussion amongst the students is -encouraged, as long as the discuss is @italic{about the concepts} and not -@italic{the solutions}. The majority of communication should be via office -hours. +Course staff will interact with students outside of class in primarily +two ways: office hours, and electronically via e-mail. The use of +@link[piazza]{Piazza} and/or other classroom forums is allowed, and +discussion amongst the students is encouraged, as long as the discuss +is @italic{about the concepts} and not @italic{the solutions}. @;{The +majority of communication should be via office hours.} Personalized assistance, e.g., with assignments or exam preparation, will be provided during office hours. Office hours for the instructional staff will be posted on the course web page. Additional assistance will provided via discussion on -@link[@piazza]{Piazza}. You may use this forum to ask general +@link[piazza]{Piazza}. You may use this forum to ask general questions of interest to the class as a whole, e.g., administrative issues or problem set clarification questions. The course staff will monitor it on a daily basis, but do not expect immediate answers to @@ -414,8 +506,6 @@ All arrangements for exam accommodations as a result of disability three business days prior to the exam date; later requests (including retroactive ones) will be refused. - - @section{Academic Integrity} The Campus Senate has adopted a policy asking students to include the @@ -469,6 +559,18 @@ section of the program. } ] +@bold{AI tool disclosure:} If a student chooses to use an AI tool to +assist in any course work (e.g. assignments, programs, projects, +reports, etc), they must disclose this information to the +instructor. This disclosure should include the name of the AI tool and +explain how it was used. + +Failure to adhere to this policy may result in a zero on the +particular course work where the AI tool is used. In addition the +university honor code is applicable here: violation of the honor code +and appropriate action will be enforced. + + @bold{Violations of the Code of Academic Integrity may include, but are not limited to:} @@ -524,10 +626,38 @@ will not be extended upon in a later assignment. @section{Course Evaluations} If you have a suggestion for improving this class, don't hesitate to -tell the instructor or TAs during the semester. At the end of the -semester, please don't forget to provide your feedback using the -campus-wide @link["https://www.courseevalum.umd.edu/"]{CourseEvalUM} -system. Your comments will help make this class better. +tell the instructor or TAs during the semester. You may submit +feedback anonymously using this @link[feedback]{form}. If you are +uncomfortable contacting the instructor or if issues are not addressed +to your satisfaction, you may use the +@link["https://www.cs.umd.edu/classconcern"]{CS Class Concern Form}. + +At the end of the semester, please provide your feedback using the +campus-wide @link["https://www.courseevalum.umd.edu/"]{Student +Feedback on Course Experiences} system. Your comments will help make +this class better. + +@section{Mandatory Reporting of Disclosures of Inappropriate Behavior} + +Instructors and teaching assistants are designated as Responsible +University Employees by the University and are required to to promptly +notify the Title IX Coordinator when they become aware of any type of +sexual misconduct. @bold{They are not confidential resources.} + + +If you wish to speak with someone confidentially, please contact one +of UMD's confidential resources, such as CARE to Stop Violence +(located on the Ground Floor of the Health Center) at 301-741-3442 or +the Counseling Center (located at the Shoemaker Building) at +301-314-7651. + + +You may also seek assistance or supportive measures from UMD's Title +IX Coordinator, Angela Nastase, by calling 301-405-1142, or emailing +titleIXcoordinator@"@"umd.edu. To view further information on the above, +please visit the Office of Civil Rights and Sexual Misconduct’s +website at @link["https://ocrsm.umd.edu/"]{ocrsm.umd.edu}. + @section{Right to Change Information} @@ -539,13 +669,14 @@ information on this syllabus or in other course materials. Such changes will be announced and prominently displayed at the top of the syllabus. -@section{Course Materials} +@section{Acknowledgments} Portions of the course materials are based on material developed by Ranjit Jhala and Joe Gibbs Politz. -We gratefully acknowledge the work of past CMSC 430 TAs William Chung, -Pierce Darragh, Justin Frank, Vyas Gupta, Sankha Narayan Guria, Tasnim -Kabir, John Kastner, Yiyun Liu, Dhruv Maniktala, Christopher Maxey, -Deena Postol, Ivan Quiles-Rodriguez, Benjamin Quiring, Temur -Saidkhodjaev, Matvey Stepanov, Alex Taber. +We gratefully acknowledge the work of past CMSC 430 TAs Kaylan +Bhetwal, William Chung, Pierce Darragh, Justin Frank, Vyas Gupta, +Sankha Narayan Guria, Tasnim Kabir, John Kastner, Yiyun Liu, Dhruv +Maniktala, Christopher Maxey, Caspar Popova, Deena Postol, Ivan +Quiles-Rodriguez, Benjamin Quiring, Temur Saidkhodjaev, Emma Shroyer, +Matvey Stepanov, Alex Taber, Kazi Tasnim Zinat. From 8e62320a9ea5de63f4093942b265eb4b87996bad Mon Sep 17 00:00:00 2001 From: David Van Horn Date: Tue, 2 Sep 2025 10:01:59 -0400 Subject: [PATCH 17/17] Update README link. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index d40dc32e..24e888ea 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ University of Maryland, College Park. The current instance of this course is: -* http://www.cs.umd.edu/class/fall2024/cmsc430/ +* http://www.cs.umd.edu/class/fall2025/cmsc430/ Copyright © David Van Horn and José Manuel Calderón Trilla and Leonidas Lampropoulos