% -*- mode: Noweb; noweb-code-mode: caml-mode -*- % $Id: translate.nw,v 1.21 2005-03-08 03:36:22 govereau Exp $ % --------------------------------------------------------------------------- \section{Translation to Intermediate Representation} \label{sec:translate} % --------------------------------------------------------------------------- The [[Translate]] module transforms an abstract syntax tree (Section~\ref{sec:ast}) into the compiler's intermediate representation---a ``tree language'' expression (Section~\ref{sec:tree}). The [[Translate]] module is called by the [[Semantics]] module (Section~\ref{sec:semantics}) to transform expressions and statements after they have been type-checked. A translation function is defined for each of the different types of abstract syntax fragments that need to be translated. <>= type exp = Tree.exp type label = Tree.label val nil : exp val int_literal : int -> exp val str_literal : string -> exp val simple_var : Frame.frame -> Frame.access -> exp val field_var : exp -> int -> bool -> exp val subscript_var : exp -> exp -> bool -> int -> exp val assign : exp -> exp -> exp val call : Frame.frame -> label -> string option -> Frame.frame -> exp list -> label option -> bool -> exp val arithmetic : Ast.oper -> exp -> exp -> exp val compare_int : Ast.oper -> exp -> exp -> exp val compare_str : Ast.oper -> exp -> exp -> exp val ifexp : exp -> exp -> exp -> bool -> exp val loop : exp -> exp -> label -> exp val break : label -> exp val new_record : (exp * bool) list -> exp val new_array : exp -> exp -> bool -> exp val sequence : exp list -> exp val func : Frame.frame -> exp -> bool -> exp val try_block : exp -> label -> (int *exp) list -> exp val raise_exn : int -> exp val spawn : label -> exp @ % --------------------------------------------------------------------------- \subsection{Translation Implementation} % --------------------------------------------------------------------------- <>= module E = Error module A = Ast module S = Symbol module T = Tree module F = Frame type exp = T.exp type label = T.label let ws = Sys.word_size / 8 <> <> <> <> <> <> <> <> <> <> <> <> @ \paragraph{Utilities} A few small utility functions are defined to make the [[Translate]] module more readable. The [[seq]] function converts a list of statements into a tree of [[SEQ]] nodes. <>= let rec seq = function [] -> E.internal "nil passed to seq" | x :: [] -> x | x :: rest -> T.SEQ(x, seq rest) @ The [[eseq]] function produces an expression from and expression and a list of statements. The list of statement is executed first, then the expression is evaluated. <>= let eseq exp stmts = T.ESEQ (seq stmts, exp) @ During translation we often need to allocate new temporary variables. The [[temp]] function produces a fresh temporary variable. <>= let temp ptr = T.TEMP (T.new_temp(), ptr) @ The Tiger compiler uses static links to manage nested functions. Each generated function has a frame pointer that points to the top of the function's stack frame. The first word in each stack frame is the frame pointer of the calling function. Thus, we can fetch our caller's frame pointer by dereferencing the first location in our own frame pointer. The [[getfp]] function generates an expression that fetches the frame pointer of one frame relative to another. <>= let getfp frm parent_frm = let diff = F.level frm - F.level parent_frm in let rec deref = function 0 -> F.fp frm | x -> T.MEM (deref (x-1), true) in assert (diff >= 0); deref diff @ Finally, we define a few functions for constructing tree expressions and statements. For the binary operators, we simplify expressions when possible. <>= let alloc_ptr = T.NAME (S.symbol "alloc_ptr") let space_end = T.MEM (T.NAME (S.symbol "space_end"), true) let goto lbl = T.JUMP (T.NAME lbl) let ( =>) e v = T.MOVE (e, v) let simplify tig_op op x y = match (x,y) with (T.CONST x, T.CONST y) -> T.CONST (op x y) | _ -> T.BINOP (tig_op, x, y) let (<+>) = simplify T.PLUS ( + ) let (<->) = simplify T.MINUS ( - ) let (<*>) = simplify T.MUL ( * ) @ \paragraph{Literals} In Tiger, there are three types of literal value: nil, integers, and strings. For strings literals, we must output the string as read-only data and refer to it by a label. Most of the implementation of string literals is handled by the [[Frame]] module (Section~\ref{sec:frame}). This module gets a label from the [[Frame]] module and generates a reference to the label using a [[NAME]] expression. <>= let nil = T.CONST 0 let int_literal i = T.CONST i let str_literal s = T.NAME (F.alloc_string s) @ \paragraph{Function Calls} Each Tiger function will eventually be translated into a [[C--]] function. In addition, there are functions defined in the standard library and in the runtime system. We call these two types of functions ``internal'' and ``external'', respectively. All internal functions take as a first parameter the frame pointer of the enclosing function. To generate a call to an internal function, we must compute the correct frame pointer to pass as the first argument. If we are calling a nested function, then we pass the current frame pointer. Otherwise, we compute the frame pointer of the enclosing function using [[getfp]]. <>= let call myfrm lbl cc frm args k ptr = let args = if F.level frm == 0 then args else let pfp = if (F.level frm) > (F.level myfrm) then F.fp frm else T.MEM(getfp myfrm frm, true) in pfp :: args in @ If we are calling a foreign function (a function not written in [[C--]]), then we need to save and restore the global register variables. In Tiger, we only have one global register variable---the allocation pointer. <>= match cc with None -> T.CALL((T.NAME lbl), args, cc, k, ptr) | Some "gc" -> T.CALL((T.NAME lbl), args, cc, k, ptr) | Some _ -> let tmp1 = temp ptr and tmp2 = temp ptr in eseq tmp2 [ T.MOVE(alloc_ptr, tmp1); T.MOVE(T.CALL((T.NAME lbl), args, cc, k, ptr), tmp2); T.MOVE(tmp1, alloc_ptr) ] @ For internal use, we define a few abbreviations for calling both [[C]] and [[C--]] functions from the standard library. <>= let ext_call cc name args = call F.base_frame (S.symbol name) cc F.base_frame args None false let ext_c_call = ext_call (Some "C") let ext_gc_call = ext_call None (* (Some "gc") *) let ext_cmm_call = ext_call None @ \paragraph{Variables} There are three types of variable access expressions: simple variables, record fields, and array indices. Simple variables can live on the stack, or in [[C--]] local variables. We generate a memory access for stack variables, and for local variables we just output their name. <>= let simple_var frm = function F.Temp lbl -> T.NAME lbl | F.Stack(var_frm, offset, ptr) -> T.MEM(getfp frm var_frm <+> T.CONST(offset * ws), ptr) @ Record fields always live in a memory location. For record field variables, we generate a memory access at an offset from the beginning of the record. <>= let field_var ex i ptr = T.MEM(ex <+> T.CONST(i * ws), ptr) @ Array subscripts also always live in a memory location. However, before accessing an array element, we first do an array bounds check. There is a small optimization when the array index expression is a constant, in that we can compute the offset into memory at compile time. <>= let subscript_var e1 e2 ptr pos = let check = ext_c_call "bounds_check" [e1;e2;T.CONST(fst (Error.line_number pos))] and offset = (e2 <+> T.CONST 1) <*> T.CONST ws in eseq (T.MEM(e1 <+> offset, ptr)) [T.EXP check] @ Finally, any variable expression can be updated with a [[MOVE]] node. <>= let assign v e = eseq nil [e => v] @ \paragraph{Operator Expressions} There are three types of operator expressions: arithmetic, integer comparison, and string comparison. Arithmetic operations are translated into [[BINOP]] nodes, and integer comparisons are translated into [[RELOP]] nodes. <>= let arithmetic op ex1 ex2 = let oper = match op with A.PlusOp -> T.PLUS | A.MinusOp -> T.MINUS | A.TimesOp -> T.MUL | A.DivideOp -> T.DIV | _ -> E.internal "relop used as binop" in T.BINOP(oper, ex1, ex2) let compare_int op ex1 ex2 = let oper = match op with A.EqOp -> T.EQ | A.NeqOp -> T.NE | A.LtOp -> T.LT | A.LeOp -> T.LE | A.GtOp -> T.GT | A.GeOp -> T.GE | _ -> E.internal "binop used as relop" in T.RELOP(oper, ex1, ex2) @ String comparisons are translated into integer comparisons by first calling the standard library function [[compare_str]], which returns -1, 0, or 1. The result of this function is then compared to zero using an integer comparison. <>= let compare_str op ex1 ex2 = let result = ext_c_call "compare_str" [ex1;ex2] in compare_int op result (T.CONST 0) @ \paragraph{Conditionals} To translate an if expression, we create a new temporary variable which becomes the result. The true and false expressions are converted into move statements that store values into the temporary. <>= let ifexp test thn els ptr = let tmp = temp ptr and tru = T.new_label "ifTrue" and fls = T.new_label "ifFalse" and end' = T.new_label "ifEnd" in eseq tmp [ T.CJUMP(test, tru, fls); T.LABEL tru; thn => tmp; goto end'; T.LABEL fls; els => tmp; T.LABEL end'] @ \paragraph{Loops} For each loop statement, the [[Semantics]] module will call the [[loop]] function with translated test and body expressions, and a break label. The break label is produced by the [[Semantics]] module and stored in the environment. <>= let loop test body lend = let lbeg = T.new_label "loop_start" and lbdy = T.new_label "loop_body" in eseq nil [ T.LABEL lbeg; T.CJUMP(test, lbdy, lend); T.LABEL lbdy; T.EXP body; goto lbeg; T.LABEL lend ] @ A [[break]] statement is translated as a jump to the current break label. The current break label is fetched from the environment and provided by the [[Semantics]] module. <>= let break lbl = eseq nil [goto lbl] @ \paragraph{Records and Arrays} When new records and arrays are created, memory must be allocated on the heap for their storage. Memory is allocated by calling the [[alloc]] function defined below. This function generates code that checks for heap exhaustion, calls the garbage collector, and updates the allocation pointer. <>= let alloc size = let size = (size <+> T.CONST 1) <*> T.CONST ws in let test = T.RELOP(T.GT, alloc_ptr <+> size, space_end) and tmp = temp true and tru = T.new_label "alc_gc" and fls = T.new_label "alc" in eseq tmp [ T.CJUMP(test, tru, fls); T.LABEL tru; T.EXP (ext_gc_call "call_gc" []); T.LABEL fls; size => T.MEM(alloc_ptr, true); (alloc_ptr <+> T.CONST ws) => tmp; (alloc_ptr <+> size) => alloc_ptr (* ; T.EXP (ext_gc_call "call_gc" []) *) ] @ To initialize a new record, we must copy the results of the initializing expressions into the the record fields. The private function [[initialize]] creates a list of assignments to the record fields. The result of calling [[initialize]] is placed in sequence after an allocation statement, and a new temporary variable is returned that contains a pointer to the new record. <>= let new_record init = let tmp = temp true and size = T.CONST (List.length init) in let rec initialize offset = function [] -> [] | (ex,ptr)::rest -> (ex => field_var tmp offset ptr) :: initialize (offset+1) rest in eseq tmp ((alloc size => tmp) :: initialize 0 init) @ To initialize a new array, we must copy the result of the initializing expression into each array element. First, we allocate memory for the array and store the array size into the first element of the array. Then, we construct a small loop that initializes each element of the array. <>= let new_array sizeEx initEx ptr = let ary = temp true and i = temp false and lbeg = T.new_label "init_start" and lend = T.new_label "init_end" in eseq ary [ alloc (sizeEx <+> T.CONST 1) => ary; sizeEx => T.MEM(ary, false); T.CONST 1 => i; T.LABEL lbeg; initEx => T.MEM (ary <+> (i <*> T.CONST ws), ptr); i <+> T.CONST 1 => i; T.CJUMP(T.RELOP(T.LE, i, sizeEx <+> T.CONST 1), lbeg, lend); T.LABEL lend ] @ \paragraph{Sequences} To translate a sequence of $n$ expressions, we convert the first $n-1$ expressions into statements and execute them. Then, we evaluate the last expression and the result is used as the result of the whole sequence. <>= let rec sequence = function [] -> nil | e :: [] -> e | e :: es -> T.ESEQ((T.EXP e), (sequence es)) @ \paragraph{Functions} Once a code fragment has been produced, the [[Semantics]] module will call [[func]] to transform it into a function. To produce a function, we allocate a temporary to store the result of evaluating the body. Then, we add a return node that returns the temporary. <>= let func frm ex ptr = let tmp = temp ptr in eseq nil [ex => tmp; T.RET tmp] @ \paragraph{Exceptions} When a try block is encountered, first the expression within the block and all of the handlers will be translated. Then, the expression inside the try block and the expressions corresponding to each handler will be passed to the [[try_block]] function. When translating a try block, we place all of the handlers under a new continuation that takes a single parameter---the exception identifier. The private [[cont]] function constructs a continuation with the label [[l]] that accepts one parameter in the temporary [[t]]. <>= let try_block exp exn_lbl hs = let cont l = function T.TEMP(t,_) -> T.CONT(l, [t]) | _ -> E.internal "non temp in continuation node" in @ Each handler must test the continuation parameter, and then either handle the exception or pass control down to the next handler. The [[handler]] function constructs this sequence of statements for a single handler. <>= let try_endl = T.new_label "try_end" and tmp = temp false in let handler (uid,ex) = let hl = T.new_label "handle" and sl = T.new_label "skip" in [ T.CJUMP(T.RELOP(T.EQ, tmp, T.CONST uid), hl, sl); T.LABEL hl; T.EXP ex; goto try_endl; T.LABEL sl ] in @ The complete try block consists of the expression within the try block followed by a new continuation and the handlers for the block. The scope of the try block is marked by [[TRY]] and [[TRYEND]]. At the beginning and end of the try block we must set and reset the current exception handler. However, if we are generating code for the unwinding implementation, then we can omit these calls to [[set_handler]]. <>= let old = temp false in let set_handler = ext_cmm_call "set_handler" [T.NAME exn_lbl] => old and reset_handler = T.EXP (ext_cmm_call "set_handler" [old]) and not_unwind stm = if Option.use_unwind() then T.EXP nil else stm in eseq tmp [ T.TRY exn_lbl; not_unwind set_handler; exp => tmp; not_unwind reset_handler; T.TRYEND exn_lbl; goto try_endl; cont exn_lbl tmp; not_unwind reset_handler; seq (List.flatten (List.map handler hs)); T.LABEL try_endl ] @ When raising an exception, we will either call the [[raise]] function or the [[unwind]] function depending on how the compiler was called. <>= let raise_exn uid = let fn = if Option.use_unwind() then "unwind" else "raise" in ext_cmm_call fn [T.CONST uid] @ \paragraph{Threads} All of the thread magic is in the runtime system. Here, we just make a call to the runtime system function ``tig_spawn'' passing it the label of a function. <>= let spawn lbl = ext_cmm_call "spawn" [T.NAME lbl] @