Machine-Independent Support for
Garbage Collection, Debugging, Exception Handling, and Concurrency
(Draft)
For a compiler writer, generating good machine code for a variety of platforms is hard work. One might try to reuse a retargetable code generator from another compiler, but code generators are complex and difficult to use, and they limit one's choice of implementation language. One might try to use C as a portable assembly language, but C limits the compiler writer's flexibility and the performance of the resulting code. The wide use of C, despite these drawbacks, argues for a portable assembly language.C-- is a new language designed expressly as a portable assembly language. C-- eliminates some of the performance problems associated with C, but in its originally-proposed form it does not provide adequate support for garbage collection, exception handling, and debugging. The problem is that neither the high-level compiler nor the C-- compiler has all of the information needed to support these run-time features. This paper proposes a three-part solution: new language constructs for C--, run-time support for C--, and restrictions on optimization of C-- programs.
The new C-- language constructs enable a high-level compiler to associate initialized data with spans of C-- source ranges and to specify ``alternate continuations'' for calls to procedures that might raise exceptions. The run-time support is an interface (specified in C) that the garbage collector, exception mechanism, and debugger can use to get access to both high-level and low-level information, provided that the C-- program is suspended at a safe point. High- and low-level information is coordinated by means of the C-- spans and a common numbering for variables. Finally, the C-- optimizer operates under the constraints that the debugger or garbage collector can change the values of local variables while execution is suspended, and that a procedure call with alternate continuations can return to more than one location.
This three-part solution also provides adequate support for concurrency, so the paper illustrates the problem and the proposed solution with examples from garbage collection, exception handling, debugging, and threads. The paper also includes a model of the dataflow behavior of C-- calls.
A number of open problems remain. The most serious have to do with apparent redundancies among spans and safe points, and with the interaction of debugging support with optimization.
Suppose you are writing a compiler for a high-level language. How are you to generate high-quality machine code? You could do it yourself, or you could try to take advantage of the work of others by using an off-the-shelf code generator. Curiously, despite the huge amount of research in this area, only three retargetable, optimizing code generators are freely available: VPO [cite benitez:portable], ML-RISC [cite george:mlrisc], and the gcc back end [cite stallman:gcc]. Each of these impressive systems has a rich, complex, and ill-documented interface. Of course, these interfaces are quite different from one another, so once you start to use one, you will be unable to switch easily to another. Furthermore, they are language-specific. To use ML-RISC you must write your front end in ML, to use the gcc back end you must write it in C, and so on.
All of this is rather unsatisfactory. It would be much better to have one portable assembly language that could be generated by a front end and implemented by any of the available code generators. So pressing is this need that it has become common to use C as a portable assembly language [cite {]atkinson:experiences,henderson:compiling,pettersson:simulating, peyton-jones:spineless,tarditi:assembly,bartlett:scheme. Unfortunately, C was never intended for this purpose --- it is a programming language, not an assembly language. C locks the implementation into a particular calling convention, makes it impossible to compute targets of jumps, and provides no support for garbage collection, exceptions, or debugging, except such debugging support as may be provided by a particular C compiler. An earlier paper discusses C's shortcomings in more detail [cite peyton-jones:c--].
The obvious way forward is to design a language specifically as a compiler target language. Such a language should serve as the interface between a compiler for a high-level language (the front end) and a retargetable code generator (the back end). What makes the problem interesting is that we want to retain the high performance one would expect from a code generator crafted specifically for the front end. The design of a portable assembly language should enable the front end to make choices that maximize performance, while enabling the back end to apply the best known code-improvement technology.
Separating the front and back ends complicates run-time support. In general, a front end will be designed in conjunction with a run-time system, which helps implement such high-level features as garbage collection, exception handling, debugging, and concurrency. To avoid repetition we refer to such features as high-level run-time services. The difficulty is that some of the information required by these run-time services, such as the locations of variables, is known only to the back end. The back end must therefore have its own run-time system, which supports the front-end run-time system.
The primary contribution of this paper is a specification that makes it possible to keep these two run-time systems separate, so that different front ends (and their run-time systems) can be used with different back ends (and their run-time systems), provided both front and back end conform to the specification. The specification has two parts. The front end communicates with the back end by emitting programs in a portable assembly language called C--. The front-end run-time system communicates with the back-end run-time system through a run-time interface (specified in C) called the C-- run-time interface.
This paper builds on the existing design of C-- [cite peyton-jones:c--]. The new material, which enables C-- to support garbage collection, exception handling, and debugging, is in three parts: new language constructs for C--, run-time support for C--, and restrictions on optimization of C-- programs. The paper contains many examples of C-- code and C code. These examples have not been compiled, so they are surely riddled with errors and inconsistencies.
CAVEAT: This draft is being circulated for comment while the design is still incomplete. Some paragraphs may bear marginal notes discussing unresolved issues. Particularly sticky issues may be marked as ``caveats,'' as is this paragraph. Caveats may not be intelligible unless you've read the whole paper.
[Sum-and-product functions written in C--] Three functions that compute the sum \sum_i=1^n i and product \prod_i=1^n i, written in C--. [*]
/* Ordinary recursion */ export sp1; sp1( word32 n ) { word32 s, p; if n == 1 { return( 1, 1 ); } else { s, p = sp1( n-1 ); return( s+n, p*n ); } } /* Tail recursion */ export sp2; sp2( word32 n ) { jump sp2_help( n, 1, 1 ); } sp2_help( word32 n, word32 s, word32 p ) { if n==1 { return( s, p ); } else { jump sp2_help( n-1, s+n, p*n ) } }/* Loops */ export sp3; sp3( word32 n ) { word32 s, p; s = 1; p = 1; loop: if n==1 { return( s, p ); } else { s = s+n; p = p*n; n = n-1; goto loop; } }
An earlier paper describes the basic design of C-- [cite peyton-jones:c--]. We sketch the design here, with emphasis on those aspects that are relevant for understanding run-time support. Figure [<-] gives examples of some C-- procedures that give a flavour of the language. C-- has the following features:
The word types are used for characters, bit vectors, integers, and addresses (pointers). On each machine, one of the word types (typically word32 or word64) is designated the ``native word size'' of the machine. One (probably the same one) is also designated the ``native pointer type.'' Exported and imported names must have the native pointer type.
word32[foo] = word32[foo] + 1;loads a word32 from the location whose address is in foo, adds one to it, and stores it at the same location. The mnemonic for this syntax is to think of word32 as a C-like array representing all of memory, and word32[foo] as a particular element of that array. The semantics of the address is not C-like, however; the expression in brackets is the byte address of the item.
data {
foo: word32 0;
word32 27;
baz: float64[10] 0.0;
}
The keyword data specifies that the data is to be put in the data section.
The data block contains two 32-bit words and an array of ten 64-bit floats.
foo is the address of the first word, and baz is the address of the array.
foo and baz are immutable addresses; they cannot be
assigned to.
data "debug" {
...
}
This syntax declares the block of data to belong to the section named "debug".
Code is by default placed in the section "text", and a data directive with no
explicit section name defaults to the section "data".
Procedures can be enclosed in code "mytext" { ... } to place them
in a named section "mytext".
C-- assigns no semantics to the names of data sections, except that, when linking object files, the linker is required to concatenate sections with the same name. Particular implementations may, however, assign machine-dependent semantics. For example, a MIPS implementation might assume that data in sections named sdata is addressable from the global pointer and that data in sections named rodata is read-only.
const GC = 2;
global {
word32 hp;
}
the implementation attempts to put variable hp in a register,
but if no register is available, it puts hp in memory.
C-- programs use and assign to hp without knowing whether it is
in a register or in memory.
Unlike a name declared by data, a name declared by global is not
an address.
In fact,
there is no such thing as ``the address of a global,''and memory stores
to unknown addresses cannot affect the value of a global.
This guarantee permits
a global to be held in a register, and even if it has to be held in memory,
the optimizer does not need to reload it after a store to
an unknown memory address.
All separately compiled modules must have identical global declarations,
or horribly strange things will happen.
global declarations may name specific registers, for example:
global {
word32 hp "%eax";
word32 hplim "%ebx";
}
Register names are implementation-dependent.
C-- suports procedures that are both more and less general than C procedures (e.g., multiple results and full tail calls but no varargs). Specifically:
r = f( g(x) ); /* illegal */because result returned by g(x) cannot be an argument to f. Instead one must write:
y = g(x); r = f(y);This restriction makes explicit the order of evaluation, the location of each call site, and the names and types of temporaries used to hold the results of calls.
f (word32 x) {
word32 y;
stack { p : word32;
q : word32[40];
}
/* Here, p and q are the addresses of the relevant chunks of
data. Their type is the native pointer type of the machine. */
}
stack is rather like data; it has the same syntax between the
braces, but it allocates on the stack.
As with data, the names are
bound to the addresses of the relevant locations, and they are immutable.
C-- makes no provision for dynamically-sized stack allocation (yet).
word32[ptr] = sp1; /* Store procedure address */ ... r,s = (word32[ptr+4])( 4 ); /* Call some stored procedure */
The calling convention for C-- procedures is entirely a matter for the C-- implementation---we call it the standard C-- calling convention. In particular, C-- need not use the C calling convention.
The calling convention includes not only a specification of how parameters are passed during calls, but also a specification of how values are returned. The standard calling convention places no restrictions on the number of arguments passed to a function or the number of results returned from a function. The only restrictions are that the number and types of actual parameters must match those in the procedure declaration, and similarly, that the number and types of values returned must match those expected at the call site. These restrictions enable efficient calling sequences with no dynamic checks. (A C-- implementation is not required to check that C-- programs meet these restrictions.)
We note the following additional points:
foreign convention-name r = f( parameters );foreign calls and returns may be to other C-- procedures or to truly foreign code, provided, of course, that a foreign return returns to a foreign call.
foreign convention-name return( results );
foreign convention-name f( parameters ) { body };
Depending on the linker support available, names exported from C-- compilation units may be visible to foreign code. For example, the following C-- procedure can be called from C:
foreign "C" inc( word32 x ) {
foreign "C" return( x+1 );
}
The calling convention used at each call site must match the calling convention
given in the procedure declaration. Foreign calling conventions may be more
restrictive than C--'s standard convention; for example, the C calling
convention only permits one return value.
A C-- implementation is not required to implement any calling conventions other
than the standard one.
Section [->] discusses this and related issues to
be considered by designers and implementors of C-- calling conventions.
When a front end and back end are written together, as part of a single compiler, they can cooperate intimately to support high-level run-time services, such as garbage collection, exception handling, debugging, and concurrency [We view debuggers as part of the front-end run-time system, even though on many machines debuggers run in separate address spaces, using operating-system services to manipulate the program being debugged.] . The primary contribution of this paper is to propose a means by which the front and back end might cooperate at arm's length, and still get good performance. Our guiding principle is this:
C-- should make it possible to implement high-level run-time services, but it should not actually implement any of them. Rather, it should provide just enough ``hooks'' to allow the front-end run-time system to implement them.This paper identifies the hooks.
Keeping the front end and back end at arm's length requires complex interfaces at both compile time and run time. It might appear more palatable to incorporate garbage collection, exception handling, and debugging into the C-- language, as (say) the Java Virtual Machine does. But doing so would guarantee that C-- would never be used. Different source languages require different support, different object layouts, and different exception semantics---especially when performance matters. No one back end could satisfy all customers.
Why are the interfaces complex? High-level run-time services need to inspect and modify the state of a suspended program. A garbage collector must find, and perhaps modify, all live pointers. An exception handler must navigate, and perhaps unwind, the call stack. A debugger must allow the user to inspect, and perhaps modify, the values of variables. All of these tasks require information from both front and back ends. The rest of this section elaborates.
The difficulty is that neither the front end nor the back end has all the knowledge needed to find roots at run time. Only the front end knows which source-language variables, and therefore which C-- variables, represent pointers into the heap. Only the back end, which maps variables to registers and stack slots, knows where those variables are located at run time. Even the back end can't always identify exact locations; variables mapped to callee-saves registers may be saved arbitrarily far away in the call stack, at locations not identifiable until run time.
An exception mechanism also needs to identify the locus of control, because in some high-level languages, that locus determines which handler should receive the exception. When it identifies a handler, the exception mechanism unwinds the stack and changes the locus of control to refer to the handler.
At run time, loci of control are represented by values of the program counter (e.g., return addresses), but at the source level, loci of control are associated with statements in a high-level language or in C--. Only the front end knows how to associate high-level source locations or exception-handler scopes with C-- statements. Only the back end knows how to associate C-- statements with the program counter.
Succinctly stated, each of these operations must combine two kinds of information:
[*] We assume that executable programs are divided into three parts, each of which may be found in object files, libraries, or a combination.
Within this setting, our proposal has three main elements:
In what follows, we use the term ``variable'' to mean either a parameter of the procedure or a locally-declared variable.
High-level run-time services often inspect and modify the state of a running program. This observation necessarily imposes some constraints on the C-- optimizer, because such state changes might violate invariants that it could otherwise maintain. In this section we identify the constraints, and describe C-- language mechanisms that minimise their impact.
In the presence of garbage collection and debugging, calls have an unusual property: live local variables are potentially modified by any call. For example, a compacting garbage collector might modify pointers saved across a call. Consider
f( word32 a ) {
word32[a+8] = 10; /* store 10 in 32-bit word pointed to by a+8 */
g( a );
word32[a+8] = 0; /* store 0 in 32-bit word pointed to by a+8 */
return;
}
If g invokes
the garbage collector, the collector might modify a during the call to g, so
the code generator must recompute a+8 after the call---it would be unsafe to
save the common subexpression a+8
across the call.
The same constraint supports a debugger that might
change the values of local variables.
Calls may also modify C-- values that are declared to be allocated
on the stack.
A compiler writer might reasonably object to the performance penalty imposed by this constraint; the back end pays for compacting garbage collection whether the front end needs it or not. To eliminate this penalty, the front end can mark C-- parameters and variables as invariant across calls, using the keyword invariant, thus:
f( invariant word32 a ) {
invariant word16 b;
word32 c;
...
g( a, b, c ); /* "a" and "b" are not modified by the call,
but "c" might be */
...
}
The invariant keyword places an obligation on the front-end
run-time system, not on the caller
of f.
The keyword constitutes a promise to the C-- compiler
that the value of an invariant variable will not change
``unexpectedly'' across a call.
The run-time
system and debugger may not change the values of invariant variables.
In the absence of a debugger, a front end can safely mark non-pointer variables as invariant across calls, and front ends using mostly-copying collectors [cite bartlett:compacting, bartlett:mostly] or non-compacting collectors [cite boehm:garbage] can safely mark all variables as invariant across calls.
Some high-level run-time services, such as an exception dispatcher or debugger, may need to change the locus of control. They can't simply change the program counter, because two different program points may hold their live variables in different locations, and they may have different ideas about the layout of the activation record and the contents of callee-saves registers. They may even have different ideas about which variables are alive and which are dead. Unconstrained, dynamic changes in locus of control also make life hard for the optimizer: if the program counter can change arbitrarily, there is no such thing as dead code, and a variable live anywhere is live everywhere.
We address these problems in two ways. To support synchronous exception handling, we tell the optimizer exactly how the program counter might be changed dynamically, as described below. To support asynchronous exception handling and debugging, we give the optimizer substantial freedom, requiring only that it record decisions it makes. The debugger can make a reasonable effort at run time, as described in Section [->].
When exceptions are synchronous, it is reasonable to assume that raising an exception requires a call.
Typically, handling an exception involves first unwinding the stack to the caller of the current procedure, or its caller, etc., and then directing control to an ``exception handler.'' We support such exceptions by making the following extension to the semantics of a procedure call: a call might return to more than one location, and the C-- programmer specifies explicitly all the locations to which a call could return. In effect, the call has many possible continuations instead of just one. An example syntax [We are still arguing over syntax!] is:
r = f( x )
also returns v to { /* alternate continuation 1 */ }
also returns w1, w2 to { /* alternate continuation 2 */ }
also aborts;
/* code for the normal case */
This statement establishes four possible continuations for the call
f(x). The ``main'' continuation simply follows the call, as usual;
if this continuation is taken the result is assigned to r.
The two ``alternate'' continuations are in the blocks that follow
the returns ... to clauses.
If alternate continuation 2 were
taken, the net effect would be just as if the call above were replaced by:
w1, w2 = f( x ) ; /* alternate continuation 2 */That is, the results would be assigned to w1 and w2, and the appropriate code would be executed. To avoid confusion, C-- requires that alternate continuations end with an explicit control transfer (goto, jump, or return).
Unlike also returns, also aborts does not identify an explicit continuation. Instead, it indicates that the exception dispatcher can terminate this procedure by going to a handler in some calling procedure. In effect, also aborts introduces a flow edge from the call site to the procedure exit node, so that the optimizer does not eliminate assignments before the call that might otherwise be considered dead [cite hennessy:exception]. Any call site in a procedure P that could raise an exception not handled in P should be annotated with also aborts.
How are these multiple continuations used? A return statement causes execution to transfer to the ``main'' continuation in the caller. However, an exception dispatcher may abort a running C-- procedure and cause it to return to an alternate continuation, by using the C-- run-time procedures described in Section [->]. Section [->] presents an example exception dispatcher that uses this method.
So far as the optimizer is concerned, then, such a call site is simply a fork in the control graph. The flow edges from the call site to each continuation (or to the exit node) guarantee that the variable locations, liveness, etc., are consistent no matter which continuation gets control after the call. Alternate continuations are assumed to be rarely used, and back ends are encouraged to exploit this assumption, e.g., when allocating registers or deciding on the placement of spills.
Programmers and compiler writers may expect there to be virtually no execution-time penalty for associating alternate continuations with a call site. There may, however, be a significant cost associated with the C-- run-time procedure that finds the alternative continuation, though it is unlikely to be so large as to dominate the cost of exception dispatching. Section [->] discusses implementation techniques.
A future revision of C-- may specify a means by which alternate return addresses could be communicated to f, the locations into which f could place returned values, and a mechanism by which generated C-- code could arrange a return to an alternate continuation.
CAVEAT: The also returns mechanism limits potential exception implementations. In particular, it presumes that the exception dispatcher restores callee-saves registers, and it requires that exception dispatch be implemented in the run-time system (because there is no C-- value that represents the also returns continuation). These limitations preclude ``ML-style'' exceptions, which are very efficient to raise but which have a modest overhead in the normal case. Possible support for such exceptions is presented in Section [->].
[*] Front ends can use a data directive to record arbitrarily complex static information in the form of initialized data. For example, to support garbage collection, a front end could build a data block that says which of the parameters and local variables of a procedure are heap pointers. Only the front-end run-time system depends on the format of such data blocks. We need C-- language support only to identify the interesting data blocks at run time. For example, when tracing a particular procedure activation, the garbage collector needs the block describing the local variables of that procedure. For debugging or exception handling, the front end may record information that applies only to parts of procedures, or to individual C-- statements. We address all these applications by providing a mechanism for mapping suspended procedure activations to associated data blocks.
As an example, suppose we have a function f(x, y), with no other variables, in which x holds an integer and y holds a pointer into the heap. The front end can encode the heap-pointer information by emitting a data block, or descriptor, associating 0 with x and 1 with y:[*]
data {
gc1: word32 2; /* this procedure has two variables */
word8 0; /* x is a non-pointer */
word8 1; /* y is a pointer */
}
This encoding does not use the names of the variables;
instead, each variable is assigned an integer index, based on the textual order in
which it appears in the definition of f.
Therefore x has index 0 and y has index 1.
A more compact encoding would use the indexes more intelligently,
consuming only one bit per variable.
To associate this descriptor with f, the front end places the definition of f in a C-- span:
span GC gc1 {
f( word32 x, word32 y ) {
...code for f...
}
}
A span may apply to a sequence of
function definitions, or to a sequence of statements within a function
definition.
In this case, the span applies to all of f.
If execution of f is suspended to perform some run-time service,
the back end must be able to map the suspended activation of f, plus the token
GC, to the value gc1.
More concretely, the C-- run-time system provides the C procedure
There are no constraints on the form of the descriptor that gc1 labels; that form is private to the front end and its run-time system. All C-- does is transform span directives into mappings from program counters to values.
Spans cannot overlap, but they can nest; the innermost span bearing a given token takes precedence. One can achieve the effect of overlapping by binding the same data block to multiple spans.
There may be several independent mappings in use simultaneously, e.g., one for garbage collection, one for debugging, and so on. The ``token'' is an integer used to distinguish these mappings from one another. Again, C-- takes no interest in the tokens; it simply provides a map from a (token, pc) pair to an address.
For exception handling, a mapping may indicate which handler should receive control upon an exceptional return from a procedure call. Handlers may be implemented using alternate continuations. The C-- runtime provides procedures that enable the front-end exception mechanism to transfer control to to alternate continuations, which are identified by number, according to the order in which they are attached to the call.
The C-- back end has considerable freedom to optimize programs; in particular, front ends may not assume that each span maps to a contiguous sequence of machine instructions. All that a front end can count on is that the optimizer preserves the mapping of program counter to span value. In particular,
CAVEAT: Giving the optimizer this much freedom makes things hard for the debugger. For example, it's not clear what the debugger should do to set a breakpoint. Most of the problems seem to go away if the optimizer is prevented from splitting spans, i.e., if every span maps to a contiguous set of object-code locations.
All our intended high-level run-time services must be able to suspend a C-- computation, inspect its state, and modify it, before resuming execution.
What does ``the state of a suspended C-- computation'' look like? In addition to the contents of memory, it includes a logical stack of procedure activations. This stack may not correspond exactly to the programmer's intuition about calls and returns, because activations of procedures that end in tail calls may have disappeared. The logical stack is probably, though not necessarily, implemented as a physical stack of activation records, the layouts of which are determined by the back end.
In many implementations of high-level languages, the run-time system runs on the same physical stack as the program itself. In such implementations, walking the stack or unwinding the stack requires a thorough understanding of system calling conventions, especially if an interrupt can cause a transfer of control from generated code to the run-time system. We prefer to imagine that the generated code and the run-time system run on separate stacks, as separate threads:
Though we call them ``threads,'' ``coroutine'' may be a more accurate term. The system thread never runs concurrently with the C-- thread, and the two can be implemented by a single operating-system thread.
The C-- run-time system provides a synchronous control mechanism, which a front-end run-time system can use to run as a coroutine with C-- code. The front-end runtime calls the C procedure Resume(t) to transfer control to the C-- thread described by thread-control block t. Execution continues in the C-- thread until it calls the C-- procedure yield(r), which suspends execution of the C-- thread, saves its state in its control block, and co-routines back to the system thread's Resume call. The value r that is passed as a parameter to yield is returned as the result of Resume; it is the thread's yield code.
To start a program, execution begins in the front-end run-time system, which uses the C procedure InitTCB to initialize the C-- thread, to which it can then transfer control with Resume. These C procedures are described in greater detail in Section [->].
Though it is not the focus of this paper, we intend that C-- should support many very lightweight threads, in the style of Concurrent ML [cite reppy:cml], Concurrent Haskell [cite peyton-jones:concurrent], and many others. The C procedures InitTCB and Resume, together with the C-- procedure yield, are sufficient for front-end run-time systems to implement a complete non-preemptive threads package, as we discuss in Section [->].
At a call to yield, as at any other call, the code generator guarantees that the live variables are tidily saved away in the activation record or in callee-saves registers. yield stores the callee-saves registers, as well as the rest of the thread state, in the thread-control block, so at this point the execution of the C-- thread can safely be suspended, and the front-end runtime has full access to the state of the C-- thread. We call such a point a safe point.
More precisely, a program-counter value within a procedure is a safe point if it is safe to suspend execution of the procedure at that point, and to inspect and modify its variables. Accordingly, we require the following precondition for execution in the front-end run-time system:
A C-- thread can be suspended only at a safe point.Because any procedure could call yield, the code generator must make a safe point at every call site. This safe point is associated with the state in which the call as been made and the procedure is suspended awaiting the return.
It is impractical to try to make every instruction a safe point; recording local-variable liveness and location information for every instruction might represent a truly onerous burden. The front end can therefore insert extra safe points with the C-- statement
safepoint;The existence of a safe point constrains the optimizer in the following way:
CAVEAT: A stronger restriction would prevent the optimizer from moving assignments to C-- variables as well as memory loads and stores. This restriction might be sufficient to ensure that the observable state of the machine at a safe point would be consistent with the C-- source code, where ``observable'' means ``observable by the procedures defined in Section [->]''. (Front ends can achieve such an effect by making all variables non-invariant.) It also might, if enough safepoints are introduced to support debugging, effectively disable the optimizer, which we don't want. We would like to give the optimizer as much freedom as possible without creating a heroic task for the debugger; how to do this is a topic for research.
A safepoint directive may have an optional name, which the back end binds to the program counter that corresponds to the safe point. This name is visible anywhere in the compilation unit containing the function that contains the safe point. A safe-point name is not a label and may not be the target of a goto. Safe points may have names in order to support debugging; a front end can use the names in data declarations to build maps from source locations to program counters.
CAVEAT: We give safe points names only to support debugging; the debugger has to have a map from source-code locations to object-code locations. Safe-point names seem like an ugly wart; it might be better to supply a map from spans to sets of PCs, but in that case, it's not clear how to ensure that suspension occurs at a safe point.
So far we have suggested that a C-- program can only yield control voluntarily, through a yield call. What happens if an interrupt or fault occurs, transferring control to the front-end run-time system, and the currently executing C-- procedure is not at a safe point? This may happen if a user deliberately causes an interrupt, e.g., to request that the stack be unwound or the debugger invoked. It may happen if a hardware exception (e.g., divide by zero) is to be converted to a software exception. It may happen in a concurrent program if timer interrupts are used to pre-empt threads.
How can we achieve the safe-point invariant in the presence of such asynchronous events? There are the following conventional alternatives:
To implement pre-emption, the C-- runtime must sit between the operating-system interrupt mechanism and the front-end runtime. When it takes an interrupt, it bundles up the C-- thread state into its thread-control block, and it transfers control to the front-end handler. If the C-- thread was suspended at a safe point, all operations in Section [->] are valid. If not, only a restricted set of operations are valid. This restricted set must include ExecuteToNextSafePoint. It also must include some support for the debugger, in case a fault makes forward execution impossible. The details are a topic for research.
[*] [*] This section presents the detailed interface that a front-end run-time system can use to create C-- threads, to transfer control to them, and to inspect and modify their state. Rather than specify representations of thread-control blocks or activation records, we hide them behind simple abstractions. In particular, we specify a set of C procedures that must be provided by the C-- runtime, and that together allow the state of a C-- thread to be inspected and modified.
The state of a C-- thread consists of some saved registers and a logical stack of procedure activations. This logical stack is usually implemented as some sort of physical stack, but the correspondence between the two may not be very direct. In particular, callee-saves registers that logically belong with one activation are not necessarily stored with that activation, or even with the adjacent activation; they may be stored in the physical record of an activation that is arbitrarily far away. This problem is the reason that C's setjmp and longjmp functions don't necessarily restore callee-saves registers, which is why some C compilers make pessimistic assumptions when compiling procedures containing setjmp.
We hide this complexity behind a simple abstraction, the activation. The idea of an activation of procedure P is that it approximates the state the machine will be in when control returns to P. The approximation is not completely accurate because other procedures may change the global store or P's stack variables before control returns to P. At the machine level, the activation corresponds to the ``abstract memory'' of [cite ramsey:thesis, Chapter 3], which gives the contents of memory, including P's activation record (stack frame), and of registers. The contents of volatile registers are undefined, and such registers are treated as unavailable.
The activation abstraction hides machine-dependent details and raises the level of abstraction to the C-- source-code level. In particular, the abstraction hides:
In the C-- run-time interface, an activation record is represented by an ``activation handle,'' which is a value of type activation. Arbitrary registers and memory addresses are represented by variables, which are referred to by number. Values of the program counter are restricted to safe points.
[*] The procedures in the C-- run-time interface are:
Names bound by stack declarations are considered variables for purposes of FindVar, even though they are immutable. For such names, FindVar returns the value that the name has in C-- source code, i.e., the address of the stack-allocated block of storage. Storing through this address is meaningful; it alters the contents of the activation record a. Stack locations are not subject to liveness analysis.
We use result_index, not the number of the variable that will receive the result, because we expect that doing so will significantly reduce the amount of information that must be saved by the back end. For example, under a C-like calling convention, FindReturnLocation(a, 0) might always return the same register.
Notice that, unlike FindVar, FindReturnLocation only applies to the top-most activation --- hence we pass a pointer to the thread-control block, rather than an activation handle. For other activations it is nonsense to talk of the return locations, whereas for the topmost activation it makes perfect sense; for example, the first return value might be returned in register 1, which is held in the thread-control block.
| InitTCB(t, n, pc) | Initialises a thread-control block t of size n, with initial program counter pc. |
| Resume(t) | Resumes C-- thread t. |
| GetDescriptor(&a, token) | Returns pointer to data block associated with the smallest span tagged with token and containing the point where activation a is suspended. |
| FindVar(a, v) | Returns pointer to mutable location containing value of parameter or local variable v in activation a. |
| FindDeadVar(a, v) | Returns pointer to location containing last known value of v. |
| FirstActivation(t, &a) | Sets a to ``currently executing'' activation of thread t. |
| NextActivation(&a) | Mutates a to point to the activation to which a will return (normally a's caller). |
| SetActivation(t, a) | Mutates thread-control block so thread t will resume execution with activation a. |
| SetContinuation(t, k) | Mutates thread-control block so thread t will resume execution at its k'th continuation. |
| FindReturnLocation(t, n) | Returns a pointer to the location in which the n'th return value will be returned to thread t. |
| SetSafePoint(t, sp) | Sets the safe point at which the thread will resume. |
| ExecuteToNextSafePoint(t) | Runs thread t until it reaches a safe point. |
Table [<-] summarises the C-- run-time interface.
[*] [*] Can spans, multiple return continuations, and the C-- run-time interface can be implemented efficiently? By sketching a possible implementation, we argue that they can. Because the implementation is private to the back end and the back-end run-time system, there is wide latitude for experimentation. Any technique is acceptable provided it implements the semantics above at reasonable cost. We argue below that well-understood techniques do just that.
The span mappings of Section [<-] take their inspiration from table mappings for exception handling, and they can be implemented in similar ways [cite chase:exn-I]. The most common way is to use tables sorted by program counter. If suitable linker support is available, tables for different tokens can go in different sections, and they will automatically be concatenated at link time. Otherwise, tables can be chained together (or consolidated) by an initialisation procedure called at program initialisation time.
Walking a stack [*]The activation handle points to an activation record, which may contain values of some local variables. Other local variables may be stored in callee-saves registers, in which case their values are not saved in the current activation record, but in the activation records of one or more called procedures. These activation records can't be determined until run time, so the stack walker incrementally builds a map of the locations of callee-save registers, by noting the saved locations of each procedure.
In our sketch implementation, the call stack is a contiguous stack of activation records. An activation handle is a static record consisting of a pointer to an activation record on the stack, together with pointers to the locations containing the values that the non-volatile registers [The non-volatile registers are those registers whose values are unchanged after return from a procedure call. They include not only the classic callee-saves registers, but also registers like the frame pointer, which must be saved and restored but which aren't always thought of as callee-saves registers.] had at the moment when control left the activation record (Figure [<-]). FirstActivation initialises the activation handle to point to the topmost activation record on the stack and to the locations in the thread-control block that hold the values of registers. Depending on the mechanism used to suspend execution, the thread-control block might hold values of all registers or only of non-volatile registers, but this detail is hidden behind the run-time interface. [cite ramsey:thesis] discusses retargetable stack walking in Chapters 3 and 8.
The run-time system executes only when execution of C-- procedures is suspended. We assume that the front-end run-time system uses one of the techniques mentioned in Section [<-] to ensure that C-- execution is suspended only at safe points. For each safe point, the C-- code generator builds a statically-allocated activation-record descriptor that gives:
The C-- runtime can map activations to descriptors using the same mechanism it uses to implement the span mappings of Section [<-]. The run-time interface can cache these descriptors in the activation handle, so the lookup need be done only when NextActivation is called, i.e., when walking the stack. An alternative that avoids the lookup is to store a pointer to the descriptor in the code space, immediately following the call, and for the call to return to the instruction after the pointer. The SPARC C calling convention uses a similar trick for functions returning structures [cite sparc:architecture, Appendix D].
The details of descriptors and mapping of activations to descriptors are important for performance. At issue is the space overhead of storing descriptors and maps, and the time overhead of finding descriptors that correspond to PCs. [cite liskov:exception] suggests that sharing descriptors between different call sites has a significant impact on performance. Luckily, these details are private between the back end and the back-end run-time system, so we can experiment with different techniques without changing the approach, the run-time interface, or the front end.
This section explains how the C-- run-time interface might be used to help implement a garbage collector, an exception mechanism, a debugger, and a threads package. The first three all use the same technique to walk the stack: allocate an activation handle, initialize it with FirstActivation, and call NextActivation to move from one activation record to the next.
Our primary concern is how the collector finds, and possibly updates, roots. Other tasks, like finding pointers in heap objects and compacting the heap, can be managed entirely by the front-end run-time system (allocator and collector) with no support from the back end. C-- takes no responsibility for heap pointers passed to code written in other languages. It's up to the front end to pin such pointers or to negotiate changing them with the foreign code.
To help the collector find roots in global variables, the front end can arrange to deposit the addresses of such variables in a special data section. To find roots in local variables, the collector must walk the activation stack. For each activation handle a, it calls GetDescriptor(&a, GC) to get the garbage-collection descriptor deposited by the front end. The descriptor tells it how many variables there are and which contain pointers. For each pointer variable, it gets the address of that variable by calling FindVar. If the result is non-NULL, the collector marks or moves the object the variable points to, and it may redirect the variable to point to the object's new location. Note that the collector need not know which variables were stored on the stack and which were kept in callee-saves registers; FindVar provides the location of the variable no matter where it is. Figure [->] shows a simple collector based on [cite appel:simple], targeted to the C-- run-time interface and the descriptors shown in Section [<-].
struct gc_descriptor {
unsigned var_count;
char heap_ptr[1];
};
void gc(void) {
activation a;
FirstActivation(tcb, &a);
for (;;) {
struct gc_descriptor *d = GetDescriptor(&a, GC);
if (d) {
int i;
for (i = 0; i < d->var_count; i++)
if (d->heap_ptr[i]) {
typedef void *pointer;
pointer *rootp = FindVar(a, i);
*rootp = gc_forward(*rootp); /* copying forward as per Appel */
}
}
if (!NextActivation(&a))
break;
}
gc_copy(); /* from-space to to-space, as per Appel */
}
Part of a simple copying garbage collector
[*]
A more complicated collector might have to do more work to decide which variables represent heap pointers. TIL is the most complicated example we know of [cite tarditi:til]. In TIL, whether a parameter is a pointer or not may depend on the value of another parameter. For example, a C-- procedure generated by TIL might look like this:
f( word32 ty, word32 a, word32 b ) { ... }
The first parameter, ty, is a pointer to a heap-allocated type record.
It is not statically known, however, whether a is a heap pointer.
At run time, the
first field of the type record that ty points to describes whether a is a pointer.
Similarly, the second field of the type record describes whether b is a
pointer.
To support garbage collection, we attach to f's body a span that points to a statically allocated descriptor, which encodes precisely the information in the preceding paragraph. How this encoding is done is a private matter between the front end and the garbage collector; even this rather complicated situation is easily handled with no further support from C--.
Garbage collection is invoked, via a call to yield, when the allocator runs out of space. For example, here is how the code might look if allocation takes place in a single contiguous area bounded by heap_limit:
f( word32 a,b,c ) {
while (hp+12) > heap_limit {
yield( GC ); /* Need to GC */
}
hp = hp+12;
...
}
[*] [*] Exceptions have a rich history, and many models of exceptions and exception handling have been proposed and implemented [cite goodenough:exception, liskov:exception, drew:exception]. Exceptions may be associated with handlers statically, by source-language constructs, or dynamically, by calls to primitives like the POSIX sigaction. The raising of an exception transfers control to a handler, which may terminate, resume, or retry the code that raised the exception. Finally, there are synchronous exceptions, which occur only at well-defined points in the program, and asynchronous exceptions, which may occur at arbitrary points.
This paper addresses exceptions that are associated with handlers statically; by definition, dynamically associated handlers need no compile-time support. The C-- run-time interface supports termination and retry models, but not the resumption model. It also supports synchronous exceptions, and possibly asynchronous exceptions via ExecuteToNextSafePoint (Section [<-]).
C-- does not enforce a particular model of or semantics for exceptions; instead, C-- provides hooks that enable different front-end run-time systems to implement different high-level exception semantics. These hooks do not impose undue overhead on the normal case. (The designers of Modula-3 [cite cardelli:m3] urge implementors to spend 10,000 instructions in the exceptional case to save 1 instruction in the normal case.) We illustrate the idea of using a single set of hooks to implement various high-level exception semantics by sketching how one might implement exception dispatchers for Modula-3, Eiffel, and Exceptional C.
PROCEDURE TryAMove() =
BEGIN
TRY
makeMove(getMove(player));
next := (next + 1) MOD NUMBER(players);
EXCEPT
| IllegalMove(why) => player.illegalmove(why);
| NoMoreTiles => player.illegalmove("not enough tiles left");
END;
INC(movesTried);
END TryAMove;
Example Modula-3 procedure
[*]
Figure [<-] shows a fragment from a game-playing program written Modula-3. Modula-3 uses TRY-EXCEPT-END to show handlers and their scopes. The statement sequences to the right of the arrows (=>) are handlers for the exceptions IllegalMove and NoMoreTiles. If either of these exceptions is raised anywhere between TRY and EXCEPT, control transfers to the appropriate handler. Otherwise, after the assignment to next, control skips directly from EXCEPT to END. After execution of a handler, control also transfers to END.
void TryAMove() {
word32 s, t;
span EXN ex1 {
t = getMove(player) also returns s to {goto H1;} also returns to {goto H2;};
makeMove(t) also returns s to {goto H1;} also returns to {goto H2;};
t = word32[players]; /* load size of array from its descriptor */
next = (next + 1) mod t;
}
finish:
movesTried = movesTried + 1;
return;
H1:
t = word32[word32[player]+12]; /* load address of illegalmove method */
t (s);
goto finish;
H2:
t = word32[word32[player]+12]; /* load address of illegalmove method */
t (lit1);
goto finish;
}
data {
lit1 : word8[22] "not enough tiles left\0";
ex1 : align 4;
word32 2; /* two handlers in scope */
word32 Exn_IllegalMove;
word32 0; /* assign argument to variable 0 (s) */
word32 Exn_NoMoreTiles;
word32 -1; /* no argument */
}
C-- implementation of Modula-3 TryAMove.
[*]
This code might be translated into the C-- procedure shown in Figure [<-]. The data section contains both a string literal and an ``exception descriptor'' that shows which exceptions are handled in the C-- span tagged with ex1.
To see how exception dispatch works, let us suppose that getMove terminates normally, but makeMove discovers that the move cannot be made because there would be no more tiles. makeMove would contain the Modula-3 statement
RAISE IllegalMove("Your play goes off the board");
which might be translated into a yield to awaken the system
thread, using the yield code to request exception handling service.
The details of the particular exception would be pushed onto a global
``exception stack.''
push_exn_info(Exn_MoMoreTiles, lit19); yield( EXCEPTION );The system thread would invoke the exception dispatcher. The dispatcher would in turn call FirstActivation(tcb, &a) to get the activation handle for makeMove, and then call GetDescriptor(&a, EXN) to find handlers. GetDescriptor might return NULL if, for example, makeMove contained no handlers. The dispatcher would then call NextActivation(&a) to get the next frame. This time, GetDescriptor(&a, EXN) would return a pointer to ex1, and the dispatcher would identify handler 0 as the handler for the Modula-3 exception IllegalMove. It would then use SetActivation(tcb, &a) to establish a as the activation to resume, and SetContinuation(tcb, 1) to cause resumption at the first alternate continuation. Then it would use FindReturnLocation(tcb, 0) to find the location to which to assign the value stored on the exception stack as the result to return.
Figure [->] shows a simple dispatcher implemented in C. A real dispatcher for Modula-3 would be more complicated, because it would have to provide for finalization (TRY-FINALLY-END), for handlers that receive multiple exceptions, and for better recovery from unhandled exceptions. The dispatcher included with DEC SRC Modula-3 even includes performance optimizations, such as efficient finalization of locks.
struct exn_descriptor {
int handler_count;
struct { void *exn_tag; int arg_number; } handlers[1];
}
void dispatcher() {
activation a;
void *exn_tag, *argument;
pop_exn_info(&exn_tag, &argument);
FirstActivation(tcb, &a);
for (;;) {
struct exn_descriptor *d;
d = GetDescriptor(&a, EXN);
if (d) {
int i;
for (i = 0; i < d->handler_count; i++)
if (d->handlers[i].exn_tag == exn_tag) {
SetActivation(tcb, &a); /* unwind stack */
SetContinuation(tcb, i+1); /* choose handler */
if (d->handlers[i].arg_number >= 0) { /* exn expects value */
void *result;
result = FindReturnLocation(tcb, d->handlers[i].arg_number);
*result = argument; /* Assign result */
}
return; }
}
if (!NextActivation(&a))
abort(); /* unhandled exception: dump core */
}
}
A simplified exception dispatcher for Modula-3
[*]
The Eiffel language [cite meyer:eiffel] provides a somewhat different exception model, but one that can be implemented using the same C-- mechanisms. In Eiffel, exception handlers may not be attached to arbitrary sequences of statements, but only to whole procedures (called routines). Multiple handlers are not permitted; a handler, if present, receives all possible exceptions. Finally, handlers may not terminate normally, but must either retry the execution of their routines or terminate their routines and propagate the exception up the call stack. The high-level semantics are significantly different from Modula-3 semantics, but the implementation in C-- is almost the same. The new element is the retry, which can be implemented in C-- by a simple branch (goto) to the beginning of the routine. Propagating the exception requires re-invoking the run-time exception dispatcher, which is also possible in Modula-3.
Exceptional C [cite gehani:exceptional] offers a limited form of asynchronous exceptions with resumption. In Exceptional C, handlers for asynchronous exceptions do not have access to the local variables of the functions in which they appear; such handlers have access only to global variables. Such handlers can be compiled into C-- procedures, just as they are compiled into C functions in Gehani's implementation. Because the handler does not touch the state of the suspended C-- procedure, it can resume that procedure even when the procedure is suspended at an unsafe point. Unfortunately, the C-- run-time interface provides no general way to find the handler or to implement Exceptional C's retry or next. (Retry resumes execution at the beginning of the block protected by the handler, and next resumes execution at the end of the block protected by the handler.) The problem is that, if an asynchronous exception occurs at an arbitrary point, it is unsafe to call NextActivation to search for the handler---think about interrupting a procedure that is in the middle of saving callee-saves registers. If exceptions are truly asynchronous, all these problems can be solved by guaranteeing that ExecuteToNextSafePoint terminates in bounded time, and C-- can be used for the implementation.
[*] Debugging is not a solved problem. Questions remain in how compilers should support debuggers, how debuggers can be made retargetable, and how debuggers interact with optimizers. This paper does not address these questions. The paper does argue that the C-- run-time interface provides adequate support for parts of debuggers that are well understood, and that it is compatible with some simple but useful optimizations. Moreover, we illustrate how C-- can be used to express some tradeoffs between flexibility for the optimizer and power in the debugger.
CAVEAT: This is the weakest section in the paper. The main issues are (1) that named safe points seem redundant with spans, and (2) it's not at all clear what restrictions on the optimizer are needed even to do something as simple as set a breakpoint. We certainly don't have answers to the more complicated questions, e.g., suppose I don't want to pay the performance penalty of having a safepoint at every source-level statement. In this case, how do I set a breakpoint at a statement, and what can I assume about the machine's state when the breakpoint is taken?
The tasks of a source-level debugger are
Much of the information the debugger needs is known to the front end. Although many current compilers use specialized ``stabs'' or other machine-dependent formats to communicate these ``debugging symbols'' to the debugger, such specialized support is not necessary; ordinary initialized data is sufficient. It is helpful, but not necessary, to have some link-time support so this data can be put in a separate section and not automatically loaded into the target program's memory. The exact format of this data is a private matter between the front end and the debugger; we note in passing that this format need not be machine-dependent [cite ramsey:retargetable].
As an example of supporting a debugger using C--, this paper sketches a scheme derived from [cite ramsey:thesis, Chapter 4]. Debugging symbols emitted by a high-level front end might include
The tables record the location and type of each variable. Locations of global variables are placed directly in the debugging symbols; locations of local variables are given as indices and looked up at debug time using FindVar. The type contains the information needed to print the value of the variable and to evaluate expressions involving that variable.
To show active procedures and where they are suspended, the debugger would use procedures FirstActivation and NextActivation, and also the source-code locations stored in the safe-point spans.
To plant and remove breakpoints, a debugger must be able to map source-code locations to object-code locations. Typical debuggers use a two-level strategy, mapping a source file to a list of procedures, and each procedure to a list of safe points. They then search the safe points for the one nearest the specified source location.
CAVEAT: The problem with setting breakpoints at safepoints is that for them to be of any use at all, one must restrict the optimizer. We haven't yet figured out a good set of restrictions.
A debugger can use SetSafePoint to change the flow of control in a suspended procedure. As ever, arbitrary changes in the flow of control are unsafe: for example, what does it mean to transfer control to the middle of a sequence of instructions used to restore callee-saves registers in a procedure epilog? As its name suggests, SetSafePoint supports only transfer of control between safe points.
Even if control is transferred only from one safe point to another, the back end may have made different assumptions about the states of C-- variables at different safe points. For example, what if local variable x is currently held in a register, but at the new PC, x is expected to be on the stack? Worse, what if local variable y is dead, but the debugger changes to a safe point at which y is live? Some compilers and debuggers handle this problem by making worst-case assumptions at code-generation time, e.g., by putting all variables on the stack at every safe point in the procedure. The C-- run-time interface makes such pessimistic assumptions unnecessary. Before the debugger calls SetSafePoint, it must consult FindVar to discover, and separately record, the values of all live variables. After calling SetSafePoint, must again consult FindVar to find the locations of all variables live at the new safe point, and store the values it previously extracted. In the example above, this sequence of calls would copy the value of x from its register location to its stack location. Furthermore, if the debugger discovers that the new safe point is associated with a live variable that is currently dead, it must supply a new value for the variable, perhaps by consulting the user.
Even when supporting a debugger, C-- gives the back end some freedom to optimize code, and the front end some ability to trade off debuggability and optimization. Given the specification of SetSafepoint, the back end is free to put variables in registers, to split live ranges, and to use state-of-the-art register allocation [cite briggs:improvements, george:iterated]. The back end is also free to reorder spans, to break up spans, and to use global-optimization techniques within spans. (The optimizer must not move code across span boundaries.) If the relevant variables are marked as invariant, the optimizer can save common subexpressions across calls and other safe points.
The front end has two ways of controlling optimization. Marking variables as invariant across calls gives the optimizer more opportunities for common-subexpression or partial-redundancy elimination, at the cost of preventing a garbage collector or debugger from changing these variables. Defining larger spans and reducing the number of safe points gives the optimizer larger regions in which to operate and more opportunities for optimization, at the cost of providing less precise information and control at debug time.
[*] Section [<-], above, describes C--'s support for very lightweight threads. Here we briefly outline how the interface described there can be used to support a simple, conventional threads package. As ever, our goal is to provide the minimal hooks to make a threads package possible; the bulk of the implementation should be in the front end runtime, not the C-- runtime.
When forking a thread, the front-end runtime allocates a thread-control block, which includes space for a stack. It initializes the block by passing to InitTCB a pointer that actually points a few words past the start of the allocated area. These initial few words are under the complete control of the front end runtime, to use for any per-thread data it wishes. We call this area the thread-local data block. The front-end runtime starts the thread by calling Resume. We expect it puts a pointer to the thread-local data block in a known place, and the generated C-- code probably moves that pointer to a register.
Since threads are not interrupted asynchronously, mutual exclusion is not an issue. When a thread wants to synchronise with another thread or block awaiting some event, it simply mutates the state of the appropriate semaphore (or whatever), and yields to the front-end runtime with a suitable yield code. All of this can be done with a call to a support library written in C--, and provided along with the front-end runtime.
This model of concurrency may support a weak form of preemption by allowing ``preempted'' threads to continue to the next safe point. Front ends can ensure atomicity by keeping safe points out of critical sections. A similar strategy was used with Argus, made cheaper by forcing safe points to coincide with stack-limit checks [cite liskov:argus]. This style of concurrency is very cheap. The operating system is not involved, no protection boundaries are crossed, and shared data structures can be manipulated without OS-level synchronization. The resulting system can have thousands of threads.
[*] Most high-level threads packages will wish to pass arguments to threads, and to have threads return results upon termination. This can be done by putting a closure in thread-local data, and by passing something like the following function to InitTCB.
extern word32 "register 7" localdata;
extern word32 initial_localdata;
void run_a_closure () {
word32 f, arg, answer, closure;
localdata = initial_localdata;
closure = word32[localdata+closure_slot];
f = word32[closure];
arg = word32[closure+4];
answer = f(arg);
word32[localdata+answer_slot] = answer;
yield(I_HAVE_FINISHED);
}
Multiplexing lots of C-- threads onto one operating-system thread is all very well, but if a system call blocks, then all C-- threads block. A standard solution for this well-known problem is to use non-blocking I/O exclusively, but not all I/O libraries can be made to use non-blocking I/O.
If the operating system provides multiple threads, an alternative solution is possible. When a C-- thread wants to make a call that may block, it builds a data structure describing the call. Then it yields to the front-end runtime, with a code indicating that it wants a possibly-blocking call to be made. The front-end runtime keeps a pool of OS threads handy, and sends the request to it. Meanwhile, it blocks the C-- thread, and schedules another. When the OS thread completes the call, and the front end runtime is next awakened, it re-schedules the original C-- thread.
What should happen if the stack of a C-- thread overflows? Since the stack is under the control of C--, it must be C-- that detects stack overflow. This is primarily an implementation issue, but it bears on the run-time interface because the C-- runtime must be able to ask the front-end runtime to allocate additional stack space. It should do so by arranging to yield to the front end with a suitable yield code. This way the front end can run the garbage collector if it needs to. (It may be necessary to allocate a small amount of space statically to ensure that there is room enough to save the thread's state in case of stack overflow.) This aspect of C--'s design is not fully thought out.
What are the alternatives to a portable assembly language? One possibility is to use existing retargetable code generators. In the introduction, we mentioned that such code generators are complex, language-specific, and often poorly documented. They are also surprisingly few:
An alternative to using existing code generators is to use existing virtual machines.
Given that we are designing a portable assembly language, we considered several alternative mechanisms to support high-level run-time services, such as garbage collection, exception handling, and debugging.
It would be a little easier to build thread packages if initial arguments could be passed to InitTCB, and if yield could return a result on thread termination. (There are also other cases in which it is useful for yield to pass a value as well as a code, e.g., when generated code uses yield to ask the front-end runtime to make a system call on its behalf.) The problem with passing such values in this way is that there are times when the values could be hidden inside the C-- run-time system, inaccessible to the garbage collector. We saw three possible solution