Frequently Asked Questions about C--
What machines does the compiler support?
As of May 2004, only the Pentium back end is as expressive as a C compiler. The Alpha and Mips back ends can run hello, world. We are working on back ends for PowerPC (Mac OS X), ARM, and IA-64. Let us know what other platforms you are interested in.
How is C– different from JVM or CLR?
A major purpose of C– is to give you the power to make all the same choices about performance tradeoffs that you would get to make if you were building a custom code generator. This attitude distinguishes C– from the Java Virtual Machine or Microsoft’s Common Language Runtime, which pre-package JIT compilation, type checking, class loading, garbage collection, exception dispatch, and much more besides. If the JVM is a mansion (great if you like its design), then C– is the bricks (from which you can build all sorts of houses).
Why doesn’t the system include a garbage collector?
A garbage collector fixes many design choices, such as heap object layout and relocatability, allocation strategy, real-time bounds and so on. If we made these choices for you, we would be guaranteed to make the wrong ones! Instead C– provides the hooks you can use to attach your own garbage collector. You write your garbage collector in C (a programming language), not C– (an assembly language). At some point in your collector you probably need to traverse the stack to find the live roots; you use the C interface to the C– runtime system to do that.
Relationship to C
Why isn’t C-- a superset of C?
- C is a programming language designed for human programmers, whereas C-- is a compiler target language. So C has many things that are entirely unnecessary for C--; requiring a C-- compiler to support them would be exceptionally burdensome. Most notably, C has an elaborate type system that we don’t need for code generation (struct, union, prototype, and all that).
- There are a few features in C that are actually incompatible with C--, most notably support for varargs procedures (that is, procedures with a variable number of arguments). It seems impossible to have both C-style varargs and efficient, fully-general tail calls (which C-- must have).
- C-- deliberately provides different notation for many things that C can do. For example, where C would have “*p“, we write “bits32[p]” in C--.
- The C standard leaves too much up to the implementation, including the representations of structures, the sizes of the built-in types, and the meanings of the operators. C operators can have side effects, and the order of their evaluation is unspecified. Implementing other languages on top of C-- requires finer control. For example, Modula-3 requires division that rounds towards minus infinity, not towards zero. Standard ML requires arithmetic operations that detect overflow.
OK, then why isn’t C-- a subset of C?
For efficient compilation of modern languages, we need features that C just doesn’t provide efficiently:
- Ability to return multiple values in registers
- Optimized tail calls to any procedure
- Global variables bound to registers
- Ways to tie garbage-collection information to particular program points
- Support for exceptions
- Support for lightweight concurrency
The latter three are the real killers: C-- provides a run-time interface that allows the state of a suspended C-- computation to be inspected and modified at runtime. C has no equivalent for this.
The goal of UNCOL was to provide a universal intermediate language. The goal of C– is more modest: to encapsulate what code generators already do well. We don’t claim you could do any more with C– than you could do with a standard code generator; we’re just trying to make it easier to do those things.
A key distinguishing feature of C– is the data model. Put simply, C– has no high-level types—it does not even distinguish floating-point variables from integer variables. This model gives the front end total control of representation and type system, which is quite different from an UNCOL. This data model also helps distinguish C– from systems like the Java Virtual Machine and the Microsoft Common Language Runtime.
Why is the syntax of C– so C-like?
We expect that compiler writers will have to read a lot of C– while they’re debugging their front ends. Many compiler writers have significant experience reading low-level C code; making the syntax C-like helps them benefit from this experience. There are a number of syntactic tweaks in C– that make it easier to generate than C; for example, every operator has a prefix form, so it’s not necessary to use infix operators.
Why is the domain name not c--.org?
The Evil Empire refuses even to consider registering a domain name that ends with a dash.
Where is the Source Code?
What about that other C--?
C– is such a good name that others have also used it.
- Peter Cellik developed Sphinx C– as a programming language for the x86 only; it is like a mix of C and x86 assembly language. The project was taken over by Michael Sheker. Barry Kauler has more information about Sphinx C– and a translation of thedocumentation to English.
- The Riverside Intermediate Format uses a source language called C–, which is a subset of Gnu C.
- One of the intermediate codes of the Objective Caml compiler is called C–. Simon has been suspected of stealing the name from this source.
- David Holdsworth and colleagues are using the name C– for an emulation platform.
Is this a secret Microsoft project?
Does C-- provide the %xyz primitive?
Eventually we will divide the primitive operations into two categories:
- Required operations, which must be supported by every implementation, at at least one size.
- Optional operations, which if supported, must have the standard semantics.
The picture about sizes is less clear. For example, although we expect to require every implementation to implement two’s-complement add, we can’t imagine requiring every implementation to implement 32-bit two’s-complement add.
The reason for adding all known operations to C– as standard opcodes is to encourage different implementations to use the same name for the same operation. Eventually we hope to have a register a new primitive operator page at cminusminus.org.
C-- has no primitive operators that return multiple results. But my target machine has a single instruction that performs both quotient and remainder. What do I do?
Ideally, the C-- compiler would spot separate %quot and %rem operations and combine them. But you might want to use a multiple assignment to communicate your intentions to the code generator, e.g.,
q, r = %quot(x, y), %rem(x, y);
We don’t have enough peephole optimization yet to know if this style will make a difference, but it can’t hurt.