Glasgow Haskell Compiler
The Glasgow Haskell Compiler (GHC) is an open-source native code compiler for the functional programming language Haskell.[4] It provides a cross-platform environment for the writing and testing of Haskell code and it supports numerous extensions, libraries, and optimisations that streamline the process of generating and executing code. GHC is the most commonly used Haskell compiler.[5] The lead developers are Simon Peyton Jones and Simon Marlow.
Original author(s) | Kevin Hammond |
---|---|
Developer(s) | The Glasgow Haskell Team[1] |
Initial release | December 1992[2] |
Stable release | 8.10.3
/ 19 December 2020[3] |
Repository | |
Written in | Haskell and C |
Operating system | Linux, OS X 10.7 Lion and later, iOS, Windows 2000 and later, FreeBSD, Solaris 10 and later |
Platform | x86, x86-64, ARM |
Available in | English |
Type | Compiler |
License | New BSD License |
Website | www |
History
GHC originally started in 1989 as a prototype, written in LML (Lazy ML) by Kevin Hammond at the University of Glasgow. Later that year, the prototype was completely rewritten in Haskell, except for its parser, by Cordelia Hall, Will Partain, and Simon Peyton Jones. Its first beta release was on 1 April 1991 and subsequent releases added a strictness analyzer as well as language extensions such as monadic I/O, mutable arrays, unboxed data types, concurrent and parallel programming models (such as software transactional memory and data parallelism) and a profiler.[2]
Peyton Jones, as well as Marlow, later moved to Microsoft Research in Cambridge, England, where they continued to be primarily responsible for developing GHC. GHC also contains code from more than three hundred other contributors.[1] Since 2009, third-party contributions to GHC have been funded by the Industrial Haskell Group.[6]
Architecture
GHC itself is written in Haskell,[7] but the runtime system for Haskell, essential to run programs, is written in C and C--.
GHC's front end—incorporating the lexer, parser and typechecker—is designed to preserve as much information about the source language as possible until after type inference is complete, toward the goal of providing clear error messages to users.[2] After type checking, the Haskell code is desugared into a typed intermediate language known as "Core" (based on System F, extended with let
and case
expressions). Recently, Core was extended to support generalized algebraic datatypes in its type system, and is now based on an extension to System F known as System FC.[8]
In the tradition of type-directed compilation, GHC's simplifier, or "middle end", where most of the optimizations implemented in GHC are performed, is structured as a series of source-to-source transformations on Core code. The analyses and transformations performed in this compiler stage include demand analysis (a generalization of strictness analysis), application of user-defined rewrite rules (including a set of rules included in GHC's standard libraries that performs foldr/build fusion), unfolding (called "inlining" in more traditional compilers), let-floating, an analysis that determines which function arguments can be unboxed, constructed product result analysis, specialization of overloaded functions, as well as a set of simpler local transformations such as constant folding and beta reduction.[9]
The back end of the compiler transforms Core code into an internal representation of C--, via an intermediate language STG (short for "Spineless Tagless G-machine").[10] The C-- code can then take one of three routes: it is either printed as C code for compilation with GCC, converted directly into native machine code (the traditional "code generation" phase), or converted to LLVM virtual machine code for compilation with LLVM. In all three cases, the resultant native code is finally linked against the GHC runtime system to produce an executable.
Language
GHC complies with the language standards, both Haskell 98[11] and Haskell 2010.[12] It also supports many optional extensions to the Haskell standard: for example, the software transactional memory (STM) library, which allows for Composable Memory Transactions.
Extensions to Haskell
A number of extensions to Haskell have been proposed. These extensions provide features not described in the language specification, or they redefine existing constructs. As such, each extension may not be supported by all Haskell implementations. There is an ongoing effort[13] to describe extensions and select those which will be included in future versions of the language specification.
The extensions[14] supported by the Glasgow Haskell Compiler include:
- Unboxed types and operations. These represent the primitive datatypes of the underlying hardware, without the indirection of a pointer to the heap or the possibility of deferred evaluation. Numerically intensive code can be significantly faster when coded using these types.
- The ability to specify strict evaluation for a value, pattern binding, or datatype field.
- More convenient syntax for working with modules, patterns, list comprehensions, operators, records, and tuples.
- Syntactic sugar for computing with arrows and recursively-defined monadic values. Both of these concepts extend the monadic do-notation provided in standard Haskell.
- A significantly more powerful system of types and typeclasses, described below.
- Template Haskell, a system for compile-time metaprogramming. A programmer can write expressions that produce Haskell code in the form of an abstract syntax tree. These expressions are typechecked and evaluated at compile time; the generated code is then included as if it were written directly by the programmer. Together with the ability to reflect on definitions, this provides a powerful tool for further extensions to the language.
- Quasi-quotation, which allows the user to define new concrete syntax for expressions and patterns. Quasi-quotation is useful when a metaprogram written in Haskell manipulates code written in a language other than Haskell.
- Generic typeclasses, which specify functions solely in terms of the algebraic structure of the types they operate on.
- Parallel evaluation of expressions using multiple CPU cores. This does not require explicitly spawning threads. The distribution of work happens implicitly, based on annotations provided by the programmer.
- Compiler pragmas for directing optimizations such as inline expansion and specializing functions for particular types.
- Customizable rewrite rules. The programmer can provide rules describing how to replace one expression with an equivalent but more efficiently evaluated expression. These are used within core datastructure libraries to provide improved performance throughout application-level code.[15]
- Record dot syntax. Provides syntactic sugar for accessing the fields of a (potentially nested) record which is similar to the syntax of many other programming languages.[16]
Type system extensions
An expressive static type system is one of the major defining features of Haskell. Accordingly, much of the work in extending the language has been directed towards types and type classes.
The Glasgow Haskell Compiler supports an extended type system based on the theoretical System FC.[8] Major extensions to the type system include:
- Arbitrary-rank and impredicative polymorphism. Essentially, a polymorphic function or datatype constructor may require that one of its arguments is itself polymorphic.
- Generalized algebraic data types. Each constructor of a polymorphic datatype can encode information into the resulting type. A function which pattern-matches on this type can use the per-constructor type information to perform more specific operations on data.
- Existential types. These can be used to "bundle" some data together with operations on that data, in such a way that the operations can be used without exposing the specific type of the underlying data. Such a value is very similar to an object as found in object-oriented programming languages.
- Data types that do not actually contain any values. These can be useful to represent data in type-level metaprogramming.
- Type families: user-defined functions from types to types. Whereas parametric polymorphism provides the same structure for every type instantiation, type families provide ad hoc polymorphism with implementations that can differ between instantiations. Use cases include content-aware optimizing containers and type-level metaprogramming.
- Implicit function parameters that have dynamic scope. These are represented in types in much the same way as type class constraints.
- Linear types (GHC 9.0)
Extensions relating to type classes include:
- A type class may be parametrized on more than one type. Thus a type class can describe not only a set of types, but an n-ary relation on types.
- Functional dependencies, which constrain parts of that relation to be a mathematical function on types. That is, the constraint specifies that some type class parameter is completely determined once some other set of parameters is fixed. This guides the process of type inference in situations where otherwise there would be ambiguity.
- Significantly relaxed rules regarding the allowable shape of type class instances. When these are enabled in full, the type class system becomes a Turing-complete language for logic programming at compile time.
- Type families, as described above, may also be associated with a type class.
- The automatic generation of certain type class instances is extended in several ways. New type classes for generic programming and common recursion patterns are supported. Additionally, when a new type is declared as isomorphic to an existing type, any type class instance declared for the underlying type may be lifted to the new type "for free".
Portability
Versions of GHC are available for several platforms, including Windows and most varieties of Unix (such as Linux, FreeBSD, OpenBSD, and macOS).[17] GHC has also been ported to several different processor architectures.[17]
See also
References
- "The GHC Team". Haskell.org. Retrieved 1 September 2016.
- Hudak, P.; Hughes, J.; Peyton Jones, S.; Wadler, P. (June 2007). "A History of Haskell: Being Lazy With Class" (PDF). Proc. Third ACM SIGPLAN History of Programming Languages Conference (HOPL-III). Retrieved 1 September 2016.
- "The Glasgow Haskell Compiler". Haskell.org. Retrieved 25 December 2020.
- "The Glorious Glasgow Haskell Compilation System User's Guide". Haskell.org. Retrieved 27 July 2014.
- "2017 state of Haskell survey results". taylor.fausak.me. 15 November 2017. Retrieved 11 December 2017.
- "Industrial Haskell Group". Haskell.org. 2014. Retrieved 1 September 2016.
- "GHC Commentary: The Compiler". Haskell.org. 23 March 2016. Archived from the original on 23 March 2016. Retrieved 26 May 2016.
- Sulzmann, M.; Chakravarty, M. M. T.; Peyton Jones, S.; Donnelly, K. (January 2007). "System F with Type Equality Coercions". Proc. ACM Workshop on Types in Language Design and Implementation (TLDI).
- Peyton Jones, S. (April 1996). "Compiling Haskell by program transformation: a report from the trenches". Proc. European Symposium on Programming (ESOP).
- Peyton Jones, S. (April 1992). "Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine, Version 2.5". Journal of Functional Programming. 2 (2): 127–202. doi:10.1017/S0956796800000319.
- "Haskell 98 Language and Libraries: The Revised Report". Haskell.org. Retrieved 28 January 2007.
- "Haskell 2010 Language Report". Haskell.org. Retrieved 30 August 2012.
- "Welcome to Haskell' (Haskell Prime)". Haskell.org. Retrieved 26 May 2016.
- "GHC Language Features". Haskell.org. Retrieved 25 May 2016.
- Coutts, D.; Leshchinskiy, R.; Stewart, D. (April 2007). "Stream Fusion: From Lists to Streams to Nothing at All". Proc. ACM SIGPLAN International Conference on Functional Programming (ICFP). Archived from the original on 23 September 2007.
- Mitchell, Neil; Fletcher, Shayne (3 May 2020). "Record Dot Syntax". ghc-proposals. GitHub. Retrieved 30 June 2020.
- Platforms at gitlab.haskell.org