This compiler design and implementation site aims to provide book reviews and free ebook on compiler design handbook, advanced compiler design, modern compiler design, compiler design tools like lex, yaac, flex, Automata techniques and practical design using c++.

Compilers and Compiler Generators an introduction with C++

By P.D. Terry
This book has been written to support a practically oriented course in programming language translation for senior undergraduates in Computer Science. More specifically, it is aimed at students who are probably quite competent in the art of imperative programming (for example, in C++, Pascal, or Modula-2), but whose mathematics may be a little weak; students who require only a solid introduction to the subject, so as to provide them with insight into areas of language design and implementation, rather than a deluge of theory which they will probably never use again; students who will enjoy fairly extensive case studies of translators for the sorts of languages with which they are most familiar; students who need to be made aware of compiler writing tools, and to come to appreciate and know how to use them. It will hopefully also appeal to a certain class of hobbyist who wishes to know more about how translators work.
The reader is expected to have a good knowledge of programming in an imperative language and, preferably, a knowledge of data structures. The book is practically oriented, and the reader who cannot read and write code will have difficulty following quite a lot of the discussion. However, it is difficult to imagine that students taking courses in compiler construction will not have that sort of background!
There are several excellent books already extant in this field. What is intended to distinguish this one from the others is that it attempts to mix theory and practice in a disciplined way, introducing the use of attribute grammars and compiler writing tools, at the same time giving a highly practical and pragmatic development of translators of only moderate size, yet large enough to provide considerable challenge in the many exercises that are suggested.

Compiler Construction

By Niklaus Wirth
This book has emerged from my lecture notes for an introductory course in compiler design at ETH Zürich. Several times I have been asked to justify this course, since compiler design is considered a somewhat esoteric subject, practised only in a few highly specialized software houses. Because nowadays everything which does not yield immediate profits has to be justified, I shall try to explain why I consider this subject as important and relevant to computer science students in general.
It is the essence of any academic education that not only knowledge, and, in the case of an engineering education, know-how is transmitted, but also understanding and insight. In particular, knowledge about system surfaces alone is insufficient in computer science; what is needed is an understanding of contents. Every academically educated computer scientist must know how a computer functions, and must understand the ways and methods in which programs are represented and interpreted. Compilers convert program texts into internal code. Hence they constitute the bridge between software and hardware.
Now, one may interject that knowledge about the method of translation is unnecessary for an understanding of the relationship between source program and object code, and even much less relevant is knowing how to actually construct a compiler. However, from my experience as a teacher, genuine understanding of a subject is best acquired from an in-depth involvement with both concepts and details. In this case, this involvement is nothing less than the construction of an actual compiler.

Compilers: Backend to Frontend and Back to Front Again

By Abdulaziz Ghuloum

Compilers are percieved to be magical artifacts, carefully crafted by the wizards, and unfathomable by the mere mortals. This paper attempts to dispel this myth. We build a simple compiler for a simple language in a step-by-step fashion. The input language accepted by the compiler starts minimal, and grows as our knowledge of how to build compilers grows. The final language is almost Scheme.

Although the compiler is written in the Scheme programming language, only minimal knowledge of Scheme is required. Essentially, the reader is assumed to be comfortable reading and writing recursive Scheme functions to the level presented in The Little Schemer. Additionally, we recommend the freely available tutorial Teach Yourself Scheme in Fixnum Days for people familiar with other programming languages but not Scheme. The Scheme Programming Language is an invaluable resource for understanding Scheme’s semantics. You will find it most useful when you give up in thinking how list? detects circular data structures.

Our compiler targets the Intel-386 architecture, the dominant architecture for personal computing. The output of our compiler is assembly code that can be assembled by gas, the GNU assembler, which is freely available for most operating systems. No knowledge of assembly language or the Intel-386 architecture is assumed beyond the basics: binary numbers, memory layout, and basic pointers. If you are familiar with arrays in C, and know how the bit-level operations (and, or, xor, and not) work, then you’re good to go.
Click to Read More

Advanced Programming Language Design

By Raphael A. Finkel
This book stems in part from courses taught at the University of Kentucky and at the University of Wisconsin–Madison on programming language design. There are many good books that deal with the subject at an undergraduate level, but there are few that are suitable for a one- semester graduatelevel course. This book is my attempt to fill that gap.

The goal of this course, and hence of this book, is to expose first-year graduate students to a wide range of programming language paradigms and issues, so that they can understand the literature on programming languages and even conduct research in this field. It should improve the students’ appreciation of the art of designing programming languages and, to a limited degree, their skill in programming.

This book does not focus on any one language, or even on a few languages; it mentions, at least in passing, over seventy languages, including wellknown ones (Algol, Pascal, C, C++, LISP, Ada, FORTRAN), important but less known ones (ML, SR, Modula-3, SNOBOL), significant research languages (CLU, Alphard, Linda), and little-known languages with important concepts (Io, Go..del). Several languages are discussed in some depth, primarily to reinforce particular programming paradigms. ML and LISP demonstrate functional programming, Smalltalk and C++ demonstrate object-oriented programming, and Prolog demonstrates logic programming.

Students are expected to have taken an undergraduate course in programming languages before using this book. The first chapter includes a review of much of the material on imperative programming languages that would be covered in such a course. This review makes the book self-contained, and also makes it accessible to advanced undergraduate students.

Most textbooks on programming languages cover the well-trodden areas of the field. In contrast, this book tries to go beyond the standard territory, making brief forays into regions that are under current research or that have been proposed and even rejected in the past. There are many fascinating constructs that appear in very few, if any, production programming languages.

Some (like power loops) should most likely not be included in a programming language. Others (like Io continuations) are so strange that it is not clear how to program with them. Some (APL arrays) show alternative ways to structure languages. These unusual ideas are important even though they do not pass the test of current usage, because they elucidate important aspects of programming language design, and they allow students to evaluate novel concepts.

Flex - fast lexical analyzer generator

cs.man.ac.uk
Flex is a tool for generating scanners: programs which recognized lexical patterns in text. flex reads the given input files, or its standard input if no file names are given, for a description of a scanner to generate. The description is in the form of pairs of regular expressions and C code, called rules. flex generates as output a C source file, lex.yy.c, which defines a routine yylex(). This file is compiled and linked with the -lfl library to produce an executable. When the executable is run, it analyzes its input for occurrences of the regular expressions. Whenever it finds one, it executes the corresponding C code

Flex, version 2.5 - A fast scanner generator

By Vern Paxson
This manual describes flex, a tool for generating programs that perform pattern-matching on text. The manual includes both tutorial and reference sections:
  • Description - a brief overview of the tool
  • Some Simple Examples
  • Format Of The Input File
  • Patterns
  • the extended regular expressions used by flex
  • How The Input Is Matched - the rules for determining what has been matched
  • Actions - how to specify what to do when a pattern is matched
  • The Generated Scanner - details regarding the scanner that flex produces; how to control the input source
  • Start Conditions -introducing context into your scanners, and managing "mini-scanners"
  • Multiple Input Buffers - how to manipulate multiple input sources; how to scan from strings instead of files
  • End-of-file Rules - special rules for matching the end of the input
  • Miscellaneous Macros - a summary of macros available to the actions
  • Values Available To The User - a summary of values available to the actions
  • Interfacing With Yacc - connecting flex scanners together with yacc parsers
  • Options - flex command-line options, and the "%option" directive
  • Performance Considerations - how to make your scanner go as fast as possible
  • Generating C++ Scanners - the (experimental) facility for generating C++ scanner classes
  • Incompatibilities With Lex And POSIX - how flex differs from AT&T lex and the POSIX lex standard
  • Diagnostics - those error messages produced by flex (or scanners it generates) whose meanings might not be apparent
  • Files - files used by flex
  • Deficiencies / Bugs - known problems with flex
  • See Also - other documentation, related tools
  • Author - includes contact information

Click to Read More

Compiler Construction using Flex and Bison

By Anthony Aaby
A language translator is a program which translates programs from source language into an equivalent program in an object language.
Keywords and phrases: source-language, object-language, syntax-directed, compiler, assembler, linker, loader, parser, scanner, top-down, bottom-up, context-free grammar, regular expressions.
Introduction
A computer constructed from actual physical devices is termed an actual computer or hardware computer. From the programming point of view, it is the instruction set of the hardware that defines a machine. An operating system is built on top of a machine to manage access to the machine and to provide additional services. The services provided by the operating system constitute another machine, a virtual machine.
A programming language provides a set of operations. Thus, for example, it is possible to speak of a Java computer or a Haskell computer. For the programmer, the programming language is the computer; the programming language defines a virtual computer. The virtual machine for Simple consists of a data area which contains the association between variables and values and the program which manipulates the data area.

Bison The YACC-compatible Parser Generator

by Charles Donnelly and Richard Stallman
Bison is a general-purpose parser generator that converts a grammar description for an LALR(1) context-free grammar into a C program to parse that grammar. Once you are proficient with Bison, you may use it to develop a wide range of language parsers, from those used in simple desk calculators to complex programming languages.
Bison is upward compatible with Yacc: all properly-written Yacc grammars ought to work with Bison with no change. Anyone familiar with Yacc should be able to use Bison with little trouble. You need to be fluent in C programming in order to use Bison or to understand this manual.
We begin with tutorial chapters that explain the basic concepts of using Bison and show three explained examples, each building on the last. If you don't know Bison or Yacc, start by reading these chapters. Reference chapters follow which describe specific aspects of Bison in detail.
Bison was written primarily by Robert Corbett; Richard Stallman made it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University added multicharacter string literals and other features.
This edition corresponds to version 1.25 of Bison.

Yacc: Yet Another Compiler-Compiler

By Stephen C. Johnson
Computer program input generally has some structure; in fact, every computer program that does input can be thought of as defining an ``input language'' which it accepts. An input language may be as complex as a programming language, or as simple as a sequence of numbers. Unfortunately, usual input facilities are limited, difficult to use, and often are lax about checking their inputs for validity.
Yacc provides a general tool for describing the input to a computer program. The Yacc user specifies the structures of his input, together with code to be invoked as each such structure is recognized. Yacc turns such a specification into a subroutine that handles the input process; frequently, it is convenient and appropriate to have most of the flow of control in the user's application handled by this subroutine.
The input subroutine produced by Yacc calls a user-supplied routine to return the next basic input item. Thus, the user can specify his input in terms of individual input characters, or in terms of higher level constructs such as names and numbers. The user-supplied routine may also handle idiomatic features such as comment and continuation conventions, which typically defy easy grammatical specification.
Yacc is written in portable C. The class of specifications accepted is a very general one: LALR(1) grammars with disambiguating rules.
In addition to compilers for C, APL, Pascal, RATFOR, etc., Yacc has also been used for less conventional languages, including a phototypesetter language, several desk calculator languages, a document retrieval system, and a Fortran debugging system.

Therobs Lex & Yacc Examples and Download Links

Therobs Computing Resources
Overview
This page gives links to executables, reference material, and examples.
FAQ
Look at the lame, but developing, Lex and Yacc FAQ.
Questions?Any and all questions cheerfully answered by the one of therobs.
  • Fill out the Help Request Form
  • Email: lexandyacc at rtdti period com

Click to Read More

Lex and YACC primer/HOWTO

By bert hubert
If you have been programming for any length of time in a Unix environment, you will have encountered the mystical programs Lex & YACC, or as they are known to GNU/Linux users worldwide, Flex & Bison, where Flex is a Lex implementation by Vern Paxon and Bison the GNU version of YACC. We will call these programs Lex and YACC throughout - the newer versions are upwardly compatible, so you can use Flex and Bison when trying our examples.
These programs are massively useful, but as with your C compiler, their manpage does not explain the language they understand, nor how to use them. YACC is really amazing when used in combination with Lex, however, the Bison manpage does not describe how to integrate Lex generated code with your Bison program.
There are several great books which deal with Lex & YACC. By all means read these books if you need to know more. They provide far more information than we ever will. See the 'Further Reading' section at the end. This document is aimed at bootstrapping your use of Lex & YACC, to allow you to create your first programs.
The documentation that comes with Flex and BISON is also excellent, but no tutorial. They do complement my HOWTO very well though. They too are referenced at the end.
I am by no means a YACC/Lex expert. When I started writing this document, I had exactly two days of experience. All I want to accomplish is to make those two days easier for you.
In no way expect the HOWTO to show proper YACC and Lex style. Examples have been kept very simple and there may be better ways to write them. If you know how to, please let me know.

Lex - A Lexical Analyzer Generator

By M. E. Lesk and E. Schmidt
Lex helps write programs whose control flow is directed by instances of regular expressions in the input stream. It is well suited for editor-script type transformations and for segmenting input in preparation for a parsing routine.
Lex source is a table of regular expressions and corresponding program fragments. The table is translated to a program which reads an input stream, copying it to an output stream and partitioning the input into strings which match the given expressions. As each such string is recognized the corresponding program fragment is executed. The recognition of the expressions is performed by a deterministic finite automaton generated by Lex. The program fragments written by the user are executed in the order in which the corresponding regular expressions occur in the input stream.
The lexical analysis programs written with Lex accept ambiguous specifications and choose the longest match possible at each input point. If necessary, substantial lookahead is performed on the input, but the input stream will be backed up to the end of the current partition, so that the user has general freedom to manipulate it.
Lex can generate analyzers in either C or Ratfor, a language which can be translated automatically to portable Fortran. It is available on the PDP-11 UNIX, Honeywell GCOS, and IBM OS systems. This manual, however, will only discuss generating analyzers in C on the UNIX system, which is the only supported form of Lex under UNIX Version 7. Lex is designed to simplify interfacing with Yacc, for those with access to this compiler-compiler system.

How to Write a Simple Parser

by Ashim Gupta
This is a simple tutorial which gives certain simple yet necessary steps required to write a Parser. It won't cover too much details about Programming Languages constructs.The Compiler Design Tools used in this tutorial are LEX (Lexical Analysis) YACC (Yet another Compiler Compiler).
Certain Basic Concepts and Definitions:
What is a Parser ?
A Parser for a Grammar is a program which takes in the Language string as it's input and produces either a corresponding Parse tree or an Error.
What is the Syntax of a Language?
The Rules which tells whether a string is a valid Program or not are called the Syntax.
What is the Semantics of a Language?
The Rules which gives meaning to programs are called the Semantics of a Language.......

Debugging Lex, Yacc

cs.man.ac.uk
Typical compile-time errors
Here are the commoner error messages that you may see, with some hints as to what to do about them. Often, correcting one problem gets rid of several error messages, so fix as many as you can using these hints and your knowledge of C. If you still have error messages that aren't in this list, email me.
This is because you have spaces in your makefile where you should have tabs, on the previous line to the line number reported e.g. CC=gcc

test: two
two
^^^^^^^^.................. this is a tab
two: two.c
$(CC) -o two two.c
^^^^^^^^.................. but these are spaces - replace them by a tab
The first character on the lines containing commands must be a tab.

A Compact Guide to Lex & Yacc

By Tom Niemann
epaperpress.com
This document explains how to construct a compiler using lex and yacc. Lex and yacc are tools used to generate lexical analyzers and parsers. I assume you can program in C, and understand data structures such as linked-lists and trees.
The Overview describes the basic building blocks of a compiler and explains the interaction between lex and yacc. The next two sections describe lex and yacc in more detail. With this background, we construct a sophisticated calculator. Conventional arithmetic operations and control statements, such as if-else and while, are implemented. With minor changes, we convert the calculator into a compiler for a stack-based machine. The remaining sections discuss issues that commonly arise in compiler writing.

Tree Automata Techniques and Applications

grappa.univ-lille3.fr

The two first chapters contain the basics on Tree Automata theory for finite ordered ranked trees. Chapter 3 shows connections between Logic and Tree Automata. Chapter 4 presents Automata with Constraints. Chapter 5 presents Automata for Sets of Tree Languages. Chapter 6 gives the basics on Tree Transducers. Chapter 7 (new, work in progress) presents Alternating Tree Automata. We provide a search engine for Tata, a detailed description of the different chapters, and Postcript versions of all chapters:

We know that the book currently covers only few aspects of tree automata. A new chapter covering Tree Automata for Unranked Trees, Unordered Trees and more generally Tree Automata for Trees modulo Equational Theories is scheduled. We welcome submissions of additional chapters investigating other aspects of tree automata. We believe it is important to keep the homogeneity of the book, which should not be a collection of chapters. Hence submissions should be consistent with previous chapters. They will be reviewed by the current authors. If you wish to contribute, please send a message to

Click to Read More

Let's Build a Compiler

by Jack Crenshaw
This series of articles is a tutorial on the theory and practice of developing language parsers and compilers. Before we arefinished, we will have covered every aspect of compiler construction, designed a new programming language, and built a working compiler.
Though I am not a computer scientist by education (my Ph.D. is in a different field, Physics), I have been interested in compilers for many years. I have bought and tried to digest the contentsof virtually every book on the subject ever written. I don't mind telling you that it was slow going. Compiler texts arewritten for Computer Science majors, and are tough sledding for the rest of us. But over the years a bit of it began to seep in. What really caused it to jell was when I began to branch off onmy own and begin to try things on my own computer. Now I plan to share with you what I have learned. At the end of this series you will by no means be a computer scientist, nor will you knowall the esoterics of compiler theory. I intend to completely ignore the more theoretical aspects of the subject. What you_WILL_ know is all the practical aspects that one needs to know to build a working system.
This is a "learn-by-doing" series. In the course of the series I will be performing experiments on a computer. You will be expected to follow along, repeating the experiments that I do, and performing some on your own. I will be using Turbo Pascal4.0 on a PC clone. I will periodically insert examples written in TP. These will be executable code, which you will be expected to copy into your own computer and run. If you don't have a copy of Turbo, you will be severely limited in how well you will be able to follow what's going on. If you don't have a copy, I urge you to get one. After all, it's an excellent product, good formany other uses!
Some articles on compilers show you examples, or show you (as inthe case of Small-C) a finished product, which you can then copy and use without a whole lot of understanding of how it works. I hope to do much more than that. I hope to teach you HOW the things get done, so that you can go off on your own and not only reproduce what I have done, but improve on it.
This is admittedly an ambitious undertaking, and it won't be donein one page. I expect to do it in the course of a number ofarticles. Each article will cover a single aspect of compiler theory, and will pretty much stand alone. If all you're interested in at a given time is one aspect, then you need tolook only at that one article. Each article will be uploaded asit is complete, so you will have to wait for the last one before you can consider yourself finished. Please be patient.
The average text on compiler theory covers a lot of ground that we won't be covering here. The typical sequence is:
  • An introductory chapter describing what a compiler is.
  • A chapter or two on syntax equations, using Backus-Naur Form (BNF).
  • A chapter or two on lexical scanning, with emphasis on deterministic and non-deterministic finite automata.
  • Several chapters on parsing theory, beginning with top-down recursive descent, and ending with LALR parsers.
  • A chapter on intermediate languages, with emphasis on P-code and similar reverse polish representations.
  • Many chapters on alternative ways to handle subroutines and parameter passing, type declarations, and such.
  • A chapter toward the end on code generation, usually for some imaginary CPU with a simple instruction set. Most readers (and in fact, most college classes) never make it this far.
  • A final chapter or two on optimization. This chapter often goes unread, too...........

Click to Read More

Followers