I'm moving, and I don't yet have a new email address or phone number, although I'll try to remember to send Vaughan these things when I get them. If you have a question that this document and the source code don't answer, and you don't have any direct way to contact me, phone my parents at (412)-833-1401 and ask them how to contact me.
I don't attempt to explain the mathematics of Chu spaces in my code or this document. For interactive lessons on Chu spaces, try Living Chu spaces.
Another example: the bindings of names to values does not change when the Context (K or standardization) changes. Instead, names are bound to things that know how to conform to whatever Context is currently in use. Spaces conformed to the current Context are built only as needed.
I have also done my best to provide lots of flexability by making code more general than it needs to be. I'll point out opportunities for extension or optimization as I come to them.
I handle most Exceptions, including ExecutionExceptions, in the same simple way: I pass them up to the user by printing their message in the most convenient TextComponent.
I put these two functions together, and separated them from the user interface, because I wanted to break the UI down into small independent components. These components can learn all they need to know by consulting the Calc object. They do not have to store state themselves or coordinate with each other.
Most of the state of a calculator is stored in Hashtables which bind names to objects. The Calc constructor builds these tables and fills them with the standard initial contents. Many important objects are defined here in the Calc constructor, using anonymous inner classes.
The Conformable object "undef" (used to represent the lack of a binding) conforms by returning null. The table "variables" contains only Chu spaces or "undef". The table "constants" also contains mostly Chu spaces, but the constants bound to "1" and "_|_" conform by returning a Chu space which depends on K. Users of Calc.java do not have to worry about Conformable objects, because Conformable objects are never passed to or returned from Calc.
There are many methods for manipulating constants and variables. To check that a proposed variable name does not conflict with an existing binding, use "newVariableNameOK" (see ChuTarget.java). To create or change bindings of a variable, use "bindVariable" (see, for example, Statement.java). To find all variable names (possible targets for a result space), use "lvalueIdentifiers" (see, for example ChuTarget.java). To find all bound names (possible sources for an input space), use "rvalueIdentifiers" (see ChuSource.java). To get the (conformed) value of any binding, use "lookupChu" (see, for example, ChuSource.java). To get the underlying (unconformed) value of a variable, use "lookupVariable" (see ChuEdit.java). Since constants may have no unconformed value, there is no "lookupConstant".
For example, Chu.java defines product but not sum. The Calc constructor defines a BinaryOperator which dualizes the inputs and outputs of product, then stores this Operator in binops, under the name "+". The Calc constructor also defines a BinaryOperator which just calls product, and stores this in binops under the name "*".
The type of event which has occurred is given by the field "eventCode". Observers typically use a switch statement on this field to decide how to handle an event. Look at the "update" method of the Observer for GUI.java for an example. Notice that the argument passed to update is either a String (the text of the Program) or null (If the Program is long).
Therefore, I fork a separate Thread, called "calcThread", to handle the heavy computational work. The main Thread continues running, and can respond to a user's click on the cancel button by calling method "cancel" in Calc.java. This method halts calcThread, throws it away, and broadcasts the news that a calculation failed. Notice that nothing in the scripting language or the back end Chu space operations needs to know anything about multithreading or the possibility that a calculation might "fail" due to user impatience.
Multithreading is done in the usual Java way, by passing a Thread constructor something that implements a "run" method. In this case, class Calculation implements "run", and the constructor for both a Calculation and its associated Thread are called in method "calculate". I made Calculation an inner class, because it is used inside class Calc only, and because it calls "broadcast" often.
The fields "nrows" and "ncols" count the rows and columns. Notice that while nrows=matrix.length is redundant, ncols=matrix.length is not redundant, because we might have nrows=0, in which case there is no matrix. For simplicity and symmetry, I've included both fields.
The field "K" gives the number of distinct degrees of completion. The entries in the matrix are in the range 0..K-1 and give the degree of completion of that event in that state. Notice that "K" is only partially redundant: The contents of the matrix provide a lower bound on K, but not an upper bound.
The last field is "standard", a pointer to the standardized version of this Chu space. For more information on this field, see Conform and standardize
Chu.java provides only basic Chu space functionality: In Calc.java, more complex operations are built up from these basic operations (see Unary and Binary Operators). These basic operations include constructing Chu spaces, converting spaces to and from Strings ("parsing"), and applying unary and binary operators which return new spaces.
The basic operations "implication" and "query" are much more complicated than the others, because they involve generating matrixes of overlapping rows and columns. Below, I deal with construction and parsing first, all the simple operations second, and implication and query last.
As in Calc.java and Matrix.java, all data is private: inspectors are provided as needed. Since Chu spaces are never modified, I provide no operations for changing their data.
In addition to these small constructors, there is a parse constructor which builds Chu spaces from String input, and an "unparse" method which turns the matrix of a space back into text. I throw a Chu.ParseException whenever there is a problem with parsing or unparsing a Chu space.
The parse constructor takes four Strings as input. The first three strings provide values for K, nrows, and ncols, and the last String provides the matrix itself.
The lines of the matrix text are newline-delimited, and the entries are the digit characters '0'..'9', possibly separated by spaces or tabs. Currently, there is no way to parse for K>10. A reasonable extension might be to allow K>10 but require whitespace between entries, which could then have more than one digit. The unparse method also currently assumes that the matrix entries correspond to a digits '0'..'9', and prints them without intermediate whitespace. (The rest of the code assumes that K may be any positive integer: only parsing and unparsing are restricted to digits.)
If the value of K specified by the input String conflicts with a matrix entry, I throw a Chu.ParseException. If the value of nrows or ncols specified by the input Strings conflict with the shape of the matrix, then I crop the matrix to size or pad with zeroes as needed.
If K, nrows or ncols is left unspecified (by passing null), then I set a boolean to indicate that the corresponding value is not initialized, and fill in a default value later. The default value of K is the largest entry in the matrix, plus one. The default value of nrows is the number of newline-delimited lines in the matrix text. The default value of ncols is the number of entries in the first row of the matrix, or zero if there are no rows in the matrix.
The binary operation "choice" sticks together two spaces by their corners, and pads out the matrix with zeros. That is, the rows of the new matrix are rows from the first space followed by zeros, or rows from the second space preceded by zeros.
The binary operation "product" combines the rows of two spaces in all possible ways. That is, the rows of the new matrix are rows from the first space followed by rows from the second space. All possible pairs of rows are formed.
The binary operation "sequence" combines the columns of two spaces, but not in all possible ways. Columns are partially ordered by component-wise comparison. A column is "initial" if it is "larger" than no other column. A column is "final if it is "smaller" than no other column. Columns of the new matrix are any column of the first space followed by an initial column of the second space, or a final column of the first space followed by any column of the second space.
The implementation of sequence first calls the helper function "classifyCols" to determine which columns are initial and which are final (classifyCols in turn calls its helper function, "compareCols"). Columns classified as "unknown" are comparable with nothing, and are therefore both initial and final. Once the columns are classified, the implementation then considers all possible pairs of columns twice. In the first pass, it counts all combinations that will go into the result. Now the matrix can be built. In the second pass, all the good combinations are written into the result.
Because conform is a common operation, so is "standardize". Unfortunately, standardizing a large space can be very expensive. This pointer to the standardized version is manipulated only in the private constructor, in dual, and in standardize. Therefore, every space keeps a pointer to its standardized version (in the field "standard"), so the standardization needs to be computed at most once. Note that this optimization depends on the fact that Chu spaces are never modified. Spaces known to be standardized point to themselves, and spaces whose standardization is not yet known point to null.
The operation "standardize" eliminates all duplicate rows and columns. The helper functions row_sort and col_sort sort the indexes of the rows/columns. Sorting puts duplicates right next to each other, where they can be eliminated. The sorting functions return the number of unique rows/columns, and fill an array with their indexes. Once the unique row and column indexes are available, a new matrix can be built (we know how big it is). To fill in the matrix, just try all possible combinations of unique rows and columns, and copy the corresponding entry.
Every Node represents some valid prefix. The "children" of a Node are Nodes representing extensions of this prefix (if a given index is not a valid extension, then the child in that position is null). The "parent" of a Node is a Node representing the prefix which this Node extends. The "branch" of a Node is the last element of this prefix, and also the position of this Node in its parent's array of children.
There are two important operations in class Node. First, the operation "child" implements prefix extension. It returns a Node representing a given extension of this prefix, or null if the extension is not valid. Second, the operation "grow" is used to build up the prefix tree. It extends the current prefix by adding a child Node for the given extension, unless that child already exists.
Leaf Nodes represent complete lines, and thus have no children. However, they do have "data", a pointer to a Link. A Link is a linked list of integers. The "data" for a leaf Node is the indexes of all matching lines. If the space is standardized, then there are no duplicate lines, so this linked list has length one. In general, there may be repeated rows or columns, so there may be more than one entry in "data".
A Tree is basically a convenient wrapper for the root Node of a prefix tree ("top"). A Tree remembers how long lines are supposed to be ("length"), and remembers the K value of the set of lines, which is the number of possible children of a Node ("arity"). Class Tree provides two operations, "findLine" and "addLine", which search and modify the set of lines. These operations return a Link, a linked list of the indexes of matching lines.
After a call to "next" that returns true, look in "rowLinks" or "colLinks" to get the new matrix. The array "rowLinks" has one linked list for each row. The numbers in this list are the indexes of all the lines that match that row. (Naturally, "colLinks" does the same for columns). To get an entry (r,c) of the matrix, just look up any line in the list for row r, and find the element in position c.
Why not just provide the matrix itself, instead of these linked lists of indexes? First, the whole matrix is sometimes more information than is needed: query needs only the diagonal of each matrix, so there is no point in building more. Second, the matrix is sometimes less information than is needed: implication needs to know how many instances of each matrix to put in the answer, and thus needs to know the length of the linked lists. Also, if you want to use MatrixGenerator to find morphisms between two spaces, the rowLinks, colLinks form is more explicit: row r goes to row rowLinks[r].datum(), and similarly for columns.
MatrixGenerator does a depth-first search of all possible matrixes. Since the number of possible matrixes grows exponentially, it is important to prune the search tree as early as possible. MatrixGenerator works by trial extension of an region of overlapping partial rows and columns. Every new cell added to the region extends one row prefix and one column prefix. The arrays rowNodes and colNodes keep track of the prefixes for all rows and columns, and allow non-valid prefix extensions to be detected quickly. The order in which cells are added to the region ("herringbone" order) is also chosen to allow impossible combinations to be detected quickly (see the code for details of "herringbone" order).
Whenever a forward step in herringbone order takes the search out of bounds, this means that all the cells in the matrix have been filled, and a new matrix (or morphism) has been found. If a backwards step in herringbone order is attempted from cell (0,0), then the depth-first search has backed up to the beginning, and there are no more matrixes.
Before tinkering with MatrixGenerator, be very careful to think about boundry cases like rows or columns of length zero, or sets of lines for which there are no matrixes. Experiments to date indicate that the current implementation handles these cases correctly, but earlier implementations were buggy.
The binary operation "implication" has as its rows the morphisms from A to B. To generate these morphisms, a MatrixGenerator is constructed using the rows of B and the columns of A. Each new matrix found corresponds to several morphisms, one for each possible way to pick rows and columns to form that particular matrix. The linked lists in rowLinks and colLinks give all the choices for each row and column, so the product of their lengths is the number of morphisms corresponding to this matrix. For each matrix, the implementation counts instances, constructs a row, and enters the right number of copies of that row into the result.
The unary operation "query" has rows which are closed under two operations. First, all rows of constants are in query, and second, all square matrixes which can be formed from the rows has its diagonal among the rows. The implementation adds all constant rows first, then repeatedly uses MatrixGenerator to find all matrixes and add their diagonals to the set of rows. There are two representations for the set of rows: a Tree for use with the MatrixGenerator, and a Vector for quickly finding entry values and for building the final result. When all diagonals are found to already be in the set of rows, then all rows have been found.
Query is a very expensive operation, because it uses MatrixGenerator repeatedly. Fortunately, there is an optimization if K=2. In this case, the diagonals of matrixes are all just unions and intersections of rows, so query can be computed without using MatrixGenerator at all. The function "query2" handles this special case.
In general, the language is processed in two stages. In the first stage, a String is parsed down to a Program containing the names of spaces and operators. This stage may throw a SyntaxException if the String is unparsable. In the second stage, the Program gets executed against a Calc object. This stage may throw an ExecutionException if some name is not bound to a value or something else goes wrong at runtime. I chose this division of labor so that future versions of the calculator might store unexecuted Programs while still doing as much error checking as possible before the Program is stored.
The textual representation of Expressions does not exploit the full recursive generality described above. The simple "parse" function in class Expression currently assumes that the arguments of operators are all Identifiers. A more complex parser might handle nested, parenthesized expressions. Also, the current parser needs whitespace to find the separations between identifiers and operator names. A more clever parser might be able to do without this help. However, the problem is not simple, because (for example) the unary operator dual and the constant space bottom both have the same textual representation: "_|_".
AssignStatements are the heart of the scripting language. Every AssignStatement has a "lhs" (left hand side) String which names a variable, and a "rhs" (right hand side) Expression. To execute an AssignStatement, just evaluate the Expression and call "bindVariable" to store the result in the variable. The textual representation of AssignStatements is just var=expr . The parse function knows the difference between AssignStatements and other Statements by the presence of the equals sign.
InvokeStatements consist of just the name of an Executable. They are evaluated by looking up that Executable in the tables of a Calc object, then executing that Executable against that Calc object. Currently, the only InvokeStatements are "Big", "little", and other names bound to the Executables that turn standardization on and off.
SetKStatements fall between the cracks left by the other two kinds of Statements. They cannot be AssignStatements, because what is being altered is not a Chu space variable. They cannot be InvokeStatements, because there are as many ways to set K as there are new values for K. Therefore, SetKStatements are written by just giving the new value of K. The parse function knows the difference between SetKStatements and InvokeStatements, because SetKStatements parse to a number. (Note that it would therefore be unwise to use a number as the name of an Executable in the tables of a Calc object.)
GUI.java pulls together all the UI components into one interface. You may want to start here to get a top-down perspective on the structure of the UI.
Here's one improvement to the UI that I did not get to implement: All the components already get messages that tell them when calculations start and stop. While a calculation is running, most of the UI is useless: only the cancel button should be available. All components should enable/disable their buttons, etc. whenever calculations start and stop.
Whenever there is reason to believe that the space to display has changed, the right binding for the current name is found using "lookupChu". Whenever there is reason to believe that the set of names bound to values may have changed, then the new set of value names is found using "rvalueIdentifiers". The ChuSource learns about the end of a calculation (and the possible need to refresh "choice") by registering an Observer with the Calc object.
In this mode, the name of the current space is whatever the user types in the TextField "newVariableField". If the user hits return in this TextField, then the new name is bound to undef. If the user hits an operation button while the target is in newVariableMode, then the result ends up in a new variable with the name the user typed. In both cases, the new name has to be cleared using "newVariableNameOK" from Calc.java to make sure it does not conflict with an existing name of a variable or constant.
The user may take any existing value of a variable as a starting point for editing. A click on the parseButton or a return in any of the nrows, ncols, or K fields triggers an attempt to parse the contents of the left ChuView using the parse constructor in Chu.java. This constructor crops or pads the matrix to fit the given number of rows and columns, and throws an exception if the matrix conflicts with the given value of K. These behaviors are sometimes a useful feature, but it can be very annoying if it is not intended by the user. Therefore, by default spaces get displayed with the nrows, ncols, and K fields left blank: The user can fill them in if desired.
If the parse succeeds, then the resulting space is stored and displayed in the right ChuView. If the result is not what the user desired, no harm is done, because no bindings have yet changed. If the new space is correct, then it may be bound to any variable by using the storeButton.
The status panel has a TextField that reports the current state of the calculator, and a button for canceling a calculation in progress. The context panel displays the current standardization and K value, and allows the user to modify this data.
User clicks on the standardization checkbox cause a "Big" or "little" to be sent to the Calc objext and run. In general, user actions cause pieces of scripting language to be generated, sent to the Calc object, and run. After a calculation is finished, the context panel reloads the values of K and standardization from the Calc object. In general, all UI components resychronize themselves to the Calc object after a calculation finishes.
The most important panel in the UI is the main panel, which has two ChuSources and a ChuTarget, plus columns of buttons for unary and binary operations. Whenever one of these buttons is pressed, the names of the source(s), target, and operation are used to build an AssignStatement. This Statement is then sent to the Calc object, parsed and evaluated.
The bottom panel in the UI is the card panel. This panel is just a trick to save on-screen space. A ScriptPanel and a ChuEdit share the same region of the interface. The user can switch back and forth between them by hitting the associated CheckBoxes.