Slides for Friday's Fields institute talk

*Preamble: Miscellaneous Links*

[Note: In the following, whereas in the past we've written A and X for the respective sets of points and states of a Chu space, Steve Vickers has them the other way round for the points and opens of the topological systems defined in his book ``Topology via Logic.'' We had originally justified this by matching up Chu spaces with frames rather than locales. However we felt it might be less confusing to orient Chu spaces to agree with the ``natural direction'' of topological systems, taking X to consist of points and A of states. The downside of this switch is the possibility of confusion and inconsistency during the transition. We considered the alternative notation X

[If any of the symbols × → ≤ ∧ ∨ ∀ ∞ ∈ ∉ is a box then your browser is lacking some of the HTML 4.0 mathematical symbols as listed in http://www.cs.tut.fi/~jkorpela/html/guide/entities.html. Consider upgrading your browser to Firefox. Safer too. If only the last symbol is box, it denotes not-∈.]

* Short Form* A Chu space is a transformable matrix
whose rows transform forwards while its columns transform backwards.

* Generality of Chu spaces* Chu spaces unify a wide
range of mathematical structures, including the following.

Algebraic structures can be reduced to relational structures by a technique described below. Relational structures constitute a large class in their own right. However when adding topology to relational structures, the topology cannot be incorporated into the relational structure but must continue to use open sets.

Chu spaces offer a uniform way of representing relational and topological structure simultaneously. This is because Chu spaces can represent relational structures via a generalization of topological spaces which allows them to represent topological structure at the same time using the same machinery.

**Definition**

Surprisingly this degree of generality can be achieved with a remarkably simple form of structure. A Chu space (X,r,A) consists of just three things: a set X of points, individuals, or subjects, a set A of states or predicates, and a lookup table or matrix r: X×A → K which specifies for every subject x and predicate a the value a(x) of that predicate for that subject. The value of a(x) is given by the matrix r as the entry r(x,a). These matrix entries are drawn from a set K.

K can be as simple as {0,1}, as when representing topological spaces
or Boolean algebras. Or it can be as complex as the set of all complex
numbers, as when representing Hilbert spaces. Or it can be something
in between, such as the set of 16 subsets of {0,1,2,3} when
representing topological groups. The full generality of Chu spaces
derives from the ability of predicates to take other values than simply
*true* or *false* (the case K={0,1}).

This definition can be reconciled with the short form definition above by
taking A to be a subset of K^{X}, namely by representing the
predicate a as the function λx.r(x,a): X→K mapping each
individual to the value of the predicate on that individual, and taking
A to be the set all such functions from X to K. In this case there is
no separate matrix, just the two sets consisting of respectively
individuals and predicates. However this breaks the symmetry of
subject and predicate. The more symmetric matrix encoding of this
information makes it equally reasonable to view X as a subset of
K^{A} when this is helpful, bearing in mind however that the
matrix may contain repeated rows and/or columns, i.e. in neither case
need these be extensional subsets of K^{X} or K^{A}.

A more satisfactory reconciliation with the short form definition is
to view the subject-predicate relationship as a symmetrically expressed
relation r. We do not *have* to interpret the notion of
predicate on a set X of subjects as a function a:X→K, any more
than we have to interpret a subject as a function x:A→K. Instead
we have three options: either of those two, or just leaving r as the
symmetric expression of the relationship. Unlike both algebras and
topological spaces, Chu spaces do not take subjects to be primitive and
predicates to be derived, but rather take both to be primitive.

In this symmetric view, ``subject'' is more natural than ``individual.'' One envisages an individual as having an independent existence. A subject on the other hand is a subject of something: it forms one half of an elementary proposition r(x,a) that combines a subject x with a predicate a.

This reconciliation has a historical link with the discovery and resolution of the paradoxes of Cartesian Dualism. If we identify the sets X and A with Descartes' 1647 division of the universe into physical and mental components respectively, then r is the mediator of these components sought by many philosophers during the following century. The respective proposals of Hume and Berkeley to make one side or the other primitive correspond to taking respectively X or A to be primitive and deriving the other in terms of functions from the former to K. That Hume won out would seem to be correlated with mathematics' preference for basing mathematical objects on their constituent individuals rather than their constituent predicates.

If we view the open sets of a topological space as its permitted predicates defining its structure, a topological space is an example of an object structured by the interaction of its subjects and predicates. It is by no means the only example however. Although both abelian groups and vector spaces are traditionally represented as algebras, they also have well-known alternative representations as what amounts to Chu spaces. Finite (and more generally locally compact) abelian groups are representable as the sets consisting of their elements and their homomorphisms (continuous in the infinite case, where locally compact becomes necessary) into the multiplicative group of nonzero complex numbers. And vector spaces V over a field F are representable as the sets of points and dual points of V, with the latter defined as the set of functionals on V, or linear transformations V→F where F in this context denotes the one-dimensional vector space over F.

Now K is always given merely as a set, with no particular structure of its own. In particular the case K={0,1} is not presumed to have the structure of a Boolean algebra or even a lattice, while when K is the set of complex numbers it is not assumed to be a field.

So a Chu space amounts to just a matrix r over a plain set K. As
such it hardly seems like a general or powerful notion.
The spark that animates Chu spaces is the manner in which they
transform. Whereas a set X transforms into a set Y with a single
function f: X→Y, a Chu space (X,r,A) transforms into a Chu space
(Y,s,B) with a pair of functions f: X→Y, g:B→A. These
functions are not entirely independent, being required to satisfy
s(f(x),b) = r(x,g(b)) for all x in X and b in B, called the
*adjointness* condition. We call f and g *adjoints* of
each other.

In general there is no requirement that the rows and columns of the Chu space matrix be distinct. When they are however, each of the two functions f,g of a Chu transform determines the other. This can be seen by associating with f the matrix m(x,b) = s(f(x),b). By adjointness of f,g, m(x,b) = r(x,g(b)) as well. Thus each of f and g determines m.

Now if all the columns of (X,r,A) are distinct, that is, if (X,r,A)
is *extensional*, there is only one possible function g:Y→X
that can satisfy m(x,b) = r(x,g(b)) for all x and b. Hence under that
assumption, m determines g whence so does f. Dually, if all the rows
of (Y,s,B) are distinct, that is, if (Y,s,B) is *separated*,
there is only one possible f: A→B that can satisfy m(x,b) =
s(f(x),b). So under *that* assumption, m determines f whence so
does g.

So actually f and g are far from independent. Even without any
requirement of distinct rows or columns, if we know f then g is almost
completely determined: the only degree of freedom it has is in choosing
one column from a set of columns all having the same entries. We say
that such columns are *isomorphic*. Two Chu spaces are
*equivalent* when they have the same sets of rows and columns up
to isomorphism, without regard for the identities of the subjects or
predicates.

We interpret r(x,a)=1 as meaning x∈a (x is a member of a), and r(x,a)=0 as x∉a (x is not a member of a).

No additional restriction on the notion of Chu transform is needed, as it exactly expresses the standard notion of continuity for ordinary topological spaces. It achieves this as follows. The adjointness condition on (f,g) asserts x∈g(b) ⇔ f(x)∈b, that is, g(b) = {x|f(x)∈b}. But this is exactly the notion of inverse image of f. The requirement that g map B to A then asserts that the inverse image of each open set of (Y,s,B) is an open set of (X,r,A).

Although we required topological spaces to be extensional, we did
not require them to be separated: they are permitted to have repeated
rows. A topological space defined in this way is said to be
T_{0} just when it is separated. The usual definition of a
T_{0} space is one such that for any two distinct points there
exists an open set containing one but not the other. But this can be
seen to be the same thing as requiring all rows to be distinct.

* Open sets as predicates. * The

Chu spaces generalize topological spaces analogously to the way topological spaces generalize partially ordered and preordered sets, namely by imposing fewer restrictions. To appreciate the former it is helpful to understand the latter, which we explain here.

A preorder (X,≤) can be understand as a restricted kind of
topological space, namely its *Alexandrov topology*, having as
its open sets the order filters or up-sets of (X,≤). These are
those sets a such that y≥x ∧ x∈a → y∈a. It can
be verified that both the union and the intersection of any set of
order filters is an order filter; in particular the empty set and the
whole space are order filters, whence the Alexandrov topology so
defined is indeed a topological space.

Conversely a topological space determines a preorder called the
*specialization order* of the space. This is simply the
coordinatewise ordering of the rows: x≤y is defined as
∀a[x∈a → y∈a]. This binary relation is clearly
reflexive and transitive and hence a preorder. Furthermore it is
antisymmetric, and hence a partial order, if and only if the space is
T_{0}. The important point here is that distinct (i.e.
nonisomorphic) topological spaces can have the same (i.e. isomorphic)
specialization orders, as we will illustrate below with
N∪{∞}. Topology is more expressive than order in that it
permits distinctions to be drawn that order alone cannot.

When the open sets are closed under arbitrary intersection (in addition to arbitrary union), the space is uniquely determined by its specialization order as being the Alexandrov topology of that order. Conversely any Alexandrov space, one whose open sets are closed under arbitrary intersection, is the Alexandrov topology on its specialization order.

All finite topological spaces are trivially Alexandrov spaces. The
simplest example of a non-Alexandrov space is obtained by starting with
the Alexandrov topology on the poset N∪{∞} standardly ordered
as 0≤1≤2≤…≤∞, and removing the open set
{∞}. When this open set is present, the predicate (as a
continuous function to the Sierpinski space described above) that is
false (0) on the natural numbers but true (1) on ∞ is permitted,
as the inverse image of {1} is then {∞}. That is, {0} is a
closed set whereas {1} is not. When {∞} is absent as an open set
however, any function mapping all the natural numbers to 0 must also
map ∞ to 0, i.e. to the same closed set {0} as where the natural
numbers went. This topology is called the *Scott topology* on
N∪{∞}.

This demonstrates the role of omitting closure of the open sets
under infinite intersection, namely to permit expressing a quantity
such as ∞ as the *limit* of an infinite sequence. The
Sierpinski space is a primitive yet sufficient target for expressing
this limit property; for any other target, if all finite numbers
greater than some value land inside some closed set however small, then
∞ must also land inside that closed set. When the open sets are
closed under arbitrary intersection, as with the Alexandrov topology of
a poset, it ceases to be possible to link the fate of one point under a
function to that of an infinite sequence of points.

Thus this additional flexibility of topological spaces over posets makes them more expressive. By allowing yet more possibilities, such as not being closed under arbitrary union for starters, Chu spaces are in turn more expressive than topological spaces.

As a simple example, the two point Chu space having two open sets, namely the two singletons, has for its continuous functions on itself just the two permutations of the two points. Unlike topological spaces, for which every constant function is continuous, here neither of the constant functions are continuous.

Sets are unstructured objects, like an unfurnished house. Structure has traditionally been provided in ad hoc ways according to need, like adding sofas and beds for a residence or desks and computers for an office. Mathematics furnishes its objects with operations to form algebras, relations to form relational structures, or topology to form topological spaces.

Chu spaces provide an elementary way of furnishing sets with structure that subsumes all the principal structuring techniques currently in use, both elementary and sophisticated. In so doing they remove the walls that divide up mathematics into its categories, and introduce many new structures previously unknown to mathematics, to create a new, universal, and homogeneous mathematical landscape. Every mathematical object is representable as a Chu space whose transformability is fully faithful to the transformability of the object it represents.

The extant mathematical categories tend in practice to group objects by stiffness. Sets are the most flexible, having the granularity of sand. Vector spaces are in the middle: picture a block of rubber. Boolean algebras are the stiffest.

Some categories have greater diversity of stiffness than others: partially ordered sets range in stiffness from that of sets to that of vector spaces, while the stiffness of distributive lattices ranges from that vector spaces to that of Boolean algebras.

Chu spaces span the whole gamut of stiffnesses; we have called this
the *Stone Gamut* in honor of Marshall Stone, who discovered
that the dual of a Boolean algebra was a Stone space, and that of a
distributive lattice an ordered Stone space. (He of course did not
call them that, but identified them nonetheless by precisely and fully
describing them in the language of topology.)

Only a few standard categories of mathematical object share with Chu
spaces this property of being *self-dual*. Among the
better-known ones are finite abelian groups, finite-dimensional vector
spaces, sets transformed by binary relations, finite chains with bottom
(top also works), and (as a common generalization of the previous two)
complete semilattices.

Chu spaces are important to the foundations of mathematics because they demonstrate that when one has stepped back to view the mathematical landscape from a sufficient distance, a global symmetry appears, duality, that is not apparent when standing inside any particular category. This is like the difference between natural numbers and integers: with the former there is no operation of negation, which only springs into existence to create a symmetry when the view is broadened to include the negative numbers.

Further enlarging the universe of numbers to bring in the rationals, the reals, or the complex numbers, does not harm the basic symmetry of negation. This feature of the universe of all numbers becomes apparent when you step back just far enough to see the integers. (Seeing the rationals then the reals entails then stepping closer again without losing the panorama of the integers.) By the same token, once the basic universe of Chu spaces has been created, further expansion to yet larger universes (at least when they are constituted from yet more general Chu spaces) does not destroy the symmetry of duality.

We are therefore dealing with the duality of points and states. Now it is eminently reasonable to think of points as physical and states as mental. This would make duality for Chu spaces a sort of duality between the mental and the physical.

When we introduced Chu spaces near the start of this page, we indicated their generality in being able to represent a wide range of mathematical objects. These representations are concrete in the sense that we represent algebras and relational structures having underlying set X as a Chu space (X,r,A) having the same underlying set, and having a set A of states purporting to represent the same structure as the algebra or relational structure.

Our test for whether the structure has been fully and faithfully represented will be whether the continuous functions between the Chu spaces doing the representing are the same functions as the homomorphisms between the structures they represent.

We can reduce the case of algebras to that of relational structures
by the technique familiar from first order logic of representing an
n-ary operation f: X^{n}→X as an (n+1)-ary relation
R. This is accomplished by taking R to consist of all tuples of the
form
(x_{1},…x_{n},f(x_{1},…x_{n})),
one such tuple for every n-tuple
(x_{1},…,x_{n}). For example the addition
operation + of the algebra (N,+) of natural numbers under addition can
be represented as the ternary relation R consisting of all triples
(x,y,z) in N³ satisfying x+y=z.

We begin by showing how to represent a relational structure (X,R)
having just one n-ary relation R, that is, R is a subset of
X^{n}.

Let n denote the set {1,2,…,n}. We will represent the n-ary
relational structure (X,R) as a Chu space over 2^{n}. This is
stated more precisely as follows.

**Theorem. ** *The category of n-ary relational structures and
their homomorphisms is a concretely full subcategory of
Chu(Set,2 ^{n}).*

``Concretely'' means that the representing Chu spaces have the same underlying sets as the structures they represent, and that the representing functions between them are the same functions as the homomorphisms they represent. (Hence the representation is faithful: distinct homomorphisms are represented by distinct continuous functions). ``Full'' means that every continuous function between Chu spaces representing structures represents a homomorphism between the represented structures.

**Proof. ** Represent (X,R) as the Chu space (X,r,A) where A
consists of those n-tuples of subsets (a_1,a_2,…,a_n), where
each a_i ⊆ X, such that for every n-tuple (x_1,x_2,…,x_n)
in R, there exists i such that x_i ∈ a_i. This makes A of type
(2^{X})^{n}, but this is isomorphic to
(2^{n})^{X} which is the desired type K^{X},
noting that K=2^{n}.

This representation is concrete, having the same carrier as (X,R). To see that the representation is full and faithful, we must show that any function f: X→Y is a homomorphism from (X,R) to (X,S) if and only if it is a continuous function from the Chu space (X,r,A) representing (X,R) to the Chu space (Y,s,B) representing (Y,S).

(Only if) We first show that homomorphisms are continuous. For a
contradiction let f: X→Y be a homomorphism which is not
continuous. Then there must exist a state
(b_{1},…,b_{n}) in B for which
(f^{-1}(b_{1}),…,f^{-1}(b_{n}))
is not a state in A. Hence there exists
(x_{1},…,x_{n})∈ R for which
x_{i}∉ f^{-1}(y_{i}) for every i. But
then f(x_{i})∉ y_{i} for every i, whence
(f(x_{1}),…,f(x_{n}))∉ S, impossible
because f is a homomorphism.

(If) We now show that continuous functions are homomorphisms. That is,
given (x_{1},…,x_{n})∈ R we must have
(f(x_{1}),…,f(x_{n}))∈ S. For if not then
({f(x_{1})}',…,{f(x_{n})}') is a state in B,
where {f(x_{i})}' denotes all of Y except for the one element
f(x_{i}). Then by continuity,
(f^{-1}({f(x_{1})}'),…,f^{-1}
({f(x_{n})}')) is a state of X. Hence for some i,
x_{i}∈ f^{-1}({f(x_{i})}'), i.e.
f(x_{i})∈ {f(x_{i})}', which is impossible.
**QED**

The above caters for the case of a structure with a single
relation. A more general case allows multiple relations, as in
(X,R_{1},R_{2},…,R_{n}) where the i-th
relation R_{i} has arity &alpha_{i}.

This case is easily reduced to the single-relation case by first
forming R' as the product Π_{i}R_{i}. R' is the set
of all n-tuples whose i-th component is some α_{i}-tuple
from R_{i}. Take R to be the set of `flattened'' n-tuples each
resulting from concatenating the constituent α_{i}-tuples
of R' to form a single composite tuple of arity
Σ_{i}α_{i}.

*Claim:* The homomorphisms between single-relation structures
(X,R) formed in this way coincide with the homomorphisms between the
n-relation structures
(X,R_{1},R_{2},…,R_{n}) they were
derived from. This is because preserving the composite tuples is
equivalent to preserving each of their constituent subtuples from the
individual relations R_{i}.

This can be further extended to multisorted relational structures
(X_{1},…,X_{m};
R_{1},…,R_{1}n). Here each R_{i} is a
subset of some mixed product of X_{j}'s. Homomorphisms between
such structures are m-tuples (f_{1},…,f_{m}) of
functions f_{j}: X_{j}→Y_{j}.

This case is reducible to the previous single-sorted case by
replacing the X_{i}'s by X = Σ_{i}X_{i}.
In effect this erases the type information distinguishes the sorts from
one another, thereby collecting all the elements into one
undifferentiated set X. The m-tuples of functions can then be viewed
as a single function from X to Y. We represent this single-sorted
version as before.

However the resulting single-sorted continuous functions, though
respecting the tuples, need not preserve sorts. That is, we have not
prevented f(x) from being an element of Y of sort different from that
of x. We enforce that requirement by replacing the K arising in the
single-sorted construction by K×m where m = {1,2,…,m}, and
replacing the matrix r: X×A → K arising in that construction
by r': X×A → K×m defined as r'(x,a) = (r(x,a),j) where
x was originally in X_{j}. The adjointness condition s(f(x),b)
= r(x,g(b)) then forces f(x) to have the same type as x.

Apart from crossword puzzles being real, is there anything else about Chu transforms that has anything to do with the real world? Well, suppose you are writing an email with your favorite editor and you have just entered the first three letters, say "cat", when you realize you need to go back and delete the first letter. You have transformed the 3-letter string "cat" to the 2-letter string "at".

Now think of the set of all 3-letter strings as the individuals of a Chu space whose "states" are pos1, pos2, pos3, with the value of state pos1 at any given string being the letter in position 1, and so on for the other positions. If your alphabet has 26 letters then these states have 26 "truth values" and that set will have 26x26x26 = 17576 strings of length 3.

Likewise think of the set of all 2-letter strings as the individuals of a Chu space whose states are pos1 and pos2. This is a smaller set with only 26x26 = 676 strings of length 2.

The act of deleting the first letter can be understood in terms of a function from strings of length 3 to strings of length 2. But it can also be understood as a function from {pos1,pos2} of the target to {pos1,pos2,pos3} of the source, namely the function that maps pos1 (in the target) to pos2 (in the source) and pos2 (in the target) to pos3 (in the source).

The latter function tells your editor how to obtain the new string from the old one. The letter at pos1 now must be whatever it was before at pos2, and likewise pos2 must have come from pos3.

So the simple act of deleting a letter from a text buffer is a Chu
transform. In one world it transforms one set of strings to a smaller
set of strings (smaller by a factor of the size of the alphabet). In
the dual world, at the same time it is busy transforming one set of
positions *backwards in time* to another, by explaining to the
editor how to obtain the letters in the new string from those of the
old.

The forward-looking function is the transformation you thought you were applying to the space of strings, taking you from the past to the future. The backward-looking one is the hidden historian, constructing the future by looking back into the past. Each determines the other, and together they constitute the Chu transform you realized by hitting backspace.

Anyone who has ever implemented an editor knows that deleting the last letter of the buffer is the easy operation; deleting any other letter requires moving all the following letters up one place to fill in the gap (or some equivalent thereof). We can see this directly from the backwards function, which leaves unchanged the positions before the cursor and subtracts one from those after.

In the foregoing we called the strings points and the positions states. But when you're staring at a particular string, what you see are letters at positions. So it really seems as though the positions should be the points and the letters at those positions the local state at each position, with the whole string being the global state of the text buffer. How did we end up with them the other way round?

In fact we could have transposed this matrix to make the positions points and the letters states. But then the Chu transform for deleting the first letter would point backwards in time, which is counterintuitive. How do we reconcile the conflicting results of these two points of view?

When we are processing information we are transforming states of the buffer, where a state conveys information. Such a transformation can be regarded as mental in nature. If we try to view information-manipulating operations in terms of physical positions of letters, transformation seems to be going backwards. But if we view it in terms of mental states it goes forwards, which is the customary direction we associate with transformations. So evidently (if it wasn't already obvious), deleting a letter is fundamentally an information-processing or mental activity rather than a physical one.

Another thing we can do besides deleting information is copying it. Suppose we duplicate the first letter, that is, abc becomes aabc. The Chu transform expressing this editing operation maps the set of 3-letter strings to the set of 4-letter strings. But it also maps the set {pos1,pos2,pos3,pos4} backwards to {pos1,pos2,pos3} by sending both pos1 and pos2 to pos1, pos3 to pos2, and pos4 to pos3.

Once again we find that we are transforming mental entities, namely the possible states of the buffer, forwards in time while transforming the physical positions backwards in time. So copying is also an information-processing or mental transformation. Intuitively obvious of course, but it is nice to see that our intuition is supported by the respective directions taken by the mental and physical halves of the Chu transform representing this particular copy operation.

Are all editing operations mental in this sense? No. Consider the operation of extending a buffer of length 2 by appending one more position. This, we claim, is a physical operation, involving no information processing. All it does is create a new position.

This operation cannot be described as a function from the set of strings of length 2 to those of length 3 because we do not know which string to send say ab to; it could be any of aba or abb or ... Nor is there a function from {pos1,pos2,pos3} back to {pos1,pos2} because we do not know where to send pos3.

On the other hand there *is* a function from the set of
strings of length 3 to those of length 2, which sends each string abc
to ab. And this is mirrored by a function from {pos1,pos2} to
{pos1,pos2,pos3} which sends each of pos1 and pos2 to itself. So here
we have a Chu transform for which the physical half is going in the
forward direction. Creating space for letters is physical.

The very act of building a computer memory creates space for information. So the manufacturers of computers are not processing information, they are performing physical actions. Again intuitively obvious, but again it is nice to see this close up and in slow motion.

This section takes what for some at least is hopefully an entertaining detour into physics. By all means take it with a grain of salt, and bear in mind that nothing in this section should be taken as reflecting poorly on the material outside; we are here simply to have fun for a little while. While I see nothing fundamentally erroneous in the arguments below, a real physicist might well complain.

So far we have only considered mental states. What about physical
states? Aha, here we take the position that there is *no such
thing*. States are states, regardless of whether the points they
are states of are part of a brain, part of a computer, or part of a
swinging hammer. All states are mental, and the points that
coordinatize state vectors are physical.

The counterparts of point and state for physics are time and energy respectively (along with space and momentum respectively but let's keep things simple here). Points in time can reasonably be considered physical, but surely energy is physical too (it's all physics after all).

Mass lives on the energy side of time-energy duality, and indeed is
interchangeable with energy via E = mc^{2}. Surely
*mass* is physical at least!

Here we really go out on a limb and declare that energy, mass, and for that matter momentum, are all fundamentally mental qualities rather than physical.

Now one place that information does creep into physics is entropy, which except for a minus sign is information. Entropy S is intimately correlated with energy Q via the thermodynamic relationship dQ = T dS. That is, energy flux is in proportion to entropy flux with temperature constituting the constant of proportionality. Rewriting the relationship as T = dQ/dS, we can view temperature as joules (energy) per negative bit (entropy or neg-information).

To make the notion of flux more vivid we can divide both sides of either of the first two equations by dt. The second equation then becomes dS/dt = dQ/dt / T. The units on the left are bits per second (data rate) while those on the right are ergs per second (power) per degree. For a given level of power, temperature acts to dilute data rate.

So what is temperature? When analyzed carefully, temperature of a system of particles, whether those particles form a gas, liquid, or solid, turns out to be a statistical notion expressing where the distribution of energies of the particles of the system is most strongly peaked. For large systems in thermal equilibrium, the law of large numbers makes this peak astonishly narrow, giving such systems a very well-defined temperature. This is a fundamental fact of statistical mechanics.

The statistical aspect aside, the basic idea with temperature of a system is that it is the prevailing energy of the particles of that system. Now the more energetically particles bang around, the noisier one can imagine things getting. Entropy is the result of this noise "drowning out" the information flow that we call energy flow.

Following our principle that energy is really information, we can understand entropy flow as the dilution of information flow, in the form of energy flow, by the noise that is temperature. This is the real content of dS = dQ/T. As with entropy, there is a minus sign that is needed to complete the connection between energy and information: energy like entropy is negative information.

The significance of all this is that both energy and entropy can be understood as information, respectively before and after dilution by temperature when viewed differentially as flux.

Since mass is just slowly moving energy (and lots of it for even a
small amount of mass, c^{2} being a very large number), we have
that mass too is information, huge amounts of it, and hence mental.

One situation having a clear-cut connection between mass and information is black holes. Here however we have that mass behaves not as information directly but as the square root of information. The information in a black hole is its area while its mass is the square root of its area. When black holes collide they merge violently to produce a single black hole, the only absolutely irreversible process in all of physics. The information in the resulting black hole in bits is the sum of the number of bits in each of the two colliding black holes.

If the colliding holes contribute 1 bit each then each weighed one mass unit (for the appropriate choice of unit). The resulting 2-bit black hole then weighs 1.414 mass units, with the missing .586 mass units radiated violently away as gravitational waves.