Code and language

The JC pontificates about technology
An occasional series.


Comments? Questions? Suggestions? Requests? Insults? We’d love to 📧 hear from you.
Sign up for our newsletter.


Computer language (for ease of discussion let's call this “code”) differs from ordinary human languages (let’s can these “languages”) in that it has no concept of tense. It can still record and transmit past and future states, but it does so by rendering them in the present. The present tense is different from the past or the future because it addresses an infinitesimal instant having no duration in time. A given object in the present tense there cannot contradict itself. It can only have one state. Say Switch A has two possible states: On and Off: to describe the state of that switch at any time T, a computer language would describe them thus:
A(T1) = On
A(T2) = Off
A(T3) = On
And so on.
Each of these states is a discrete object in the code. They have a common feature (“A”) and they can be ordered by reference to their date (“Tn”) but they are not otherwise intrinsically connected, much less the same. The existence of A(T1) and A(T3) does not imply anything about configuration A(T2) or even that there is an A(T2).
This would be rather like how we imagine the mental life of a goldfish or someone with severe amnesia: every encounter with an A would present like a new, different (but very similar) object. “that's remarkable! I was just taking to a man who looked just like you!”
Indeed, A(T1) and A(T3) can contradict each other (one can be On and the other Off). It is open in the code to ignore some common features (“A-ness”) and instead draw different (less tenuous) connections: one could group A(T1) with B(T2) and C(T3) instead.
As long as they are rendered in the present tense, these objects are discrete, distinct objects. To interpret the language correctly one need only see these as date-stamped snapshots sharing a common feature, A. Though you can group these objects by reference to that common feature, and can calculate the differences between them, you don't have to. Nor does anything in configuration A(T3) imply anything about A(T1) or A(T2) — or even that there is an A(T1) or A(T2).
It is possible to describe these time-bound discrete objects exclusively in the present tense of a natural language (indeed, mathematics is a dialect of a natural language), but doing so would miss a quality that we would see as an important feature of A: its continuity. A at T1 is the same switch as A at T2, it has simply changed. This gives A a special form of existence, with a past, a present and a future. A has a history.
Natural languages accommodate history by using tenses. Yesterday, Switch A was configured to X. Today it is Y. Tomorrow it will be X again. (that “again” is important. You cannot comfortably say A(T) is X again, because A(T3) can only have one state. If you change A(T)’s state, you entirely wipe any record of its previous state. A state at an infinitesimal point in time cannot, itself, have a history.)
This history gives Switch A an identity that transcends any specific point in time. Switch A, encountered at any point in time and regardless of its state is, literally, the same thing.
Whenever we behold A we attribute to it all of its prior states, and all of its potential future states: all of them bound up in what it means to be A, even if we don’t know them (in the case of past states) they are not ascertainable (as is the case for future states) and even if we are mistaken about them: if we later discover that at P2, A’s state was Y and not, as we had previously believed, X, this does not change our conceptualisation of Switch A. It is still Switch A. We were just wrong about it, or we have now found out more about it.
But Switch A is an “it”. A past, present and a future. It has a history.
In computer code, A is simply a common property of existing fields A(T1) to A(Tn).
Both approaches capture the same “external data” - there is no degradation involved in either method - but a language applies a metaphorical overlay which designates given fields having a common feature as an integral object having a causal history. This history is a causal chain running through the “object”. Thus, unless something intervenes to change it, Switch A’s status at T+n will be the same as its state was at T. There is an equivalent causal chain operating on interactions between discrete integral objects.
By contrast machine code does sees only a correlation between states, and not a causal chain.
It was the great Scottish philosopher David Hume who first observed that the very idea of causality is an explaining fiction that we lay over the raw data of our observations to organise and make sense of them.
Describing the universe in tenses is more conceptually complex than rendering it in code. As well as a basic horizontal layer of data (each discrete switch status at each point in time), between which states we can describe mathematical correlations, there is a vertical layer tracking the progress through time of every Switch, from which we can prescribe identities and causal relations.
Describing means reporting states that are already there in the data: they may not be apparent, but their extraction is a matter of deduction and mathematical operation. This kind of thing a binary code is very, very good at. A switch can be on or off. There are no shades of gray.
Prescribing means imaginatively constructing new relationships between object states. Converting a correlation between states A(T1), A(T2) and A(T1) (they share the feature “A”) to a causal one (they are all temporal instances of an integral object “A” and that any of these states at any time is to some degree influenced by or dependent on the prior state of that switch.
But unlike the binary states of a switch, a prescription is not exclusive of other prescriptions. There are an infinite number of potential prescriptions - some complex, some simple - that can sit alongside each other. They might be contradictory. Prescription is the process of organising and fitting data to an existing theory. The success of a prescription can only really be judged from the perspective of the theory. (sometimes it is said that the success of a theory can only be judged from the perspective of the data, but this is a naïve view.)
So: Prescriptions are powerful, flexible and ambiguous. Machine code is not ambiguous. Every state is an infinitesimal unique, digital state. It can have one value only. So: to the extent it relies on history - that is, the identity of an object enduring through time, natural language is inherently ambiguous. The designation of the “object” itself is a constructive act (the decision to choose this common feature to define this object and not that one was at some level wilful or ad hoc - it was not logically constrained and cannot be justified except by reference its success at the use to which I put it, the outcome, which in part is a function of the very choice to group in that way in the first place.
Also the state of an object in the infinitesimal now is statistically insignificant in the context of an infinitely larger history. It might have only just acquired its state, it might be about to lose it. It is never unequivocally true. It's ambiguous.
So far we've been talking about a single on/off switch - in any complex system (whether a computer, a machine, a cell, a molecule, or an atom), the simplest possible unit. 1 or 0; existence or non existence. One cannot further reduce. Even here we find, when we are using a natural language, ambiguity.
The ambiguity, and the power of the history we create, is amplified if we group switches together. All complex systems (brains, organisms, societies, the Internet) are combinations of billions of switches. By selecting the switches that we wish to group together we can identify meta-objects to which we apply a history. Meta objects like “a bicycle” or “a program”, or “you” or “me”.
For exactly the same reason any such grouping is (or at one point in time was) an imaginative act: nothing in the underlying configuration of switches requires any grouping at all (we could operate like a code) let alone any particular grouping: the groupings we apply - many of which seem profound - evolved and layered on top of each other throughout the development of natural language. It is a way of organising sensory input to make sense of it. Call this “narratisation” (© Julian Jaynes) or coining a metaphor.
The ability to narratise - the susceptibility of a language to metaphor implies ambiguity.
That history may be a kind of fiction: the ship of thebes will tend to be treated as a continuous object when, if you look at it as a collection of infinitesimal states, there is not one in common.
So?
So natural languages have an advantage of code in that they can natively orient continuous objects in four dimensions (three spatial dimensions and time) rather than two which, to a computing machine that can cope with that extra dimension, makes for a massive gain in efficiency and computing power. This historical overlay means you can treat A(T1), A(T2), and A(T3) not as separate objects but the same thing, and you can infer the existence of an infinite amount of intermediate states (A(T1.0001), A(T1.0002) etc.) and even (probabilistically) deduce their values (if the value of consecutive fields A(T1) and A(T1) is On, then it stands to reason hypothetical intermediate field A(T1. 5) will be On, too). Not that in the basic data layer there is no A(T1.5), so arguing whether it is “true” or not is somewhat fraught.
But this comes at a cost: firstly, ambiguity. Even before we start inferring hypothetical intermediate states, the very history we have given A to help us make sense of it renders it ambiguous and volatile. It can change. It can contradict itself. The causal continuity we have given it may misrepresent its essence.
Secondly, arbitrariness. The continuity we assign to related switches and collections of switches is our own invention. Any given switch may have many particular features (“A”ness, “B”ness, “C”ness and so on); which one (or more) we choose to associate with it when we see it as a continuous object is up to us: in our discourse a red car from Italy can be a car (that happens to be red and Italian, a red thing (that happens to be an Italian car), an Italian thing (that happens to be a red car) a red car (that happens to be from Italy) and so on. Any of these groupings is legitimate, but the act of preferring one over another is an act of creative “narratising”: it does not exist in the data - it is a function of the sentence we put it in. (Richard Rorty put it this way: truth is a property of sentences, not objects”.
This brings us back to an important point. Algorithms can be fiendishly complex but they must have one property: for any input, only one output. They cannot require the machine to see any nuance, or make any value judgment. There cannot be any ambiguity in the instructions. If an algorithm is presented with an input it does not expect (that the algorithm does not cater for) the program will stop. If the algorithm stipulates “you decide what to do next” the program will freeze.