Desktops, metadata and filing

Revision as of 14:22, 28 September 2024 by Amwelladmin (talk | contribs)

In 1973, Xerox’s Palo Alto Research Center released the “Alto” personal computer. This was the first machine to boast a graphical user interface (GUI) instead of the traditional character user interface.[1]

JC pontificates about technology
An occasional series.
What you see is what you get, yesterday
The Xerox PARC Alto “desktop”
VisiCalc on the Apple II
Index: Click to expand:
Tell me more
Sign up for our newsletter — or just get in touch: for ½ a weekly 🍺 you get to consult JC. Ask about it here.

A bad metaphor: the desktop

To lessen the cognitive burden on users — at the time, bowler-hatted civil servants, sleeve-gartered clerks and others whose mental framework comprised girls in a typing pool and boys running memoranda between office in-trays in reusable manila envelopes, and whose idea of “information technology” was a pneumatic tube system that launched invoices around the clanking pipes of the organisation like mortar bombs — Xerox PARC created the metaphor of the “desktop”.

If they were going to be asked to stare instead at a computer screen then best make it as familiar as possible. Thus, the desktop: not an impenetrable wall of green code and a flashing cursor, but a cartoonish depiction of a literal desktop, with its comforting iconography of manila folders, a blotter, filing cabinets, in-trays, out-trays and even a dinky little waste-paper basket.

All very familiar; all just so: perhaps the transition to the atomic age would not be so bad after all.

A better metaphor: the spreadsheet

In 1979, Dan Bricklin and Bob Frankston created a new application for the Apple II computer. They called it “VisiCalc”. It was the first spreadsheet program. VisiCalc was, of course, the ancient ancestor of that beast we all now know and love as Microsoft Excel.

It might not have seemed much in 1979, but it would revolutionise business computing. While not nearly as intuitive as the “desktop” — there was no graphic user interface or anything like that — VisiCalc was a much purer expression of what a personal computer could do. It promised even modest undertakings a powerful means of storing, augmenting, filtering, analysing and manipulating unprecedented amounts of information as structured data.

Why it’s is a better metaphor

Aspreadsheet is a much better way of thinking about how to organise digital information than a desktop because digital information has no physical dimension. It is not constrained by a physical “substrate” — usually paper — that analogue information is embedded in.

The desktop was designed around that physical problem: how to manage bits of paper. In printed information, paper is “form”, text is “substance”. A desktop is obliged to prioritise form over substance because the substance does not exist independently of the form.

Paper must be put somewhere. Unless you physically copy it, it can only be in one place.

Copying and transporting paper is expensive, slow and “lossy”. Each copy loses fidelity and increases storage costs.[2]

Digital information has (almost) no form.[3] It does not occupy physical space. It costs nothing to store. We can copy and move it costlessly, instantly, and with no loss of fidelity. At least when compared with physical information, digital information can be everywhere at once. We are not constrained by space or time when we store or move digital information. Yet to file it, we use a metaphor that assumes we are.

A spreadsheet is not constrained by the physical limitations of a desktop: being a conceptually infinite number of rows and columns, a spreadsheet extends in two infinite directions:

  1. Downwards: You can add items to your filing system without limit, unconstrained by the area of your desk or the volume of your filing cabinet, where each item occupies one of an infinite number of rows.
  2. Across: You can categorise each item in an unlimited number of ways by creating a new column. There is no relative hierarchy between columns. They need not even bear any relation to each other as long as they relate to the original item. Whereas a subfolder is necessarily a sub-division of the folder it sits in, this is not true of a new column.

Division versus multiplication

In a “desktop” structure, subfolders are sub-divisions, each further level down more fine-grained and subordinate than the last, and less important relative to the formal hierarchy. We prioritise the hierarchy over the item. The hierarchy explains and contextualises everything.

All columns in a spreadsheet have equal standing — they are, well, pari passu — and their combination has a multiplicative effect: if an existing column, or an artful combination of columns, doesn’t yield the information you need, you can always add more columns. In a spreadsheet, we prioritise the item over the hierarchy. The hierarchy is incidental.

A front in the battle between substance and form

The desktop prioritises form.
A spreadsheet prioritises substance.

The last thing to notice is our old friend the struggle between form and substance: if we take it that, whatever your metaphor of choice, the “item” — the thing being filed — is the substance and the organising system it goes into is the form, we can see that the desktop and the spreadsheet have fundamentally opposed philosophies.

The desktop priorities form over substance — the “item” is buried at the bottom of a rigid formal structure of folders and subfolders which cannot easily be altered. This is why it is so hard to find things you have misfiled. You cannot put anything into the database until you have fully specified the folder path that comprises its formal structure.

By contrast, a spreadsheet prioritises substance over form. The “item” is the first thing to go in the database, naked of any formal structure. It therefore sits at the top of the structure. Only once it is in situ can you assign it any formal metadata properties. The item wears its metadata lightly, and is not affected if the metadata is later altered, removed or augmented.

Metadata

Each folder or column is its own item of metadata literally, “information about information” — about the item being filed.[4]

A hierarchical folder structure uses a very limited range of metadata: just characters of text, and a limited number at that: so limited that it is hardly worth thinking of it as metadata at all.[5]

A spreadsheet, by contrast, puts no limits on how we use metadata. It can take the form of text, calculable numbers and dates, checkboxes, people,[6] colours, flags, choices, lookups, comments, or calculations. It can be validated, managed, controlled, compulsory, optional, pre-populated or free-form.

Each extra piece of metadata enriches the item without detracting from it. Even if the metadata is wrong, the inconsistencies between it and other correct fields allow us to triangulate and identify problematic data.

Metadata is, in this way, “non-destructive”. It only augments. Each metadata field creates a new way of ordering information. If not being used to force a hierarchy, it can be much more detailed — you can date-stamp your item to the microsecond, where you might have had a folder for a month.

Suddenly, you can organise the same information in multiple different ways, simultaneously, without upsetting anyone else’s existing categorisation.

You can then filter and group your items by one or more columns. You can sort, chart, pivot and triangulate. The more metadata you have, the more ways you can look at the data.

You can even sort your data using data about how much metadata there is.

This is metametadata.

The “spreadsheet” approach to file management is hierarchy agnostic (or “multi-hierarchical”, if you like things to have hierarchies). What about unused metadata? Ignore it. Unused hierarchies are almost costless. And you just never know —

The desktop clings on

Yet even in our modern, hyper-networked, cloud-based work environment — even though we have had Microsoft Excel for nearly 40 years, the desktop metaphor hangs on.

We still call them “desktops”, though now for the prosaic reason that they generally are the only thing that sits on top of our desk. The desktop was a nice, quaint idea, and it got old men in green visors to sit down at a keyboard, and for that the ranks of middle management can be truly grateful, but it has well-outlived its purpose now.

The desktop assumes there is a unique physical location for any document as if it were in a physical library. Older readers may remember the Dewey decimal system, which numbered the entire corpus of non-fiction wisdom from zero to 1,000.[7]

Plainly, this is an imperfect state of affairs. It creates a basis risk. Which was the canonical version of the document, if there are duplicates? How can we be sure they are the same? What happens if one of them, but not the other, gets updated? Where the document is a “living thing” plotting its own miserable trajectory through the cosmos — a contract under negotiation, or a periodically updated legal template, for example — then duplicating it duplicates the manual task of updating all copies of the document as it changes, and that introduces the opportunity for human error. There may be miskeys. A document may be forgotten. Version control is a pain.

Also a preferred hierarchy can change, as personnel, managers, business priorities, or circumstances change. Changing your preferred hierarchy means completely re-engineering your folder structure.

Substrate neutrality

These are all problems of the physical realm; the spreadsheet metaphor shows us we need not be so troubled in the digital realm. In the digital world, the physicalsubstrate” is irrelevant. What matters is the ASCII code embedded in that document. In the digital realm, it has been abstracted from the substrate and floats free. Within a diverse network of collaborators, this is immensely empowering.

It did not take people long to realise that email was amazing.

From, more or less, a standing start in about 1993 — by lucky coincidence the year JC entered the workforce — the corporate world fell head over heels in love with electronic communication. Whatever reverence it had for the sacred substrate fell quickly away.[8] The expression, “this document is not worth the paper it is written on” has lost its meaning because the paper it is written no longer has much value at all.

Now we recognise the digital content embedded in the substrate is the valuable bit; the paper bit is just annoying. It is an inconvenient reminder of our erstwhile physical analogue reality. The better metaphor than the “desktop” here is the spreadsheet. A spreadsheet is, of course a rudimentary form of a database.

In a spreadsheet that inconvenient imposition of substrate has gone: a “document” is nothing more that an information string: more or less costless to generate, transport, replicate and store. By simply appending metadata, we can enrich it and put the same thing in several places at once. We transcend the Euclidian geometry of physical space.

Now, I said our reverence it had for the sacred substrate fell quickly away. It did not entirely fall away. We still revere wet ink, for some reason counterparts clauses, and the dear old desktop. For still, as we file, we cannot resist the siren call of folders. Folders in folders in folders in folders in folders. Why do we persist with folders?

More than twenty years ago Tom Zingale taught a young JC a valuable lesson. Battling with some byzantine folder structure, and losing, JC cried out in anguish, and Tom said this:

JC: How on earth am I meant to organise all this?

Tom: With metadata.

JC: Er, with what?

Tom: Metadata. The answer to your question is metadata. Metadata, metadata, metadata. Whatever your question is, the answer is metadata.

JC: Well, my question is, “How do I use metadata to fix this filing problem?

Tom: Oh, right. Simple: SharePoint.

About SharePoint

Now a lot of good people viscerally hate SharePoint. And, to be sure, Microsoft seems to have gone out of its way, over 20 years, to make SharePoint as hard to love as it can. But at the same time, it has based its entire Office 35 Suite on the SharePoint platform. It is, to be sure monumentally confusing, the Teams integration is baffling. The utterly dismal online versions of its Office suite drive people righteously up the wall.

But, still, a good part of the enmity for SharePoint arises from this fundamental misunderstanding. SharePoint is the first, philosophically, digitally native operating system.

SharePoint has abandoned the desktop metaphor.

SharePoint uses the spreadsheet metaphor.

In SharePoint you organise by metadata, not by folders.

DO NOT USE FOLDERS IN SHAREPOINT.

Folders are top-down. Metadata is bottom-up. Folders prefer form over substance. Metadata prefers substance over form.

SharePoint allows you to do exactly the same thing with a document library as Excel allows you to do with a spreadsheet.

So it is odd — isn’t it? We intuitively understand the power of metadata when we are presented with a spreadsheet. But the same power does not occur to us when we are presented with a file management system. The desktop metaphor is burned on our retina.

Even though it is, in essence, a supercharged online spreadsheet, SharePoint continues to be resented by almost everyone.

See also

References

  1. It was well ahead of its time: the GUI would not become mainstream until Apple released its Macintosh a decade later, in 1984.
  2. All physical information is eventually destined for the Iron Mountain.
  3. Okay: almost no form. Compared with physical information. In this section take the word “almost” as read.
  4. Grammar pedants’ corner: Even though “data” is plural, “metadata” is generally treated as a singular mass noun. Please direct your letters to the Royal Statistical Society — not because it is their fault: rather, they might keep metadata about this sort of thing.
  5. Indeed, the Windows operating system doesn’t treat folder names as metadata at all, which is mad.
  6. As in, a lookup to an object in a people directory, and not just a text name.
  7. My favourite was 001.9.
  8. I have a lengthy essay about the gradual extraction of data from the substrate but can't for the life of me find it at the moment.