Desktops, metadata and filing: Difference between revisions

From The Jolly Contrarian
Jump to navigation Jump to search
No edit summary
No edit summary
 
(37 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{a|technology|{{wmc|Xerox Alto with mouse and chorded keyset - Computer History Museum.jpg|What you see is what you get, yesterday}}}}====A bad metaphor====
{{afreeessay|technology|metadata|{{wmc|Xerox Alto with mouse and chorded keyset - Computer History Museum.jpg|What you see is what you got, yesterday: A Xerox Alto with a portrait monitor. Nice.}}{{wmc|Desktop icons for Xerox Star 8010.jpg|The Alto’s “desktop”}}{{wmc|Visicalc.png|VisiCalc on the Apple II}}}}
{{drop|I|n 1973 Xerox’s Palo}} Alto Research Center released the “Alto” personal computer. This was the first machine to boast a graphical user interface (GUI) instead of the traditional character user interface.<ref>It was well ahead of its time: the GUI would not become mainstream until Apple released its Macintosh a decade later, in 1984.</ref>
{{nlp}}
 
{{wmcflex|file=Desktop icons for Xerox Star 8010.jpg|width=49%|align=left|caption=|parameters=margin-right:7px; margin-bottom:0px;}}To lessen the cognitive burden on users — then still bowler-hatted civil servants and sleeve-gartered clerks of an analog world of mailroom boys running written memoranda around the office in reusable envelopes and where “information technology” meant a {{pl|https://pneumatic.tube/the-lamson-pneumatic-tube-system-at-jacksons-of-reading-uk|Lamson Pneumatic Tube System}} for sending documents invoices around the organisation by a sort of commercial mortar — the designers at Xerox PARC created the metaphor of the “[[desktop]]”.
 
Yes, you were looking at a computer screen. But ''on'' that screen was not an impenetrable wall of orange code and a flashing cursor, but a cartoonish depiction of a ''literal'' desktop, with manila folders, a blotter, filing cabinets, in-trays and out-trays and even a dinky little wastebasket.
====A better metaphor====
{{wmcflex|file=Visicalc.png|width=49%|align=right|caption=|parameters=margin-left:7px; margin-bottom:0px;}}In 1979, Dan Bricklin and Bob Frankston created VisiCalc for the Apple II computer. VisiCalc was the first spreadsheet program. It revolutionized business computing, allowing users easily to create and manipulate tabular data. It was a major factor in the success of the Apple II and helped drive the adoption of personal computers in businesses.
 
VisiCalc wasn’t nearly quite as dinky or intuitive as the desktop. It was a much purer expression of what a personal computer could do, though: it promised a powerful means of storing, augmenting, filtering, analysing and manipulating structured data. It is, of course the ancestor to that beast we all now know and love as [[Excel|Microsoft Excel]].
 
This was a nice, quaint idea, and it got old men in green visors over the Rubicon, but it has well outlived its purpose now. Now, the “desktop” is a bad metaphor. It forces us into ''idiotic'' filing practices.
 
We put our files in folders, as we would do on a real desk. Sometimes a big folder might contain subfolders. And just as with a real desk, once you have put a document in one folder, you can’t very well put it anywhere ''else''.
 
Because the thing is, folks, ''physical'' information that sits on a real desktop, and ''digital'' information that sits on a computer are very different ontological propositions.
 
Physical information, in the real world, can only be in one place at one time. Your filing system reflects this: there is a unique physical location for any single document. So do filing methodologies: older readers may remember the [[Dewey decimal system]] which categorised the entire field of non-fiction wisdom by number.
 
If the same document needs to be categorised in different ways this can only be achieved by ''duplicating'' it. In the physical realm, duplication was slow, imperfect and expensive and so, limited. At the time this seemed to be a drawback; with hindsight, it appears a valuable discipline.
 
In the digital world, the ''physical'' aspect of a document — its “[[substrate]]” — is to all intents and purposes, ''irrelevant''. The expression this document is not worth the paper it is written on has lost its meaning because the document being the digital content embedded on the paper is the part that has value; the paper is an inconvenient imposition of physical analogue reality. In the digital realm that inconvenient imposition has gone: a “document” is nothing more that an information string: more or less costless to generate, transport, replicate and store. By simply appending metadata to such a document, it can be categorised in as many different ways, and stored in as many different places, as takes your fancy.
 
A better metaphor than a desktop here is a ''spreadsheet''. A spreadsheet is ofcourse a rudimentary form of a database
 
Indeed the imperative is to ensure that a single document is not unnecessarily duplicated but is instead assigned metadata properties by which it can be categorised and therefore positioned in the digital firmament.
 
SharePoint gets a lot of hate from people who don’t use it properly. To be sure, Microsoft has not made the job of learning how to use it easy — Microsoft’s design decisions across its platform are pretty weird, so we should not be surprised — but here is a basic rule of thumb:
 
{{quote|In SharePoint you organise by ''[[metadata]]'', not by ''folders''.}}
 
Folders are top-down. Metadata is bottom-up. Folders prefer form over substance. Metadata prefers substance over form.
====Folders====
Folders are very old economy. The folder metaphor is, literally, based on physical artefacts that can only be in one place at any time. If I put this item in the “Litigation” folder, I can’t ''also'' put it in the “Knowledge Management” folder.
 
Where the same unitary item deserves to be in both folders, I must therefore ''duplicate'' it. Where it is a “living thing” plotting its own miserable trajectory through the cosmos — a contract under negotiation, or a periodically updated legal template for example — then ''duplicating it'' is a ''bummer''. It duplicates the manual task of updating all copies of the document as it changes, and that introduces the opportunity for human error. There may be miskeys. A document may be forgotten. Version control is a pain.
 
Also your preferred hierarchy can ''change'', as personnel, business priorities, or circumstances change. Changing your hierarchy means ''completely re-engineering your folder structure''.
 
So: a folder structure assumes a ''single'' hierarchy and multiple copies of each item.
====Metadata====
[[Metadata]] looks at the world the other way up. It says, “let there be a single canonical item, and multiple hierarchies.” Metadata allows you to non-destructively add hierarchies as you please. The more metadata fields you have, the more possible hierarchies there are. Unused hierarchies are almost costless.
 
Excel is a, well, ''excellent'' tool for managing metadata: Each row is an ''item'' and each column is a ''metadata point''. You can add additional columns as you see fit without impacting what is already there: newly added columns are ''non-destructive'' as they augment without affecting existing ones.
 
In Excel you can filter sort and pivot by reference to any column in a table, in any order, and in doing so you impose a dynamic hierarchy on the items in the list. This is the magic of metadata.
 
''SharePoint allows you to do exactly the same thing with a document library''.
 
We intuitively understand the power of metadata when we are presented with a spreadsheet. But the same power does not occur to us when we are presented with SharePoint, even though it is, in essence, a supercharged online spreadsheet.
 
It is as if we take a preconceived notion of a physical library with us, and ignore our understanding of spreadsheets.
{{nld}}
{{sa}}
*[[Metadata]]

Latest revision as of 07:56, 1 October 2024

JC pontificates about technology
An occasional series.

The Jolly Contrarian holds forth™

💥🆕💥
Audio version of this article
🎧 Listen to this podcast — on Spotifyon SubstackJC Life on Apple
Index: Click to expand:
What you see is what you got, yesterday: A Xerox Alto with a portrait monitor. Nice.
The Alto’s “desktop”
VisiCalc on the Apple II

Resources and Navigation

Index: Click to expand:
Index: Click to expand:

The desktop

In 1973, Xerox’s Palo Alto Research Center released the “Alto”. This was the first personal computer equipped with a “graphical user interface” (GUI) — computing with pictures — instead of the traditional “character user interface”.

If potential users — bowler-hatted bureaucrats who didn’t use computers at all — were to be persuaded to give up their card catalogue systems, typing pools and reusable manila envelopes and instead stare at a screen all day, the system would need to look as familiar as possible.

And so, to lessen the cognitive burden, Xerox came up with a visual metaphor. The Alto’s graphic user interface was modelled on a “desktop”. Instead of an impenetrable wall of green code and a flashing cursor, users were presented with a cartoonish depiction of a literal desktop and all its familiar iconography: documents, folders, a blotter, filing cabinets, in-trays, out-trays and even a dinky little waste-paper basket. This even extended to types of document that did not exist in an analogue office: Emails were depicted as little envelopes with a stamp and a wax seal.

All was designed to reassure the meatware — as fearful of incipient obsolescence then as it is now — that the change journey from the comfy old analogue world to the coming atomic age would not be so bad after all.

The spreadsheet

In 1979, Dan Bricklin and Bob Frankston created a new application for the Apple II computer. They called it “VisiCalc”. It was a grid of cells that you could input numbers and text into and then run calculations on by reference to cell coordinates. VisiCalc was the first spreadsheet program: a primitive ancestor to that beast we all now know and love as Microsoft Excel.

VisiCalc’s brilliant innovation was to separate the data you wanted to manipulate — the numbers and text in the cells — from the logical operations you wanted to manipulate them with — quasi-mathematical formulae — which referenced just the coordinates of the cells holding the data, not the data itself. You could therefore change the data without upsetting the calculation parameters. VisiCalc established a rudimentary form of programming language. A spreadsheet is a sort of programme. This may seem redolent of a smart contract, by the way. That is because it is. But let us not be distracted.

It might not have seemed much in 1979, but VisiCalc and its heirs would revolutionise business computing. While not nearly as intuitive as the Alto’s “desktop” — there was no graphic user interface or anything like that — VisiCalc was a much purer expression of what a personal computer could do. It promised even modest undertakings a powerful means of storing, augmenting, filtering, analysing and manipulating unprecedented amounts of information as structured data.

Good and bad metaphors

A spreadsheet is a much better way of thinking about how to organise digital information than a desktop because it is not constrained by physical space. Whereas the information on a traditional desktop is embedded in a physical “substrate” — usually paper — digital information has no such limitations. An empty spreadsheet stretches endlessly away in two directions:

Downwards: You can add items — “artefacts” — to your filing system without limit, unconstrained by the area of your desk or the volume of your filing cabinet. Each new artefact occupies a new row. There is an infinite number of rows.
Across: You can categorise each artefact however you like by creating new columns. There is no limit to the number of columns and no necessary hierarchy between them.

The desktop is designed to manage physical properties that digital information does not have. In printed information, paper is “form”, text is “substance”. A desktop must prioritise form because in a physical system substance cannot exist independently of form. Data must be printed on paper. Paper must be put somewhere. Unless you physically copy it, you can only put a piece of paper in one place at a time.

Older readers may remember the Dewey decimal system, by which libraries numbered the entire corpus of non-fiction wisdom from zero to 1,000. My favourite was 001.9: “mysteries and the unexplained”. But the Dewey system addressed an exclusively physical problem: where to put things that could only be in one place at a time. We don’t need it online.[1]

For the same reason, we don’t need the desktop. It was designed around other purely physical constraints, too. Storing physical information, on paper, is expensive.[2] Copying and transporting paper is expensive, slow and “lossy”. With each copy we make we increase our storage costs and lose some fidelity.

But digital information has — almost[3] — no “form” at all. It does not occupy physical space. It costs nothing to store. We can copy and move it costlessly, instantly, and with no loss of fidelity. At least when compared with physical information, digital information can be everywhere, and nowhere, at once. We are not constrained by space or time when we store or move digital information.

Yet to file it, we insist upon using a metaphor that assumes we are.

A front in the battle between substance and form

The desktop prioritises form.
A spreadsheet prioritises substance.

In a “desktop” structure, subfolders are sub-divisions, each further level down more fine-grained and subordinate than the last. The deeper the folder structure, the less significant the artefact relative to the formal hierarchy. The folder structure is logically prior to the artefact because you cannot put anything into the system until you have fully specified its folder path. The end-folder has to be there to put the document into it.

The desktop therefore buries documents at the bottom of the filing structure. If you want to retrieve one, down the folder-path rabbit hole you must go. Heaven help you if your document has been misfiled.

In a spreadsheet, the substance takes priority. The artefact is the first thing to go in the database, and sits at the top. Only then do we apply formal properties to our artefact. These formal properties are thus incidental.

And where folders are divisive in nature — a subfolder is necessarily a sub-division of the folder it sits in — spreadsheet columns have no hierarchy, need not bear any relation to each other, and are therefore multiplicative: they can be multiplied without limit: if an existing column, or an artful combination of them, doesn’t yield the exact information you need, you can always add more columns.

Now, naturally we like hierarchies. There is not a social structure on the face of the earth that doesn’t have one. Hierarchies place things in a permanent, graspable relation to each other. This is comforting. It feels tangible in a virtual realm that natively is not tangible. Flexibility is good for experts, improvisers and virtuosi, but it intimidates everyone else. Hierarchy is important when we want dependability and reliability.

And here we find our old friend the struggle between form and substance: if we take it that, whatever your metaphor of choice, the “artefact” — the thing being filed — is the substance and the organising system it goes into is the form, we can see that the desktop and the spreadsheet are fundamentally opposed philosophies.

Metadata

Each desktop folder or spreadsheet column is metadata about its artefact — literally, “information about information”.[4] A folder structure generates a limited, anaemic sort of metadata in the shape of the folder name: an alphanumeric label of up to 260 characters that is so limited in what it can be used for that the Windows operating system does not treat it as metadata at all.[5] A spreadsheet, by contrast, imposes few limits on what form metadata can take: text, calculable numbers and dates, checkboxes, people,[6] colours, flags, choices, lookups, comments, concatenations or calculations. Spreadsheet metadata can be compulsory, optional, pre-populated or free-form. You can validate, manage, control, filter, group, sort, chart, or pivot it.

The more metadata you have, the more ways you can play with it. Each separate value represents a new and distinct way of organising your information. Even if your metadata is wrong, you can triangulate the inconsistencies to identify problematic data. Whereas a misfiled folder is lost forever, a mis-tagged cell reveals itself for self-cleansing.

You can, in this way, generate metadata about metadata. This is meta-metadata.

The desktop clings on

Yet even in our modern, hyper-networked, cloud-based work environment; even though we have had Microsoft Excel for nearly 40 years and we’re quite good at it now, the desktop metaphor hangs on. We still call them “desktops”, for the prosaic reason that they are the only thing still allowed on the desktop in our clear-desk, humans-as-fungible-cogs-in-the-machine modern office environment. (Is it any wonder firms are struggling to get staff back to the office, by the way?)

The desktop was a nice, quaint idea. It got old geezers in green visors to sit down at keyboards. For that, the change managers of the world can be grateful. But the metaphor has long since outstayed its welcome. Enough already of the dinky desktop.

When information is digital and has no physical dimension it is an unnecessary constraint. Duplicating artefacts to suit multiple hierarchies creates basis risk. Which was the canonical version of the document? How can we be sure they are the same? What happens if one, but not the other, gets updated?

Where the document is a “living thing,” plotting its own miserable trajectory through the cosmos — say, a contract under negotiation, or a maintained legal template — then running multiple copies multiplies the job of maintaining all copies as the document changes, and that introduces the risk of human error. There may be miskeys. A document may be forgotten. Version control is a pain.

Also, a preferred hierarchy can change. Personnel, managers, business priorities, and circumstances change. They change the priorities of formal organisation. Changing your preferred hierarchy means completely re-engineering your folder structure.

Substrate neutrality

These are all problems of the physical realm; the spreadsheet metaphor shows us we need not be troubled by them in the digital realm. Here, the physicalsubstrate” — the hard copy — is irrelevant. What matters is the ASCII code embedded in it. In the digital realm, it has been abstracted and floats free of the papery substrate.

Across a diverse network of collaborators, the freedom to create multiple organising hierarchies on the fly, without upsetting other users and without needlessly duplicating documents, is immensely empowering.

Our reverence for the sacred substrate has fallen away, but not entirely. We still revere wet ink, for some reason counterparts clauses, and the dear old desktop.

For still, as we file, we cannot resist the siren call of folders. Folders in folders in folders in folders in folders.

Why do we persist with folders?

More than twenty years ago Mr T. Zingale taught the young JC a valuable lesson. Battling with some byzantine folder structure, and losing, JC cried out in anguish to his technologist friend:

JC: How on earth am I meant to organise all this?

Mr. T: With metadata.

JC: Er, with what?

Mr. T: Metadata. The answer to your question is metadata. Metadata, metadata, metadata. Whatever your question is, the answer is metadata.

JC: Well, my question is, “How do I use metadata to fix this filing problem?

Mr. T: Oh, right. Simple: SharePoint.

Wait: did somebody say, SharePoint?

About SharePoint

I know I’m unlovable
You don’t have to tell me.
Message received loud and clear
[...]
Loud and clear
If I seem a little strange well
That’s because I am.

—The Smiths, Unlovable

Now a lot of good people viscerally hate SharePoint. To be sure, Microsoft has apparently gone out of its way to foment this emotion, waging a sustained campaign, over 20 years, to render it as unlovable as can be.

But still, SharePoint was the first — and remains the only — philosophically digital operating system. It abandoned the dinky desktop metaphor and embraced the spreadsheet approach. Think of it as an online version of Excel that can also hold documents in a spreadsheet format. You can create as many new metadata columns about each document, just like you can in Excel.

And Microsoft must realise this: it has rebuilt its entire productivity suite around SharePoint. To be sure, the Office 365 platform — or whatever they call it now — is monumentally confusing: the Teams integration is baffling. The interaction between OneDrive, Teams, File Explorer, SharePoint and Outlook seems wilfully designed to create as much confusion, duplication and data loss as possible. Where once there was a single place for documents, communications and schedules, now there are three or four of each. Did you get that file I sent? I chatted it to you in Teams. Or, hang on, did I email it? Or Share a link from OneDrive?

Disaster.

The unbearable tolerance of spreadsheets

And this is the downside of the spreadsheet metaphor. A spreadsheet is a blank page: it leaves hierarchy and database design up to whoever wants to have a go at it. SharePoint leaves the discipline of getting the database design right, and then enforcing it, up to the user.

But this pluralism, this multiplicity, this total ambivalence to how users want to work, this tolerance of all the different possible ways users do work, however dopey, this total disarming agnosticism, that will even let users embrace the desktop metaphor if that’s what they really want — is a recipe for organisational chaos.

Because, as we should know by now, most users will embrace the desktop metaphor if you give them half a chance. Because they don’t know any better. Why would they? Database design optimisation is its own branch of computer engineering. You can even get a certificate in it.

Most users do not have such a certificate. Nor do they want one. They have not thought much about database design optimisation. Nor do they want to.

Rather, when they entered the workforce, or first clapped eyes on a computer, they saw a desktop and they liked it. They understood it and got very used to it.

So the desktop metaphor has proven resilient. It will remain hard to dislodge.

Yet it is odd — isn’t it? We intuitively understand the power of metadata when we are presented with a spreadsheet. But the same power does not occur to us when we are presented with a file management system. The desktop metaphor is burned on our retina.

Even though it is, in essence, a supercharged online spreadsheet, SharePoint continues to be resented by almost everyone. This is no more than Microsoft deserves for its terrible implementation, but at least give it a try.

Cross the Rubicon. Redpill yourself. Leave the desktop metaphor behind and finally go truly digital.

See also

References

  1. Entertainingly, courtesy of some well-meaning rube, the Dewey decimal system limped into the information age, in the shape of webdewey. This may be the librarian’s equivalent of an online Amish supplies store.
  2. All physical information is eventually destined for the Iron Mountain.
  3. Compared with physical information. Yes, it does take up some space and as we get carried away into our brute force computing AI swoon the energy cost of data is no-longer negligible, but that is a peripheral issue in “normal” computing. Nevertheless, in this section take the word “almost” as read.
  4. Grammar pedants’ corner: Even though “data” is plural, “metadata” is generally treated as a singular mass noun. Please direct your letters to the Royal Statistical Society — not because it is their fault: rather, they might keep metadata about this sort of thing.
  5. This, by the way, is mad.
  6. As in, a lookup to an object in a people directory, and not just a text name.