Metadata taxonomy

From The Jolly Contrarian
Jump to navigation Jump to search
The design of organisations and products

Making legal contracts a better experience
Index — Click ᐅ to expand:

Comments? Questions? Suggestions? Requests? Insults? We’d love to 📧 hear from you.
Sign up for our newsletter.

Eager reg tech providers may try to sell you some kind of artificially intelligent automated taxonomy application that will categorise, tag and organise documents in an unstructured database (your email corpus, for example).

We are skeptical, as these initiatives are predicated on the high-modernist delusion that all information an organisation handles cleaves to a common, static, uniform architecture, and the only challenge, beyond discovering that structure, is to apply it reliably to each file: that data has a single, unchanging nature that one can, as the saying goes, carve at its joints.

This is rather like a central bureaucracy forecasting the population’s forthcoming need for spoons, rather than letting a competitive market sort this out by itself.

For no chatbot, no neural network, however artificially intelligent, can apprehend the particular use a user may have for categorising data. That is not how they work.

We inhabit a dynamic, shape-shifting world. The “market” is a sprawling, inchoate patchwork of sprawling, inchoate, patchwork systems. What counts as a canonical category here is no use as a category there — even inside the same firms [1]

furthermore, the data we handle is already inundated with metadatawhen it was sent; by and to whom; concerning what; and so on — not to mention actual data, being the text of the document and its attachments, none of which, broadly, is properly used. Rather than ignoring that trove and instead, imposing further arbitrary metadata on top of it[2] at least first make the most of it.

Here something like unglamorous search — virtual folders; that kind of thing — is a better option. The search parameters are, of course, ad hoc; they may (but need not be) be impermanent; they categorise information in real time according to parameters the user at the time determines valuable.

See also


  1. The best example is the “client”. A sales desk might categorise a client by its sector; the credit department by its market capitalisation; the legal department by its corporate form, compliance by its sophistication; tax by its domicile. These categorisations are incommensurable — but need not be commensurated: all are relevant, and none has intellectual priority over the others. Building a system to manage these clients requires design choices.
  2. With the tedious overheads that implies: software licence fees and a squadron of librarians chasing users up to validate the taxonomy and update it