Low-Background Steel; Low-Silicon Text

This was a metaphor I saw recently somewhere on the internet. Likely on Hacker News, my guilty pleasure.

Low-background Steel

Low-background steel is steel that was produced before the advent of the nuclear bomb, and contains far less radioactive inclusions than contemporary steel. This is not to say that modern steel contains a large amount of radiation. Rather, if you are producing a very precise machine (say, a medical imaging device), your goal is to minimize interference, and so you would likely seek out materials that produce as little radiation as possible.

Why does steel contain this radiation? The nuclear tests following the end of the second world war seeded the atmosphere with minuscule amounts of radiation. Producing steel is a relatively simple reaction: Iron oxide is combined with carbon, usually, to produce iron and carbon dioxide. In the process of this reduction, atmospheric gasses sneak in.

So where does one obtain low-background steel from? The past! The only steel that does not contain atmospheric hangers-on is that which was produced before such an atmosphere exists. This means large sources of steel that have been created before nuclear testing, and isolated from the atmosphere thereafter. The prime candidate? Shipwrecks!

Low-silicon text

You can see where this is going.

There is a similar turning point going on, well, that has already happened. Large Language Models (LLMs) are the latest iteration of artificial intelligence. These are large neural networks enough that generate humanlike text, often indistinguishable from human-generated text.

Thus, for any text generated after a certain date (say, Nov 30 2022, when OpenAI introduced ChatGPT), there is an uncertainty as to whether or not a human or a machine generated it. Equally, there is a new quality attributed to text generated before this advent, a certainty that it is human-generated. Why care about this? Well, if you are studying humans or machines, you would likely want to be confident you are studying only those that you are interested in.

Mirroring the plain text dichotomy, there is an amusing observation that GitHub’s “Arctic Code Vault,” designed for code to survive the apocalypse, is now also a repository of human-generated code. If you are interested in analyzing code generated by humans, you can no longer be certain that any code released after March 29, 2022 (the release of GitHub Copilot), is generated solely by humans.

So what?

I do not know!

A few things are certain though: