Template:AlphaGo v LLM
I see a difference between large language models with Alpha Go learning to play super human Go through self-play.
When Alpha Go adds one of its own self-vs-self games to its training database, it is adding a genuine game. The rules are followed. One side wins. The winning side did something right.
Perhaps the standard of play is low. One side makes some bad moves, the other side makes a fatal blunder, the first side pounces and wins. I was surprised that they got training through self-play to work; in the earlier stages the player who wins is only playing a little better than the player who loses and it is hard to work out what to learn. But the truth of Go is present in the games and not diluted beyond recovery.
But an LLM is playing a post-modern game of intertextuality. It doesn’t know that there is a world beyond language to which language sometimes refers. Is what an LLM writes true or false? It is unaware of either possibility. If its own output is added to the training data, that creates a fascinating dynamic. But where does it go? Without Alpha Go’s crutch of the “truth” of which player won the game according to the hard-coded rules, I think the dynamics have no anchorage in reality and would drift, first into surrealism and then psychosis.
One sees that Alpha Go is copying the moves that it was trained on and an LLM is also copying the moves that it was trained on and that these two things are not the same.[1]