WOPR: Memory-based language modeling

WOPR, Word Predictor, is a memory-based language model developed in 2006-2011. It just woke up from its cryogenic sleep and is figuring out what is all the fuss about LLMs.

WOPR is an ecologically friendly alternative LLM with a staggeringly simple core. Everyone who took "Machine Learning 101" knows that the k-nearest neighbor classifier is among the simplest yet most robust ML classifiers out there, perhaps only beaten by the Naive Bayes classifier. So what happens if you train a k-NN classifier to predict words?

WOPR's engine is the TiMBL classifier, which implements a number of fast approximations of k-NN classification, all partly based on decision-tree classification. On tasks like next-word prediction, k-NN is inhibitively slow, but the TiMBL approximations can classify faster at many orders of magnitude.

Compared to Transformer-based LLMs, on the plus side memory-based LLMs are

very efficient in training. Training is essentially reading the data (in linear time) and compressing it into a decision tree structure. This can be done on CPUs, with sufficient RAM. In short, its ecological footprint is dramatically lower;
pretty efficient in generation when running with the fastest decision-tree approximations of k-NN classification. This can be done on CPUs as well;
completely transparent in their functioning. There can also be no doubt about the fact that they memorize training data patterns.

On the downside,

Their performance is currently not as great as current Transformer-based LLMs, but we have not trained beyond data set sizes with orders of magnitudes above 100 million words. Watch this space!
They do not have a delicate attention mechanism, arguably the killer feature of Transformer-based decoders;
Memory requirements during training are heavy with large datasets (more than 32 GB RAM with more than 100 million words);

WOPR: Memory-based language modeling

WOPR in brief

BibTeX