WOPR: Memory-based language modeling

1Utrecht University 2University of Lund

WOPR in brief

WOPR, Word Predictor, is a memory-based language model developed in 2006-2011. It just woke up from its cryogenic sleep and is figuring out what is all the fuss about LLMs.

WOPR is an ecologically friendly alternative LLM with a staggeringly simple core. Everyone who took "Machine Learning 101" knows that the k-nearest neighbor classifier is among the simplest yet most robust ML classifiers out there, perhaps only beaten by the Naive Bayes classifier. So what happens if you train a k-NN classifier to predict words?

WOPR's engine is the TiMBL classifier, which implements a number of fast approximations of k-NN classification, all partly based on decision-tree classification. On tasks like next-word prediction, k-NN is inhibitively slow, but the TiMBL approximations can classify faster at many orders of magnitude.

Compared to Transformer-based LLMs, on the plus side memory-based LLMs are

  • very efficient in training. Training is essentially reading the data (in linear time) and compressing it into a decision tree structure. This can be done on CPUs, with sufficient RAM. In short, its ecological footprint is dramatically lower;
  • pretty efficient in generation when running with the fastest decision-tree approximations of k-NN classification. This can be done on CPUs as well;
  • completely transparent in their functioning. There can also be no doubt about the fact that they memorize training data patterns.

On the downside,

  • Their performance is currently not as great as current Transformer-based LLMs, but we have not trained beyond data set sizes with orders of magnitudes above 100 million words. Watch this space!
  • They do not have a delicate attention mechanism, arguably the killer feature of Transformer-based decoders;
  • Memory requirements during training are heavy with large datasets (more than 32 GB RAM with more than 100 million words);

BibTeX

@article{VandenBosch+09,
	author = {A. {Van den Bosch} and P. Berck},
	journal = {The Prague Bulletin of Mathematical Linguistics},
	pages = {17--26},
	title = {Memory-based machine translation and language modeling},
	volume = {91},
	year = {2009},
	bdsk-url-1 = {http://ufal.mff.cuni.cz/pbml/91/art-bosch.pdf}}
}