Tensor Labbet A blog of deep learnings

The Lost Reading Items

In this post: An attempt to reconstruct Ilya Sutskever's 2020 AI reading list
(8 min read)

I recently shared a summary of a viral AI reading list attributed to Ilya Sutskever, which laid claim to covering ‘90% of what matters’ back in 2020. It boils down the reading items to barely one percent of the original word count to form the TL;DR I would have wished for before reading.

The viral version of the list as shared online is known to be incomplete, however, and includes only 27 of about 40 original reading items. The rest allegedly fell victim to the E-Mail deletion policy at Meta¹. These missing reading items have inspired some good discussions in the past, with many different ideas as to which papers would have been important enough to include.

This post is an attempt to identify these lost reading items. It builds on clues gathered from the viral list, contemporary presentations given by Ilya Sutskever, resources shared by OpenAI and more.

¹Correction: An earlier version mistakenly referred to OpenAI here instead of Meta

Filling the Gaps

The main piece of evidence is a claim shared along with the list according to which an entire selection of meta-learning papers was lost.

Meta-learning is often said to pursue ‘learning to learn’, with neural networks being trained for a general ability to adapt more easily to new tasks for which only few training samples are available. A network should thus be able benefit from its existing weights without requiring an entirely new training from scratch on the new data. One-shot learning provides just a single training sample to a model from which it is expected to learn a new downstream task, whereas zero-shot settings provide no annotated training samples at all.

For some of the candidate papers listed below, the case can be strengthened further by evidence in the form of an endorsement straight from OpenAI itself. Ilya Sutskever was chief scientist at a time when OpenAI published the educational resource ‘Spinning Up in Deep RL’ which includes several of these candidates in an entirely separate reading list of 105 ‘Key Papers in Deep RL’. Below, the papers which also appear in that list are marked with a symbol (⚛).

Clues from the Preserved Reading Items

Some meta-learning concepts can be found even in the known parts of the list. The preserved reading items can be arranged into a narrative arc around a related branch of research on Memory-Augmented Neural Networks (MANNs). Following the ‘Neural Turing Machine’ (NTM) paper, ‘Set2Set’ and ‘Relational RNNs’ experimented with external memory banks that an RNN could read and write information on. They directly cite or closely relate to several papers which may well have been part of the original list:

Potential Reading Items (Part 1):

Clues from Contemporary Presentations

Certain papers about meta-learning and competitive self-play also feature repeatedly in a series of presentations held by Ilya Sutskever around this time and may well have eventually been included in the reading list too.

Recorded Presentations:
- Meta Learning and Self Play - Ilya Sutskever, OpenAI (YouTube), 2017
- OpenAI - Meta Learning & Self Play - Ilya Sutskever (YouTube), 2018
- Ilya Sutskever: OpenAI Meta-Learning and Self-Play (YouTube), 2018

These presentations largely overlap and repeatedly reference known contents of the reading list. They open with a fundamental motivation of why deep learning works, framing backpropagation with neural networks as a search for small circuits that relate to the Minimum Description Length principle, according to which the shortest program that can explain given data will reach the best generalization possible.

Next, all three presentations reference the following meta-learning papers:

Potential Reading Items (Part 2):

Reinforcement Learning (RL) also features heavily in all three presentations, with close links to meta-learning. One key concept is competitive self-play in which agents interact in a simulated environment to reach specific, typically adversarial objectives. As a way to ‘turn compute into data’, this approach enabled simulated agents to outperform human champions and invent new moves in rule-based games. Ilya Sutskever presents an evolutionary biology perspective that relates competitive self-play to the impact of social interaction on brain size (pay-walled link). He goes on to suggest that rapid competence gain in a simulated ‘agent society’ may ultimately, according to his judgement, provide a plausible path towards a form of AGI.

Given the significance he ascribes to these concepts, it seems plausible that some of the cited papers on self-play may have later also been included in the reading list. They may form a sizeable chunk of the missing items, especially as RL is otherwise mentioned by only one of the preserved reading items.

Potential Reading Items (Part 3):

Even today, these presentations from around 2018 are still worth watching. Next to fascinating bits of knowledge, they also include gems such as the statement:

‘Just like in the human world: The reason humans find life difficult is because of other humans’

-Ilya Sutskever

While some concepts in computer science accordingly appear timeless, other points may seem surprising today, like the casual remark of an audience member in the Q&A session:

‘It seems like an important sub-problem on the path to AGI will be understanding language, and the state of generative language modelling right now is pretty abysmal.’

-Audience member

To which Ilya Sutskever responds:

‘Even without any particular innovations beyond models that exist today, simply scaling up models that exist today on larger datasets is going to go surprisingly far.’

-Ilya Sutskever (in 2018)

This response was later confirmed by experimental results in the reading item ‘Scaling Laws for Neural Language Models’ (which echoes the ‘Bitter Lesson’ by Rich Sutton). It was ultimately proven true, as he would oversee Transformer architectures scaled up to an estimated 1.8 trillion parameters and costing over $60 million to train on 128 GPUs forming Large Language Models (LLMs) which are today capable of generating text that is increasingly difficult to distinguish from human writing.

Honorable Mentions

Many other works and authors may have featured on the original list, but the evidence wears increasingly thin from here on.

Overall, the preserved reading items manage to strike an impressive balance between covering different model classes, applications and theory while also including many famous authors of the field. Perhaps the exceptions to this rule are worth noting, even if they may have slipped among the ‘10% of what matters’ that didn’t make the original list.

As such, it would have seemed plausible to include:

Conclusion

This post will remain largely speculative until more becomes known. After all, even the viral list itself was never officially confirmed to be authentic. Nonetheless, the potential candidates for the lost reading items listed above seemed worth sharing. Taken together, they may well fill a gap in the viral version of the list that would, in the words of the author, corresponded roughly to a missing ‘30% of what matters’ at its time.