Road to AI Safety Excellence

From Lesswrongwiki
Revision as of 15:03, 25 September 2017 by RobbBB (talk | contribs)
Jump to: navigation, search

Road to AI Safety Excellence (RAISE), previously named AASAA, is an initiative from u/toonalfrink/ to improve the pipeline for AI safety researchers, especially by creating an online course.


AI safety is a small field. It has only about 50 researchers. The field is mostly talent-constrained. Given the dangers of an uncontrolled intelligence explosion, increasing the amount of AIS researchers is crucial for the long-term survival of humanity.

Within the LW community there are plenty of talented people that bear a sense of urgency about AI. They are willing to switch careers to doing research, but they are unable to get there. This is understandable: the path up to research-level understanding is lonely, arduous, long, and uncertain. It is like a pilgrimage. One has to study concepts from the papers in which they first appeared. This is not easy. Such papers are undistilled. Unless one is lucky, there is no one to provide guidance and answer questions. Then should one come out on top, there is no guarantee that the quality of their work will be sufficient for a paycheck or a useful contribution.

The field of AI safety is in an innovator phase. Innovators are highly risk-tolerant and have a large amount of agency, which allows them to survive an environment with little guidance or supporting infrastructure. Let community organisers not fall for the typical mind fallacy, expecting risk-averse people to move into AI safety all by themselves. Unless one is particularly risk-tolerant or has a perfect safety net, they will not be able to fully take the plunge. Plenty of measures can be made to make getting into AI safety more like an "It's a small world"-ride:

  • Let there be a tested path with signposts along the way to make progress clear and measurable.
  • Let there be social reinforcement so that we are not hindered but helped by our instinct for conformity.
  • Let there be high-quality explanations of the material to speed up and ease the learning process, so that it is cheap.

Becoming an AIS researcher in 2020

What follows is a vision of how things *could* be, should this project come to fruition.

The path

1. Tim Urban's Road to Superintelligence is a popular introduction to superintelligence. Hundreds of thousands of people have read it. At the end of the article is a link, saying "if you want to work on this, these guys can help". It sends one to an Arbital page, reading "welcome to "prerequisites for "introduction to AI Safety""".

2. What follows is a series of articles explaining the math one should understand to be able to read AIS papers. It covers probability, game theory, computability theory, and a few other things. Most students with a technical major can follow along easily. Even some talented high school graduates do. When one comes to the end to the arbital sequence, one is congratulated: "you are now ready to study AI safety". A link to the course appears at the bottom of the page.

3. The course teaches an array of subfields. Technical subjects like corrigibility, value learning, ML safety, but also some high-level subjects like preventing arms races around AI. Assignments are designed in such a way that they don't need manual grading, but do give some idea of the student's competence. Sometimes there is an assignment about an open problem. Students are given the chance to try to solve it by themselves. Interesting submissions are noted. One competent recruiter looks through these assignments to handpick high-potential students. When a student completes the course, they are awarded a nice polished certificate. Something to print and hang on the wall.

Local study groups

When it comes to motivation, nothing beats the physical presence of people that share your goal. A clear and well-polished path is one major thing, social reinforcement is another. Some local study groups already exist, but there is no way for outsiders to find them. RAISE seems like a most natural place to index study groups and facilitate hosting them.

Course prerequisites & target audience

While the project originally targeted any student, it was decided that it will target those that are philosophically aligned first. The next step could be to persuade academics to model a course after this one, so that we will reach a broader audience too.

There are technical (math, logic) and philosophical (Bostrom/sequences/WaitButWhy) prerequisites. Technical prerequisites identified so far:

  • Probability theory
  • Decision/game theory
  • Computability theory
  • Logic
  • Linear algebra

As mentioned before, it seems best to cover this in a sequence of articles on Arbital, or to recommend an existing course that teaches this stuff well enough.

The state of the project & getting involved

If you're enthusiastic about volunteering, fill in this form

To be low-key notified of progress, join this Facebook group

One particularly useful and low-bar way to contribute is to join our special study group, in which you will be asked to summarize AIS resources (papers, talks, ...), and create mind maps of subjects. You can find it in the Facebook group.

A detailed outline of next steps can be found in this Workflowy tree


This is (like everything) subject to debate, but for now it looks like the following broad categories will be covered:

  • Agent foundations
  • Machine learning safety
  • AI macrostrategy

Each of these categories will be divided into a few subcategories. The specifics of that are mostly undecided, except that the agent foundations category will contain at least corrigibility and decision theory.

We are making efforts to list all available resources here and here

Course development process

Now volunteers and capital are largely in place, we are doing an iterative development process with the first unit on corrigibility. When we are satisfied with the quality of this unit (which will probably take 1 or 2 more lecture meetings), we will use the process we developed to create the other units.

Lecture meetings Every few weeks, a lecture meeting is held. A lecture room is set up with testers, lecturer and camera. By then, the instructional designer will have broken down the content into bits that are as small as possible. Best would be bits that can be explained in 3 to 6 minutes. For each bit, the lecturer explains it, and right after, the testers indicate how clear it was. If it wasn't clear enough, we repeat. At the first meeting it became apparant that repeating at least once is always a good idea, for the second try is always higher quality. Levels of understanding for the testers: 5. I would be comfortable explaining this in front of a class 4. I would be comfortable explaining this to a friend 3. I get the gist of it, but I would like to stare at this a bit longer 2. I have a superficial sense of what you probably mean 1. Wtf? The goal is to get everyone to at least level 3 from the lecture alone, and to at least level 4 when assignments are included.

Study groups Even for volunteers it proved tricky to reach a high-level understanding of a topic by oneself, so we decided to learn together. The study group is constructed in such a way that it produces useful content for the course. More concretely:

- There are 'overview' and 'assignments' meetings.

- The 'overview' meeting asks it's attendants to summarize a resource that is given to them. Then it asks every attendant to read every summary (or a subset of them). Then attendants are asked to construct a mind map (tree structure) of the subject. Then attendants are paired up to cross-check their mind map with someone else and make amendments where they see fit.

- The mind maps and summaries that are produced during the 'overview' meeting are used as an input to the lecture meetings.

- The set of lecture bits produced at the lecture meeting are used as an input to the assignment meeting.

- At the assignment meeting, for each lecture bit, attendants are asked to create assignments and try the assignments of others. A selection of these assignments is later added to the course.

Instruction strategy

The course will be strictly digital, which limits the amount of strategies that can be used. These are some potentially useful strategies:

  • Text
  • Lecture
  • Documentary
  • Game
  • Assignment
  • Live discussion
  • Open problem
  • etc...

Content guides form The best way to present an idea often depends on the nature of the idea. For example, the value alignment problem is easily explained with an illustrative story (the paperclip maximizer). This isn’t quite the case for FDT. Also, some ideas have been formalized. We can go into mathematical detail with those. Other ideas are still in the realm of philosophy and we will have to resort to things like thought experiments there. How to say depends on what to say.

Gimmick: Open problems (Inspired by The Failures of Eld Science) A special type of instruction strategy will be an assignment like this: “So here we have EDT, which is better than CDT, but it is still flawed in these ways. Can you think of a better decision theory that doesn’t have these flaws? Give it at least 10 minutes. If you have a useful idea, please let us know.”

The idea is to challenge students to think independently how they might go about solving an open problem. It gives them an opportunity to actually make a contribution. I expect it to be strongly intrinsically motivating.

Taxonomy of content At least three sorts of content will be delivered:

  • Anecdotes/stories to illustrate problems (paperclip maximizer, filling a cauldron, ...)
  • Unformalized philosophical considerations (intelligence explosion, convergent instrumental goals, acausal trade, ...)
  • Technical results (corrigibility, convergent instrumental goals, FDT, ...)

Example course unit: value learning & corrigibility

  • Preview of unit and its structure
  • An x-minute lecture that informally explains the value learning problem
  • Assignments
  • A 5-minute cutscene shows a fictional story of an agent that keeps its creators from pushing the off-button
  • An x-minute lecture that informally explains corrigibility
  • A piece of text that introduces the math
  • A video of the lecturer solving example math assignments
  • Corrigibility math assignments

Alternatively, we can interleave tiny bits of video with questions to keep the student engaged. A good example of this is the Google deep learning course.


Task allocation

The following is a reply to the common remark that “I’d like to help, but I’m not sure what I can do”.

Full responsibility This means you can’t sleep when things are off track, and jump to your laptop every time you have a new idea to move things forward. This also means you are ready to take on most tasks if no one else volunteers for it, even if you’re not specialized in it. The project is your baby, and you’re a helicopter parent.

Required technical understanding: superficial. Minimum commitment: 2 full days per week

Armchair advice You’re in the chat, and you’re interested in the project, but not ready to make significant contributions. You do want to see where things go, and sometimes you have some interesting remarks to make. On your own terms though.

Minimum commitment: none

Instructional design There are many ways to get an idea across. There are documentaries, lectures, stories, texts, examples, proofs, assignments, etc. Different ideas need different media. It is hard to explain a math formula through a documentary, for example. On the other hand, if you want to explain some philosophical idea like value misalignment, a made-up story like the paperclip maximizer would be a good fit. You decide the layout of a course unit. Ideally, there will be small juicy bits of information/questions/thingies in quick succession. A great example of a good course design is Google's deep learning course. Notice how most bits are only a few minutes long, and the student is prompted to think all the time. The course design done so far can be found here

Required technical understanding: moderate. Minimum commitment: 1 full day per week

Giving lectures You thoroughly study the material, making sure you know it well enough to explain it clearly. Together with the instructional designer, you sit down and go over the bits that need explanation. These bits may range from 10 seconds to an hour of speaking, and they are interleaved with questions and small assignments to keep the student engaged. Then you teach the bits to a small class of testers, on camera. Testers will give immediate feedback: a few repeats may be necessary to get it right.

Required technical understanding: thorough. Minimum commitment: 1 full day per week

Writing material You write both assignments and text-based explanations, according to the request of the instructional designer. To what extent this is needed depends on the nature of the concept that is being covered.

Required technical understanding: thorough. Minimum commitment: 2 hours to 1 full day per week

Taking care of hosting We will host the course ourselves during development. Edx is open source and ideally suited for our needs. We have it set up using Amazon's EC2 hosting service. I am not even a linux noob (6 years of casual use), but this took me a full weekend to set up. If you have system administration skills, you help is highly valuable here. Minimum commitment: 1 hour per week

Legal Legal is a black box. Your first job is to write your job description.

Marketing/PR/acquisition Are you good at connecting people? There are a lot of people that want to fix the world, would engage with this project if they knew about it, and have the means (funding, expertise) to help out. Things you can do include finding funders, hosting a round of review, inviting guest speakers with interesting credentials, connecting with relevant EA organisations, etc. Having high social capital in the EA/LW community is a plus.

Graphic design/animation Good animation can make a course twice as polished and engaging, and this matters twice as much as you think. The whole point of a course instead of a loose collection of papers is that learners can trust they're on the right track. Virtue signaling builds that trust. Animation is also a skill that is impossible to pick up in a short enough timeframe. If you're interested in AI safety and skilled at animation, we really need you!


I want to note that what we are doing here isn’t hard. Courses at universities are often created on the fly by one person in a matter of weeks. They get away with it. There is little risk. The worst that can reasonably happen is that we waste some time and money on creating an unpopular course that doesn’t get much traction. On the other hand, there is a lot of opportunity. If we do this well, we might just double the amount of FAI researchers. If that's not impact, I don't know what is.

External links