Road to AI Safety Excellence
Road to AI Safety Excellence (RAISE), previously named AASAA, is an initiative from u/toonalfrink/ to improve the pipeline for AI safety researchers, especially by creating an online course.
The field of AI safety is in an innovator phase. Innovators are highly risk-tolerant and have a large amount of agency, which allows them to survive an environment with little guidance or supporting infrastructure. Let community organisers not fall for the typical mind fallacy, expecting risk-averse people to move into AI safety all by themselves. Unless one is particularly risk-tolerant or has a perfect safety net, they will not be able to fully take the plunge. Plenty of measures can be made to make getting into AI safety more like an "It's a small world"-ride:
- Let there be a tested path with signposts along the way to make progress clear and measurable.
Becoming an AIS researcher in 2020
What follows is a vision of how things *could* be, should this project come to fruition.
1. Tim Urban's Road to Superintelligence is a popular introduction to superintelligence. Hundreds of thousands of people have read it. At the end of the article is a link, saying "if you want to work on this, these guys can help". It sends one to an Arbital page, reading "welcome to "prerequisites for "introduction to AI Safety""".
2. What follows is a series of articles explaining the math one should understand to be able to read AIS papers. It covers probability, game theory, computability theory, and a few other things. Most students with a technical major can follow along easily. Even some talented high school graduates do. When one comes to the end to the arbital sequence, one is congratulated: "you are now ready to study AI safety". A link to the course appears at the bottom of the page.
3. The course teaches an array of subfields. Technical subjects like corrigibility, value learning, ML safety, but also some high-level subjects like preventing arms races around AI. Assignments are designed in such a way that they don't need manual grading, but do give some idea of the student's competence. Sometimes there is an assignment about an open problem. Students are given the chance to try to solve it by themselves. Interesting submissions are noted. One competent recruiter looks through these assignments to handpick high-potential students. When a student completes the course, they are awarded a nice polished certificate. Something to print and hang on the wall.
Local study groups
When it comes to motivation, nothing beats the physical presence of people that share your goal. A clear and well-polished path is one major thing, social reinforcement is another. Some local study groups already exist, but there is no way for outsiders to find them. RAISE seems like a most natural place to index study groups and facilitate hosting them. You can see and edit the current list here: https://bit.ly/AISafetyLocalGroups.
Course prerequisites & target audience
There are technical (math, logic) and philosophical (Bostrom/sequences/WaitButWhy) prerequisites. Technical prerequisites identified so far:
- Probability theory
- Decision/game theory
- Computability theory
- Linear algebra
As mentioned before, it seems best to cover this in a sequence of articles on Arbital, or to recommend an existing course that teaches this stuff well enough.
The state of the project & getting involved
If you're enthusiastic about volunteering, fill in this form
To be low-key notified of progress, join this Facebook group
One particularly useful and low-bar way to contribute is to join our special study group, in which you will be asked to summarize AIS resources (papers, talks, ...), and create mind maps of subjects. You can find it in the Facebook group.
This is (like everything) subject to debate, but for now it looks like the following broad categories will be covered:
- Agent foundations
- Machine learning safety
- AI macrostrategy
Each of these categories will be divided into a few subcategories. The specifics of that are mostly undecided, except that the agent foundations category will contain at least corrigibility and decision theory.
Course development process
Now volunteers and capital are largely in place, we are doing an iterative development process with the first unit on corrigibility. When we are satisfied with the quality of this unit, we will use the process we developed to create the other units.
Study groups Even for volunteers it proved tricky to reach a high-level understanding of a topic by oneself, so we decided to learn together. The study group is constructed in such a way that it produces useful content for the course. More concretely:
- There are 'scripting' and 'assignments' meetings.
- The 'scripting' meetings embody an iterative process to go from papers to lecture scripts. We start with summaries, then we create mind maps, then we decide on a set of video, and then we create a set of script drafts based on summaries and mind maps
- All of this content is used by the lecturer to finalize scripts, set up the studio and film.
- The set of videos produced by the lecturer are used as an input to the assignments meeting.
- At the assignment meeting, for each lecture bit, attendants are asked to create assignments and try the assignments of others. A selection of these assignments is later added to the course.
We enlisted Rob Miles to shoot our lectures. About once a week, our content developer sits down with him to go over a particular script draft, which he modifies to his liking.
The setup includes a lightboard, which is a neat educational innovation that allows a lecturer to look at the camera while writing on a board simultaneously.
- Live discussion
- Open problem
Taxonomy of content At least three sorts of content will be delivered:
- Anecdotes/stories to illustrate problems (paperclip maximizer, filling a cauldron, ...)
- Technical results (corrigibility, convergent instrumental goals, FDT, ...)
Example course unit: value learning & corrigibility
- Preview of unit and its structure
- An x-minute lecture that informally explains the value learning problem
- An x-minute lecture that informally explains corrigibility
- A piece of text that introduces the math
- A video of the lecturer solving example math assignments
- Corrigibility math assignments
(last updated at 2018-01-31)
Currently done by: Toon Alfrink, Veerle de Goederen, Remmelt Ellen, Johannes Heidecke, Mati Roy
Required technical understanding: superficial.
Minimum commitment: 1 full day per week
Currently done by: lots of people
Minimum commitment: none
As our content developer, you are responsible for the quality of the material. You coordinate the study group, review the quality of it's production, and spend extra time on your own learning the content (if you haven't already) so you can be our expert. You also help the lecturer with finalizing his scripts, and you assist him in understanding everything.
Currently done by: No one. This is a paid position. If interested, email us at email@example.com
Required technical understanding: near-complete.
Minimum commitment: 2 full days per week
You thoroughly study the material, making sure you know it well enough to explain it clearly. Together with the content developer, you sit down and go over the bits that need explanation. These bits range from 3 to 6 minutes, and they are interleaved with questions and small assignments to keep the student engaged.
Currently done by: Robert Miles
Required technical understanding: thorough.
Minimum commitment: 1 full day per week
Study group attendant
You help out in the weekly study group, creating summaries, mind maps, script drafts and assignments. We also give presentations
Currently done by: Johannes Heidecke, Tom Rutten, Toon Alfrink, Tarn Somervell Fletcher, Nandi Schoots, Roland Pihlakas, Robert Miles, Rupert McCallum, Philine Widmer, Louie Terrill, Tim Bakker, Veerle de Goederen, Ofer Givoli
Required technical understanding: none
Minimum commitment: 4 hours per week
With about 60% certainty, we will use ihatestatistics as a platform. The company is run by EA's (we may use it for free), and it's specialization in statistics (which is closely related to AI) makes it well-suited for our needs. Here is a demo lesson. There are a lot of diamonds buried in the field of automated assessment. The quality of our answer-checking software determines the quality of the questions we can ask. Elaborate feedback mechanisms can make a lot of difference in how fast a learner may converge on the right kind of understanding. You write this software for us.
Currently done by: Arran Stirton
Minimum commitment: 2 days per week
Legal is a black box. Your first job is to write your job description.
Animation & editing
Good animation can make a course twice as polished and engaging, and this matters twice as much as you think. The whole point of a course instead of a loose collection of papers is that learners can trust they're on the right track. Polish builds that trust. Animation is also a skill that is hard to pick up in a short enough timeframe, so we can't do it. If you're interested in AI safety and skilled at animation, we need you!
I want to note that what we are doing here isn’t hard. Courses at universities are often created on the fly by one person in a matter of weeks. They get away with it. There is little risk. The worst that can reasonably happen is that we waste some time and money on creating an unpopular course that doesn’t get much traction. On the other hand, there is a lot of opportunity. If we do this well, we might just double the amount of FAI researchers. If that's not impact, I don't know what is.