The Open-Source Machine Learning Master's (OSMLM) is a self-curated deep-dive into select topics in machine learning and distributed computing. Educational resources are derived from online courses (MOOCs), textbooks, predictive modeling competitions, academic research (arXiv), and the open-source software community. In machine learning, both the quantity and quality of these resources - all available for free or at a trivial cost - is truly f*cking amazing.

## Why Am I Doing This?

I want to become more of a technical expert in machine learning. I want to use this expertise to solve real-world problems that actually matter.

To this end, I see two main roads: a traditional graduate program, and the OSMLM.

### Why Not Graduate School

For me, graduate school is suboptimal for 3 key reasons:

**It's expensive.**Upon a quick Google search, a 2-year graduate program would cost, conservatively, $80,000 in tuition fees alone. This is a wholly nontrivial sum of money that would impact how I structure the next 10 years of my life.**There are**I have to apply. I have to get accepted. I have to find the right professor. I have to find a city suitable to my broader interests and lifestyle. This takes time.*far*more dependencies.**By the time I finish, the field of machine learning will look fundamentally different than it did when I started.**This is the most important point of all. The only way to remain current with the latest tools and techniques is to do just that. Given the furious and only-accelerating-faster pace at which machine learning is moving, this requires much more than just a few hours on the weekend.

### Why the OSMLM

**I think the higher education paradigm is changing.**Access to critical, academic knowledge is increasingly democratic: Khan Academy can teach me about the Central Limit Theorem as well as any statistics professor. The ~$250,000 in tuition fees commanded by an undergraduate education at a private American university is, for some, several decades of debt and concession, and for others, prohibitive beyond comedy, reason and fantasy alike. If hard-skills are your end, online self-education is an immensely attractive, intuitive, and practical road to follow - especially in an industry as meritocratic as tech.**I'm keenly aware of how productive I am in a self-teaching environment.**I'm largely self-taught in data science. Before that, it was online poker: a 5-year, $50 to $150,000 journey of instructional videos, online forums, critical discussion with other players and personal coaching - all from the comfort of my bedroom. I'm very effective at learning things online.**Some of the most impactful projects I've completed professionally stemmed directly from those I'd completed personally.**I would not know how to ensemble models if not for Kaggle. I would not know how to perform hierarchical Bayesian inference if not for Bayesian Methods for Hackers. The open-source data science community continues to teach me creative ways to use data to solve challenging problems. To this end, I want to consume, consume, consume.**The road to further technical expertise is a function of little more than time and effort.**I have a few years' industry experience as a Data Scientist. I can write clean code and productionize machine learning things. For me, the OSMLM is nothing more than taking all of the extra-curricular time spent learning new tools and algorithms and making it a full-time job.**I'm extremely motivated.**The thought of studying machine learning all day has me smiling from ear to ear. Simply put, I f*cking love this stuff.

## How Long is the OSMLM?

9-12 months. Not forever.

## Why Morocco?

I aim to speak indistinguishably fluent French and Spanish by the time I'm 30. I'm currently 27. The Spanish box is largely checked. With 6-9 months in Francophone Morocco, the French box will be largely checked as well.

Furthermore, I've always wanted to live in a Muslim country: I grew up in a predominantly Jewish suburb of Philadelphia, and have had fantastic experiences traveling the Muslim world.

## How Will I Spend My Time?

I'll be spending my best 8-10 hours of the day working from a co-working space. I'll be taking online courses, reading textbooks, participating in machine learning competitions and publishing open-source code. I intend to post frequently to this blog.

## What Will I Learn?

I have 4 main areas of focus:

**"Deep Learning" with flavors of: auto-encoders, recommendation, and natural language processing.**I remain obsessed with encoding real-world entities as lists of numbers. I like applications that seek to understand people better than they understand themselves. Free-form text is everywhere (and relatively quick to process).**Bayesian Inference.**Because they taught me frequentist statistics in school.**Game Theory and Reinforcement Learning.**I wrote an undergraduate thesis in game theory and group dynamics and remain eager to tackle more. Reinforcement Learning seems like the hipster way to solve such problems these days.**Apache Spark and Distributed Computing.**I have a bit of professional experience with Spark. As data continues to grow in size, distributed computing will move from a thing Google does to a no-duh occupational necessity.

## What Does Success Look Like?

Success has a few faces:

**Technical.**Have the technical expertise to lead teams focused on each of the above 4 topics (weighted towards the former 3, realistically).**Personal.**Learning how I best learn. How do I structure my ideal working day? Do I prefer working alone, or indeed as part of a team? What is my optimal balance of reading, thinking, and coding?**Language.**I intend to speak French like it's my mother tongue.

## What Happens Afterwards?

I'm likely headed back to the Americas, where I intend to devote myself to an impossibly awesome technology project and team for a period of several years. I'd like a technical mentor as well.

## How Can You Help?

In addition to self-study, I'd like to assist a few fascinating Moroccan technology organizations with their data problems. As such, if you know anyone in-country with even the most fleeting shared interest, please put me in touch.

## In Two Sentences

The Open-Source Machine Learning Master's in Casablanca, Morocco allows me to pursue several significant personal goals at the same time. This is my Francophone machine learning adventure.

## Update: Now Finished, Here's What I Did

#### Publications:

- Neurally Embedded Emojis
- Random Effects Neural Networks in Edward and Keras
- Further Exploring Common Probabilistic Models
- Minimizing the Negative Log-Likelihood, in English
- Transfer Learning for Flight Delay Prediction via Variational Autoencoders
- Deriving the Softmax from First Principles
- Approximating Implicit Matrix Factorization with Shallow Neural Networks
- Ordered Categorical GLMs for Product Feedback Scores
- Intercausal Reasoning in Bayesian Networks
- Bayesian Inference via Simulated Annealing
- RescueTime Inference via the "Poor Man's Dirichlet"
- Generating World Flags with Sparse Auto-Encoders
- Docker and Kaggle with Ernie and Bert
- Recurrent Neural Network Gradients, and Lessons Learned Therein
- Simulating the Colombian Peace Vote: Did the "No" Really Win?

#### Notable courses, books:

- Statistical Rethinking: A Bayesian Course with Examples in R and Stan
- Probabilistic Graphical Models: Representation, Stanford University
- Probabilistic Graphical Models: Inference, Stanford University
- Probabilistic Graphical Models: Learning, Stanford University
- Practical Deep Learning For Coders, fast.ai
- Discrete Optimization, University of Melbourne
- Artificial Intelligence Nanodegree (Part 1), Udacity
- Deep Learning, Udacity
- Apache Kafka, Udemy

#### Code:

Repositories I published (or contributed to) unrelated to the publications above.

- tensorflow-models: Some basic models in TensorFlow
- vanilla-neural-nets: A straightforward and highly readable implementation of vanilla neural nets
- dotify: A web application that recommends songs via "country arithmetic" and hand-rolled Implicit Matrix Factorization
- n-queens-sympy: A simple solver for the N-Queens Problem using SymPy
- markdown-insert-screenshot: A lightweight Atom plugin for saving an interactive screen capture to a relative file destination