Reading: 3 minutes

The Day Netflix Refused to Integrate the $1 Million Algorithm

After awarding $1 million to the winners of a competition, Netflix decided not to use their algorithm.

Header image for article: The Day Netflix Refused to Integrate the $1 Million Algorithm

In 2007, Netflix created a competition titled Netflix $1 Million Challenge where it awarded one million dollars to the best algorithm that improved its platform. In 2008, the winning team was able to create an algorithm that improved Netflix's recommendation system by 10.06%, but Netflix ultimately decided not to integrate it into its platform.

How is it possible that Netflix paid that sum of money for something it wasn't even going to use? In Netflix's own words:

We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.

In short, the effort to implement it was going to far outweigh the benefit it would bring to the system.

Photo of team BellKor receiving the prize
Photo of team BellKor receiving the prize

Netflix had already published a post discussing its recommendation system, as well as the reasons why it decided to reject certain algorithms. It is clear that Netflix does not throw away good software after paying a large sum of money without good reasons. In fact, since they started in 2007 they have integrated code from several winners into their platform.

A year into the competition, the Korbell team won the first Progress Prize with an 8.43% improvement. They reported more than 2000 hours of work in order to come up with the final combination of 107 algorithms that gave them this prize. And, they gave us the source code. We looked at the two underlying algorithms with the best performance in the ensemble: Matrix Factorization (which the community generally called SVD, Singular Value Decomposition) and Restricted Boltzmann Machines (RBM). SVD by itself provided a 0.8914 RMSE (root mean squared error), while RBM alone provided a competitive but slightly worse 0.8990 RMSE. A linear blend of these two reduced the error to 0.88. To put these algorithms to use, we had to work to overcome some limitations, for instance that they were built to handle 100 million ratings, instead of the more than 5 billion that we have, and that they were not built to adapt as members added more ratings. But once we overcame those challenges, we put the two algorithms into production, where they are still used as part of our recommendation engine.

Netflix values the effort of the participants in its competition. These are not weekend hackathon-style contests. Many of these algorithms have hundreds of hours behind them. In the case of team BellKor (the 2008 winner), it is estimated that they dedicated around 2,000 hours to its development and collaborated primarily by email (in fact, they met in person for the first time when they publicly received the prize).

What we need to be clear about is that in this type of competition, companies are in no way obligated to integrate the winning solutions into their platforms. After all, it is third-party code and, as we have seen, sometimes the integration effort is so great and the benefits so imperceptible that, ultimately, it is not worth it.