Ever wonder how that latest season of Orange Is The New Black actually made its way across the planet from Netflix's little home in Los Gatos, California to your big-screen TV? Netflix engineers have talked in depth about how one copy of a new video is converted into dozens of different resolutions and bit-rates, uploaded to a storage service, and then distributed as smartly as possible to thousands of different Netflix nodes around the world — including the one closest to your home.
A new blog post (called Netflix and Fill, the best title ever) goes into some serious detail about Netflix's Open Connect Appliances, the boxes installed at major internet service providers and worldwide routing locations that store an entire copy of the region's Netflix content. An algorithm determines exactly what each Open Connect box holds: "Each manifest cluster gets configured with an appropriate content region (the group of countries that are expected to stream content from the cluster), a particular popularity feed (which in simplified terms is an ordered list of titles, based on previous data about their popularity), and how many copies of the content it should hold. We compute independent popularity rankings by country, region, or other selection criteria."
Because, like any internet-connected device, Netflix's OCAs are I/O-limited — they only have a certain amount of total disk bandwidth in and out to use for serving shows to customers and simultaneously for drawing down new and updated content — the company needs to use accurately-predicted fill windows, times when the lowest possible number of users are hitting Netflix's servers, to transfer new content. It's the same concept as trying to write a large folder of files to your hard drive as you're simultaneously reading another large folder to write it to an external disk, slowing both operations down — except one a scale a million times larger.
The inital files — in a variety of resolutions and bit-rates, and sometimes with extra data like HDR encoding, are initially stored on Amazon's S3 storage. From there, every gigabyte of data transfer around the world costs Netflix money — no precise figure is available, but the company spent $US800 million on technology costs in 2016 — so the company offloads the data to its private network of OCAs as efficiently as possible, and uses a tiered system of access to distribute content. First an Open Connect box calculates whether it is able to pull content from another box in its same cluster, then from another nearby cluster or a cluster further away, and then finally from the original S3 source.
And, then, once that hypothetical piece of content has reached enough Open Connect appliances in a region — not all of them, but enough to progressively address expected load while the title is loaded, again as efficiently as possible, to other OCA boxes — it is marked as live and appears in your Netflix library. Et voila. The process continues to run in the background, further propagating the data within a region and sharing it to other regions, and then the whole thing happens over again ad infinitum with any content updates or new content.