Mega is here, and you’ve been hearing a lot about its encryption, as well as it not really working too great just yet. But maybe the most important thing is Mega’s promise of being less of a lawsuit magnet. A lot of steps have been taken there, but there’s one that stands out as the biggest: Mega doesn’t use de-duplication.
Let’s talk de-duplication. It’s a pretty simple idea, with some widespread consequences. “De-duplication” basically means that a file storage system — in this case Mega or Megaupload — scans files as they come in. If they are recognised as something that’s been uploaded previously, the system will not store the new files, and instead reference back to the version already on the servers. In addition to being a great space-saver, this can be an easy way to wipe out all versions of a copyright-infringing file in one swoop. In fact, not doing that will get you in trouble, if the option is available. Which is why Mega takes that option off the table.
Megaupload did use de-duplication, which also saves on cost, but when it would receive a takedown request for copyright violations — you know, stemming from any of the zillion message boards and blogs posting illegal download links — Megaupload would only disable the one reported link, instead of every link associated with that file, and the file itself. So copyright holders, who already want to ritualistically disembowel people like Dotcom, didn’t take it well when the found out that, systemically, Megaupload had to know that copyright-protected files were being left up. For all the conspicuous consumption and willful ignorance involved with the Megaupload case, that was as big a factor as any.
Now, there are legitimate reasons for a system like Megaupload to not just nuke all versions of a file. Plenty of people use these lockers for legitimate storage of music and movies, and never share them with anyone, only using them to transfer data from one machine to another. Nuke all associated links to a file, and you wipe out all the legal users’ music too. It’s a complicated problem to solve, and would require a lot of traffic and use analytics that would compromise anonymity and probably raise more than a few privacy issues. Megaupload’s problem, though, was that it basically just ignored the problem completely. Mega’s solution to his tricky situation is that, since you encrypt files as you upload them, the service can honestly say it has no idea what you’re storing on its drives.
The cost and overhead of associating a different file for every upload is significant, but other services have done it for a while. Rapidshare never ran into the kinds of link-based problems that Megaupload did, despite a huge amount of lawsuits of its own. Combined with Mega’s considerable encryption, this should be as good a shield against piracy hawks for a site that’s basically entirely about piracy. Much more so than the flimsy buck-passing Terms of Service, at least. There will be other threats to the service — a group is already trying to shut down Mega’s finances — but no de-duping is one more finger in the dam.