Technology

Blockchain And The Cloud: How Data Storage Could Change In The Coming Years

Realizing what "the cloud" really means can be a letdown. Sure, sloughing all that data off was liberating. Companies no longer have to maintain their own finicky servers; individuals no longer have to devote so much cash to hardware or care to keeping everything backed up. The cloud untethers us from expensive circuitry and not a little stress.

Except, as the line goes, there is no cloud. It's just someone else's computer. Rather than ascending to the heavens, vast quantities of individual and enterprise data were dumped into server farms run by the likes of Google, Amazon and Microsoft. These companies are good at running server farms, and the user experience is for the most part above reproach. (I'm writing in Google Docs right now.)

But if you cringe at such deep, far-reaching centralization—if the Equifax breach makes you shudder at the headlines the coming years could bring—this situation won't do. Blockchain-based applications, which are trying to decentralize a range of digital services, are particularly reluctant to use Microsoft Azure or Amazon Web Services. That leaves local storage on user devices that are no longer designed for it.

Unless, that is, we can decentralize storage the way bitcoin has decentralized financial transactions.

A number of projects are working to make that happen. Their solutions employ various combinations of blockchain-based tokens, which storage renters use to compensate storage providers; smart contracts, which govern these transactions; encryption, which keeps the data secure; sharding, which keeps it manageable; and secure multiparty computation, which keeps it hidden even when it's being run through algorithms.

Through these techniques—some of which predate bitcoin, some of which are only now being perfected—decentralized storage platforms aim to accomplish a shared goal: to provide secure, reliable storage that doesn't rely on cloud providers or any other trusted party.

Storj and Sia

Storj, pronounced "storage," is the furthest along in terms of bringing a decentralized storage application to market. The Atlanta-based company has a working product and—mercifully—a GUI, meaning users don't have to contend with the command line (though some actions do require editing slightly intimidating json files).

So how does Storj work? The application takes the files you'd like to store on someone else's hardware and encrypts them -- that is, it scrambles them using cryptographic techniques so that only someone in possession of a specific key can render them readable again. It then shards the files, meaning that it splits (and occasionally combines) them into fragments of uniform size. These are divvied up among different nodes for storage, so that a given file might be stored across a few continents.

Sharding not only eases the storage of large files, it adds security, making it difficult for an attacker who's somehow broken the encryption to assemble the complete file. More practically—we'd have bigger problems if industry-standard encryption were broken—sharding disguises files' movements. Without it, someone snooping around could follow a certain file's movements based on its size, even if the contents couldn't be read. The size might also betray the file's type.

Another key feature of Storj is redundancy. Centralized cloud providers need to keep multiple copies of customers' data, in case some calamity befalls a server farm. Decentralized storage platforms have an even greater need for such precautions, since they cannot trust a given node not to vanish. In addition to making multiple copies of files, Storj incentivizes uptime by factoring it into its token payout calculations.

While Storj only pays storage providers, or "farmers," in tokens, users can pay through a number of methods. Prices are quoted in fiat: $0.015 per gigabyte stored per month and $0.05 per gigabyte downloaded. There is currently a waitlist for prospective users.

A similar project, Sia, is geared specifically towards businesses, meaning it's strict in the way it prioritizes storage providers' uptime. Unlike Storj, it records transactions between storage renters and "hosts" onto a dedicated blockchain, and hosts must pledge collateral for renters to claim if their data becomes unavailable. Higher collateral is more attractive to renters, introducing an element of competition. These agreements are written into smart contracts and specify timeframes among their terms.

IPFS and Filecoin

The InterPlanetary File System, or IPFS, has ambitions that go beyond distributed storage. The goal, according to the whitepaper (that's an IPFS address by the way), is "to connect all computing devices with the same system of files," displacing HTTP. The project bills itself as the "distributed web"—a sort of real-life Pied Piper. On top of that protocol, IPFS' creator Juan Benet is building an "incentive layer" called Filecoin.

Filecoin is similar to Storj and Sia in that it connects storage providers with those looking to rent space; the latter compensate the former using an eponymous token. The network has a unique design, however, that—while not yet implemented—generated enough excitement to raise over $200 million in September.

As with Sia, data storage on the Filecoin network is governed by smart contracts written onto a dedicated blockchain. In contrast to other networks, though, Filecoin assigns three different roles to participants: clients, who rent storage; storage miners, who provide that storage and post collateral to keep them honest; and retrieval miners, who are responsible for getting stored data to clients upon request.

Storj says its "farming" nodes are comparable to cryptocurrency miners in that they offer their hardware to the network and receive tokens, but these nodes play no role in maintaining the ethereum blockchain. Sia's blockchain is mined in the traditional way, through proof of work (the hash algorithm is Blake2b); storage nodes, while subject to blockchain-based contracts, are not active in updating the ledger.

In Filecoin's solution, "storage miners" actually mine, using a new consensus mechanism called "proof of spacetime." By repeatedly broadcasting proof that they are storing the data assigned to them in smart contracts, they create a proof that they've stored the data over time. This "useful" work replaces hashing for hashing's sake as the mechanism for securing the blockchain.

Retrieval miners, on the other hand, are a bit of a misnomer. In order to avoid bottlenecks, they mostly operate off-chain, acting as intermediaries between storage nodes and clients who want to download data. If they sound like trusted parties, they are, but Filecoin minimizes the risks by splitting data delivery up into a string of tiny transactions: a fraction of the total data for a fraction of the total payment. That way if one party fails to deliver, the exchange can be stopped before too much damage is done.

Enigma

One issue that none of the above solutions address is the fact that—for all the security that sharding, encryption and other cryptographic techniques provide—there is a glaring divide between data's security while in storage and its vulnerability outside of it. The moment you want to edit your data or run it through an algorithm, you have to reassemble and decrypt it. Share it with a third party in that form, and you have no choice but to trust that everything will be fine.

Enigma aims to remedy that. Beyond providing distributed storage, the project (which is still in early stages of development) would allow data to be computed over while still in decentralized, encrypted form. The technique, known as secure multiparty computation, could allow data to be useful and valuable even while sitting in secure storage. An Equifax, for example, could obtain a credit score using your financial data without ever being able see it. Or lose it.

Do they stand a chance?

As even some distributed app developers will admit, people don't always care about decentralization. When you need some item your local corner store lacks, you Google information on it, head to Amazon and type in your Visa card number. Citi transfers Federal Reserve tender from your account to the vendor's. (You then see Facebook ads about it for a week, as though you haven't already bought it.)

In this centralized milieu, it's hard to  make the case that switching to a distributed storage platform from Amazon S3 is a priority. Decentralizing storage does remove single points of failure, but a string of high-profile hacks has not yet made a dent in demand for centralized services.

Low costs might do the trick. Storj's offering appears to be cheaper than S3's, though the comparison is potentially complicated. Storj's whitepaper argues that "an open market for data storage may drive down costs for various storage services by enabling more parties to compete using existing devices." Then again, the bitcoin whitepaper also promised to push down transaction costs. That didn't last long (though Nakamoto's rationale and Storj's differ).

Then there are the legal questions. With no way to know what—or whose—data you're hosting, how can you avoid harboring illegal content? What about EU rules governing the treatment of user data, or requirements that Chinese citizens' data be housed in-country?

Storj, Filecoin, Sia and Enigma are off to a promising start, but replacing server farms with decentralized storage has some way to go.

The views and opinions expressed herein are the views and opinions of the author and do not necessarily reflect those of Nasdaq, Inc.

David Floyd

David Floyd is an Atlanta native and a Kenyon alum living in Brooklyn. He writes about the intersections of investing, politics, energy and international relations. His work also appears at Investopedia.

Read David's Bio