Does Richard's distributed storage have a flaw?

In season 4 of Silicon Valley, in his attempt to create the new internet, Richard finds ways to store large amount of data on distributed devices like phones, and even smart fridges. Needless to say that redundant copies of the data segments would reside on multiple devices. When the user requests the data, the server perhaps broadcasts a message to who is online and who can provide the data segments needed. Not knowing Richard's code, it is possible that the devices would start streaming the data right away.

Won't this cause an extraordinary amount of network traffic, even with Richard's compression algorithm?

realism silicon-valley

Best Answer

It depends...

The problem with a distributed storage system would not so much be joining the group and requesting data. That would be comparable to what bandwidth is used for services like bittorrent/dropbox/onedrive/google drive/... today. And the bandwidth usage of those services is small compared to media streaming services like netflix/youtube/twitch/... .

That being said, maintaining high availability and reliability of the data may be the bigger issue. If you can store data mainly on 'always-on' devices this is fairly simple, you choose a few group members, shard and replicate the data and it can live there for a long time. Only when a member dies you choose a new replication target.

However, most end-user devices with realistic app-install and storage capacity (i.e. not fridges) are laptops, desktops, phones, tablets, game consoles, NAS, ... . Only a few of these are always-on, most have the annoying habit of being switched off or losing connectivity on a regular basis. If this is the main bulk of your storage group/swarm, you need a fairly high replication rate to cover a multi-device loss and every time this rate dips too low because of many disconnect you need new replications. If this operations becomes frequent you may have huge bandwidth usage just to maintain the swarm.

So in this universe, if the company can gets lots of storage on always-on-devices the answer is likely no, the traffic will not be extraordinary. If they however mainly rely on more volatile devices the answer is most likely yes. Since at that point in the show they were targeting phones I don't think this would have worked very well, but we never found out.

But who knows, maybe aside from magical lossless compression algorithms, the silicon valley universe also has magical infinite (wireless) bandwidth :).

Pictures about "Does Richard's distributed storage have a flaw?"

Does Richard's distributed storage have a flaw? - Aerial Photography of Container Van Lot

Does Richard's distributed storage have a flaw? - Men Working in a Warehouse

Does Richard's distributed storage have a flaw? - Photo of Warehouse

KING RICHARD – Official Trailer

More answers regarding does Richard's distributed storage have a flaw?

Answer 2

it is possible that the devices would start streaming the data right away.

Possible but highly unlikely - I work in the field of enterprise distributed storage and while he could theoretically do that it's highly unlikely he would. He's make out to be a very good designer/coder and while they could use that design flaw for narrative/humorous reasons in reality he wouldn't do that and/or it would fail tests.

In general, and this is a very generalistic explanation, the way distributed filesystems work is that, as you say, the blocks/fragments are encrypted and distributed to N+1 nodes and a record is made in a distributed database (typically an in-memory key/value DB rather than one with referential-integrity such as SQL) stating the inode, block reference, encryption key references and node names. This entry is itself replicated amongst DB nodes (often the same nodes as the storage node code) for resilience in the same way as the actual block data. This way when a client requests a file (and access authentication is passed) the service node (again could be a combined node with DB and block) looks up the file inode reference/s, which are served by the DB node network, then individual get requests are made to the block nodes for the various blocks and the file is then assembled in order and unencrypted by the service node, which then serves the file to the client and updates the various metadata interfaces to show that the file was read. So essentially each block is typically only read once (you could opt for a multi-node parallel read if you wanted to benefit from race-conditions) and therefore the data wouldn't cause a flood. Is that ok?

Sources: Stack Exchange - This article follows the attribution requirements of Stack Exchange and is licensed under CC BY-SA 3.0.

Images: ELEVATE, Tom Fisk, Tiger Lily, Cleyder Duque