I long while ago I stumbled upon an excellent resource describing why distributed systems are hard (what I happened to be claiming years ago when OpenFlow was at the peak of the hype cycle ;)… lost it and found it again a few weeks ago.
If you want to understand why networking is hard (apart from the obvious MacGyver reasons) read it several times; here are just a few points:
- Distributed systems are hard because they fail more often;
- Writing robust distribute systems costs more than writing robust single-machine system;
- Coordination is hard;
- Find ways to be partially available;
The one thing I’d add to the list is “you have to deal with byzantine failures”.
Next time someone tells you “networking engineers are so obtuse, we solved $whatever in some other domain in no time” point him to this document… not that it would help, RFC 1925 rule 4 cannot be beaten.