Your browser failed to load CSS style sheets. Your browser or web proxy might not support elliptic-curve TLS

Building network automation solutions

6 week online course

reserve a seat
back to overview

Ethernet Checksums Are Not Good Enough for Storage (Updated)

A while ago I described why some storage vendors require end-to-end layer-2 connectivity for iSCSI replication.

TL&DR version: they were too lazy to implement iSCSI checksums and rely on Ethernet checksums because TCP/IP checksums are not good enough.

It turns out even Ethernet checksums fail every now and then.

2015-12-06: I misunderstood the main technical argument in Evan’s post. The real problem is that switches recalculate CRC, so the Ethernet CRC is no longer end-to-end protection mechanism.

2015-11-15: Fixed: TCP/IP checksums are not XORs (thanks to Andrew Yourtchenko).

TCP and IP checksums are simple XOR operations ones' complement sums, and we know they’re weak. As Evan Jones explained in his blog, you might expect that one in ~65000 corrupt packets won’t be detected, which combined with pretty low error rates we see on Ethernet these days might be good enough… or not, if you’re Twitter and dealing with petabytes of traffic.

Ethernet CRC is supposed to save the day. After all, a switch receiving a packet checks the CRC regardless of whether the packet is subsequently bridged or routed. Ethernet CRC should reliably detect transmission errors, and the TCP/IP checksums should detect extremely rare intra-device data corruption errors… or so the theory goes.

In practice, there’s a gap between theory and practice: cut-through switches (becoming yet again ever more popular due to reduced latency) don’t check the CRC. Even worse, they slap the correct CRC on corrupted data, making it impossible to detect the corruption further down the line (more details in an excellent blog post by John Harrington), and the store-and-forward switches recalculate the CRC, which thus no longer protects the integrity of Ethernet frame between end hosts.

Evan’s conclusion: if you care about data integrity, implement application-level checksum, preferably using CRC32C, which is implemented in hardware on recent CPUs.

Also note: Stretched VLAN is not a data protection feature for your iSCSI network. If the iSCSI or NFS solution you’re using doesn’t support application-level checksums, your data is at risk no matter what.

Finally, how many application-level protocols apart from SSL/TLS and iSCSI (when implemented) implement an application-level checksum? Please write a comment!

9 comments:

  1. Criag Partridge, from BBN, published a couple of research papers around 2000 based on experiments discussing this issue and related ones

    ReplyDelete
  2. How does cut-through introduce a correct CRC when the incoming one fails? If I understand correctly, the post by John Harrington says that cut-through switches ‘stomp’ the outbound FCS when the inbound CRC fails, meaning they forward a garbage (incorrect) FCS. It gets over all the network (since you cannot un-forward the frame by the time you detect a corrupt CRC), but it reaches the destination with an incorrect FCS. Shouldn't that be detected by the iSCSI endpoint to discard the frame? Or is it that switches actually calculate a new CRC when they route between different subnetworks, without 'stomping'?

    ReplyDelete
  3. You are correct, CRC errors are detected even in case of cut-through switching.

    ReplyDelete
  4. Yes, CRC errors are detected even in cut-through mode, at least on Cisco Nexus 9000: http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/7-x/layer2/configuration/guide/b_Cisco_Nexus_9000_Series_NX-OS_Layer_2_Switching_Configuration_Guide_7x/configuring_switching_modes.html#concept_73023BB781B14EBCB7048F8B0CA7189D

    ReplyDelete
  5. http://www.cisco.com/c/en/us/products/collateral/switches/nexus-5020-switch/white_paper_c11-465436.html

    "http://www.cisco.com/c/en/us/products/collateral/switches/nexus-5020-switch/white_paper_c11-465436.html"

    Maybe nexus works differently...

    ReplyDelete
  6. Whereas a store-and-forward switch drops invalid packets, cut-through devices forward them because they do not get a chance to evaluate the FCS before transmitting the packet.

    ReplyDelete
  7. Does anyone know how to tell if the switch you are configuring/using is a cut-through type?

    ReplyDelete
  8. Francois Labonte06 December, 2015 00:27

    I wholeheartedly agree with the idea that one should not rely on the Ethernet or ip/tcp checksums to verify the integrity of your data. These things are very fast to check on a modern CPU and you can avoid incredibly hard bugs.

    It would be a serious bug for a cut-through switch to put a correct CRC on a frame that had an incorrect CRC upon reception and none of the cut through switches available since 2006 have had this bug. The real issue with cut-through and bad CRC is that they don't get dropped and keep using precious bandwidth, though in reality unless you have a lot of FCS errors in your network, this is not a big deal at all.

    Also note that Ip checksum is only on the header. TCP checksum covers the payload with a weak 16b ones complement checksum. So UDP packets basically rely only the ethernet CRC to protect the payload.

    What can happen very rarely is a piece of networking equipment could have some memory where some bits are stuck at 0 or 1. If that memory buffers packets after the CRC is checked and happens not to be protected with parity or ECC, packet payload is now corrupted and the networking equipment generates a new CRC that is correct for the corrupted payload. Here's a good description of such a failure from Edgecast ( Verizon CDN ):

    https://www.verizondigitalmedia.com/blog/2015/02/being-good-stewards-of-the-internet/

    ReplyDelete
    Replies
    1. You (and everyone else) is absolutely right - I misunderstood the main technical argument in Evan's post. Fixed.

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Sidebar