Here’s another “do these things ever disappear?” question from Enrique Vallejo:
Regarding storage, is Fibre Channel still a thing in 2022, or most people employ SATA over Ethernet and NVMe over fabrics?
TL&DR: Yes. So is COBOL.
To understand why some people still use Fibre Channel, we have to start with an observation made by Howard Marks: “Storage is different.” It’s OK to drop a packet in transit. It’s NOT OK to lose data at rest.
The crazy pace of webinar sessions continued last week. Howard Marks continued his deep dive into Hyper-Converged Infrastructure, this time focusing on go-to-market strategies, failure resiliency with replicas and local RAID, and the eternal debate (if you happen to be working for a certain $vendor) whether it’s better to run your HCI code in a VM and not in hypervisor kernel like your competitor does. He concluded with the description of what major players (VMware VSAN, Nutanix and HPE Simplivity) do.
On Thursday I started my Ansible 2.7 Updates saga, describing how network_cli plugin works, how they implemented generic CLI modules, how to use SSH keys or usernames and passwords for authentication (and how to make them secure), and how to execute commands on network devices (including an introduction into the gory details of parsing text outputs, JSON or XML).
The last thing I managed to cover was the cli_command module and how you can use it to execute any command on a network device… and then I ran out of time. We’ll continue with sample playbooks and network device configurations on February 12th.
You can get access to both webinars with Standard ipSpace.net subscription.
I’m running a hyperconverged infrastructure event with Mitja Robas on April 6th, and so my friend Christoph Jaggi sent me a list of interesting questions, starting with:
What are hyperconverged infrastructures?
The German version of the interview is published on inside-it.ch.
How about replacing dedicated storage boxes with distributed file system?
In late September, Howard Marks will talk about software-defined storage in my Building Next Generation Data Center course. The course is sold out, but if you register for the spring 2017 session, you’ll get access to recording of Howard’s talk.
A while ago I described why some storage vendors require end-to-end layer-2 connectivity for iSCSI replication.
TL&DR version: they were too lazy to implement iSCSI checksums and rely on Ethernet checksums because TCP/IP checksums are not good enough.
It turns out even Ethernet checksums fail every now and then.
2015-12-06: I misunderstood the main technical argument in Evan’s post. The real problem is that switches recalculate CRC, so the Ethernet CRC is no longer end-to-end protection mechanism.
When I wrote my stretched VSAN post, I thought VSAN uses asynchronous replication across WAN. Duncan Epping quickly pointed out that it uses synchronous replication, and I fixed the blog post.
The “What about latency?” question immediately arose somewhere in my subconscious, but before I could add that thought to the blog post, Anders Henke wrote a lengthy comment that totally captured what I was thinking, so I’m including it in its entirety:
TL&DR answer: it makes way more sense than long-distance vMotion. However…
Olivier Hault sent me an interesting challenge:
I cannot find any simple network-layer solution that would allow me to use total available bandwidth between a Hypervisor with multiple uplinks and a Network Attached Storage (NAS) box.
TL&DR summary: you cannot find it because there’s none.
A while ago Cisco added dynamic FCoE support to Nexus 5000 switches. It sounded interesting and I wanted to talk about it in my Data Center Fabrics update session, but I couldn’t find any documentation at that time.
In the meantime, the Configuring Dynamic FCoE Using FabricPath configuration guide appeared on Cisco’s web site and J Metz wrote a lengthly blog post explaining how it all works, triggering a severe attack of déjà vu.
One of my regular readers sent me a long list of FCoE-related questions:
I wanted to get your thoughts around another topic – iSCSI vs. FCoE? Are there merits and business cases to moving to FCoE? Does FCoE deliver better performance in the end? Does FCoE make things easier or more complex?
He also made a very relevant remark: “Vendors that can support FCoE promote this over iSCSI; those that don’t have an FCoE solution say they aren’t seeing any growth in this area to warrant developing a solution”.
A while ago I wrote about ATAoE and why I think a layer-2-only TFTP-like protocol shouldn’t be used these days. As always, the answer to that black-and-white opinion (and I’m full of them) is “it depends” – ATAoE works great if you do it right.
Nicolas Vermandé sent me a really interesting question: “I've been looking for answers to a simple question that even different people at Cisco don't seem to agree on: Is it a good idea to class IP traffic (iSCSI or NFS over TCP) in pause no-drop class? What is the impact of having both pauses and TCP sliding windows at the same time?”
With all the Puppet buzz I’m hearing and claims that “compute and storage orchestration problems have been solved” I wanted to check the reality of those claims – is it (for example) possible to create a LUN on a storage array using a standard well-defined API.
Stephen Foskett, Simon Gordon and Scott Lowe quickly pointed me in the right direction: SMI-S. Thank you!
Chris Marget recently asked a really interesting question:
I've encountered an environment where the iSCSI networks are built just like FC networks: Multipathing software in use on servers and storage, switches dedicated to "SAN A" and "SAN B" VLANs, and full isolation of paths (redundant paths) between server and storage. I understand creating a dedicated iSCSI VLAN, but why would you need two? Isn’t the whole thing running on top of TCP? Am I missing something?
Well, it actually makes sense in some mission-critical environments.
Update 2015-12-06: Ethernet checksums are not a workaround for lack of iSCSI-level checksums. If your iSCSI solution doesn't support application-level checksum, your data might be at risk