Twilight Zone: File Transfer Causes Link Drop
Long long time ago, we built a multi-protocol WAN network for a large organization. Everything worked great, until we got the weirdest bug report I’ve seen thus far:
When trying to transfer a particular file with DECnet to the central location, the WAN link drops. That does not happen with any other file, or when transferring the same file with TCP/IP. The only way to recover is to power cycle the modem.
Try to figure out what was going on before reading any further ;)
I got onsite, the customer started the file transfer, and (as claimed) the link dropped… but when the customer reached for the power-off button on the modem, I noticed something weird: the “remote loopback” LED was on.
We power cycled the modem, the link went up, routing protocols did their job, we restarted the file transfer… and the remote loopback LED turned on. The link went down a few seconds after that.
Testing WAN links was a big deal in those days1, and one of the tests was the loopback test: put a modem into a state where it would transmit every bit it received. You could do a local loopback test (loop 3 in V.54 recommendation), where the modem would create a loop as close to the physical line as possible, allowing you to test the DTE-DCE connection2 and the local modem. In a remote loopback (loop 2 in V.54 recommendation), the modem would create a loop on the WAN link, so you’d be able to test all the components of a WAN link apart from the remote CPE.
A remote loopback could be triggered with a button on the modem front panel, but that obviously required an on-site person able to follow instructions. The remote loopback could also be triggered remotely: a modem would send a weird sequence of bits to the remote modem which would enter the remote loopback state until it would receive another weird sequence of bits.
CCITT3 designed the weird sequence of bits to be something that was almost impossible to occur in a real-life network. The V.54 recommendation includes a further protection for HDLC links4:
In order to provide protection against false recognition caused by user HDLC frames, the bit sequence consisting of seven consecutive binary 1s, which is at present in the preparatory pattern, must be included in the recognition criteria.
I don’t know whether it was the almost part, or the modem designers not following the V.54 recommendation to the letter5 (or they got it wrong). In any case, that particular file contained the precise sequence of bits needed to throw a modem into remote loopback, and DECnet sliced application data into packets in a slightly different way than TCP/IP, so the loopback was triggered only when transferring files with DECnet.
Once we go that far, it was trivial to solve the problem: open the modem, and flip the enable remote loopback DIP switch to off.
More to Explore
Why don’t you check out How Networks Really Work webinar – you can watch numerous videos in that webinar with Free ipSpace.net Subscription.
When we ordered our first international leased line, one of the final steps in the provisioning process involved a 24-hour test of the circuit. We even got a test report proving they did their job. ↩︎
Data Terminal Equipment, oftentimes called a router by the uninitiated. It’s usually connected to Data Circuit-terminating Equipment (DCE), colloquially called a modem. ↩︎
The entity now known as ITU. ↩︎
Seven consecutive ones is a signal to abort the current frame, and is almost never used. ↩︎
I don’t remember the modem manufacturer, but it was one of those “creative” baseband modems that managed to push 1 Mbps over a telephone circuit that was supposed to be able to carry 28 kbps… obviously only when the stars were properly aligned. Considering that, I wouldn’t be surprised if there were further mismatches between what the modem did and CCITT recommendations. ↩︎
... or the perils of "in-band" signaling.
Reminds me of the hacks that could be played on public telephones to place free calls.
Experienced similar issues in 90ties when migrated Customer from X25 Eicon interface PC adapters to Cisco 2500 with RS232D. It ended up with P1 at Cisco TAC. We had to implement hw fix (some cable modification).
Had a very similar problem in the 90s at a remote site in Tahiti randomly dropping out. Eventually I found I could also trigger it with a "wr t" and then narrowed it down to the single line of config. The local service provider kept refusing to believe it was anything to do with them.
In the end I had to travel there 🙂 with a protocol analyzer (international customs nightmare). I recreated it in front of their eyes and stayed their until they replaced the line driver/baseband modem - which they previously said they had already done. One of the more bizarre faults