OSPF Neighbors Stuck in EXSTART

Wednesday, October 24, 2007 07:06 CEST

OSPF Neighbors Stuck in EXSTART

This problem is rare but tantalizing enough to warrant mentioning: OSPF neighbors are forever stuck in the EXSTART state (occasionally going DOWN and back to EXSTART).

I’ve stumbled across it accidentally in my lab and have luckily seen it before, so I knew immediately what it was.

The moment you start suspecting that something might be wrong with the OSPF adjacencies and use debug ip ospf adj command, the problem becomes obvious: the Database Description packet contains an Interface MTU field and if the value received from the neighbor is higher than the IP MTU configured on the inbound interface, the DBD packet is rejected (section 10.6 of the RFC 2328). The router with the lower MTU complains that “Nbr x.x.x.x has larger interface MTU”; the other router moans about protocol violations (First DBD and we are not SLAVE).

As always, there are two ways to solve this problem:

The correct one: fix the MTU issues;
The other one: disable MTU checks with the ip ospf mtu-ignore interface configuration command (which might be OK if the hardware can receive oversized packets and the router is not using fixed-size input buffers).

OSPF

16 comments:

Anonymous 24 October 2007 18:39

I have got an interesting one.

A few years ago I got called to troubleshoot an OSPF Exstart problem, Both routers were connected together over an international frame relay PVC. Both side had MTU 1500 bytes set on their interfaces initially but OSPF got stuck in Exstart. I knew about the OSPF MTU Mismatch issue back then but this one didn't seem to be it because the MTU size match on both ends. However, I was told it was an international Frame Relay PVC so I asked how the PVC was built. It actually went through three providers, and the provider in the middle had the PVC mtu set at 1100 bytes for some reasons and that was the culprit. The fix, as it turned out, was to lower the interface IP MTU on the customer routers (IP MTU = 1024)because the ospf mtu-ignore bit didn't solve it (this was because the middle Frame Relay provider dropped the over-sized frames at layer 2). It was a very unique problem so I would like to pass along. Nowadays frame relay is going away so we may never encounter a problem like this one.

Replies

Anonymous 11 August 2013 07:56

Thanks for sharing this.

Jay Swan 24 October 2007 19:10

The place I've seen this several times is when running OSPF between a SVI on a 3550 switch and a router, which have different default MTUs.

Anonymous 17 November 2007 01:11

What is the best way (not "ip ospf mtu-ignore") to resolve MTU mismatch between 3550 SVI and router's physical or BVI interface?

Without affecting other switch ports?

I know about "system mtu routing ..." on 3550, but it is system-wide.

Consider that router has BVI interface (which also produces different mtu) and switch has a SVI int.

Router:

bridge 1 protocol ieee
bridge 1 route ip
bridge irb

interface GigabitEthernet0/0
description trunk to 3750
no ip address
!
interface GigabitEthernet0/0.1
encapsulation dot1Q 100
bridge-group 1

interface BVI1
ip address 10.1.1.2 255.255.255.0

router ospf 1
network 10.1.1.0 0.0.0.255 area 0

BVI1 is up, line protocol is up
MTU is 1514 bytes

Ivan Pepelnjak 18 November 2007 13:00

According to this discussion, you can only set system-wide MTU on 3550, not per interface.

Once I get my hands on a Catalyst switch (and have time to spare), I'll run a few tests.

Anonymous 04 December 2007 22:26

Thank you.
So should I set "system mtu routing 1514" on the 3750 to match the bvi's mtu and forget about it?

Any negative consequences?

What about other routers on the same L2 segment with regular routed intefaces? they currently have "ip ospf mtu-ignore" :)

The bvi interface would not take mtu settings.

Thanks,
Vladimir

Ivan Pepelnjak 05 December 2007 08:36

You should set the system MTU to 1500, not 1514 (unless I'm gravely mistaken, the MTU specifies the payload size, not the layer-2 frame size).

There SHOULD be no negative impact, unless the workstations in your LAN use jumbo frames (and let's assume that the switches are not MPLS PE routers :).

As for the BVI interface; I can set the MTU and IP MTU on a BVI interface on a router (using 12.4(15)T1), but as I said in a previous comment, you cannot set per-interface MTU on a Cat3550 at all.

Anonymous 07 December 2007 19:29

Thanks

Nicolas 15 May 2009 18:36

Google got me here with the magic words mtu + ospf while looking for some info regarding this topic for a post in my new blog. I basically wrote the same (in spanish), but added something that I found pretty interesting; lowering back the mtu or removing the ip ospf mtu-ignore and see what would happen. Just the latter would bring us back to the issue. MTU would just be an issue again whenever the adjacency is rebuilt...just my two cents.

Clive 17 December 2009 11:41

Yeh, got a strange issue.

If the MTU is set to 1500 or lower then full adjacency is achieved, anything higher and it stays in 2 way - Anyone got any ideas on that.

Set up is - Juniper -> Foundry -> SmartEdge

Set ups on Juniper and Smartedge as follows:-

Juniper
metric 65535;
retransmit-interval 5;
transit-delay 1;
hello-interval 10;
dead-interval 40;

SmartEdge:

transmit-delay 1
router-priority 0
hello-interval 10
router-dead-interval 40
cost 65534

The only difference I can see is the metric cost, but then why would it work with 1500 but not anything larger?

Ivan Pepelnjak 17 December 2009 11:59

I would suspect the box in the middle is dropping jumbo frames. See also

http://blog.ioshints.info/2009/11/ip-ospf-mtu-ignore-is-dangerous-command.html

Robin M. 07 September 2011 18:58

Funny enough I'm experiencing this issue right now on a Gigabit Ethernet link between two 7609s. Looks like the MTU on the transport network is wrong and the carrier is looking at it now.

New technology, same old problems. :)

jeff 08 February 2012 15:16

hi Robin, im experiencing it right now. i have two routers between two 7609 and sometimes the ospf is going down. how did you resolve the issue?

Anonymous 14 November 2013 17:33

I am having an issue with OSPF, we have HP, Cisco, and H3C in our Area 0. Router priorities are set, remote sites are priority 0 and the main sites are 250 and lower (to specify DR). However, intermittently we are still getting some strange adjacency losses.
This started with an existing network that I am trying to fix. Originally no priorities were set anywhere and all Area 0 routers were set to priority 1 (default). I fixed that and the problem became MORE common - it had been happening once or twice every 3 months.
I discovered then that the NTP server config on all the network equipment was inconsistent. So I fixed that, pointed all devices to the appropriate NTP servers (One of which was the loopback on our core router which had and IP that already existed on the BDR as the router ID). Finally yesterday for the first time in 10 days there were no OSPF messages of adj change in the logs.
All devices have identical MTU,Hello, Dead, and Carrier delay timers.

My questions are:
What affect did NTP have on OSPF? Could all the issues have been resolved by finding that duplicate IP in Area 0? Has anyone else seen issues with this type of mixed environment (HP, Cisco, H3C)?

Replies

Ivan Pepelnjak 15 November 2013 07:21

Duplicate IPs (particularly if they're used for Router ID) could be the root cause of your problems.

Anonymous 15 November 2013 15:57

I agree, that is why I am going through the configs of all the devices on the network very carefully. I didn't build or design this network, but I can sure make it work better and redesign what I can to even improve the original design

Add comment

Recent posts in the same categories

OSPF

16 comments: