Opened 5 months ago

Closed 5 months ago

#1185 closed defect (pending)

Default Route disappears intermittently while using ietf-nat64

Reported by: david.somers-harris@… Owned by: panda@…
Priority: tbd Milestone: ietf-100
Component: nat64 Keywords:
Cc: Clemens, Schrimpe, <csch@…> My Current Location: Canning
My MAC Address: 7c:04:d0:c5:3a:64 My OS: macOS Sierra 10.12.6

Description (last modified by david.somers-harris@…)

While using ietf-nat64, even though the connection works at first, eventually Internet access is lost because the default route disappears.

Attached are the outputs of netstat -nr

  1. Works: route.20171114144623​ (attached)
  2. Broken: route.20171114145135​ (attached)

I reproduced this at least twice on my Mac, and also on Windows.

Attachments (2)

route.20171114144623 (3.1 KB) - added by david.somers-harris@… 5 months ago.
Works
route.20171114145135 (3.0 KB) - added by david.somers-harris@… 5 months ago.
Broken

Download all attachments as: .zip

Change history (15)

Changed 5 months ago by david.somers-harris@…

Attachment: route.20171114144623 added

Works

Changed 5 months ago by david.somers-harris@…

Attachment: route.20171114145135 added

Broken

comment:1 Changed 5 months ago by david.somers-harris@…

Description: modified (diff)

comment:2 Changed 5 months ago by llynch@…

Cc: Clemens Schrimpe <csch@…> added
Component: incomingnat64
Owner: changed from llynch@… to panda@…
Status: newassigned
Type: requestdefect

Also see: https://tickets.meeting.ietf.org/wiki/nat64
for a list of current known issues!

comment:3 Changed 5 months ago by Bill Fenner

"me too". I see no RAs arriving -- e.g., I have only seen 4 RAs since 15:05:19:

fenner:noc-dns fenner$ date; sudo tcpdump -i en0 -s 0 -n 'ip6[40] == 134'
Tue Nov 14 15:05:19 +08 2017
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on en0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:06:42.890004 IP6 fe80::1:1 > ff02::1: ICMP6, router advertisement, length 96
15:06:47.899741 IP6 fe80::1:1 > ff02::1: ICMP6, router advertisement, length 96
15:07:44.928119 IP6 fe80::1:1 > ff02::1: ICMP6, router advertisement, length 96
15:07:49.001407 IP6 fe80::1:1 > ff02::1: ICMP6, router advertisement, length 96

and it's now 15:23. The utun devices seem to be a red herring, it's not that they're pulling the default route away from en0, it's that en0 is losing the default route.

comment:4 Changed 5 months ago by stephenng.ietf@…

This issue is reproduced on a Windows 10 Enterprise (10.0.15063.674).
The default route disappear after about 20 minutes associated to the -nat64 SSID.
It seems could be workaround after reassociation.
From what I see, it seems only happening in the v6 session rooms where the v4 SSIDs are disabled (no proof, may be wrong).

A packet capture from initial association to the problem has been taken and please kindly let me know if you need that.

comment:5 Changed 5 months ago by jim@…

The NOC has reproduced this in Canning during 6MAN. We've reverted back to the normal Dual Stack SSIDs being available, but will test again in Canning for OPSAWG.

comment:6 Changed 5 months ago by jim@…

Just for anyone playing along at home, RA suppression is intentional to limit network impact. Specifically, the wireless controller is configured to allow 3 RAs through every 60 seconds, so the following is "expected behavior":

Jim@Dhcp-00d5:/>sudo tcpdump -i en0 -s 0 -n 'ip6[40] == 134'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on en0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:29:31.202621 IP6 fe80::1:1 > ff02::1: ICMP6, router advertisement, length 96
15:29:36.200147 IP6 fe80::1:1 > ff02::1: ICMP6, router advertisement, length 96
15:29:59.305427 IP6 fe80::1:1 > ff02::1: ICMP6, router advertisement, length 96
15:30:30.383088 IP6 fe80::1:1 > ff02::1: ICMP6, router advertisement, length 96
15:30:35.212021 IP6 fe80::1:1 > ff02::1: ICMP6, router advertisement, length 96
15:30:39.220282 IP6 fe80::1:1 > ff02::1: ICMP6, router advertisement, length 96
15:31:33.301240 IP6 fe80::1:1 > ff02::1: ICMP6, router advertisement, length 96
15:31:37.306255 IP6 fe80::1:1 > ff02::1: ICMP6, router advertisement, length 96
15:31:42.304711 IP6 fe80::1:1 > ff02::1: ICMP6, router advertisement, length 96

However, what we saw in 6MAN was MUCH larger gaps than this. We're still exploring what happened there. Please report if you see this misbehavior happening again...

comment:7 Changed 5 months ago by Clemens Schrimpe

My 2¢: Up the RA rate!

At present RAs are being "broadcast" (802.11-wise), hence not protected against interference stepping on to them. If you happen to be in a noisy area that might smudge enough of them that your device forget's about the roter.

We could also extend the lifetime of our RAs, as our MX80s are the *only* official routers in each subnet and -due to VRRP- they both share the same link-local address anyway. So it's them or nobody → looooong lifetime is ok!

comment:8 in reply to:  7 Changed 5 months ago by cpetrie@…

Replying to Clemens Schrimpe:

We could also extend the lifetime of our RAs, as our MX80s are the *only* official routers in each subnet and -due to VRRP- they both share the same link-local address anyway. So it's them or nobody → looooong lifetime is ok!

This.
The prefix info has a valid lifetime of 30 days, and a preferred lifetime of 7 days.
If your current VRRP setup means that the RA next-hop L2/L3 info should not have to change, then there is no reason for it to be set to 10 minutes

My 2c :)

comment:9 in reply to:  7 Changed 5 months ago by Bill Fenner

Replying to Clemens Schrimpe:

My 2¢: Up the RA rate!

At present RAs are being "broadcast" (802.11-wise)

This is not the case. Each AP unicasts the RAs to each associated station.

There is strong evidence that the AP "forgets" about a station on this list of things to unicast the RA to:

1) Warren added a default route explicitly pointing to the router, and immediately started getting RAs again.
2) Jen solicited a router advertisement, and immediately started getting (all) RAs again.

comment:10 Changed 5 months ago by Clemens Schrimpe

Hmmm, that was my assumption too and I asked yesterday, but was assured, that this mode of operation does not apply to our current setup?!

So if there are different opinions about this, we should investigate and *verify* by sniffing the air, because this plays an important role in this case.

comment:11 in reply to:  10 Changed 5 months ago by Bill Fenner

Replying to Clemens Schrimpe:

Hmmm, that was my assumption too

On my part, it is not an assumption. Both Warren and I saw packets unicast at the MAC layer to our individual MAC addresses, and on #wireless in slack, Joe verified

The APs are turning this into unicast.

comment:12 Changed 5 months ago by jim@…

This morning, the RA lifetime was changed from 600s to 9000s to ameliorate the bad behavior. We'll continue to monitor and will debug with Cisco to see if there's a bug here...

comment:13 Changed 5 months ago by jim@…

Resolution: pending
Status: assignedclosed

This has been clearly identified as a bug on the Cisco side. Joe Clarke has collected extensive information on the misbehavior and has submitted Cisco Bug CSCvg79061 as a Severe. We'll work with Cisco in hopes of getting this resolved before London. Closing out this ticket as Pending as there's no further action for this meeting.

Note: See TracTickets for help on using tickets.