N7K vPC limitation with single 10GB line card – what to do?

THE SETUP

Dual 7K (with dual sup and single 10gb line card such as N7K-M132XP-12 in each 7K)

NexusEnv

vPC domain between 7Ks and 5Ks
- Physically connected on single 10gb line card
- vPC 2

Peer-link beetween 7Ks
- Physically connected on single 10gb line card
- vPC 1

Keep alive link between 7Ks
- Physically connected on separate line card

Dual 5K
- Peer-link between 5Ks
- Keep alive link between 5Ks

Pretty typical data center deployment of Nexus environment.


Here’s the drag. Peer link on 7Ks is on the same 10gb card as the links to the 5Ks. Keep alive link is on a different line card. If the 10gb line card fails on 7K1 (primary role for vPC), 7K2 will lose its peer-link to 7K1. This is where you want 7K2 to take over. No dice. Keep alive link is still up, so 7K2 thinks that 7K1 is still up and active. Since 7K2 lost its peer-link, it shuts down all it vPCs to avoid split brain scenario, including vPC to the 5Ks. So now you have lost complete connectivity to 5Ks and anything else on vPCs connected to 7Ks. Bad day when this happens.

What to do? Technically, the easiest solution is to make sure you have dual 10gb line cards and your vPC links and peer-links are spread across them. Now, when you lose a line card, you only lose one link and your vPCs and port-channels stay up. The downside to this solution is that it costs money. Enter the classic issue of dealing with a budget, and we dont have any money right now, and can we do this next year, etc, etc, etc.

So what can we do now? Object-tracking.

Basically object tracking allows you to watch interfaces and take action if a certain situation occurs.
Basic object tracking config (also reference page 33 of the guide I have linked to below):

track 1 interface port-channel1 line-protocol
track 2 interface port-channel2 line-protocol

track 5 list boolean OR
object 1
object 2

vpc domain 10
track 5

Awesome. Lets go ahead and implement and test failover. Telling the business you are going to take the core of the data center up and down can sometimes be met with a bit of apprehension. Of course, if you happen to have a pair of 7Ks hanging around in your lab then you are golden. But ain’t no one got budget for that. So you develop your implementation / testing / failover (which inlcudes taking down peer-link, vPC link, etc to validate object tracking is acutally working) plan and go to the business for approval of the change.

Here’s how the conversation goes:

Network Dude: We are asking for a change window to implement something that will make us much more redundant in our Data Center. It can potentially save us from a large scale outage.
Business: OK. What does the change impact?
Network Dude: All traffic that goes through the core of the Data Center.
Business: I don’t understand when you say things like ‘traffic’ and ‘core’.
Network Dude: (sigh) Access to any application, Internet, websites, etc.
Business: WTF?!?!

Now what?
Cisco Nexus Gold Lab

Contact your Cisco SE to hook you up with a time slot for the Cisco Nexus Gold Lab. The purpose of the lab is to learn about the Nexus options and how to config the environment. But you can use it to test object-tracking and take stuff up and down as much as you want.

I went through the lab testing multiple variations of object-tracking. The documentation on the boolean AND / OR wasn’t crystal to me so here is my testing results:

FYI all testing uses the config referenced above while changing the boolean AND/OR

TEST 1 – NO OBJECT TRACKING
7K1
Shut down interfaces on 10gb line card (simulates line card going down). The interfaces include 7K peer link and vPC link going to 5Ks
7K2
N7K-C2-2-pod5 %VPC-2-VPC_SUSP_ALL_VPC: Peer-link going down, suspending all vPCs on secondary
FAIL

TEST 2 – AND / AND
Enable object tracking with boolean AND on 7K1
Enable object tracking with boolean AND on 7K2
7K1
Shut down interfaces on 10gb line card (simulates line card going down). The interfaces include 7K peer link and vPC link going to 5Ks
7K2
N7K-C2-2-pod5 %VPC-2-TRACK_INTFS_DOWN: In domain 10, vPC tracked interfaces down, suspending all vPCs and keep-alive
FAIL

TEST 3 – AND / OR
Enable object tracking with boolean AND on 7K1
Enable object tracking with boolean OR on 7K2
7K1
Shut down interfaces on 10gb line card (simulates line card going down). The interfaces include 7K peer link and vPC link going to 5Ks
7K2
vPC role : secondary, operational primary
7K2 links did not shut down. Now lets test what happens if 7K2 does down. Bringing everything back up to normal.
7K2
Shut down interfaces on 10gb line card (simulates line card going down). The interfaces include 7K peer link and vPC link going to 5Ks
7K1
N7K-C2-1-pod5 %VPC-2-TRACK_INTFS_DOWN: In domain 1, vPC tracked interfaces down, suspending all vPCs and keep-alive
FAIL

TEST 4 – OR / OR
Enable object tracking with boolean OR on 7K1
Enable object tracking with boolean OR on 7K2
7K1
Shut down interfaces on 10gb line card (simulates line card going down). The interfaces include 7K peer link and vPC link going to 5Ks
7K2
vPC role : secondary, operational primary
vPC status
———————————————————————-
id Port Status Consistency Reason Active vlans
– —- —— ———– —— ————
20 Po20 up success success 1-4

7K2 links did not shut down. Now lets test what happens if 7K2 does down. Bringing everything back up to normal.
7K2
Shut down interfaces on 10gb line card (simulates line card going down). The interfaces include 7K peer link and vPC link going to 5Ks
7K1
vPC keep-alive status : Suspended (Destination IP not reachable)
vPC role : primary
vPC status
———————————————————————-
id Port Status Consistency Reason Active vlans
– —- —— ———– —— ————
20 Po20 up success success 1-4

BOOM!
Simulating 10gb line card going down on either 7K and connectivity to the 5Ks stays up!

I also tested the following under the OR / OR scenario:
Take down Peer Link on 7K1
Take down Peer Link on 7K2
Take down vPC link from 7K1 to 5K
Take down vPC link from 7K2 to 5K

Failover was successful for each of these tests.


Object tracking is not designed to be the key in your massive fail-proof design. It will most likely not satisfy every single fail condition that could occur. Generally, It is better to go with redundant hardware. Push for redundant 10gb cards in each 7K. However, object tracking certainly can be used as the poor mans solution. :)

Cisco vPC guide:

http://www.cisco.com/en/US/docs/switches/datacenter/sw/design/vpc_design/vpc_best_practices_design_guide.pdf

2 thoughts on “N7K vPC limitation with single 10GB line card – what to do?

  1. I’m in the same boat with a single 10G line card. There wasn’t enough money to buy redundancy. This twice in my career management went with the product they couldn’t afford.

    I have been waiting for it to fail so I have more justification for redundancy.

  2. Joe – I know when my org originally looked a redundant 10G line cards, they said no due to price as well. This was a few years ago. Today those M1 10G cards are half the price they were a few years ago. Maybe the price is right now for your mgmt/budget!

Leave a Reply