Feb 25

QOS on the 4500-E with IOS XE is different from the older 4500s.

For years I have configured QOS on Cisco switches. The 4500’s and 6500’s always caused me the most frustration. Depending on the line card, you may have 2 or 4 hardware queues with the Priority queue different on each platform. The 4500’s and 6500’s are different then the other Cisco platforms.

I recently had the privilege to setup a new 4506-E running IOS-XE 3.3.0X0(15.1(1)XO). For the most part, this switch was very similar to the older 4500’s. Most of my configuration from the other 4500’s easily pasted into the switch. I was doing well until I got to the QOS portion of the config. I found that with the exception of the Marking policy, nothing else worked.

What did not work?
1. Trust DSCP commands on the uplink ports
2. Egress queueing, mapping COS value to the hardware queue
3. COS To DSCP mapping
4. Selecting the priority queue, what queue was the priority queue?

After some more digging I found out that the 4500 trust DSCP and COS values by default. This explains why the commands would not work. This in itself may cause new challenges to you. If you are not careful, an end user or application could put all of their traffic in the EF queue and use up all of your priority bandwidth. To resolve this challenge, I mark all traffic coming in from an edge port. How do you handle this?

Egress queuing, this is done with a policy map just like a router. The switch comes with 8 hardware queues. (I’m very happy to hear that Cisco finally added 8 hardware queues per port on their switches. Other vendors have been doing this for years.) In your policy map you identify what queue you want to be the priority queue, then under each class you can specify the bandwidth you want to give that queue. For more information on how to configure the policy map, please refer to the Cisco IOS XE Documentation You do need to be careful while going through this guide. The IOS XE software works on routers too. You may find documentation that only applies to ASR routers, but does not work on the 4500 platform.

COS to DSCP mappings may not be needed. I mark all of my traffic at the edge port with DSCP values. I mark DSCP so I do not need to worry about the COS value being dropped going over an access port. With IOS XE, the outbound queuing policy is capable of queuing egress traffic by DSCP value. Due to this, I don’t have to worry about the COS to DSCP or DSCP to COS mappings.

Now for the priority queue and mapping QOS markings to a hardware queue. This has been a major frustration for me due to the different capabilities and commands on the variety of Cisco platforms. To me, this has been no different then if every platform was a different vendors equipment. Due to the challenges in the past, I was really getting upset when I wasn’t able to find any documentation on how to perform this mapping. After speaking with my Cisco SE, I found out that IOS XE will automatically place the different classes in your policy map into different hardware queues. The specific queue for the priority traffic is GONE!!! The unique commands of allocating QOS markings to hardware queue is GONE!!!

Even though this new IOS was frustrating at first, I believe the changes in IOS regarding QOS is a drastic improvement. The old method was more difficult then I feel it should be. As long as you understand Cisco’s MQC logic, the change in Trust and automatic queuing methods, I believe you will find this IOS much better to work with. Do you agree?

Other then Marking and queuing, what other changes have you notices in this new version of software?
Are you happy with the newer IOS XE on the 4500?

Nov 26

N7K vPC limitation with single 10GB line card – what to do?

THE SETUP

Dual 7K (with dual sup and single 10gb line card such as N7K-M132XP-12 in each 7K)

NexusEnv

vPC domain between 7Ks and 5Ks
– Physically connected on single 10gb line card
– vPC 2

Peer-link beetween 7Ks
– Physically connected on single 10gb line card
– vPC 1

Keep alive link between 7Ks
– Physically connected on separate line card

Dual 5K
– Peer-link between 5Ks
– Keep alive link between 5Ks

Pretty typical data center deployment of Nexus environment.


Here’s the drag. Peer link on 7Ks is on the same 10gb card as the links to the 5Ks. Keep alive link is on a different line card. If the 10gb line card fails on 7K1 (primary role for vPC), 7K2 will lose its peer-link to 7K1. This is where you want 7K2 to take over. No dice. Keep alive link is still up, so 7K2 thinks that 7K1 is still up and active. Since 7K2 lost its peer-link, it shuts down all it vPCs to avoid split brain scenario, including vPC to the 5Ks. So now you have lost complete connectivity to 5Ks and anything else on vPCs connected to 7Ks. Bad day when this happens.

What to do? Technically, the easiest solution is to make sure you have dual 10gb line cards and your vPC links and peer-links are spread across them. Now, when you lose a line card, you only lose one link and your vPCs and port-channels stay up. The downside to this solution is that it costs money. Enter the classic issue of dealing with a budget, and we dont have any money right now, and can we do this next year, etc, etc, etc.

So what can we do now? Object-tracking.

Basically object tracking allows you to watch interfaces and take action if a certain situation occurs.
Basic object tracking config (also reference page 33 of the guide I have linked to below):

track 1 interface port-channel1 line-protocol
track 2 interface port-channel2 line-protocol

track 5 list boolean OR
object 1
object 2

vpc domain 10
track 5

Awesome. Lets go ahead and implement and test failover. Telling the business you are going to take the core of the data center up and down can sometimes be met with a bit of apprehension. Of course, if you happen to have a pair of 7Ks hanging around in your lab then you are golden. But ain’t no one got budget for that. So you develop your implementation / testing / failover (which inlcudes taking down peer-link, vPC link, etc to validate object tracking is acutally working) plan and go to the business for approval of the change.

Here’s how the conversation goes:

Network Dude: We are asking for a change window to implement something that will make us much more redundant in our Data Center. It can potentially save us from a large scale outage.
Business: OK. What does the change impact?
Network Dude: All traffic that goes through the core of the Data Center.
Business: I don’t understand when you say things like ‘traffic’ and ‘core’.
Network Dude: (sigh) Access to any application, Internet, websites, etc.
Business: WTF?!?!

Now what?
Cisco Nexus Gold Lab

Contact your Cisco SE to hook you up with a time slot for the Cisco Nexus Gold Lab. The purpose of the lab is to learn about the Nexus options and how to config the environment. But you can use it to test object-tracking and take stuff up and down as much as you want.

I went through the lab testing multiple variations of object-tracking. The documentation on the boolean AND / OR wasn’t crystal to me so here is my testing results:

FYI all testing uses the config referenced above while changing the boolean AND/OR

TEST 1 – NO OBJECT TRACKING
7K1
Shut down interfaces on 10gb line card (simulates line card going down). The interfaces include 7K peer link and vPC link going to 5Ks
7K2
N7K-C2-2-pod5 %VPC-2-VPC_SUSP_ALL_VPC: Peer-link going down, suspending all vPCs on secondary
FAIL

TEST 2 – AND / AND
Enable object tracking with boolean AND on 7K1
Enable object tracking with boolean AND on 7K2
7K1
Shut down interfaces on 10gb line card (simulates line card going down). The interfaces include 7K peer link and vPC link going to 5Ks
7K2
N7K-C2-2-pod5 %VPC-2-TRACK_INTFS_DOWN: In domain 10, vPC tracked interfaces down, suspending all vPCs and keep-alive
FAIL

TEST 3 – AND / OR
Enable object tracking with boolean AND on 7K1
Enable object tracking with boolean OR on 7K2
7K1
Shut down interfaces on 10gb line card (simulates line card going down). The interfaces include 7K peer link and vPC link going to 5Ks
7K2
vPC role : secondary, operational primary
7K2 links did not shut down. Now lets test what happens if 7K2 does down. Bringing everything back up to normal.
7K2
Shut down interfaces on 10gb line card (simulates line card going down). The interfaces include 7K peer link and vPC link going to 5Ks
7K1
N7K-C2-1-pod5 %VPC-2-TRACK_INTFS_DOWN: In domain 1, vPC tracked interfaces down, suspending all vPCs and keep-alive
FAIL

TEST 4 – OR / OR
Enable object tracking with boolean OR on 7K1
Enable object tracking with boolean OR on 7K2
7K1
Shut down interfaces on 10gb line card (simulates line card going down). The interfaces include 7K peer link and vPC link going to 5Ks
7K2
vPC role : secondary, operational primary
vPC status
———————————————————————-
id Port Status Consistency Reason Active vlans
— —- —— ———– —— ————
20 Po20 up success success 1-4

7K2 links did not shut down. Now lets test what happens if 7K2 does down. Bringing everything back up to normal.
7K2
Shut down interfaces on 10gb line card (simulates line card going down). The interfaces include 7K peer link and vPC link going to 5Ks
7K1
vPC keep-alive status : Suspended (Destination IP not reachable)
vPC role : primary
vPC status
———————————————————————-
id Port Status Consistency Reason Active vlans
— —- —— ———– —— ————
20 Po20 up success success 1-4

BOOM!
Simulating 10gb line card going down on either 7K and connectivity to the 5Ks stays up!

I also tested the following under the OR / OR scenario:
Take down Peer Link on 7K1
Take down Peer Link on 7K2
Take down vPC link from 7K1 to 5K
Take down vPC link from 7K2 to 5K

Failover was successful for each of these tests.


Object tracking is not designed to be the key in your massive fail-proof design. It will most likely not satisfy every single fail condition that could occur. Generally, It is better to go with redundant hardware. Push for redundant 10gb cards in each 7K. However, object tracking certainly can be used as the poor mans solution. 🙂

Cisco vPC guide:

http://www.cisco.com/en/US/docs/switches/datacenter/sw/design/vpc_design/vpc_best_practices_design_guide.pdf