Home Lab: Mikrotik 10gbe Networking poor performance

Lessons Learned
How did i fix it?
Conclusion

Lessons Learned:

I have been using the Mikrotik CRS326-24S+2Q+ switch/router for approx 6 – 8 months and have been using it exclusively for my home lab. Over the last few months, I have been experiencing timeouts, packet drops and in general network slowness that you wouldn’t expect from a 10gb switch.

Having had some downtime, I thought id investigate.

Without going into too much detail, I discovered the following issues:

  • The CPU was running very high
  • File transfers from one subnet to another were under 1GB
  • File transfers from the same subnet were under 2.5GB
  • The router would randomly reboot

One of the first things id like to highlight is that I find the Mikrotik documentation confusing, and isn’t as straightforward as other vendors, but the software and hardware are sound.

So what did I do wrong or was it the hardware or software…….

Well, to be truth-full, it was me, and my config but, why and how?

First, it was clear my CPU on my switch was being overloaded, these switches have small but efficient CPUs not designed to do lots of work.

After a few hours trying to understand what was happening, it was clear I wasn’t getting any line rate connectivity on my 10gb switches, but why?
After digging into more Wikis, I discovered this:

Mikrotik Bridges are like logical switches within a physical switch, its a means of isolating traffic from other interfaces, I created a bridge per VLAN, and each interface had a logical interface for that VLAN. So, in short, all my VLAN’s were running in software and hitting the CPU.

How did i fix it?

Based on the Mikrotik Wiki I could only have one Mikrotik Bridge with hardware offloading.
Before I made any changes, I backed up my config, removed all the VLAN configuration, removed all unused ports in the bridge configuration and then removed all the bridges.
This left me with a clean slate to work with, I then created a new bridge, added the interface to the bridge in the port configuration. Here I noticed a change before the cleanup none of my ports had H to them. This H means that the port is hardware offloading so any traffic doing L2 should be doing so at line rate, i.e. 10gb!

The other change I made was not to configure the VLANs on the interfaces see example of not how to do it below:


Instead, I created the VLAN on the bridge and added the ports I wished the VLAN to be connected to. After some tests, I was getting transfer speeds of over 8GB, which is a significant increase from the earlier rates, in-addition the CPU on the switch remained at 20% utilisation.

Now that I had L2 connectivity running at near line-rate speed, I needed to look at L3 connectivity, L2 works brilliantly for single-site vSAN and localised NFS. Still, I wanted to connect to services within my LAB environment and use BGP to test NSX etc. for various scenarios.

After more digging, I found out that L3 doesn’t offload on the switch (There an update that’s coming that will change this). This means that any L3 traffic goes via the switches CPU. So I made the decision not to route using the switch and use VyOS to do all my BGP and L3 connectivity. I’ll write up my routing topology on another blog, but i can say VLAN to VLAN communication using the VyOS router and getting 7-8GB throughput.

Conclusion:

I shouldn’t have expected line rate 10gb on L3 and should spent more time reading through and understanding the Mikrotik Doco. I have also discovered some great videos from Mikrotik team. For the money and cost i can recommend these switches and are an awesome product with a great feature set.