Stuck routing a vlan over Same interface with netplan in Ubuntu


Recommended Posts

I have a kubernetes server running ubuntu 20.04, using a 10Gbit link to connect to my switch.  It has an ip address of 192.168.4.x, and the switch is an access port for vlan 4 (.4.).   This works great, but All my NFS traffic goes across my firewall, bogging it down.  I'd like to  move NFS traffic off to 192.16.1.x, leaving 4.x for kubernetes traffic only.  If it helps, I'm using BGP as well... here are my subnets:

 

Default Lan: 192.168.1.0/24
Kubernetes hosts: 192.168.4.0/27
Kubernetes pods: 192.168.50.0/24

 
Storage: 192.168.1.110
Kubernetes host: 192.168.4.10

I am using BGP courtesy of metallb to advertise 192.168.5.0/24 out to the rest of my network via 192.168.4.10.   I'd like to maintain this, while having only storage-nfs traffic go over a 192.168.1.x address to the NFS server at 192.168.1.110.  

I am having trouble wrapping my head around how to do this with netplan, would something like this work?

 

network:
  version: 2
  renderer: networkd
  ethernets:
    enp3s0f0:
      dhcp4: no
      addresses: [192.168.1.112/24]
      gateway4: 192.168.1.1
      nameservers:
          addresses: ["1.1.1.1", "8.8.8.8"]
  vlan4:
      id: 4
      link:  enp3s0f0
      dhcp4: no
      addresses: [192.168.4.10/27]
      routes:
        - to: 192.168.4.0/27
          via: 192.168.4.1
          on-link: true


?  If yes, would I then need to also add a static route in my firewall to tell everything else that if it wants to talk to 192.168.4.10, it needs to go to 192.168.1.112 (the Ip assigned to the non-tagged part of the interface)?  or would i need to do something like this:

 

network:
  version: 2
  renderer: networkd
  ethernets:
    enp3s0f0:
      dhcp4: no
      addresses: [192.168.1.112/24]
      gateway4: 192.168.1.1
      nameservers:
          addresses: ["1.1.1.1", "8.8.8.8"]
      routes:
             - to: 192.168.1.0/24
                 via: 192.168.1.1
                 table: 101
            routing-policy:
             - from: 192.168.1.0/24
                 table: 101
  vlan4:
      id: 4
      link:  enp3s0f0
      dhcp4: no
      addresses: [192.168.4.10/27]
      gateway4: 192.168.4.1
      routes:
       - to: 0.0.0.0/0
         via: 9.9.9.9
          on-link: true

?

Basically I'm trying to have the vlan 4 interface be the default route and have all inbound/outbound traffic go over it, with the untagged part of the interface be only for storage traffic (NFS), since .1 is a protected vlan I don't think i can assign that to a vlan interface.  These two IPs will exist on the same physical 10Gb interface.     Thanks in advance!

Not exactly sure what your trying to accomplish here.. Oh I get not wanting nfs traffic to route.. So you want a san (storage area network)..

 

So create a san - not sure what your trying to do with routing.. If .110 and .112 are in the same L2.. Then they would use that address and interface to talk to the other IP in their network.

 

If you want this L2 your using to be native (untagged) that is fine.. You just need to make sure that .110 and .112 are in the same L2.   And this same network can also carry your tagged vlan 4 traffic..  There is no reason to route this 192.168.1 network for devices directly attached to it.

 

Normally when setting up a san, you would not set a default gw on this interface on the devices.  If you want other other networks to talk to the 192.168.1 network that are not actual members of the 192.168.1 network - then you could setup up routes on the host they want to talk to telling them how to get back to the source network.

 

But normally you would just talk to the host that is multihomed via its non san IP.. So that san would be left only isolated and not have any way to route to or from..

 

I have a somewhat related setup... My PC normal network is 192.168.9/24 and my NAS is also on this 192.168.9 network... But this is limited to gig..  I do not have any switch capable of more than gig interfaces.  But I wanted my pc and nas to talk at 2.5gig.. This was done via simple usb interface added to my pc and my nas.. This uses a different network, or san - 192.168.10/24

 

All of the other networks use the nas 192.168.9 network to talk to it.  But my pc uses the 192.168.10.x address - so all traffic to and from my PC talking to the nas via file sharing protocols use 192.168.10 address.  But when I want to manage the nas, talk to it on other protocols I use the 192.168.9 address.

 

There is no routing at all of the 192.168.10 no gateways on either the pc or the nas for this network.. My router doesn't even know about.. etc..

  • Like 1

@BudMan  Thanks for the reply, yeah basically I'm trying to not route NFS traffic inter-vlan

 

I have the following:
VLAN 1(default) = normal LAN traffic
VLAN 4 = kubernetes traffic   

 

Although irrelevant, Storage/K8s/Firewall all link to switch at 10Gb.     As is right now, any pod (or the K8s Node) doing any NFS traffic mounts a 192.168.1.110:/whatever mount to it, meaning that NFS comes across 192.168.1.x <->192.168.4.x, traversing the firewall.  With the hardware firewall I have (Palo Alto networks PA-3020), this isn't an issue since the ASIC in charge of the dataplane can handle it, but I'm moving to a virtual firewall (Palo Alto VM series), which will rely on a Off-the shelf X86 cpu (Atom C3758).  The atom, while capable of gigabit routing, likely can't handle all the NFS packets going across it, and it sends the 1 core that the dataplane uses up to like 85% usage.  so the goal is to get NFS to travel INTRA-vlan.    As I see it, I have a few options.  

Option 1.) As in original post, set up netplan on kubernetes to try and route over separate interfaces (probably the best option, since 10G can be for NFS and 1G  can be for K8s traffic)  This would also let me configure the nics with an MTU of 9k, since they're the only things talking over it.
Option 2.) Add a VLAN4 IP to the storage appliance, have pods/host mount this.  Keeps traffic Intra-vlan

Option 3.) Move K8s to a 192.168.1.x address. Since Native interface of K8s host was .4, I'd have to redeploy K8s with a .1 address, not to mention doing BGP (for advertising pod addresses) Intra-vlan isn't advised, this might pose an issue.

 

Given that I don't saturate 1Gb with pure-pod traffic, I think option 1 is the easiest, keeping NFS traffic off the firewall, so I think, like your statement said about setting up proper routing, would be some thing like this:
 


network:
  version: 2
  renderer: networkd
  ethernets:
    eno1:         #### 1G link on vlan 4 for Kubernetes traffic
      dhcp4: no
      addresses: [192.168.4.10/27]
      gateway4: 192.168.4.1
      nameservers:
          addresses: ["1.1.1.1", "8.8.8.8"]
   en3s0f0:           #### 10G link on Vlan 1 for NFS traffic
      dhcp4: no
      addresses: [192.168.1.112/24]
      mtu: 9000
      routes:
        - to: 192.168.1.0/24
          via: 192.168.1.1
          on-link: true

 

  On 29/12/2020 at 00:35, SirEvan said:

This would also let me configure the nics with an MTU of 9k, since they're the only things talking over it.

Expand  

You will for sure want to test doing that.. Jumbo doesn't make sense most of the time - depending on what exactly sort of traffic your sending..

 

I leave mine set at standard 1500..  When I bump it to 9k.. Get worse speed..

 

Here with 9k set on nas and windows machine

$ iperf3.exe -c 192.168.10.10 -V
iperf 3.9
CYGWIN_NT-10.0-19042 I5-Win 3.1.6-340.x86_64 2020-07-09 08:20 UTC x86_64
Control connection MSS 8960
Time: Tue, 29 Dec 2020 15:46:09 GMT
Connecting to host 192.168.10.10, port 5201
      Cookie: nucme5bf7bctijjl46sbkp5yuax33vled34f
      TCP MSS: 8960 (default)
[  5] local 192.168.10.9 port 1054 connected to 192.168.10.10 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test, tos 0
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   242 MBytes  2.03 Gbits/sec
[  5]   1.00-2.00   sec   251 MBytes  2.10 Gbits/sec
[  5]   2.00-3.00   sec   250 MBytes  2.09 Gbits/sec
[  5]   3.00-4.00   sec   252 MBytes  2.11 Gbits/sec
[  5]   4.00-5.00   sec   251 MBytes  2.11 Gbits/sec
[  5]   5.00-6.00   sec   253 MBytes  2.13 Gbits/sec
[  5]   6.00-7.00   sec   252 MBytes  2.11 Gbits/sec
[  5]   7.00-8.00   sec   251 MBytes  2.11 Gbits/sec
[  5]   8.00-9.00   sec   250 MBytes  2.10 Gbits/sec
[  5]   9.00-10.00  sec   252 MBytes  2.11 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  2.45 GBytes  2.10 Gbits/sec                  sender
[  5]   0.00-10.01  sec  2.44 GBytes  2.10 Gbits/sec                  receiver
CPU Utilization: local/sender 23.2% (7.7%u/15.4%s), remote/receiver 18.1% (1.2%u/16.8%s)
rcv_tcp_congestion cubic

iperf Done.

 

Here with standard..

$ iperf3.exe -c 192.168.10.10 -V                                                                                
iperf 3.9                                                                                                       
CYGWIN_NT-10.0-19042 I5-Win 3.1.6-340.x86_64 2020-07-09 08:20 UTC x86_64                                        
Control connection MSS 1460                                                                                     
Time: Tue, 29 Dec 2020 15:43:12 GMT                                                                             
Connecting to host 192.168.10.10, port 5201                                                                     
      Cookie: 62tq5fnfv7vfdg65ve35uv3eobnwqqvu6d5g                                                              
      TCP MSS: 1460 (default)                                                                                   
[  5] local 192.168.10.9 port 35558 connected to 192.168.10.10 port 5201                                        
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test, tos 0          
[ ID] Interval           Transfer     Bitrate                                                                   
[  5]   0.00-1.00   sec   267 MBytes  2.24 Gbits/sec                                                            
[  5]   1.00-2.00   sec   282 MBytes  2.36 Gbits/sec                                                            
[  5]   2.00-3.00   sec   282 MBytes  2.36 Gbits/sec                                                            
[  5]   3.00-4.00   sec   280 MBytes  2.35 Gbits/sec                                                            
[  5]   4.00-5.00   sec   283 MBytes  2.38 Gbits/sec                                                            
[  5]   5.00-6.00   sec   282 MBytes  2.36 Gbits/sec                                                            
[  5]   6.00-7.00   sec   281 MBytes  2.36 Gbits/sec                                                            
[  5]   7.00-8.00   sec   281 MBytes  2.36 Gbits/sec                                                            
[  5]   8.00-9.00   sec   282 MBytes  2.37 Gbits/sec                                                            
[  5]   9.00-10.00  sec   278 MBytes  2.33 Gbits/sec                                                            
- - - - - - - - - - - - - - - - - - - - - - - - -                                                               
Test Complete. Summary Results:                                                                                 
[ ID] Interval           Transfer     Bitrate                                                                   
[  5]   0.00-10.00  sec  2.73 GBytes  2.35 Gbits/sec                  sender                                    
[  5]   0.00-10.01  sec  2.73 GBytes  2.34 Gbits/sec                  receiver                                  
CPU Utilization: local/sender 31.8% (11.0%u/20.8%s), remote/receiver 10.8% (0.5%u/10.3%s)                       
rcv_tcp_congestion cubic                                                                                        
                                                                                                                
iperf Done.                                                                                                     

 

I will stick to just standard MTU of 1500..

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Chrome. Because it just works Chrome. Because it just works  
    • I'm curious as to how Apple will marketing it's (lacking) AI-thingy compared to other players in the market. I'm not pro-AI on OS'es, but having practically nothing looks kinda 'sad' to me also.
    • Anthropic cuts off Windsurf's Claude 3.x access: What it means for users by Paul Hill The popular AI-native coding tool, Windsurf, has announced that Anthropic has cut off first-party capacity to its Claude 3 series of models, including Claude 3.5 Sonnet, 3.7 Sonnet, and 3.7 Sonnet Thinking. Until Windsurf can find some capacity to support the demand for these models, it has had to make some short-term changes. One action Windsurf is taking to ease capacity issues is offering a promo rate for Gemini 2.5 Pro of 0.75x credits instead of the usual 1x. Gemini 2.5 Pro is a strong alternative to Claude models for coding, so it could help ease the capacity burden. Additionally, Windsurf has totally removed direct access to the affected Claude models for Free tier users and those trialing the Pro plan. However, you can add your own Claude API key to continue using the model in Windsurf. Claude Sonnet 4 is also available via your own key. Who it affects, and how As a result of the change, users who rely on the Claude 3 series models within Windsurf may experience slower response times or temporary unavailability. As an alternative, users could use the free SWE-1 models or the heavily discounted promo of GPT-4.1. There are other models available for paying customers, too. Users on the Free plan or enjoying a trial of Pro are the most affected by this change is it completely removes first-party capacity, forcing them to create a key and add it manually in Windsurf. This is a big barrier to entry, but some people might be willing to do this as Claude is widely seen as one of the best AI models for coding. The move could be considered a fairly big blow to Windsurf, which was recently in acquisition talks with OpenAI. Given Claude’s reputation as a strong AI for coding, developers could be less likely to use Windsurf now that it doesn’t come with Claude's set and is ready to go on the Free plan. Why it's happening The change came with less than a week’s notice for Windsurf to adapt to the change. While the press release doesn’t disclose the reasons for Anthropic's decision, there is a strong likelihood that it has something to do with OpenAI’s potential acquisition of the IDE. Anthropic and OpenAI were the original leaders competing in the AI race, and Anthropic won’t want to give OpenAI any help if it can help it. The chagrined Windsurf said that it was concerned about Anthropic’s decision and said the move would harm the entire industry, not just Windsurf. It’s unclear what it means by this, as it didn’t elucidate on this thought. Reactions As mentioned earlier, if you have been using Claude models and now feel abandoned by Anthropic and Windsurf, following the latter’s recommendation to use Gemini Pro 2.5 could be a sensible idea. While first-party capacity has been removed, Windsurf is still actively working with other inference providers to restore capacity and full access to the models. Windsurf, while disappointed with Anthropic's move, said the magic of its IDE doesn’t come from the models themselves. Instead, it’s all about the software’s deep contextual understanding, intentional user experience, and unique features like Previews, Deploys, and Reviews. Despite this setback, it will keep trying to deliver “magic.” Given everything, users will now need to decide whether Gemini 2.5 Pro meets their needs or if they need to hunt for a Claude 3 series API key to restore Claude functionality in Windsurf. If you use Windsurf, do not overlook its own model, SWE-1, as it’s also very capable and free to use. This decision by Anthropic highlights the main issue with relying on third parties to provide AI tools that we increasingly rely upon. For businesses like Windsurf, it means they will diversify the models they offer or, as Windsurf has already done, create their own LLMs that they control. For end users, being able to download a language model and run it offline is increasingly becoming easier and ensures users don’t lose access to their favorite models. Windsurf is not the only AI IDE on the scene, and this move could cause problems for it if other players continue to offer Claude models, at least in the short term, while it searches for more capacity. It will also reduce trust between model creators like Anthropic and the companies that rely on the models.
    • Tesla instructor reportedly said staff leave with a 'negative taste in their mouth' by Hamid Ganji Tesla has been making the headlines over the past few months due to Elon Musk's controversy in the Department of Government Efficiency, aka DOGE. People have been marching to the streets, boycotting Tesla, and even setting their already-bought Tesla cars on fire. Tesla temporarily shut down its factory in Austin for the week of Memorial Day, and employees could either take paid time off or attend a series of training sessions. Business Insider now claims to have obtained a recording of the sessions that reveals some interesting details about the Tesla culture and how its employees feel about the company. The Tesla instructor reportedly asked employees to respond if they ever felt "I can't work under these conditions" and were uneasy about the company's constant change. "I know I have," the instructor said. "A lot of people leave this company, and they have kind of a negative taste in their mouth," the Tesla instructor added. "They think: 'Man, it was terrible. It was bad. I got burnt out. I feel like I didn't get anything done, nobody listened to me.'" Hundreds of Tesla employees allegedly attended the meetings, where they were asked to take more responsibility for improving the company's culture. "Leadership has kind of another level of responsibility for trying to guide and direct that culture," the instructor told Tesla staff. "But at the end of the day, it's us as the people on the ground that are the reflection of the culture." Tesla's factory in Austin produces Cybertruck and Model Y. The staff said shutting down the factory for the sake of Memorial Day has been unusual for the company. Elon Musk recently announced that he would leave his position at the White House and added that he'll remain Tesla CEO for another five years. In the meantime, the latest data shows Tesla sales in Europe have dropped 49 percent, and the company's profit in Q1 2025 declined by 71 percent.
  • Recent Achievements

    • Dedicated
      jbatch earned a badge
      Dedicated
    • Week One Done
      Leonard grant earned a badge
      Week One Done
    • One Month Later
      portacnb1 earned a badge
      One Month Later
    • Week One Done
      portacnb1 earned a badge
      Week One Done
    • First Post
      m10d earned a badge
      First Post
  • Popular Contributors

    1. 1
      +primortal
      275
    2. 2
      snowy owl
      158
    3. 3
      +FloatingFatMan
      147
    4. 4
      ATLien_0
      141
    5. 5
      Xenon
      131
  • Tell a friend

    Love Neowin? Tell a friend!