Understanding how Linux ethernet bridge is setup and works (2024)

This article is divided into three parts discussing Linux internals of ethernet bridge

  • Bridge kernel module
  • Adding interface into the bridge
  • Life of packet inside the bridge

Bridge Kernel Module

In the Linux kernel, the bridge is implemented as a kernel module “bridge”.

$ lsmod | grep bridge

The bridge module is not yet inserted, solsmod doesn’t show anything about the bridge module.

brctl, bridge, ipare the utilities to manage the bridge on Linux system.

$ brctl
Usage: brctl [commands]
commands:
addbr <bridge> add bridge
delbr <bridge> delete bridge
addif <bridge> <device> add interface to bridge
delif <bridge> <device> delete interface from bridge
hairpin <bridge> <port> {on|off} turn hairpin on/off
...

Creating a new bridge interface

# brctl addbr br0

This creates a new network interface br0. ip link or brctl show will list the new bridge interface created.

# ip link show dev br0
3: br0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 5e:cd:d7:1c:52:6e brd ff:ff:ff:ff:ff:ff
# brctl show br0
bridge name bridge id STP enabled interfaces
br0 8000.000000000000 no

Now let’s look at lsmod output again

# lsmod | grep bridge
bridge 135168 0
stp 16384 1 bridge
llc 16384 2 bridge,stp

Now we see that a bridge and other related modules are inserted. But what inserted these modules automatically? Let us look into it.

$ lsmod | grep bridge
bridge 135168 0
stp 16384 1 bridge
llc 16384 2 bridge,stp
$ rmmod bridge
$ lsmod | grep bridge
$ strace -e trace=socket,ioctl brctl addbr br0
socket(AF_UNIX, SOCK_STREAM, 0) = 3
bridge name bridge id STP enabled interfaces
ioctl(3, SIOCBRADDBR, "br0") = 0
+++ exited with 0 +++
$ lsmod | grep bridge
bridge 135168 0
stp 16384 1 bridge
llc 16384 2 bridge,stp

Here I have removed the kernel module “bridge” with rmmod and ran strace to see what system calls are triggering the insertion of bridge module. Looks like a socket ioctl with SIOCBRADDBR is inserting the modules into the kernel.

static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
{
...switch (cmd) {
case FIOSETOWN:
...
case SIOCGIFBR:
case SIOCSIFBR:
case SIOCBRADDBR:
case SIOCBRDELBR:
err = -ENOPKG;
if (!br_ioctl_hook)
request_module("bridge");
mutex_lock(&br_ioctl_mutex);
if (br_ioctl_hook)
err = br_ioctl_hook(net, cmd, argp);
mutex_unlock(&br_ioctl_mutex);
break;
case SIOCGIFVLAN:
...
return err;
}

socket ioctlwith any cmd SIOCGIFBR, SIOCSIFBR, SIOBRADDBR, SIOCBRDELBR would have inserted the “bridge” module into the kernel if br_ioctl_hook is null. request_module is a macro for __request_module which takes care of inserting the module.

After the __request_module completed the br_ioctl_hook(net,cmd,argp) is called to create a new bridge interface. So request_module is setting up br_ioctl_hook. Now let us look into bridge kernel module init function to see what all are initialized and how is br_ioctl_hook set up.

static int __init br_init(void)
{
...
err = stp_proto_register(&br_stp_proto);
...
err = br_fdb_init();
...
err = register_pernet_subsys(&br_net_ops);
...
err = br_nf_core_init();
...
err = br_netlink_init();
...
brioctl_set(br_ioctl_deviceless_stub);
...
}
module_init(br_init)
...
MODULE_ALIAS_RTNL_LINK("bridge");
static const struct stp_proto br_stp_proto = {
.rcv= br_stp_rcv,
};

br_init is the bridge kernel module init function. It registers stp_proto_register variable which will handle BPDU (Bridge protocol data unit) frames which contains STP (Spanning tree protocol) information.

int __init br_fdb_init(void)
{
br_fdb_cache = kmem_cache_create("bridge_fdb_cache",
sizeof(struct net_bridge_fdb_entry), 0, SLAB_HWCACHE_ALIGN, NULL);

...
return 0;
}

br_fdb_init allocates a cache for bridge forwarding database. struct net_bridge_fdb_entry is the important data structure which maintains a mapping between struct net_bridge_fdb_key (mac_address and VLAN id) to bridge port. We will see more about this structure later in the article.

static struct pernet_operations br_net_ops = {
.exit= br_net_exit,
};
static void __net_exit br_net_exit(struct net *net)
{
...
for_each_netdev(net, dev)
if (dev->priv_flags & IFF_EBRIDGE)
br_dev_delete(dev, &list);
...
}

register_pernet_subsys registers a network namespace subsystem. Here the br_net_ops as only exit function defined. It removes all devices from this network on exit.

br_nf_core_init initializes the firewall core for the ethernet bridge. br_netlink_init initializes routing Netlink address family and link operations.

void brioctl_set(int (*hook) (struct net *, unsigned int, void __user *))
{
mutex_lock(&br_ioctl_mutex);
br_ioctl_hook = hook;
mutex_unlock(&br_ioctl_mutex);
}
EXPORT_SYMBOL(brioctl_set);

brioctl_set assigns the br_ioctl_deviceless_stub function to br_ioctl_hook which we saw earlier.

int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __user *uarg)
{
switch (cmd) {
...
case SIOCBRADDBR:
case SIOCBRDELBR:
{
...
if (cmd == SIOCBRADDBR)
return br_add_bridge(net, buf);
return br_del_bridge(net, buf);
}
}
return EOPNOTSUPP;
}

So socket ioctlwith cmd SIOCBRADDRD would call br_ioctl_deviceless_stub that would intern call br_add_bridge with struct net* and buffer which contains interface name to be created.

br_add_bridge creates struct net_device (a core network driver layer structure). struct net_device is created for both physical and virtual interfaces. For the NIC (network interface card), the device driver which is responsible to manage the NIC creates the struct net_device and that gets added to the kernel global struct net_device list. In the bridge case, br_dev_setup initializes the struct net_device of the bridge which is called from br_add_bridge.

void br_dev_setup(struct net_device *dev)
{
struct net_bridge *br = netdev_priv(dev);
eth_hw_addr_random(dev);
ether_setup(dev);
dev->netdev_ops = &br_netdev_ops;
dev->needs_free_netdev = true;
dev->ethtool_ops = &br_ethtool_ops;
...
}

eth_hw_addr_random generates random ethernet address(MAC) and assigns to dev->dev_addr.

br_netdev_ops is of type struct net_device_ops which contains all the operations that can be performed on net_device.

br_ethtool_ops is of the type struct ethtool_ops which contains optional device operations. ethtool utility calls these operations to set/get the network device configuration.

$ ethtool -i br0
driver: bridge
version: 2.3

firmware-version: N/A
expansion-rom-version:
bus-info: N/A
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

Adding interface to the bridge

Next, let us see how the Linux handles adding interfaces to bridge. To do this we will first create two pairs of veth (virtual ethernet device) interfaces.

$ip link add veth10 type veth peer name veth20
$ip link add veth30 type veth peer name veth40

The interface can be added to a bridge by iproute2 or brctl utility. iproute2 uses netlink socket and brctl uses ioctl to add an interface to a bridge. Either way, both utilities end up calling br_add_if in bridge module.

$brctl show br0
bridge name bridge id STP enabled interfaces
br0 8000.000000000000 no
$ip link set veth10 master br0
$ip link set veth30 master br0
$brctl show br0
bridge name bridge id STP enabled interfaces
br0 8000.2669427cd774 no veth10
veth30

Let us look into the internals of what happens when the interface is added to the bridge br0.

int br_add_if(struct net_bridge *br, struct net_device *dev,
struct netlink_ext_ack *extack)
{
...
p = new_nbp(br, dev);
...err = kobject_init_and_add(&p->kobj, &brport_ktype, &(dev->dev.kobj),
SYSFS_BRIDGE_PORT_ATTR);
...
err = netdev_rx_handler_register(dev, br_handle_frame, p);
...
}

Some of the important initialization done in br_add_if are

  • Create a bridge port from bridge and net_device
  • Setting up a sysfs entry
  • Registering a bridge handler for receiving packets on net_device

There are three main structures struct net_bridge, struct net_device, struct net_bridge_port in br_add_if. net_bridge is the bridge to which the net_device interface is going to be added. net_bridge_port is the new bridge port created by calling new_nbp. The net_bridge_port contains kobject which is initialized and added under net_device kboject by calling kobject_init_and_add. The macro SYSFS_BRIDGE_PORT_ATTR is brport. We can check this addition under sysfs.

# ls -la /sys/class/net/veth10/brport/
total 0
drwxr-xr-x 2 root root 0 Mar 22 02:05 .
drwxr-xr-x 6 root root 0 Mar 22 02:04 ..
-rw-r--r-- 1 root root 4096 Mar 23 23:31 bpdu_guard
lrwxrwxrwx 1 root root 0 Mar 22 02:10 bridge -> ../../br0
-r--r--r-- 1 root root 4096 Mar 23 23:31 change_ack
-r--r--r-- 1 root root 4096 Mar 23 23:31 config_pending
-r--r--r-- 1 root root 4096 Mar 23 23:31 designated_bridge
-r--r--r-- 1 root root 4096 Mar 23 23:31 designated_cost
...

br_handle_frame is a callback registered in the net_device interface. So that every packet received on this interface is handled by bridge code.

Life of packet inside the birdge

# brctl show br0
bridge name bridge id STP enabled interfaces
br0 8000.865ee85c4139 no veth10
veth30

Let us create a network namespace and move one end of veth pair to into namespace.

#ip netns add ns1
#ip netns add ns2
#ip link set veth20 netns ns1
#ip link set veth40 netns ns2

Now set up interface IP inside the namespaces

#ip link set br0 up
#ip link set veth10 up
#ip link set veth30 up
#ip netns exec ns1 ip link set veth20 up
#ip netns exec ns2 ip link set veth40 up

Assign IP’s to interfaces in the namespace ns1 and ns2

#ip netns exec ns1 ip addr add dev veth20 192.168.56.1/24
#ip netns exec ns2 ip addr add dev veth40 192.168.56.2/24

With the interfaces and IP address setup, we can try to see if there is connectivity.

$ip netns exec ns1 ping -c 1 192.168.56.2
PING 192.168.56.2 (192.168.56.2) 56(84) bytes of data.
--- 192.168.56.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

Here I have run the Ping from the network namespace ns1 to reach the interface on network namespace ns2 . But I don’t see ICMP response. Running a tcpdump on br0 shows that ICMP request reached bridge but there is no ICMP response.

$tcpdump -qnni br0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:04:47.431771 IP 192.168.56.1 > 192.168.56.2: ICMP echo request, id 1573, seq 1, length 64
15:04:52.561271 ARP, Request who-has 192.168.56.2 tell 192.168.56.1, length 28
15:04:52.561315 ARP, Reply 192.168.56.2 is-at 76:35:3c:05:68:ea, length 28

So it seems like the bridge is not forwarding the packet to other port. Will run ftrace on bridge code to see what is happening.

$cd /sys/kernel/debug/tracing
$echo br* > set_ftrace_filter
$echo function_graph > current_tracer
$echo 1 > tracing_on ; ip netns exec ns1 ping -c 1 192.168.56.2 ; echo 0 > tracing_on

Here I have set ftrace filter to show only bridge functions by setting br* . This tracing output shows the last bridge function the packet was handled is inbr_nf_forward_ip .

 1) | br_handle_frame [bridge]() {
1) | br_nf_pre_routing [br_netfilter]() {
... 1) | br_forward [bridge]() {
1) 0.149 us | br_allowed_egress [bridge]();
1) 0.141 us | br_handle_vlan [bridge]();
1) | br_nf_forward_ip [br_netfilter]() {
1) 0.198 us | br_validate_ipv4.isra.30 [br_netfilter]();
1) 0.147 us | brnf_get_logical_dev.isra.27 [br_netfilter]();

1) 7.276 us | }
... 1) + 44.181 us | }

Looking at the br_nf_forward_ip code the packet is handled by NF_INET_FORWARD. If the packet was forwarded we would have seen br_nf_forward_finish in ftrace output. So it means that the packet was dropped in the FORWARD chain.

static unsigned int br_nf_forward_ip(void *priv,
struct sk_buff *skb,
const struct nf_hook_state *state)
{
...

NF_HOOK(pf, NF_INET_FORWARD, state->net, NULL, skb,
brnf_get_logical_dev(skb, state->in),
parent,br_nf_forward_finish);

return NF_STOLEN;
}

So there are two options either we add Iptables rule to allow these packets or completely disable Netfilter calling Iptables. For the sake of simplicity, we will disable Iptables. There is a sysctl to disable this.

$sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 1
$sysctl -w net.bridge.bridge-nf-call-iptables=0
net.bridge.bridge-nf-call-iptables = 0

We will run the same ping command from network namespace again and check if there is an ICMP response.

$ echo 1 > tracing_on ; ip netns exec ns1 ping -c 1 192.168.56.2 ; echo 0 > tracing_on
PING 192.168.56.2 (192.168.56.2) 56(84) bytes of data.
64 bytes from 192.168.56.2: icmp_seq=1 ttl=64 time=0.065 ms
--- 192.168.56.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.065/0.065/0.065/0.000 ms

Now we see the response for ICMP request. The ftrace output for this ICMP request shows the function graph of bridge code. Let us look more into these functions.

 0) | br_handle_frame [bridge]() {
0) 0.203 us | br_nf_pre_routing [br_netfilter]();
0) | br_handle_frame_finish [bridge]() {
0) 0.192 us | br_allowed_ingress [bridge]();
0) 0.962 us | br_fdb_update [bridge]();
0) 0.291 us | br_fdb_find_rcu [bridge]();
0) | br_forward [bridge]() {
0) 0.187 us | br_allowed_egress [bridge]();
0) 0.186 us | br_handle_vlan [bridge]();
0) 0.186 us | br_nf_forward_ip [br_netfilter]();
0) 0.184 us | br_nf_forward_arp [br_netfilter]();
0) | br_forward_finish [bridge]() {
0) 0.713 us | br_nf_post_routing [br_netfilter]();
0) 1.049 us | br_dev_queue_push_xmit [bridge]();
0) 2.391 us | }
0) 4.364 us | }
0) 6.833 us | }
0) 7.761 us | }

br_dev_queue_push_xmit is the last call which forwards the packet to destination interface.

Understanding how Linux ethernet bridge is setup and works (2024)
Top Articles
Latest Posts
Article information

Author: Tuan Roob DDS

Last Updated:

Views: 6411

Rating: 4.1 / 5 (62 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Tuan Roob DDS

Birthday: 1999-11-20

Address: Suite 592 642 Pfannerstill Island, South Keila, LA 74970-3076

Phone: +9617721773649

Job: Marketing Producer

Hobby: Skydiving, Flag Football, Knitting, Running, Lego building, Hunting, Juggling

Introduction: My name is Tuan Roob DDS, I am a friendly, good, energetic, faithful, fantastic, gentle, enchanting person who loves writing and wants to share my knowledge and understanding with you.