Chasing “Packet Drops” Down the Proxmox Rabbit Hole

A troubleshooting tour through Linux bridges, Proxmox vmbr interfaces, and the moment “drops” stopped meaning “packet loss”.

 

This post is a real troubleshooting walkthrough from my homelab. I’m running Proxmox on a host called Athena and noticed scary-looking dropped packet counters on a Linux bridge (vmbr0). What followed was a surprisingly deep dive into how Proxmox networking works under the hood: Linux bridges, bridge netfilter, per-interface counters, and a set of proprietary Layer-2 frames emitted by my FritzBox that Linux mostly ignores (but still counts as “dropped”).

I’m keeping this conversational and command-heavy, because this got solved the only way these things get solved: one fact at a time, measured on the system.


The scene: Proxmox bridges in plain terms

On Proxmox, a vmbrX interface is usually a Linux bridge:

  • A physical NIC (for example enp1s0) can be a port on that bridge.
  • VM interfaces show up as tap… devices and are also ports on that bridge.
  • The bridge can have an IP itself (host management), but it can also be “pure Layer-2”.

In my setup (as seen on Athena):

  • vmbr0 = Home LAN (FritzBox network)
  • vmbr10 = VLAN 10 (management network)
  • vmbr30 = VLAN 30 (lab network via VLAN subinterface)

 Overview of the whole interface, bridge, firewall, routing configuration


Step 1: First look at interface counters

I started with interface statistics:

ip -s link

One of the snippets for vmbr10 looked healthy: link up, lots of traffic, no drops or errors.

7: vmbr10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:d0:b4:04:18:e5 brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast
     451790407 6750890      0       0       0 5213717
    TX:  bytes packets errors dropped carrier collsns
     213105192 1275126      0       0       0       0

Then I saw this on vmbr0:

6: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:d0:b4:04:18:e4 brd ff:ff:ff:ff:ff:ff
    RX:  bytes  packets errors dropped  missed   mcast
    2180069459 19289944      0 9604355       0 3048514
    TX:  bytes  packets errors dropped carrier collsns
    1521600467  4101828      0       0       0       0

9.6 million dropped packets. That number was big enough to trigger the usual instinct: “Something is broken.”

But Linux counters (especially on bridges) can be deceptive if you don’t first answer a more basic question:

Where are the drops happening? On the physical NIC? On the bridge logic? On a VM tap? Or are we looking at frames that were never meant to be forwarded in the first place?


Step 2: Confirm the bridge IP layout (dual-homed host warning)

Before treating drops as a performance issue, I wanted to understand the topology properly. So I checked whether Athena has IP addresses on these bridges:

ip addr show dev vmbr0
ip addr show dev vmbr10

Output on Athena:

6: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:d0:b4:04:18:e4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.178.50/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::2d0:b4ff:fe04:18e4/64 scope link
       valid_lft forever preferred_lft forever

7: vmbr10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:d0:b4:04:18:e5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.2/24 scope global vmbr10
       valid_lft forever preferred_lft forever
    inet6 fe80::2d0:b4ff:fe04:18e5/64 scope link
       valid_lft forever preferred_lft forever

This is important: Athena is a Layer-3 endpoint in both networks:

  • Home LAN: 192.168.178.50/24 on vmbr0
  • VLAN 10: 192.168.10.2/24 on vmbr10

That’s not automatically wrong, but it raises an immediate “control” question:

Could Athena route between Home LAN and VLAN10 and bypass the intended firewall segmentation?


Step 3: Prove Athena is not routing between networks

First check: IPv4 forwarding.

sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 0

Second check: IPv6 forwarding (because IPv6 can surprise you even if IPv4 is locked down).

sysctl net.ipv6.conf.all.forwarding
net.ipv6.conf.all.forwarding = 0

Conclusion: Athena is not an IPv4 or IPv6 router. Good. That makes the topology safer, and it also means the “drops” investigation can stay focused on switching/bridging behavior rather than routing behavior.


Step 4: Is it a physical NIC problem or a bridge problem?

The first isolation step was simple: compare vmbr0 with the physical NIC that feeds it (enp1s0).

ip -s link show dev vmbr0
ip -s link show dev enp1s0

At that time the counters looked like this:

vmbr0:
    RX ... dropped  9604355

enp1s0:
    RX ... dropped   317885

So there were some drops counted on the NIC, but the bridge drop counter was massively larger. That suggested: even if the NIC was involved, a lot of the story was happening inside the bridge logic.

Next question: are these “real drops” (driver ring overruns, missed buffers) or “logical drops” (frames the bridge discards because they aren’t forwardable or relevant)?


Step 5: Driver stats (ethtool) say “no real drops”

For a real NIC problem, the driver counters usually show it. So I checked:

ethtool -S enp1s0 | egrep -i 'drop|dropped|miss|err|fifo|over|no_buf|buffer|timeout|discard'

Relevant output:

rx_crc_errors: 0
rx_no_buffer_count: 0
rx_missed_errors: 0
rx_over_errors: 0
rx_fifo_errors: 0
rx_queue_0_drops: 0
rx_queue_1_drops: 0
rx_queue_2_drops: 0
rx_queue_3_drops: 0
tx_dropped: 0

No ring overruns. No FIFO errors. No missed buffers. No classic hardware receive drop signature.

That pushed the investigation away from “bad NIC” and toward “Linux bridge/netfilter path is doing something”.


Step 6: Surprise culprit — bridge netfilter is enabled

Linux can feed bridged frames into netfilter (iptables/ip6tables) using “bridge-nf”. In setups where the firewall is a dedicated VM/router (OPNsense), bridge-nf is often unnecessary overhead.

I checked the sysctls:

sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.bridge.bridge-nf-call-arptables 2>/dev/null

Output:

net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-arptables = 0

So bridged IPv4 and IPv6 frames were being pushed through netfilter. Next question: do I even have meaningful netfilter rules on the Proxmox host?

I checked nftables:

nft list ruleset | head

Output was empty.

Interpretation: I had bridge-nf enabled, but I wasn’t actually using nftables on the host. That’s a solid reason to disable bridge-nf unless you explicitly want host-level filtering of bridged traffic.


Step 7: Disable bridge-nf (temporarily) and measure the effect

First, I disabled it temporarily (no reboot):

sysctl -w net.bridge.bridge-nf-call-iptables=0
sysctl -w net.bridge.bridge-nf-call-ip6tables=0

Then I measured vmbr0 before/after a short time window:

ip -s link show dev vmbr0
sleep 120
ip -s link show dev vmbr0

My numbers after the change looked like this (showing the important parts):

Right after change:
    RX packets 20152543  dropped 10280787

After 120 seconds:
    RX packets 20152838  dropped 10280947

That’s +295 packets and only +160 drops across 120 seconds — a massive improvement compared to the earlier runaway drop counter growth.

At this stage, the conclusion was clear: bridge netfilter was a major contributor to the drop behavior and overhead on vmbr0.


Step 8: Persist the change (so it survives reboot)

To make it permanent, the typical Debian/Proxmox approach is a sysctl.d file.

Create/open:

nano /etc/sysctl.d/99-bridge-nf.conf

Put this content into the file:

net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-arptables = 0

Then apply:

sysctl --system

At this point, “bridge-nf is off” became part of the stable configuration.


Step 9: “But I still see drops!” — why percentage math can be misleading on bridges

Even after the improvement, I still saw drops in small measurement windows. I measured a 10-second delta and got a scary looking ratio:

10s delta: rx_packets +17, rx_dropped +10, drop%=58.823529%

That looks catastrophic until you consider a key detail:

On Linux bridges, rx_packets is not “all frames seen on the wire”.

Some frames can arrive on the physical port, be discarded by bridge logic (not forwardable, not deliverable to a tap, control-plane, proprietary ethertypes), and inflate rx_dropped without increasing rx_packets in the way you intuitively expect.

So the next step was to locate where the drops were occurring: physical NIC vs bridge vs VM tap.


Step 10: Pinpoint drops with sysfs counters (vmbr0 vs enp1s0 vs tap)

On vmbr0 I had two relevant ports:

  • enp1s0 (physical)
  • tap101i0 (a VM NIC on vmbr0)

I used sysfs stats directly to get precise per-interface counters in a 10-second window:

bash -lc 'for i in vmbr0 enp1s0 tap101i0; do echo "== $i =="; cat /sys/class/net/$i/statistics/rx_packets /sys/class/net/$i/statistics/rx_dropped; done; sleep 10; echo "--- after 10s ---"; for i in vmbr0 enp1s0 tap101i0; do echo "== $i =="; cat /sys/class/net/$i/statistics/rx_packets /sys/class/net/$i/statistics/rx_dropped; done'

Output (trimmed to the relevant parts):

== vmbr0 ==
20154043
10281439
== enp1s0 ==
628713222
340299
== tap101i0 ==
265801940
0
--- after 10s ---
== vmbr0 ==
20154062
10281451
== enp1s0 ==
628713997
340299
== tap101i0 ==
265802231
0

Compute the deltas:

  • vmbr0: rx_packets +19, rx_dropped +12
  • enp1s0: rx_packets +775, rx_dropped +0
  • tap101i0: rx_packets +291, rx_dropped +0

Interpretation: In that 10-second window, drops were not happening on the NIC and not happening on the VM tap. They were happening at the bridge logic level.

So the next step was obvious: look at what frames are actually present on vmbr0.


Step 11: tcpdump on the bridge reveals the “alien frames”

I captured 200 frames from vmbr0:

timeout 10 tcpdump -eni vmbr0 -c 200

In the capture, the majority of traffic looked normal: LAN clients talking to the internet, plus some multicast (SSDP, etc.). But there was a standout pattern:

  • Frames with Unknown ethertype 0x88e1
  • Frames with Unknown ethertype 0x8912

So I filtered for those ethertypes specifically:

timeout 10 tcpdump -eni vmbr0 'ether proto 0x88e1 or ether proto 0x8912' -c 200

Output (excerpt):

17:28:04.846933 38:10:d5:8a:e0:0c > 00:b0:52:00:00:01, ethertype Unknown (0x88e1), length 60:
17:28:04.847018 38:10:d5:8a:e0:0c > ff:ff:ff:ff:ff:ff, ethertype Unknown (0x88e1), length 60:
17:28:04.847157 38:10:d5:8a:e0:0c > ff:ff:ff:ff:ff:ff, ethertype Unknown (0x8912), length 60:

... repeats every ~2 seconds ...

15 packets captured
15 packets received by filter
0 packets dropped by kernel

At this point two facts were crucial:

  • 38:10:d5:8a:e0:0c is the FritzBox MAC (confirmed).
  • The destination 00:b0:52:00:00:01 did not look like a normal host MAC from my IPAM records.

So I checked whether the bridge ever learned that MAC address (FDB).

bridge fdb show | grep -i '00:b0:52:00:00:01'

Output was empty.

Interpretation: That destination MAC is not a “real learned host” behind a port. It’s most likely a protocol destination MAC used by the FritzBox for proprietary Layer-2 control frames. Linux calls these ethertypes “Unknown” because it doesn’t have a decoder for them, and the bridge drops them because they aren’t forwardable to any local port or relevant to the host stack.

That explains why vmbr0 rx_dropped can climb even when IP traffic is fine.


Step 12: Sanity test — is there real IP loss?

If these drops were actually hurting the network, it would show up in something simple: ping to the gateway.

ping -c 50 192.168.178.1

Output:

--- 192.168.178.1 ping statistics ---
50 packets transmitted, 50 received, 0% packet loss, time 50165ms
rtt min/avg/max/mdev = 0.219/0.287/0.597/0.067 ms

Conclusion: No IP packet loss. Sub-millisecond RTT. The network is healthy.

The big “drop” number was mostly the Linux bridge reporting that it is discarding frames it doesn’t deliver upward or forward — including the proprietary ethertype frames emitted by the FritzBox.


Conclusions and takeaways

1) Disabling bridge netfilter (bridge-nf) was a real improvement

In my setup, Proxmox isn’t meant to be a firewall. OPNsense handles segmentation and rules. With bridge-nf-call-iptables and bridge-nf-call-ip6tables enabled, bridged traffic gets pushed into netfilter paths, adding overhead and inflating counters. Turning it off reduced drop growth dramatically.

2) A large vmbr0 rx_dropped number does not automatically mean “packet loss”

On Linux bridges, “drops” can mean:

  • non-IP proprietary ethertypes discarded
  • frames not forwardable to any bridge port
  • multicast/broadcast/control-plane noise
  • filtering/bridge logic decisions

The correct way to interpret it is to measure real symptoms: ping loss, TCP retransmits, driver stats. Counters alone are not enough.

3) tcpdump was the turning point

Seeing periodic “Unknown ethertype” frames (0x88e1, 0x8912) from the FritzBox made the story coherent: the bridge was dropping frames that were never part of the IP traffic I care about.

4) Better monitoring signals than vmbr0 dropped

If you want meaningful health metrics, watch:

  • ethtool -S errors/drops on the physical NIC
  • ping loss/jitter to gateway and a stable external target
  • TCP retransmits / connection instability
  • CPU load or interrupt saturation during bursts (if suspected)

Quick copy/paste troubleshooting workflow

Start with bridge + NIC counters

ip -s link show dev vmbr0
ip -s link show dev enp1s0
ethtool -S enp1s0 | egrep -i 'drop|miss|err|fifo|over|no_buf|buffer|discard'

Check bridge netfilter sysctls

sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.bridge.bridge-nf-call-arptables 2>/dev/null

If you’re not using host-level bridge filtering, disable bridge-nf

sysctl -w net.bridge.bridge-nf-call-iptables=0
sysctl -w net.bridge.bridge-nf-call-ip6tables=0

Persist the change on Proxmox/Debian

nano /etc/sysctl.d/99-bridge-nf.conf
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-arptables = 0
sysctl --system

When drops look scary: capture and identify “junk frames”

timeout 10 tcpdump -eni vmbr0 -c 200
timeout 10 tcpdump -eni vmbr0 'ether proto 0x88e1 or ether proto 0x8912' -c 200

Sanity check real packet loss

ping -c 50 192.168.178.1

Final note

The key lesson for me wasn’t “drops are meaningless.” The key lesson was: drops are context-dependent. On a Linux bridge, rx_dropped can include frames that were never part of the IP traffic path you care about. If you treat every “drop” counter as packet loss, you’ll spend days fixing nothing.

In my case, the real optimization was disabling unnecessary bridge netfilter. The remaining drops were mostly the bridge discarding proprietary FritzBox Layer-2 frames, proven by tcpdump and disproven as a connectivity issue by a clean 50/50 ping test.