Skip to content

Conversation

@brada4
Copy link

@brada4 brada4 commented Feb 28, 2024

locate offload at the end of slowpath
... use builtin tcpudp filter in place of extra filter ... and directly yield to offload-add kworker

drop invalid asap and avoid further activity on useless packets ... which accidentally simplifies main state dispatch ... so make use of optimized output chain dispatch alternatives depending on global setting

Thanks-to: @CallMeR for tcpudp filter avoidance idea
Thanks-to: forum user kvic for detailed review and suggestions
Discussed: #20
Part-reverts: 19a8caf

Signed-Off-By: Andris PE neandris@gmail.com

locate offload at the end of slowpath
... use builtin tcpudp filter in place of extra filter
... and directly yield to offload-add kworker

drop invalid asap and avoid further activity on useless packets
... which accidentally simplifies main state dispatch
... so make use of optimized dispatch alternatives depending on global settings

Thanks-to: @CallMeR for tcpudp filter avoidance idea
Discussed: openwrt#20

Signed-Off-By: Andris PE <neandris@hmail.com>
@brada4
Copy link
Author

brada4 commented Feb 28, 2024

@jow- diff is identical to #20 , share if any (non-revolutionary) changes can improve it.
Diff visualisation misses logic change:
old:
filter.forward
if offload add flow
dispatch states
new:
filter.forward
if offload
dispatch states diverting to offload chain
else
dispatch states

@brada4
Copy link
Author

brada4 commented Feb 29, 2024

Dropping invalid packets over localhost would be swapping iif lo and ct state in output along removing iif != in new prerouting. I dont feel either way, so I maintained behaviour exactly.

brada4 added 2 commits March 2, 2024 11:45
As in old days, guilty not having idea on splitting state handling
earlier.
No need to consume CPU in default case for unrealistic corner case.
loopback invalid thus better dropped at ease.

Signed-off-by: Andris PE <neandris@gmail.com>
@brada4
Copy link
Author

brada4 commented Jun 3, 2024

@jow- this alters semantics for improved safety discarding invalid (out of state and bad checksum) packets before nat alg helpers.

firstly netfilter doc now has only vmap-y dispatch examples
secondly vmap includes "immediate" action in itself, as opposed to
setting bool in lookup and in separate bytecode insnis doing immediate or full action.
@brada4
Copy link
Author

brada4 commented Jun 6, 2024

@jow- made it vmap, netfilters own examples now has vmaps everywhere....

  • drop invalid early
  • change comments (not meant to obfuscate change)
  • use whole output lines in place of string concatenation, just to avoid superlong lines in code
  • add offload hash on first qualifying packet, not second, so that second packet is already offloaded.
    new connection -> first ack after synack sets hash (used to only set "assured" state and hash set at next packet)
    timer retirement -> hash is restored after mandatory slowpath (it is just state dispatch, not the full filter) for certain, not tried before mandatory slowpath traversal

brada4 added 4 commits June 8, 2024 14:30
Additionally since jump target is terminal no need to preserve callaback
and use goto in place of jump.
Suggested by forum user kvic at
https://forum.openwrt.org/t/first-rule-in-chain-input-output-for-firewall4/204723

Average latency is same, the jitter/distribution is halved, also max
latency conclusively reduced.
@brada4
Copy link
Author

brada4 commented Jul 24, 2024

@jow- hi, got nice pro feedback at https://forum.openwrt.org/t/first-rule-in-chain-input-output-for-firewall4/204723 and implemented best parts, 1 cosmetic 2 improves NAT performance by dozen hairs

@brada4
Copy link
Author

brada4 commented Jul 25, 2024

Also discovered that this adds easy flowtable exception via /e/n.d/ for more fifo-ish behaviour (still to dig up test case)

@jow-
Copy link
Contributor

jow- commented Jul 25, 2024

Should this PR drop commit a625924 since it is partially reverted in 5dc4d82 ?

@brada4
Copy link
Author

brada4 commented Jul 25, 2024

No, it should sray like this short simple.
1k evaluations on a pc totals to about same 7.abit ms for either but vmap version has broader deviation not explainable by any significant cpu consumption or absent in case network load.

@brada4
Copy link
Author

brada4 commented Jul 30, 2024

Yes, default configuration is revert (2 rules swapped tough)

@brada4
Copy link
Author

brada4 commented Mar 28, 2025

Ill split this in 2 pieces - 1/2 handling invalid packets early 2/2 jump-branching offload

@pesa1234
Copy link

Hi @brada4 this patch is still valid with latest update right?

@brada4
Copy link
Author

brada4 commented Mar 31, 2025

It is still valid. if i split it 2nd half has to be heavily re-based
other patches reduce non-mandatory packet examination in kind of default established,related accept adding some throughput to underpowered soc cpus

brada4 added a commit to brada4/firewall4 that referenced this pull request Apr 9, 2025
@feckert 's idea of pre-including rules before loopback
openwrt#55

Remove iif lo check from each packet
Part openwrt#22
Somewhat improving over
a5553da

Signed-off-by:
brada4 added a commit to brada4/firewall4 that referenced this pull request Apr 9, 2025
@feckert 's idea of pre-including rules before loopback
openwrt#55

Remove iif lo check from each packet
Part openwrt#22

Improves: a5553da

Signed-off-by: Andris PE <neandris@gmail.com>
@danpawlik
Copy link

That pull request might help Mediatek CPU device owners when HW offload is enabled.
@jow- , what @brada4 needs to do to make that PR to be merged?

@brada4
Copy link
Author

brada4 commented Apr 17, 2025

I am splitting this in 3 pieces, later today.

@brada4 brada4 changed the title Reorder early state dispatch for quicker outcome [2/3 overriden] Reorder early state dispatch for quicker outcome Apr 27, 2025
@glassd00r
Copy link

glassd00r commented Apr 28, 2025

@brada4 thank you.
my understanding from your posts is that this PR #22 will be replace by 3 separate PR.

  1. PR ruleset: apply loopback in/out rules later #56
  2. PR ruleset: drop invalid packets at mangle priority #59
  3. yet to come

correct?

@CallMeR
Copy link

CallMeR commented Apr 28, 2025

Based on my own usage and observation, I have identified some potential negative impacts (PR #59) :

  • Risk of misjudging legitimate traffic

Scenario: Certain special protocols (such as non-standard NAT traversal, legacy applications) or abnormal network environments (such as incorrect configuration of NAT devices) may cause normal traffic to be mislabeled as "invalid".

Impact: If legitimate traffic (such as UDP connections that are not correctly tracked, packets with failed fragmentation reassembly) is dropped, it may lead to connection interruptions or abnormal service operations.

  • Issues in handling fragmented packets

Mechanism: Conntrack relies on the first fragment to establish the connection state. If the first fragment is lost, subsequent fragments will be regarded as "invalid" and dropped.

Impact: For applications that rely on fragmented packet transmission (such as large file transfers, certain VPN protocols), it may cause performance problems or data loss. However, this is a normal protective behavior (subsequent packets without the initial fragment are inherently abnormal traffic).

@brada4
Copy link
Author

brada4 commented Apr 28, 2025

@CallMeR If you see the diagram conntrack state is classified at -200, last chance to make it valid (or notrack for more obvious usage) was respective raw table, e.g setting back same TTL to fix checksum.

@brada4
Copy link
Author

brada4 commented Apr 28, 2025

@glassd00r yes, thats correct.
3rd patch should be simple

+if offload devs > 0
+ ct state established related goto handle_offload
+else
 ct state established related accept
+endif
... later in file
+ if offload devs >0 
+ chain handle offload
+  add flow ... accept
+ accept

@glassd00r
Copy link

@brada4 sorry if this has been explained before. why break up pr #22 into 3 separate PR?
pros and cons?

@brada4
Copy link
Author

brada4 commented Apr 28, 2025

@glassd00r
Why one patch? It changes same lines over and over again
Bad: it looks messy in totality
Why layer up 3 patches
Good: simple few-line patches are easier to understand.

@glassd00r
Copy link

glassd00r commented Apr 28, 2025

@brada4 cool. will test and feedback after you push part3 of the PR.

same custom filter chains rules?

chain` handle_offload {
  # ct bytes < 1000000 counter accept
  # ct packets < 30 counter accept
  # meta l4proto udp counter accept
  # meta length < 100 counter accept
  # ct direction original counter accept
  # the fw4 offload table follows
  # flow add @ft accept
  # accept
}

@brada4
Copy link
Author

brada4 commented Apr 28, 2025

@glassd00r i remember writing those in forum, yep ill add example in /etc/nftables.d ;-)

@brada4
Copy link
Author

brada4 commented Apr 29, 2025

Last 3rd here:
https://github.com/brada4/firewall4/tree/guard-offload
ruleset.uc includes 2nd 3rd too.

@brada4
Copy link
Author

brada4 commented Apr 30, 2025

I am just trying to concentrate scattered discussions in one place.
https://forum.openwrt.org/t/lets-talk-about-firewall4-nftables/231246

@brada4
Copy link
Author

brada4 commented May 19, 2025

I am closing this
Somebody remember to submit 3rd part in two years
#22 (comment)

@brada4 brada4 closed this May 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants