Skip to content

Failures to apply L4WFPPROXY policy on Kubernetes nodes #2571

@zaharidichev

Description

@zaharidichev

Hi there, we are working with L4WFPPROXY in order to route traffic to and from a sidecar proxy that runs in a pod on a Windows node. Our workflow is essentially as follows.

We are running this setup on an AKS kubernetes cluster where the nodes are running image version: AKSWindows-2022-containerd-20348.4297.251027

  1. A cni being invoked for every newly created pod
  2. CNI resolves the pod metadata to endpoint ID
  3. A L4WFPPROXY policy is applied to the endpoint in question

The policy that we add looks like:

{
  "FilterTuple": {
    "Protocols": "6"
  },
  "InboundExceptions": {
    "PortExceptions": [
      "4193",
      "4192"
    ]
  },
  "InboundProxyPort": "4143",
  "OutboundProxyPort": "4140",
  "Type": "L4WFPPROXY",
  "UserSID": "S-1-5-20"
}

So from my perspective it looks quite vanilla. The problem that we are facing is that after a certain amount of policies added (~15) we start experiencing problems where despite the fact that the API call succeeds, the policy application fails. So it would work for the first 10 pods or so and then it would start randomly failing. This error is observed when scaling a single workload to multiple replicas. So there is nothing apparently different between these workloads/endpoints. The behavior is isolated to a single node.

If I run Get-WinEvent -ProviderName "Microsoft-Windows-Host-Network-Service" | Where-Object { $_.Message -like "*HNS-Policy-Apply*" } | ConvertTo-Json -Depth 20 I would get the following for a successful policy application:

{
  "Id": 1057,
  "Version": 0,
  "Qualifiers": null,
  "Level": 4,
  "Task": 0,
  "Opcode": 0,
  "Keywords": -9223372036854775808,
  "RecordId": 41,
  "ProviderName": "Microsoft-Windows-Host-Network-Service",
  "ProviderId": "0c885e0d-6eb6-476c-a048-2457eed3a5c1",
  "LogName": "Microsoft-Windows-Host-Network-Service-Admin",
  "ProcessId": 3500,
  "ThreadId": 5732,
  "MachineName": "akswin000000",
  "UserId": {
    "BinaryLength": 12,
    "AccountDomainSid": null,
    "Value": "S-1-5-18"
  },
  "TimeCreated": "/Date(1765202244020)/",
  "ActivityId": null,
  "RelatedActivityId": null,
  "ContainerLog": "Microsoft-Windows-Host-Network-Service-Admin",
  "MatchedQueryIds": [],
  "Bookmark": {},
  "LevelDisplayName": "Information",
  "OpcodeDisplayName": "Info",
  "TaskDisplayName": null,
  "KeywordsDisplayNames": [],
  "Properties": [
    {
      "Value": "HNS-Policy-Apply"
    },
    {
      "Value": "ba96c595-9544-4b97-9195-a163bc92818e"
    },
    {
      "Value": "358bad56-900f-4dc7-8f12-9e9009279bdc"
    },
    {
      "Value": 17
    },
    {
      "Value": 0
    }
  ],
  "Message": "HNS-Policy-Apply :- \r\n Endpoint id = '{ba96c595-9544-4b97-9195-a163bc92818e}'.\r\n  Network id = '{358bad56-900f-4dc7-8f12-9e9009279bdc}'.\r\n  Policy type = 'L4WFPPROXY'.\r\n  Result code = '0x0'."
}

But often times I would get:

{
  "Id": 1056,
  "Version": 0,
  "Qualifiers": null,
  "Level": 2,
  "Task": 0,
  "Opcode": 0,
  "Keywords": -9223372036854775808,
  "RecordId": 3863,
  "ProviderName": "Microsoft-Windows-Host-Network-Service",
  "ProviderId": "0c885e0d-6eb6-476c-a048-2457eed3a5c1",
  "LogName": "Microsoft-Windows-Host-Network-Service-Admin",
  "ProcessId": 3500,
  "ThreadId": 972,
  "MachineName": "akswin000000",
  "UserId": {
    "BinaryLength": 12,
    "AccountDomainSid": null,
    "Value": "S-1-5-18"
  },
  "TimeCreated": "/Date(1765204269427)/",
  "ActivityId": null,
  "RelatedActivityId": null,
  "ContainerLog": "Microsoft-Windows-Host-Network-Service-Admin",
  "MatchedQueryIds": [],
  "Bookmark": {},
  "LevelDisplayName": "Error",
  "OpcodeDisplayName": "Info",
  "TaskDisplayName": null,
  "KeywordsDisplayNames": [],
  "Properties": [
    {
      "Value": "HNS-Policy-Apply"
    },
    {
      "Value": "e12c4dae-567b-459d-948a-4b7dc72b119b"
    },
    {
      "Value": "358bad56-900f-4dc7-8f12-9e9009279bdc"
    },
    {
      "Value": 17
    },
    {
      "Value": -2144206793
    }
  ],
  "Message": "HNS-Policy-Apply :- \r\n Endpoint id = '{e12c4dae-567b-459d-948a-4b7dc72b119b}'.\r\n  Network id = '{358bad56-900f-4dc7-8f12-9e9009279bdc}'.\r\n  Policy type = 'L4WFPPROXY'.\r\n  Result code = '0x80320037'."
}

I wonder whether this is a problem in my configuration, the way I am applying the policy, or I am seeing some internal threshold being hit? I would expect that one would be able to apply more than 15 policies of that type. Any pointers would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions