PXE server: fix WinPE re-image SMB connection loss

WinPE clients re-imaging the same machine hit "System error 53 -
network path not found" on the second attempt. systemctl restart smbd
did not help; only a full server power cycle cleared the state.

Root cause is kernel nf_conntrack: the default TCP ESTABLISHED timeout
is 5 days (432000s), so a session from the first WinPE run whose
client rebooted abnormally leaves an ASSURED ESTABLISHED entry that
ufw's state-tracking rules then mis-classify the new SYN against.

Fix applied in three layers:
- /etc/sysctl.d/99-pxe-conntrack.conf drops TCP ESTABLISHED timeout
  to 1 hour and shortens the half-closed states to 30s each.
- smb.conf gains socket options TCP_NODELAY SO_KEEPALIVE IPTOS_LOWDELAY
  plus keepalive = 30 and deadtime = 5. Active sessions refresh the
  conntrack timer every 30s via keepalives so they never age out;
  dead ones expire in an hour.
- /usr/local/sbin/smb-diag.sh snapshots kernel + Samba state for
  remote diagnosis; /usr/local/sbin/smb-soft-reset.sh walks a
  progressive recovery (nmbd/smbd restart, conntrack flush, arp
  flush, ss -K) as an alternative to power-cycling.

conntrack package added to download-packages.sh and playbook verify
list so the offline .deb bundle ships with it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
cproudlock
2026-04-11 13:00:43 -04:00
parent ee7d3bad66
commit 18537acbbc
5 changed files with 223 additions and 0 deletions

View File

@@ -17,6 +17,7 @@
- cron
- ansible
- wimtools
- conntrack
register: pkg_check
failed_when: false
changed_when: false
@@ -364,6 +365,23 @@
wide links = yes
unix extensions = no
- name: "Samba SMB session handling for WinPE re-image robustness"
blockinfile:
path: /etc/samba/smb.conf
backup: yes
marker: "# {mark} MANAGED - PXE REIMAGE FIX"
insertafter: "# END MANAGED - GLOBAL SYMLINKS"
block: |
# Reduce the chance a WinPE client rebooting mid-imaging leaves a
# stale session on the server that blocks its next connection
# attempt with "System error 53 network path not found". Combined
# with /etc/sysctl.d/99-pxe-conntrack.conf (shorter nf_conntrack
# TCP timeouts) this keeps the conntrack + smbd state in sync with
# the short-lived flows that PXE imaging produces.
socket options = TCP_NODELAY SO_KEEPALIVE IPTOS_LOWDELAY
keepalive = 30
deadtime = 5
- name: "Configure Samba shares"
blockinfile:
path: /etc/samba/smb.conf
@@ -427,6 +445,22 @@
executable: /bin/bash
changed_when: false
- name: "Deploy nf_conntrack TCP timeout sysctl for PXE workload"
copy:
src: "{{ usb_mount }}/pxe-server-helpers/99-pxe-conntrack.conf"
dest: /etc/sysctl.d/99-pxe-conntrack.conf
mode: '0644'
notify: reload sysctl
- name: "Deploy SMB diagnostic + soft-reset helper scripts"
copy:
src: "{{ usb_mount }}/pxe-server-helpers/{{ item }}"
dest: "/usr/local/sbin/{{ item }}"
mode: '0755'
loop:
- smb-diag.sh
- smb-soft-reset.sh
- name: "Create image-type top-level directories"
file:
path: "{{ samba_share }}/{{ item }}"
@@ -784,3 +818,6 @@
handlers:
- name: "Apply netplan"
command: netplan apply
- name: "reload sysctl"
command: sysctl --system