WinPE clients re-imaging the same machine hit "System error 53 - network path not found" on the second attempt. systemctl restart smbd did not help; only a full server power cycle cleared the state. Root cause is kernel nf_conntrack: the default TCP ESTABLISHED timeout is 5 days (432000s), so a session from the first WinPE run whose client rebooted abnormally leaves an ASSURED ESTABLISHED entry that ufw's state-tracking rules then mis-classify the new SYN against. Fix applied in three layers: - /etc/sysctl.d/99-pxe-conntrack.conf drops TCP ESTABLISHED timeout to 1 hour and shortens the half-closed states to 30s each. - smb.conf gains socket options TCP_NODELAY SO_KEEPALIVE IPTOS_LOWDELAY plus keepalive = 30 and deadtime = 5. Active sessions refresh the conntrack timer every 30s via keepalives so they never age out; dead ones expire in an hour. - /usr/local/sbin/smb-diag.sh snapshots kernel + Samba state for remote diagnosis; /usr/local/sbin/smb-soft-reset.sh walks a progressive recovery (nmbd/smbd restart, conntrack flush, arp flush, ss -K) as an alternative to power-cycling. conntrack package added to download-packages.sh and playbook verify list so the offline .deb bundle ships with it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
86 lines
2.2 KiB
Bash
Executable File
86 lines
2.2 KiB
Bash
Executable File
#!/bin/bash
|
|
# smb-diag.sh - snapshot Samba + kernel network state so a future failure
|
|
# can be diagnosed remotely. Run this on the PXE server BEFORE power-cycling
|
|
# when a WinPE re-image client is getting "cannot connect" errors.
|
|
#
|
|
# Output: /tmp/smb-diag-<timestamp>.log (pastebin-friendly)
|
|
#
|
|
# Captures: smbd processes, open SMB sessions, port 445 TCP sockets,
|
|
# conntrack, arp, bridge fdb, dnsmasq leases, recent smbd logs.
|
|
|
|
set -o pipefail
|
|
|
|
TS=$(date +%Y%m%d-%H%M%S)
|
|
OUT=/tmp/smb-diag-$TS.log
|
|
|
|
exec > >(tee "$OUT") 2>&1
|
|
|
|
echo "=============================================================="
|
|
echo "SMB diagnostic snapshot - $(date)"
|
|
echo "=============================================================="
|
|
|
|
echo
|
|
echo "### uptime / kernel ###"
|
|
uptime
|
|
uname -r
|
|
|
|
echo
|
|
echo "### interfaces + bridge state ###"
|
|
ip -brief addr
|
|
echo
|
|
bridge link show 2>/dev/null
|
|
echo
|
|
bridge fdb show 2>/dev/null | head -30
|
|
|
|
echo
|
|
echo "### smbd process tree ###"
|
|
pstree -p $(systemctl show -p MainPID --value smbd 2>/dev/null) 2>/dev/null
|
|
echo
|
|
ps -eo pid,ppid,state,command | grep -E 'smbd|nmbd' | grep -v grep
|
|
|
|
echo
|
|
echo "### systemctl status ###"
|
|
systemctl is-active smbd nmbd dnsmasq apache2
|
|
|
|
echo
|
|
echo "### smbstatus ###"
|
|
smbstatus 2>&1 | head -40
|
|
|
|
echo
|
|
echo "### port 445 sockets ###"
|
|
ss -tnp 2>/dev/null | grep :445
|
|
|
|
echo
|
|
echo "### conntrack entries for PXE subnet ###"
|
|
if command -v conntrack >/dev/null 2>&1; then
|
|
conntrack -L 2>&1 | grep -E '10\.9\.100' | head -30
|
|
echo "total conntrack entries: $(conntrack -C 2>&1)"
|
|
else
|
|
echo "conntrack tool not installed"
|
|
fi
|
|
|
|
echo
|
|
echo "### arp / neighbour table for PXE subnet ###"
|
|
ip neigh show 2>/dev/null | grep -E '10\.9\.100|br-pxe'
|
|
|
|
echo
|
|
echo "### dnsmasq DHCP leases ###"
|
|
cat /var/lib/misc/dnsmasq.leases 2>/dev/null | head -20
|
|
|
|
echo
|
|
echo "### recent smbd log files ###"
|
|
ls -la /var/log/samba/ 2>/dev/null | head -20
|
|
|
|
echo
|
|
echo "### recent smbd auth / status errors (all machine logs) ###"
|
|
grep -hE 'NT_STATUS|error|denied' /var/log/samba/log.*.log 2>/dev/null | tail -30
|
|
|
|
echo
|
|
echo "### last 20 lines of smbd master log ###"
|
|
tail -20 /var/log/samba/log.smbd 2>/dev/null
|
|
|
|
echo
|
|
echo "=============================================================="
|
|
echo "Snapshot saved to $OUT"
|
|
echo "=============================================================="
|