Commit Graph

3 Commits

Author SHA1 Message Date
cproudlock
ce604adcda Renumber PXE LAN from 10.9.100.0/24 to 172.16.9.0/24
Single-site bay-stuck issue at WJ: GE Intune Report IP script filters
Get-NetIPAddress on StartsWith("10.") and posts everything matching
to the GE Tines webhook. Bays at WJ get the PXE LAN 10.9.100.x IP
captured and reported -> GE backend tags bays as on a non-corp 10.x
subnet -> dynamic group eligibility for SFLD policy never matches.
Other GE sites work because their PXE LANs aren't on 10.x at all.

Renumber PXE LAN to RFC1918 172.16.9.0/24 so the GE filter naturally
skips wired PXE addresses without any disable-NIC dance.

Server-side already in flight (netplan dual-bound, dnsmasq scope +
boot URL repointed, blancco preferences + grub.cfg + iPXE GetPxeScript
all sed'd to 172.16.9.1). This commit is the playbook / scripts /
docs side: 109 hits across 35 files sed'd in one shot.

After this lands + boot.wim is rebuilt + bays renumber off DHCP,
the 10.9.100.1 binding will be dropped from netplan as the final
cleanup step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:30:32 -04:00
cproudlock
0292bc01ad Auto-flush stale SMB/conntrack state on DHCP lease, one-source PPKG model
Three changes that go together so a re-image never hits "System error 53":

1. dnsmasq dhcp-script hook (playbook/pxe-server-helpers/pxe-dhcp-hook.sh)
   Fires on every add/del lease event. Runs conntrack -D and ss -K for the
   client IP so any stale ESTABLISHED SMB session from a previous boot is
   cleared before the client reconnects. Runs as root (dnsmasq default).
   Wired into /etc/dnsmasq.conf via dhcp-script= directive in the playbook.

2. One-source PPKG (playbook/startnet.cmd + startnet-template.cmd)
   The 5 per-Office PPKG copies were bit-for-bit identical; only the
   filename differs because BPRT parses Office and Region out of the name.
   Store one source file (e.g. GCCH_Prod_SFLD_v4.11.ppkg) and construct
   the BPRT-tagged target filename at menu-selection time from variables:
     SOURCE_PPKG / PPKG_VER / PPKG_EXP / REGION / OFFICE
   copy /Y "Y:\ppkgs\%SOURCE_PPKG%" "W:\Enrollment\%PPKG%"
   Bumped PPKG_VER v4.10 -> v4.11 and PPKG_EXP 20260430 -> 20270430.
   Saves ~30G on disk per version.

3. run-enrollment.ps1 already committed in 5a9c3db uses provtool.exe
   directly (no PowerShell cmdlet 180s timeout). Included here because it
   is part of the same end-to-end PPKG path.
2026-04-15 09:03:16 -04:00
cproudlock
18537acbbc PXE server: fix WinPE re-image SMB connection loss
WinPE clients re-imaging the same machine hit "System error 53 -
network path not found" on the second attempt. systemctl restart smbd
did not help; only a full server power cycle cleared the state.

Root cause is kernel nf_conntrack: the default TCP ESTABLISHED timeout
is 5 days (432000s), so a session from the first WinPE run whose
client rebooted abnormally leaves an ASSURED ESTABLISHED entry that
ufw's state-tracking rules then mis-classify the new SYN against.

Fix applied in three layers:
- /etc/sysctl.d/99-pxe-conntrack.conf drops TCP ESTABLISHED timeout
  to 1 hour and shortens the half-closed states to 30s each.
- smb.conf gains socket options TCP_NODELAY SO_KEEPALIVE IPTOS_LOWDELAY
  plus keepalive = 30 and deadtime = 5. Active sessions refresh the
  conntrack timer every 30s via keepalives so they never age out;
  dead ones expire in an hour.
- /usr/local/sbin/smb-diag.sh snapshots kernel + Samba state for
  remote diagnosis; /usr/local/sbin/smb-soft-reset.sh walks a
  progressive recovery (nmbd/smbd restart, conntrack flush, arp
  flush, ss -K) as an alternative to power-cycling.

conntrack package added to download-packages.sh and playbook verify
list so the offline .deb bundle ships with it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 13:00:43 -04:00