Commit Graph

40 Commits

Author SHA1 Message Date
cproudlock
a165a79f95 imaging: force shopfloor unattend deploy (was force:no -> went stale)
The shopfloor unattend deploy used force:no, so once a live copy existed the
playbook never overwrote it. That let the live gea-shopfloor unattend drift for
weeks - missing the Fetch + Verify-And-Heal staging steps - which is why imaging
lost payloads (CMM bundle/backups). Flip to force:yes so the repo stays the
source of truth, matching the standard/engineer unattend task.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 10:12:06 -04:00
cproudlock
db6be99d43 samba: deadtime=0 (was 5) so WIM-apply idle does not drop the staging mount
WinPE maps the enrollment share early, then idles for minutes while the WIM
applies. deadtime=5 dropped that idle session, so the post-apply staging
copies failed (bay left with only site-config.json staged). VM test against
the live share copied everything fine on a fresh mount; the only difference
was the idle. deadtime=0 disables idle auto-disconnect. Applied live to the
current box (smbd reloaded); this makes it permanent + covers the second box
on the next playbook run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 09:04:37 -04:00
cproudlock
ce604adcda Renumber PXE LAN from 10.9.100.0/24 to 172.16.9.0/24
Single-site bay-stuck issue at WJ: GE Intune Report IP script filters
Get-NetIPAddress on StartsWith("10.") and posts everything matching
to the GE Tines webhook. Bays at WJ get the PXE LAN 10.9.100.x IP
captured and reported -> GE backend tags bays as on a non-corp 10.x
subnet -> dynamic group eligibility for SFLD policy never matches.
Other GE sites work because their PXE LANs aren't on 10.x at all.

Renumber PXE LAN to RFC1918 172.16.9.0/24 so the GE filter naturally
skips wired PXE addresses without any disable-NIC dance.

Server-side already in flight (netplan dual-bound, dnsmasq scope +
boot URL repointed, blancco preferences + grub.cfg + iPXE GetPxeScript
all sed'd to 172.16.9.1). This commit is the playbook / scripts /
docs side: 109 hits across 35 files sed'd in one shot.

After this lands + boot.wim is rebuilt + bays renumber off DHCP,
the 10.9.100.1 binding will be dropped from netplan as the final
cleanup step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:30:32 -04:00
cproudlock
a57ed5fd96 winpe: externalize WinPE-phase status push to scripts/winpe-status-push.ps1
The inline one-liner in startnet.cmd called Get-NetAdapter, which is
not available in WinPE's stripped PowerShell (no NetTCPIP module).
Errors silently swallowed by the surrounding try/catch - POST never
fired, dashboard never showed bays during the WIM-apply phase.

Externalize to a standalone .ps1 on the enrollment share:

  * Uses wmic (always present in WinPE 10+) for both serial AND mac
    instead of Get-CimInstance / Get-NetAdapter.
  * Logs every step to X:\Windows\Temp\winpe-status-push.log so a
    future "POST didn't fire" debug is one file read away.
  * startnet.cmd now just runs powershell -File Y:\scripts\winpe-status-
    push.ps1. Future edits to the push logic do NOT require a boot.wim
    rebuild; just edit the .ps1 on the share.

Mirror the existing pattern for run-enrollment.ps1 / wait-for-internet.ps1
/ migrate-to-wifi.ps1 (all already at /srv/samba/enrollment/scripts/).
Add the new file to the playbook's enrollment-scripts copy loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 14:05:50 -04:00
cproudlock
974accf98a blancco: fix silent prefs fallback, suspend trap, display blank + add View
End-to-end fixes for Blancco Drive Eraser PXE flow uncovered by chasing
"reports never reach SMB share" across two air-gapped sites:

playbook/blancco-init.sh:
  * Drop silent || true on wget of preferences.xml + config.xml. Fail
    loud with shell-drop if download or marker grep fails. Background:
    airootfs /opt/scripts/validate_preferences.sh restores
    /albus/preferences.save (factory defaults, empty network_share) if
    xmllint fails. wget failure made every report silently land nowhere.
  * Clobber /albus/preferences.save with the same served file so even if
    the validator fallback fires, the SMB target survives.
  * Bind-mount /dev/null over /sys/power/{state,disk,mem_sleep,autosleep}
    before switch_root. Albus's license-retry path writes /sys/power/state
    directly (bypassing systemd targets); this is the last-line block.
  * /dev/null symlinks for sleep/suspend/hibernate systemd targets in the
    airootfs overlay + logind drop-in with IdleAction/Handle*=ignore.
    Three independent layers because cmdline systemd.mask alone is bypassed
    by direct /sys/power/state writes.
  * xinitrc.d/00-no-screen-blank.sh runs xset s off -dpms + setterm
    -blank 0 -powerdown 0 so the Blancco GUI doesn't blank during long
    erasures.
  * Removed the 20-failsafeDriver.conf "modesetting" pin. modesetting
    needs DRM/KMS which we disable on kernel cmdline; "vesa" also failed
    on NVIDIA. With the pin gone Xorg auto-picks fbdev which uses the
    kernel framebuffer from vga=normal - works across Intel, AMD, and
    older NVIDIA without nouveau.

playbook/pxe_server_setup.yml:
  * dnsmasq.conf: explicit empty-value dhcp-option=3 + dhcp-option=6.
    Without them, dnsmasq defaults to sending its own IP as router AND
    DNS. Commenting the configured-value lines did NOT disable the push
    (root cause of "wired keeps picking up 10.9.100.1 as gateway").
  * Split the Blancco config.img extraction and preferences.xml deploy
    into separate tasks. The previous shell-with-creates: gate caused
    playbook re-runs to skip the prefs deploy entirely after first run.
  * Added a validation task that runs python3 xml parse + grep on the
    deployed preferences.xml to fail the playbook at deploy time if the
    SMB markers are missing.
  * Added Environment=TZ=America/New_York to the pxe-webapp systemd
    service so report mtimes and audit log render in Eastern time even
    if the Python process is started before timedatectl converges.

webapp:
  * services/blancco_report.py: parse Blancco's XML report format
    (recursive <entries name="..."> walker) into a friendly dict.
  * templates/report_view.html: Bootstrap "Drive Erasure Certificate"
    layout - hero summary, customer + system cards, per-drive cards with
    step-by-step erasure timeline, document signing footer with
    integrity hash detail.
  * /reports/view/<filename> route + View button on the reports list
    (XML reports only; PDFs still download).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 07:38:54 -04:00
cproudlock
adc8d50e66 pxe: arch-aware NBP + undionly.kpxe for legacy BIOS clients
Legacy-BIOS PXE clients booting Blancco reported "NBP is too big to
fit in free base memory". Cause: dnsmasq unconditionally served
ipxe.efi (~675KB EFI binary) which legacy BIOS PXE ROMs cannot
execute and which exceeds their NBP cap.

Fix:
- Add undionly.kpxe (~70KB BIOS-mode iPXE, from boot.ipxe.org).
- dnsmasq: dhcp-match on option:client-arch,0 (BIOS) -> undionly.kpxe;
  default (everything else, including UEFI x86_64 arch 7 and 9) keeps
  getting ipxe.efi. Tag form is reversible: if the match fails to
  evaluate, fallback is the working EFI path, not the new binary.
- Ansible TFTP-copy loop: mirror undionly.kpxe alongside ipxe.efi.
- .gitignore exception: track the open-source kpxe binary so the
  air-gapped USB build stays self-contained.

UEFI clients unchanged. Blancco/Clonezilla/WinPE chain after the
iPXE menu is identical regardless of which iPXE variant delivered it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:13:44 -04:00
cproudlock
4d6438285b playbook: document USB-C 5 Gbps NIC bridge override on netplan task
The "Configure static IP for PXE interface" task writes a flat
single-NIC netplan config. Live PXE server (10.9.100.1) overrides
this with a bridge config bonding the USB-C 5 Gbps NIC and the
onboard NIC into br-pxe, because the onboard NIC alone cannot
sustain the imaging throughput required by the floor.

Add a comment block above the task warning that re-running it on
a box already using the bridge config will replace the bridge with
a flat single-iface config and likely break the PXE LAN. The full
bridge YAML is included for reference. Recovery is via
/etc/netplan/50-cloud-init.yaml.pre-gold-swap (preserved by netplan
backup).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:23:37 -04:00
cproudlock
ce3fbf5a28 sweep: pre-existing drift + matrix UDC entry + ignore 142MB EXE
Bundles drift left uncommitted from prior sessions and the UDC matrix
verify entry added today.

Drift items (all per session-progress.md, completed in earlier sessions
but never staged):

- playbook/check-bios.cmd (deleted, moved to BIOS/check-bios.cmd)
- playbook/migrate-to-wifi.ps1 (made no-op 2026-04-24 after the dnsmasq
  no-gateway fix removed the wired-NIC race that motivated it)
- playbook/preinstall/oracle/Install-Oracle11r2.cmd (post-OUI .ora copy
  added 2026-04-24)
- playbook/preinstall/oracle/tnsnames.ora (live tnsnames, 469 KB,
  deployed alongside the wrapper 2026-04-24)
- playbook/pxe_server_setup.yml (dnsmasq dhcp-option=3,6 commented,
  Oracle .ora deploy task added 2026-04-24)
- playbook/shopfloor-setup/BIOS/{check-bios.cmd, models.txt} (BIOS
  detection refinements)
- playbook/shopfloor-setup/Shopfloor/Force-Lockdown.bat
- playbook/shopfloor-setup/Shopfloor/Monitor-IntuneProgress.ps1
- playbook/shopfloor-setup/Shopfloor/SetShopfloorAutoLogon.bat (new)
- playbook/shopfloor-setup/Shopfloor/09-Install-PrinterInstallerMap.ps1
  (new, places PrinterInstallerMap.exe + Public Desktop shortcut at
  imaging time; manifest entry self-heals on tamper)
- playbook/shopfloor-setup/Shopfloor/lib/Show-IntuneDeviceQR.ps1 (new,
  standalone QR rendering for site that wanted just that piece)
- playbook/shopfloor-setup/gea-shopfloor-collections/{Install-eMxInfo.cmd.template,
  Restore-UDCData.ps1} (these were uncommitted in pre-rename Standard/;
  git mv didn't catch them because they were untracked at the time)
- docs/shopfloor-machine-imaging-guide.md (operator-facing how-to)

Matrix:
- common.test/matrix.json: add UDC verify entry to gea-shopfloor-collections
  row. Surfaces UDC silent-install issue (item H pending) instead of
  letting it pass silently.

.gitignore:
- PrinterInstallerMap.exe (142 MB) excluded. Track via LFS or stage on
  PXE server only - too big for regular git history. Untouched on disk
  so existing local copy still works.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 08:49:43 -04:00
cproudlock
70f176650b Blancco: playbook now produces working Ubuntu-kernel initramfs out of the box
Companion to the previous commit (4550d43). Three files that should have
been in the same commit but got left out of `git add`:

- .gitignore: negate rule for boot-tools/blancco/grub-blancco.cfg so the
  tracked cfg (source of truth for grubx64.efi rebuilds) survives
  the blanket boot-tools/ ignore.

- playbook/blancco-init.sh: rewritten for modprobe-with-deps, full NIC
  driver coverage, set -x trace to /dev/console, dmesg + PCI-device +
  /proc/modules dump + interactive shell on "no NIC after 60s".
  Replaces the narrow insmod-loop version that silently hung on
  unsupported NICs.

- playbook/pxe_server_setup.yml "Build Blancco PXE initramfs" task now
  sweeps the full drivers/net/ tree (ethernet + phy + mdio + usb + fddi
  + wan) plus overlay / squashfs / loop / ptp / libphy / mii deps, runs
  depmod to regenerate modules.dep inside the initramfs (required for
  modprobe dependency resolution), and symlinks the full applet list
  blancco-init.sh needs (modprobe, insmod, dmesg, find, env, etc).
  Result: ~20 MB initramfs vs the old 2 MB narrow build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 18:08:57 -04:00
cproudlock
03ed694671 pxe_server_setup: close three playbook gaps from 2026-04-22 session
1. Deploy gea-standard / gea-engineer FlatUnattendW10.xml
   The playbook only copied the shopfloor variant before; standard and
   engineer's unattend was hand-staged on the running servers. New task
   loops the repo's playbook/FlatUnattendW10.xml into Deploy/ for each
   entry in standard_types (new var covering gea-standard, gea-engineer,
   ge-standard, ge-engineer). force: yes because repo drift vs deployed
   copy is what produced the Win10/Win11 search-cleanup regression
   earlier this session (d49f516).

2. Deploy Oracle Client 11.2 preinstall payload
   preinstall.json now leads with Oracle 11.2 (commit 3a29784). The CMD
   wrapper is tracked in the repo at playbook/preinstall/oracle/; the
   686 MB Oracle_OracleDatabase_11r2_V03.zip is too large to commit and
   rides on USB under oracle/ alongside BIOS exes. Three tasks:
   mkdir staging dir, copy CMD from usb_mount, copy zip from usb_root
   with a soft-fail + warning if absent.

3. No change needed for sync-preinstall.sh — Oracle 10.2.0.3 flat
   installer was already dropped in 9235d19.

YAML lints clean. Fresh server built from this commit will bring up
Blancco-agnostic imaging paths correctly; Blancco-specific gaps
(grubx64.efi native-vs-slim, narrow kexec-initrd driver tree,
narrow blancco-init.sh) are still deferred per earlier "option B"
decision and remain server-side-pinned only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:59:08 -04:00
cproudlock
c918dea9d1 Revert all Blancco changes from this session
User reports Blancco was working before our mirror/session activity
today - then my attempted fixes (grubx64.efi rebuild, kexec-initrd
driver sweep, verbose blancco-init.sh) made it worse:

  - First attempt (narrow igc driver add) did not help because the
    switch-root path was not the one actually loaded by grubx64.efi's
    embedded config.
  - Second attempt (swapped grub embedded config to Ubuntu-kernel path)
    got further, but then kexec-initrd modules failed on insmod.
  - Third attempt (full ethernet tree sweep) pulled in broken ancient
    drivers (winbond-840, w5100-spi, xirc2ps_cs) that failed with
    unknown-symbol errors and prevented good drivers from loading.

Full revert: .gitignore, blancco-init.sh, pxe_server_setup.yml back to
the pre-session commit 6dcf832 state. Removes boot-tools/blancco/grub-
blancco.cfg from git (it was only added this session).

Runtime on both PXE servers was also restored: grubx64.efi and
kexec-initrd.img reverted from the .bak files taken before each
modification this session.

Whatever was there before today is now restored byte-for-byte on both
servers. If there is still a Blancco boot issue on specific modern
hardware that the user needs to fix, we will diagnose that narrowly
against the actual failure mode on that specific machine, not by
making sweeping preemptive changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 13:33:49 -04:00
cproudlock
d7ec6a2b5f Blancco: sweep full NIC driver tree into kexec-initrd + verbose init
Previous approach listed ~6 specific drivers (e1000e, igb, tg3, bnx2,
bnxt_en, b44) and silenced insmod errors (2>/dev/null). On modern Dell
fleet (Latitude 5330/5440, Pro-series, newer OptiPlex) this missed
igc (Intel I225/I226) entirely, and for the drivers we did include,
dependency modules they need at insmod time (libeth, libie, dca,
i2c-algo-bit, macsec, mii, libphy, ptp, ...) were never bundled.
insmod does not resolve dependencies, so NIC drivers that need
helpers failed to load silently.

playbook/pxe_server_setup.yml (kexec-initrd build):
  - Sweep the whole drivers/net/ethernet tree (~170 drivers, all
    vendors, ~15 MB total). Drivers for hardware not present skip
    without binding.
  - Add common helper dirs: drivers/net/{phy,mdio}, drivers/i2c/algos,
    drivers/dca, drivers/ptp, net/macsec, drivers/ssb.
  - overlay.ko kept.

playbook/blancco-init.sh:
  - Load helpers BEFORE main NIC drivers (libeth/libie, dca,
    i2c-algo-bit, macsec, mii, ssb, libphy, mdio*, phy*, ptp*),
    then iterate remaining modules.
  - Remove 2>/dev/null on insmod so actual failures surface on the
    boot console.
  - Print kernel version + /sys/class/net before/after driver load,
    plus dmesg grep for NIC driver activity.
  - On "no interface found" failure, dump dmesg tail and drop to a
    busybox shell for manual debug rather than just hanging.

Separate from this commit but related: kexec-initrd.img on both PXE
servers (.1 and .2) was rebuilt inline with these changes. Pre-rebuild
binary kept as kexec-initrd.img.bak-<timestamp>.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 13:29:35 -04:00
cproudlock
0292bc01ad Auto-flush stale SMB/conntrack state on DHCP lease, one-source PPKG model
Three changes that go together so a re-image never hits "System error 53":

1. dnsmasq dhcp-script hook (playbook/pxe-server-helpers/pxe-dhcp-hook.sh)
   Fires on every add/del lease event. Runs conntrack -D and ss -K for the
   client IP so any stale ESTABLISHED SMB session from a previous boot is
   cleared before the client reconnects. Runs as root (dnsmasq default).
   Wired into /etc/dnsmasq.conf via dhcp-script= directive in the playbook.

2. One-source PPKG (playbook/startnet.cmd + startnet-template.cmd)
   The 5 per-Office PPKG copies were bit-for-bit identical; only the
   filename differs because BPRT parses Office and Region out of the name.
   Store one source file (e.g. GCCH_Prod_SFLD_v4.11.ppkg) and construct
   the BPRT-tagged target filename at menu-selection time from variables:
     SOURCE_PPKG / PPKG_VER / PPKG_EXP / REGION / OFFICE
   copy /Y "Y:\ppkgs\%SOURCE_PPKG%" "W:\Enrollment\%PPKG%"
   Bumped PPKG_VER v4.10 -> v4.11 and PPKG_EXP 20260430 -> 20270430.
   Saves ~30G on disk per version.

3. run-enrollment.ps1 already committed in 5a9c3db uses provtool.exe
   directly (no PowerShell cmdlet 180s timeout). Included here because it
   is part of the same end-to-end PPKG path.
2026-04-15 09:03:16 -04:00
cproudlock
d6776f7c7f Reorganize repo, enrollment share taxonomy, Blancco USB-build fixes, v4.10 PPKGs
Workstation reorganization:
- All build/deploy/helper scripts moved into scripts/ (paths updated to use
  REPO_ROOT instead of SCRIPT_DIR so they resolve sibling dirs from the new
  depth)
- New config/ directory placeholder for site-specific overrides
- Removed stale: mok-keys/, test-vm.sh, test-lab.sh, setup-guide-original.txt,
  unattend/ (duplicate of moved playbook/FlatUnattendW10.xml)
- README.md and SETUP.md structure listings updated, dead "Testing with KVM"
  section removed
- .claude/ gitignored

Enrollment share internal taxonomy (forward-looking; existing servers
unaffected since they keep their current boot.wim with flat paths):
- Single SMB share kept (WinPE only mounts one Y: drive), but content now
  organised into ppkgs/, scripts/, config/, shopfloor-setup/, pre-install/{bios,
  installers}, installers-post/cmm/, blancco/, logs/
- README.md deployed to share root explaining each subdir
- New playbook tasks deploy site-config.json + wait-for-internet.ps1 +
  migrate-to-wifi.ps1 explicitly (were ad-hoc on legacy servers)
- BIOS subdir moved into pre-install/bios/, preinstall/ renamed to pre-install/
- startnet.cmd + startnet-template.cmd updated with new Y:\subdir\ paths
- Bumped GCCH PPKG references v4.9 -> v4.10

Blancco USB-build fixes (so next fresh USB install boots Blancco end-to-end
without the manual fixup we did against GOLD):
- grub-blancco.cfg: kernel/initrd switched HTTP -> TFTP (GRUB's HTTP module
  times out on multi-MB files); added modprobe.blacklist=iwlwifi,iwlmvm,btusb
  (WiFi drivers hang udev on Intel business PCs)
- grubx64.efi rebuilt from updated cfg
- Playbook task added to create /srv/tftp/blancco/ symlinks pointing at the
  HTTP-served binaries

run-enrollment.ps1: OOBEComplete is now set AFTER PPKG install (Win11 22H2+
hangs indefinitely if OOBEComplete is set before the bulk-enrollment PPKG runs).

Also includes deploy-bios.sh / pull-bios.sh / busybox-static / models.txt
that were sitting untracked at the repo root.
2026-04-14 16:01:02 -04:00
cproudlock
d14c240b48 Change dnsmasq-restart cron delay from 30s to 15s
Task name already said "15s after reboot" but content had sleep 30.
Align content with name; faster recovery from systemd-resolved race at boot.
2026-04-14 13:01:38 -04:00
cproudlock
ade2f3b5ff Fix USB install reliability: bash, LV resize, deps, idempotency
- autoinstall/user-data: move lvextend/growpart/pvresize BEFORE playbook
  so 130GB of drivers+PPKGs fits during first-boot copy. Use
  tr -d "[:space:]" to avoid breaking outer bash -c single-quote wrap.
- playbook: add executable: /bin/bash to Dell driver deploy (process
  substitution) and Blancco initramfs builder (brace expansion).
- playbook: make "Ensure Samba user for Blancco reports" idempotent via
  pdbedit check so re-runs don't abort the play.
- download-packages.sh: also download dist-upgrade package set. Explicit
  --simulate misses transitive version bumps (e.g. gnupg 17.4 needs
  matching gpgv 17.4) causing offline dpkg "dependency problems" when
  ISO baseline is older than noble-updates.
2026-04-14 12:57:28 -04:00
cproudlock
855af7312b Sub-type aware preinstall, USB drivers/PPKGs, Lab OpenText
- PreInstall runner reads pc-subtype.txt and matches PCTypes against
  both base type (Standard) and composite key (Standard-Machine).
- UDC scoped to Standard-Machine only. eDNC and MachineNumberACLs
  skip on Standard-Timeclock sub-type.
- Lab added to OpenText PCTypes.
- build-usb.sh copies enrollment/ (PPKGs) and drivers-staging/ (Dell
  driver packs) onto USB for self-contained deployment.
- Playbook deploys PPKGs and drivers from USB to PXE server shares.
- Gitignore enrollment/, drivers-staging/, *.ppkg (large binaries).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:00:23 -04:00
cproudlock
855d501fc2 Fix Display sync loop, PPKG deployment, dnsmasq cron, dpkg configure
- Monitor-IntuneProgress: Display PCs skip DSC phases entirely (no SAS
  token, no DSCInstall.log), complete after Phase 1 identity. Renderer
  hides Phase 2-5 for Display type.
- Playbook: deploy PPKG files and run-enrollment.ps1 from USB to
  enrollment share. Bump dnsmasq restart cron from 15s to 30s.
- build-usb.sh: copy enrollment/ directory (PPKGs) onto USB if present.
- user-data: add dpkg --configure -a after offline .deb install to fix
  packages left in unconfigured state (cron, systemd-timesyncd).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:27:21 -04:00
cproudlock
18537acbbc PXE server: fix WinPE re-image SMB connection loss
WinPE clients re-imaging the same machine hit "System error 53 -
network path not found" on the second attempt. systemctl restart smbd
did not help; only a full server power cycle cleared the state.

Root cause is kernel nf_conntrack: the default TCP ESTABLISHED timeout
is 5 days (432000s), so a session from the first WinPE run whose
client rebooted abnormally leaves an ASSURED ESTABLISHED entry that
ufw's state-tracking rules then mis-classify the new SYN against.

Fix applied in three layers:
- /etc/sysctl.d/99-pxe-conntrack.conf drops TCP ESTABLISHED timeout
  to 1 hour and shortens the half-closed states to 30s each.
- smb.conf gains socket options TCP_NODELAY SO_KEEPALIVE IPTOS_LOWDELAY
  plus keepalive = 30 and deadtime = 5. Active sessions refresh the
  conntrack timer every 30s via keepalives so they never age out;
  dead ones expire in an hour.
- /usr/local/sbin/smb-diag.sh snapshots kernel + Samba state for
  remote diagnosis; /usr/local/sbin/smb-soft-reset.sh walks a
  progressive recovery (nmbd/smbd restart, conntrack flush, arp
  flush, ss -K) as an alternative to power-cycling.

conntrack package added to download-packages.sh and playbook verify
list so the offline .deb bundle ships with it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 13:00:43 -04:00
cproudlock
a1a78e2ba3 PXE preinstall pipeline + Set-MachineNumber helper for Standard PCs
Adds a local-install pipeline so Standard shopfloor PCs get Oracle, the
VC++ redists (2008-2022), and UDC installed during PXE imaging via Samba
instead of pulling ~215 MB per device from Azure blob over the corporate
WAN. Intune DSC then verifies (already-installed apps are skipped) and
the only Azure traffic on the happy path is ~11 KB of CustomScripts
wrapper polling.

New files:
- playbook/preinstall/preinstall.json — curated app list with PCTypes
  filter and per-app detection rules. Install order puts VC++ 2008
  LAST so its (formerly) reboot-triggering bootstrapper doesn't kill
  the runner mid-loop. (2008 itself now uses extracted vc_red.msi with
  REBOOT=ReallySuppress; the reorder is defense in depth.)
- playbook/shopfloor-setup/Shopfloor/00-PreInstall-MachineApps.ps1 —
  the runner. Numbered 00- so it runs first in the baseline sequence.
  Reads preinstall.json, filters by PCTYPE, polls for completion via
  detection check (handles UDC's hung WPF process by killing it once
  detection passes), uses synchronous WriteThrough logging that
  survives hard reboots, preserves log history across runs.
- playbook/shopfloor-setup/Standard/Set-MachineNumber.{ps1,bat} — desktop
  helper for SupportUser. Reads current UDC + eDNC machine numbers,
  prompts via VB InputBox, validates digits-only, kills running UDC,
  edits both C:\ProgramData\UDC\udc_settings.json and HKLM\…\GE Aircraft
  Engines\DNC\General\MachineNo, relaunches UDC. Lets a tech assign a
  real machine number to a mass-produced PC without admin/LAPS.
- playbook/sync-preinstall.sh — workstation helper to push installer
  binaries from /home/camp/pxe-images/main/ to the live PXE Samba.

Changes:
- playbook/startnet.cmd + startnet-template.cmd — add xcopy to stage
  preinstall bundle from Y:\preinstall\ to W:\PreInstall\ during the
  WinPE imaging phase, gated on PCTYPE being set.
- playbook/pxe_server_setup.yml — create /srv/samba/enrollment/preinstall
  + installers/ directories and deploy preinstall.json there.
- playbook/shopfloor-setup/Run-ShopfloorSetup.ps1 — bump AutoLogonCount
  to 99 at start (defense against any installer triggering an immediate
  reboot mid-dispatcher; final line still resets to 2 on successful
  completion). Copy Set-MachineNumber.{ps1,bat} to SupportUser desktop
  on Standard PCs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:06:26 -04:00
cproudlock
163e58ab0b Fix dnsmasq reboot cron: use /etc/cron.d/ instead of crontab
Ansible cron module writes to root's crontab which requires cron
daemon running. Drop file in /etc/cron.d/ instead for reliability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:03:09 -04:00
cproudlock
76165495ff Shopfloor PC type system, webapp enhancements, slim Blancco GRUB
- Shopfloor PC type menu (CMM, WaxAndTrace, Keyence, Genspect, Display, Standard)
- Baseline scripts: OpenText CSF, Start Menu shortcuts, network/WinRM, power/display
- Standard type: eDNC + MarkZebra with 64-bit path mirroring
- CMM type: Hexagon CLM Tools, PC-DMIS 2016/2019 R2
- Display sub-type: Lobby vs Dashboard
- Webapp: enrollment management, image config editor, UI refresh
- Upload-Image.ps1: robocopy MCL cache to PXE server
- Download-Drivers.ps1: Dell driver download pipeline
- Slim Blancco GRUB EFI (10MB -> 660KB) for old hardware compat
- Shopfloor display imaging guide docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:25:07 -04:00
cproudlock
86660a0939 Remove UEFI HTTP Boot config from dnsmasq
Not needed since iPXE chains to grubx64.efi for Blancco boot.
Simplifies DHCP config and avoids interfering with other UEFI clients.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 11:58:33 -05:00
cproudlock
dd2fec5a41 Blancco PXE boot via Ubuntu kernel switch_root
Blancco's own kernel freezes on Dell Precision towers during PXE boot.
Workaround: boot Ubuntu kernel via GRUB chainload, download Blancco's
666MB squashfs rootfs + 132MB kernel modules over HTTP, mount overlay
filesystem, and switch_root into Blancco's userspace.

- Add blancco-init.sh: custom initramfs init script for switch_root approach
- Add blancco-preferences.xml: pre-configured with network share for reports
- Update playbook: build initramfs, deploy Ubuntu kernel/modules, config
- Update prepare-boot-tools.sh: add HTTP modules to GRUB EFI build
- Add UEFI HTTP Boot support to dnsmasq config
- iPXE menu chains to grubx64.efi (replaces sanboot of ISO)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 11:20:00 -05:00
cproudlock
57a53381f2 Ensure Media.tag exists for all images after import and via cron
Webapp now creates Deploy/Control/Media.tag after every image import.
Cron updated to create (not just touch) Media.tag for any image
directory that has Deploy/Control/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 16:58:39 -05:00
cproudlock
093a4d713b Add @reboot to Media.tag refresh cron
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 16:41:24 -05:00
cproudlock
a7636887b1 Add daily cron to refresh Media.tag (30-day expiry workaround)
PESetup.exe checks Media.tag last modified date and rejects it after
30 days. Cron job touches all Media.tag files daily at midnight.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 16:41:07 -05:00
cproudlock
1a5c4f7124 Eliminate USB requirement for WinPE PXE boot, add image upload script
- Add startnet.cmd: FlatSetupLoader.exe + Boot.tag/Media.tag eliminates
  physical USB requirement for WinPE PXE deployment
- Add Upload-Image.ps1: PowerShell script to robocopy MCL cached images
  to PXE server via SMB (Deploy, Tools, Sources)
- Add gea-shopfloor-mce image type across playbook, webapp, startnet
- Change webapp import to move (not copy) for upload sources to save disk
- Add Samba symlink following config for shared image directories
- Add Media.tag creation task in playbook for drive detection
- Update prepare-boot-tools.sh with Blancco config/initramfs patching
- Add grub-efi-amd64-bin to download-packages.sh

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 16:40:27 -05:00
cproudlock
f4c158a5ac Fix PXE interface detection, add br-pxe bridge to test VM, network upload import
- Playbook: detect interface already configured with 10.9.100.1 before
  falling back to non-default-gateway heuristic (fixes dnsmasq binding
  to wrong NIC when multiple interfaces exist)
- test-vm.sh: auto-attach br-pxe bridge NIC if available on host
- Webapp: add network upload import via SMB share with shared driver
  deduplication and symlinks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 15:15:14 -05:00
cproudlock
7486b9ed66 Apache reverse proxy for webapp, UI improvements
- Move Flask to localhost:9010, Apache serves port 9009 with static file
  handling and reverse proxy to fix intermittent asset loading on remote clients
- Add "PXE Manager" branding beneath logo in sidebar
- Increase code editor size (startnet.cmd and unattend XML) to 70vh
- Add test-lab.sh for full lab VM testing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 17:45:10 -05:00
cproudlock
f3a384fa1a Add Proxmox ISO builder, CSRF protection, boot-files integration
- Add build-proxmox-iso.sh: remaster Ubuntu ISO with autoinstall config,
  offline packages, playbook, webapp, and boot files for zero-touch
  Proxmox VM deployment
- Add boot-files/ directory for WinPE boot files (wimboot, boot.wim,
  BCD, ipxe.efi, etc.) sourced from WestJeff playbook
- Update build-usb.sh and test-vm.sh to bundle boot-files automatically
- Add usb_root variable to playbook, fix all file copy paths to use it
- Unify Apache VirtualHost config (merge default site + webapp proxy)
- Add CSRF token protection to all webapp POST forms and API endpoints
- Update README with Proxmox deployment instructions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 20:01:19 -05:00
cproudlock
cb442f971b Fix air-gapped deployment: pip wheel install, UFW ports, installer crash
- Fix pip/distutils incompatibility: install Python wheels directly via
  zipfile extraction instead of broken pip3 from Ubuntu 22.04 .debs
  (pip3 crashes on Python 3.12 with ModuleNotFoundError: distutils)
- Fix UFW port types: quote loop items so string comparison works
  correctly, giving ports 67/69 UDP rules instead of TCP
- Fix autoinstall crash: set refresh-installer to no (can't reach
  internet on air-gapped network, was crashing subiquity)
- Remove python3-pip and python3-venv from download-packages.sh
  (no longer needed with direct wheel extraction)
- Add ignore_errors to WinPE/iPXE copy tasks (files only present
  on real USB media, not test VM)
- Use system python3 instead of venv for webapp service

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 16:16:55 -05:00
cproudlock
851225d062 Add README, update docs, fix CRLF, SSH, and playbook network detection
- Add comprehensive README.md with full project documentation
- Update SETUP.md to reflect current state (7 image types, webapp, boot tools, Samba shares)
- Enable SSH in autoinstall user-data for remote access
- Fix ansible_default_ipv4.interface error when no default gateway exists
- Fix Windows CRLF line endings on all shell scripts and YAML files
- Fix test-vm.sh: use --install kernel extraction instead of --location, don't delete source ISO on --destroy

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 17:38:55 -05:00
cproudlock
725c8f43de Change webapp to port 9009, add test VM script
- Webapp now listens on port 9009 (UFW rule added)
- Apache reverse proxy updated to proxy to 9009
- test-vm.sh creates a KVM test environment with:
  - CIDATA ISO built from project files
  - Isolated libvirt network (10.9.100.0/24)
  - Ubuntu 24.04 VM with autoinstall

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 16:53:23 -05:00
cproudlock
92c9b0f762 Fix review findings: offline assets, security, audit logging
- Bundle Bootstrap CSS/JS/icons locally for air-gapped operation
- Add path traversal validation on image import source
- Disable Flask debug mode in production
- Fix file handle leaks, remove unused import
- Add python3-pip, python3-venv, p7zip-full to offline packages
- Add pip wheel download/bundling for offline Flask install
- Change UFW default policy from allow to deny
- Fix wrong path displayed in unattend editor template
- Dynamic sidebar image lists from all_image_types
- Add audit logging for all write operations
- Audit log viewer page with activity history

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 16:50:20 -05:00
cproudlock
05dbb7ed5d Add Blancco erasure reports Samba share and webapp viewer
- Samba share at \\server\blancco-reports for automatic report collection
- Webapp reports page with list, download, and delete
- Compliance warning on delete confirmation
- Sidebar link under Tools section

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 16:27:27 -05:00
cproudlock
89b58347d9 Add wimtools and startnet.cmd editor for boot.wim modification
- Added wimtools to offline packages and playbook verification
- Webapp startnet.cmd editor: extract, view, edit, save back to boot.wim
- Uses wimextract/wimupdate for in-place WIM modification
- Dark-themed code editor with tab support and common command reference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 16:23:22 -05:00
cproudlock
e7313c2ca3 Add multi-boot PXE menu, Clonezilla backup management, and GE Aerospace branding
- iPXE boot menu with WinPE, Clonezilla, Blancco Drive Eraser, Memtest86+
- prepare-boot-tools.sh to download/extract boot tool binaries
- Clonezilla backup management in webapp (upload, download, delete)
- Clonezilla Samba share for network backup/restore
- GE Aerospace logo and favicon in webapp
- Updated playbook with boot tool directories and webapp env vars

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 16:20:50 -05:00
cproudlock
cee4ecd18d Add web management UI, offline packages, WinPE consolidation, and docs
- webapp/: Flask web management app with:
  - Dashboard showing image types and service status
  - USB import page for WinPE deployment content
  - Unattend.xml visual editor (driver paths, specialize commands,
    OOBE settings, first logon commands, raw XML view)
  - API endpoints for services and image management
- SETUP.md: Complete setup documentation for streamlined process
- build-usb.sh: Now copies webapp and optional WinPE images to USB
- playbook: Added webapp deployment (systemd service, Apache reverse
  proxy), offline package verification, WinPE auto-import from USB

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 15:57:34 -05:00
cproudlock
5791bd1b49 Initial project setup: automated PXE server provisioning
Reorganized from OneDrive export into a clean project structure:
- autoinstall/: cloud-init user-data and meta-data for Ubuntu 24.04 autoinstall
- playbook/: Ansible playbook for PXE server config (dnsmasq, Apache, Samba, iPXE)
- unattend/: Windows unattend.xml sample for image deployment
- build-usb.sh: builds a bootable USB with Ubuntu installer + CIDATA partition
- download-packages.sh: downloads all offline .deb dependencies via Docker

Key improvements over original:
- Fully air-gapped: all packages bundled offline, no WiFi needed
- Hardware-agnostic network config (wildcard NIC matching)
- Removed plaintext WiFi credentials
- Single USB build process (was 15+ manual steps)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 15:47:36 -05:00