pxe-server

Author	SHA1	Message	Date
cproudlock	520d4aa791	Monitor: fix AESFMA-connected detection + stop retrying once connected Two bugs causing "AESFMA cert detected, connecting AESFMA..." to log over and over even after AESFMA is already up: 1. Regex 'SSID\s:\sAESFMA.?State\s:\s*connected' required SSID line BEFORE State line. Actual netsh wlan show interfaces order on Win11 is "Name / State / SSID" - State comes FIRST. The non- greedy match never succeeded. Always thought AESFMA wasn't connected. Refactor to a Test-AESFMAConnected helper that splits output into per-adapter blocks and checks SSID + State independently, tolerating either order. 2. Added a fast-path at top of the WiFi-swap block: if AESFMA is already connected (no help needed from us), just delete INTERNETACCESS if still present and flip the cache flag to stop running this block. Previously the block only set the flag after a successful connect-then-verify-then-delete cycle; if AESFMA was already up at first check, the cycle "succeeded" each tick but the flag never flipped, producing the log spam. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 20:06:00 -04:00
cproudlock	894305e906	Monitor: drop AESFMA-connected from Phase 1 done; webapp: LAPS endpoint 1. Phase 1 done gate was requiring 'AESFMA WLAN connected' in addition to the data-side signals (AAD + Intune + EmTask + baseline). If the bay never reached AESFMA (cert never landed, RADIUS unreachable), Phase 1 stayed IN PROGRESS forever even though Intune registration was actually complete. Reverting to the data-side-only definition. 2. New webapp endpoint POST /imaging/<serial>/laps stores a LAPS password in the session JSON so it survives the 5s dashboard auto-refresh. Empty body clears the field. Daily reset of the server (cron/restart) is the lifetime cap on stored passwords. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 19:53:05 -04:00
cproudlock	1b7e1bfee4	imaging: pause page auto-refresh while a LAPS QR is showing meta http-equiv=refresh fires every 5s and reloads the entire page, wiping the LAPS QR state mid-scan. Replaced the meta tag with a JS-driven setTimeout(location.reload, 5000) so renderLapsQR() can clearTimeout it. Reload resumes when the QR is cleared (manual or 60s auto). Multi-bay safety: only resumes if no other bay still has a QR rendered. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 19:50:24 -04:00
cproudlock	d5398bdd74	imaging: LAPS-password-to-QR generator per bay card Per-bay <details> section with: - Input field for LAPS password (paste from Intune portal manually, since deep-link to LAPS blade needs AAD objectId we can't obtain) - Make QR button generates a client-side QR from the input - QR displayed below at 280px with 4-cell quiet zone - Auto-clears input + QR after 60s with live countdown - Manual Clear button - Enter key on the input also triggers QR generation Password never POSTs to server, never logged, never persists past the 60s window. Generated using the same qrcode-generator lib already loaded for the device-id QR. Scan with a USB barcode scanner plugged into the bay (HID keyboard mode) -> password types into bay login field. Faster than reading off the Intune portal letter-by-letter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 19:48:43 -04:00
cproudlock	cdb6655e4a	imaging: drop LAPS deep-link, keep only category LAPS retrieval blade is keyed on AAD object id, not aadDeviceId / mdmDeviceId. We capture aadDeviceId from dsregcmd; resolving to objectId would require a Graph API call with Device.Read.All which we don't have at WJ. Removed the LAPS button - operator goes to Intune portal manually for LAPS as before. set-category button stays - aadDeviceId works for that blade. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 19:47:46 -04:00
cproudlock	74ba3d1339	imaging: deep-link buttons for Set Category + LAPS per bay Two buttons next to the Intune device id on each bay card: - "set category" -> portal.azure.us Intune device blade properties via aadDeviceId/{deviceId} - "LAPS" -> intune.microsoft.us encryptionKeys blade via mdmDeviceId/{deviceId} Both use the dsregcmd DeviceId we already capture - no Graph API lookup or objectId resolution needed. One click from the dashboard takes the tech to the right page for category assignment or LAPS retrieval. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 19:20:10 -04:00
cproudlock	4599c85509	Monitor: strip ANSI escape codes from dsregcmd output before regex Smoking gun for "Monitor's on-screen QR works but no idx=7 push lands on the PXE dashboard". Win11's dsregcmd emits ANSI VT100 escape codes (e.g. \x1B[7mDeviceId\x1B[0m :) around field labels. Captured output strings then have those codes between "DeviceId" and ":". The strict regex 'DeviceId\s:\s<guid>' fails because \s* doesn't match ANSI escape chars. $script:cache.DeviceId stays null, idx=7 push never fires. Build-QRCodeText was unaffected because it uses Select-String 'DeviceId' (substring match, tolerates anything in between) then splits on ':'. Fix: strip ANSI sequences via -replace '\x1B\[[0-9;]*[A-Za-z]', '' before running the regex. Same pattern covers all CSI sequences dsregcmd uses. Also force Out-String to get a single string back (was an array of lines from 2>&1; -match on arrays returns matching elements but $matches behavior across mixed objects is fragile). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 19:17:05 -04:00
cproudlock	2d75935dfc	PostPpkg settle 60s -> 120s Empirical: a fresh-imaged bay often hasn't finished AAD-join + first Intune sync by 60s, so the post-PPKG-reboot Monitor instance starts without DeviceId visible to dsregcmd yet. Doubling the settle to 120s gives MDM more time to land baseline policies before the reboot, which means the post-reboot Monitor sees AAD-joined + DeviceId on first tick and fires idx=7 immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 19:13:26 -04:00
cproudlock	3fb1d983df	Stop moving OpenText / WJ Shopfloor shortcuts into Shopfloor Tools OpenText / Host Explorer shortcut filenames vary by installed profile (e.g. 'WJ Shopfloor OpenText.lnk', 'WJ Shopfloor.lnk', 'HostExplorer ShopFloor.lnk'). The taskbar-pin path in site-config.json hardcodes 'Shopfloor Tools\WJ Shopfloor.lnk' - mismatches the actual filename so 07-TaskbarLayout silently skips pinning it. Drop OpenText/ShopFloor/HostExplorer pattern moves from 06's categorization regex. Shortcuts stay at the public-desktop top level where the OpenText installer placed them. Tech sees the icon on the desktop, no taskbar pin (the variable filename made the pin unreliable anyway). Other categories (UDC, eDNC, NTLARS, etc with stable filenames) still move into Shopfloor Tools and pin correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:10:16 -04:00
cproudlock	9beee842f1	Monitor: deterministic AESFMA cert check via X509Chain root match Walk Cert:\LocalMachine\My, build each cert's chain, look for chain element with thumbprint 27F0C9A22B28CE7687B115A29E31BF4B3ABB180F. That's the AESFMA.xml TrustedRootCA value = the GE Aerospace FreeRADIUS root that AESFMA EAP-TLS validates against. A client cert chained to that root is the SCEP-provisioned AESFMA machine cert. Combined with the verify-before-delete connect attempt, this gives two gates: 1. Cert deterministically exists + chains correctly 2. netsh wlan connect to AESFMA actually reports State=connected Only after both pass does INTERNETACCESS get deleted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:48:00 -04:00
cproudlock	f013aa2bff	Monitor: AESFMA verify-before-delete - keep INTERNETACCESS until cert ready Old gate (SCEP cert in LocalMachine\My with Client Auth EKU) was both too loose (matches non-AESFMA certs) and unable to verify the cert chains to GE's RADIUS root. INTERNETACCESS got deleted before AESFMA could actually authenticate, orphaning the bay. New flow: when Phase 1 essentials (AAD + Intune + EmTask + baseline) are complete, ATTEMPT netsh wlan connect AESFMA with INTERNETACCESS still up as fallback. Wait 8s, parse netsh wlan show interfaces for SSID=AESFMA + State=connected. Only delete INTERNETACCESS after operational verification. If AESFMA connect fails (cert not provisioned yet, RADIUS server unreachable, etc), keep INTERNETACCESS and retry next tick. Loop runs every 5s while DeviceIdReported is false, so the swap fires as soon as AESFMA is operationally viable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:46:19 -04:00
cproudlock	a9260ecadd	Monitor: 5s tight poll while DeviceId still missing DeviceId may not be in dsregcmd output the moment Monitor starts after PPKG reboot - takes a few minutes for AAD-join to settle. Default 30s PollSecs leaves wide gaps where Monitor isn't checking. Sleep 5s instead while DeviceIdReported is still false. Once captured + idx=7 push lands, falls back to PollSecs (30s) for the rest of the loop. Worst case for QR-on-dashboard latency: ~5 seconds after dsregcmd starts returning a DeviceId. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:08:12 -04:00
cproudlock	ab3e1c98f6	Monitor: fire idx=7 immediately on DeviceId capture (beat LAPS reboot) User constraint: GE-issued LAPS-prompt reboot lands ~1 minute after Report IP posts its log. Need the QR on the PXE dashboard BEFORE that reboot or the operator has no way to look up the device for LAPS retrieval. Previously idx=7 was gated on Phase 1 essentials (AAD + Intune enrolled + EmTask + baseline policies >=5). Those flips happen later than DeviceId capture (dsregcmd shows DeviceId the instant AAD-join completes during PPKG). Dropping the gate so idx=7 fires the moment the cache has a DeviceId. Phase 1 row on the on-bay Monitor display still has its own AESFMA-required gate for operational completeness; only the dashboard push is moved earlier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:06:02 -04:00
cproudlock	842ef88ccb	Monitor: gate WiFi swap on SCEP cert + Phase 1 done on AESFMA connected Two related fixes for the WiFi handoff timing: 1. WiFi swap (delete INTERNETACCESS + connect AESFMA) was firing on Phase 1 essentials being green (AAD + Intune + EmTask + baseline policies >=5). That signal flips ~minutes BEFORE the Intune SCEP machine cert actually lands in LocalMachine\My. Without the cert, AESFMA EAP-TLS auth fails and the bay has no path at all (we just deleted INTERNETACCESS). Stuck. New gate: walk Cert:\LocalMachine\My for any cert with Client Authentication EKU (1.3.6.1.5.5.7.3.2). When that's present, SCEP has delivered, AESFMA EAP-TLS will succeed. Swap then fires safely. 2. Phase 1 row on the on-bay Monitor display now ALSO requires AESFMA to be actively connected (parsed from netsh wlan show interfaces: SSID=AESFMA + State=connected). Phase 1 stays IN PROGRESS until the bay is operationally on corp WLAN, not just data-side enrolled. Matches user request "not complete phase 1 until AESFMA is ready". idx=7 dashboard push still fires on the original Phase 1 essentials gate so the QR appears as soon as Intune registers the device, independent of AESFMA join timing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:04:09 -04:00
cproudlock	a17b3fae6a	Retire wired-disable/re-enable dance now that PXE LAN is 172.16.9.0/24 GE Report IP filters Get-NetIPAddress on StartsWith("10.") and PXE LAN addresses are now 172.16.9.x which the filter skips naturally. The disable-then-re-enable workaround was only needed when PXE LAN was 10.9.100.x and bays leaked that IP to the GE webhook. With the renumber that whole flow is dead weight. Removed: - playbook/shopfloor-setup/Shopfloor/lib/Disable-WiredNics.ps1 (file) - Run-ShopfloorSetup: Disable-WiredNics call after PPKG returns - Run-ShopfloorSetup: "GE Re-enable Wired NICs" SYSTEM task registration - Monitor-IntuneProgress: reportIpLog-gated wired re-enable + idx=7 retry - Monitor-IntuneProgress: reportIpDone gate on Phase 1 done check Side benefit: stages 2-6 dashboard pushes no longer go dark mid-flow (used to die between idx=6 and idx=7 when wired was off). Phase 1 row on the Monitor screen now flips COMPLETE on the natural AAD + Intune + EmTask + baseline-policies condition instead of waiting on the Report IP log file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:45:54 -04:00
cproudlock	ce604adcda	Renumber PXE LAN from 10.9.100.0/24 to 172.16.9.0/24 Single-site bay-stuck issue at WJ: GE Intune Report IP script filters Get-NetIPAddress on StartsWith("10.") and posts everything matching to the GE Tines webhook. Bays at WJ get the PXE LAN 10.9.100.x IP captured and reported -> GE backend tags bays as on a non-corp 10.x subnet -> dynamic group eligibility for SFLD policy never matches. Other GE sites work because their PXE LANs aren't on 10.x at all. Renumber PXE LAN to RFC1918 172.16.9.0/24 so the GE filter naturally skips wired PXE addresses without any disable-NIC dance. Server-side already in flight (netplan dual-bound, dnsmasq scope + boot URL repointed, blancco preferences + grub.cfg + iPXE GetPxeScript all sed'd to 172.16.9.1). This commit is the playbook / scripts / docs side: 109 hits across 35 files sed'd in one shot. After this lands + boot.wim is rebuilt + bays renumber off DHCP, the 10.9.100.1 binding will be dropped from netplan as the final cleanup step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:30:32 -04:00
cproudlock	c6b249f866	Monitor: idx=7 push fires on Phase 1 essentials complete Pair with the INTERNETACCESS -> AESFMA WiFi-swap commit. Once AAD-joined + IntuneEnrolled + EmTaskExists + baseline policies all true AND DeviceId is captured, push idx=7 to PXE dashboard with the DeviceId immediately - don't wait for the Report IP log (which depends on AESFMA join + script timing). Side note: the legacy wired-NIC re-enable + reportIpLog-gated idx=7 push block earlier in Get-Phase1 still exists. Both paths guard on $script:cache.DeviceIdReported so only one fires, but that block is dead-ish under the new WiFi-swap flow (no wired disable -> no NIC state file -> re-enable block no-ops; Report IP log gate may still fire idx=7 if Phase 1 essentials haven't all flipped yet but Report IP did). Worth cleaning up next pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:24:18 -04:00
cproudlock	f404cd2892	Monitor: drop INTERNETACCESS WiFi + connect AESFMA on Phase 1 complete When Intune registration lands (AAD-joined + IntuneEnrolled + EnterpriseMgmt task present + baseline policies >=5), the bay is presumed to have its SCEP-provisioned machine cert in LocalMachine\My. At that point the INTERNETACCESS profile (172.16.x guest/internet WiFi) is no longer useful - it just keeps the bay on a non-corp range so Report IP can't find a 10.x to POST and the SFLD assignment filter never matches. Action: in Get-Phase1, once all four registration signals are green, fire 'netsh wlan delete profile name=INTERNETACCESS' then immediately 'netsh wlan connect name=AESFMA ssid=AESFMA'. Bay drops onto corp WLAN with EAP-TLS, picks up a 10.x lease, Report IP fires cleanly. One-shot per Monitor lifetime via $script:cache.InternetAccessDeleted flag. This is the alternative to pre-staging the AESFMA profile during imaging (which was reverted). AESFMA profile is assumed to exist already because Intune's WiFi config profile delivers it during the same enrollment that just completed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:22:40 -04:00
cproudlock	f8944fbc49	Revert AESFMA preinstall stage from Run-ShopfloorSetup User call: don't install AESFMA profiles during imaging preinstall. Removed the MA package copy + MA4NetworkConfigv2.bat invocation from Run-ShopfloorSetup line 43 area. .gitignore additions for profile XML patterns are kept - those are harmless safety net. PXE share's /srv/samba/enrollment/MachineAuth/ staging directory is left in place (not deleted) - no consumer references it after this revert. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:19:41 -04:00
cproudlock	a80bdd6923	Filtered Report IP shim - POSTs only WJ corp ranges to GE webhook GE's Intune-deployed ReportIPAddresses_v2.ps1 filters Get-NetIPAddress with $_.StartsWith("10.") - too broad for WJ where PXE LAN is 10.9.x. Can't modify the GE script (signed). Workaround: run our own POST to the same Tines webhook with a tight subnet filter, beating GE's script to the punch. New Invoke-FilteredReportIP.ps1 (lib/): - Walks Get-NetIPAddress -AddressFamily IPv4 - Filters strictly to 10.134.48.0/23 OR 10.48.249.0/26 (WJ corp) - POSTs to https://tines.apps.geaerospace.com/webhook/.../... with {host, fqdn, IP, force_update} body matching GE's payload shape - Local dedup via C:\ProgramData\GEA\FilteredReportIP\last-ip.txt - 6 retries with 10s backoff on transient HTTP error - Logs to C:\Logs\FilteredReportIP.log Monitor-IntuneProgress main loop calls it each tick until it succeeds once. After success, $filteredReportIpSucceeded flag short- circuits further attempts. If WJ later moves to a different VLAN, edit the $allowedRanges array in Invoke-FilteredReportIP.ps1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:16:39 -04:00
cproudlock	4dd300e7ab	Stage GE MachineAuth profiles at imaging time (AESFMA auto-join) Hypothesis test for WJ Phase 2 stuck issue. GE Report IP script filters Get-NetIPAddress on StartsWith("10.") - WJ bays don't see ANY 10.x because: - PXE LAN is 10.9.100.x (we'd disable wired anyway to avoid leak) - Internet WiFi at site is 172.16.x (filter rejects) - AESFMA corp WiFi (10.x) requires machine cert that Intune SCEP provisions a few minutes AFTER PPKG enrollment Result: Report IP webhook gets nothing -> GE backend never sees the bay -> bay never enters the dynamic group that SFLD policy is assigned to. Other GE sites work because their corp WiFi/wired is on a real 10.x corp network and the script always finds a 10.x to report. Drop the MA package (8021x.xml + AESFMA.xml + multi-NIC bat) onto each bay early in Run-ShopfloorSetup, run MA4NetworkConfigv2.bat to import both profiles to every physical wired + wireless adapter. AESFMA.xml patched to connectionMode=auto (default V02 was manual) so WLAN service auto-joins as soon as the SCEP cert lands. Bay gets a real 10.x corp address. Report IP webhook fires cleanly. Profile XMLs (8021x.xml, AESFMA.xml, BLUESSO.xml, WiFi-Profile.xml, .wlanprofile, .lanprofile) added to .gitignore - they contain GE-internal SSID + trusted-root thumbprint and are staged on the PXE enrollment share at /srv/samba/enrollment/MachineAuth/ instead of git. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:13:11 -04:00
cproudlock	86c7ffccd5	Monitor: bump post-reEnable settle 1s->5s + retry idx=7 push Field test: bay imaged end-to-end, Monitor saw Report IP log, captured DeviceId (Phase 1 went COMPLETE on screen + QR rendered from dsregcmd), but the idx=7 push to the PXE dashboard never landed before the Intune-triggered LAPS-prompt reboot. Root cause: Enable-NetAdapter + 1s sleep doesn't give Windows time to renew DHCP + populate routes before Send-PxeStatus POSTs to PXE webapp. Push silently caught (Send-PxeStatus has try/catch), next tick was 30s away, LAPS reboot fired in between. Two changes: 1. Sleep bumped 1s -> 5s after Enable-NetAdapter so wired path is actually carrying traffic before we POST. 2. When the tick that did the re-enable is also the push tick, retry Send-PxeStatus up to 6 times with 2s spacing (~12s total) instead of one-shot-then-give-up. Surfaces the warning to the transcript if all attempts fail so we can diagnose next time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 15:33:11 -04:00
cproudlock	0eb52c6a9e	imaging: copy button HTTP fallback (execCommand) navigator.clipboard.writeText is gated on isSecureContext - HTTPS or localhost only. PXE dashboard is served over plain HTTP (10.9.100.1:9009) so the API was undefined and the chain threw before .catch fired - user saw nothing. Wrap clipboard write in copyText() that prefers the modern API and falls back to the classic invisible-textarea + document.execCommand('copy') path which works on HTTP. Visual flash logic moved into flashCopied() for reuse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 18:30:16 -04:00
cproudlock	6275a6a2b3	imaging: add visual feedback to device-id copy button Click effect: button flashes green with "copied!" text and 1.15x scale pulse, reverts after 1.2s. Failure case (clipboard API blocked or HTTP context) shows red "failed" for 1.5s. Handler moved out of inline onclick into a single delegated click listener at the doc level so future copy buttons just need the .copy-btn class + data-copy-text attribute. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 18:29:00 -04:00
cproudlock	59dbd64e37	Fix Report IP glob (.LOG not .txt) + add device-id copy button Field bay surfaced two bugs in one diag dump (mdm-diag-F907T5X3 - 6PPSF24): 1. GE Proactive Remediation Report IP actually writes GE_Report_IP_Address_2_5.LOG (uppercase .LOG), not the .txt I assumed. Globs in two places had .txt filter -> never matched -> Phase 1 stuck IN PROGRESS forever even after the file landed and wired-NIC re-enable never fired. Drop extension from both globs in Monitor-IntuneProgress.ps1 (id=7 push gate + p1Done check). 2. The "GE Re-enable Wired NICs" SYSTEM task registered by Run-ShopfloorSetup was polling Autologon_Remediation.log for "Autologon set for ShopFloor" - a lockdown-time signal. Re-enable needs to fire at Report-IP time (well before lockdown) so that Monitor can push idx=7 with the QR before the Intune-triggered LAPS-prompt reboot. Repoint the SYSTEM task's poll to C:\Logs\GE_Report_IP_Address* (any extension). Plus minor UX: copy button next to the Intune device ID on /imaging dashboard so techs can grab the GUID without having to double-click-select the <code>. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 18:27:26 -04:00
cproudlock	7e1ea03f02	Decouple wired-NIC re-enable from DeviceId capture Previous logic bundled re-enable into the idx=7 DeviceId-push gate. If DeviceId hadn't been captured yet (AAD join lag, dsregcmd parse miss), re-enable never fired even though the Report IP log was already sitting at C:\Logs\GE_Report_IP_Address*.txt and the NIC state file was on disk. Split into two independent checks per tick: 1. Re-enable: triggered by (Report IP log) AND (NIC state file) only. 2. idx=7 push: still gated on (DeviceId) AND (Report IP log). Fixes case observed in field: file exists in C:\Logs but wired NICs stayed off and the bay couldn't reach the PXE dashboard for idx=7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 18:15:04 -04:00
cproudlock	2bfb2522c7	Phase 1 stays "in progress" until Report IP log appears Monitor on-screen Phase 1 row used to show COMPLETE the instant AAD join + Intune enroll + EmTask + baseline policies (>=15 subkeys) all hit. That's misleading: the bay isn't actually registration-clean until GE's Proactive Remediation Report IP script has fired on WiFi-only and dropped C:\Logs\GE_Report_IP_Address*.txt. Without that log, the SFLD ConfigurationProfile assignment filter still sees a leaked 10.9.100.x IP and Phase 2 won't unblock. Add reportIpDone to both the p1Done gate and the Get-PhaseStatus input list so the on-screen Intune Registration row stays IN PROGRESS until the file lands. Matches the dashboard side: idx=7 push is already gated on the same file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 18:11:36 -04:00
cproudlock	d87be4c40d	Move wired-disable from stage 2 to post-PPKG-return Push stages 2-6 to dashboard before going dark. Wired stays up through PPKG enrollment so all standard imaging progress lights up the dashboard card. Disable fires AFTER idx=6 push (handoff to Monitor PostPpkg) + BEFORE PostPpkg settle's Schedule #3 hammer + BEFORE the PPKG-driven reboot + BEFORE IME starts firing Report IP. Result: dashboard shows 2-6 cleanly, dark from 6 to 7, then catches up at 7 with QR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:28:58 -04:00
cproudlock	b8328171eb	Kill wired NICs post-stage-2 until Report IP log appears Recurring Phase 2 "Device Configuration" stuck: GE Intune Proactive Remediation "Report IP" script enumerates Get-NetIPAddress and POSTs all IPs to a GE webhook. Bays cabled to air-gapped PXE LAN have 10.9.100.x leak into that report. GE backend tags bays "not on corp net" -> dynamic-group assignment-filter at GE excludes them from the SFLD ConfigurationProfile (Function + SasToken OMA-URI) -> HKLM:\SOFTWARE\GE\SFLD\DSC never populates -> Monitor Phase 2 gate never closes. Confirmed via mdm-diag-F907T5X3 dump: every Microsoft policy delivered fine, zero SFLD/GE-namespace OMA-URI present. Fix flow: 1. Run-ShopfloorSetup line 43: disable every Up wired NIC right after stage 2 push. NIC names persisted to C:\Enrollment\disabled-wired-nics.txt for later re-enable. 2. Stages 3-6 status pushes fail silently while wired is down (PXE server lives on the air-gapped 10.9.100.0/24 LAN, unreachable from WiFi). Dashboard goes dark in that window. 3. PPKG installs, immediate reboot, AAD/Intune enroll over WiFi only. 4. IME boots, Report IP script fires with corp-WiFi IP only, writes C:\Logs\GE_Report_IP_Address*.txt. Webhook records clean IP. GE dynamic group eligibility flips. SFLD policy delivers next sync. 5. Monitor-IntuneProgress detects the log file, re-enables every NIC in the persisted list, sleeps 1s for link, then pushes idx=7 with DeviceId so the dashboard card flips to QR before the Intune- triggered LAPS-prompt reboot lands. Phase 1 remains "in progress" on the dashboard until Report IP fires - correct, the bay isn't actually registration-clean until then. Files: - Disable-WiredNics.ps1 (new) - persists names + disables - Run-ShopfloorSetup.ps1 - call after stage 2 Report-Stage - Monitor-IntuneProgress.ps1 - gate idx=7 push + re-enable Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:22:41 -04:00
cproudlock	b5a067bd48	Cut Post-PPKG settle from 180s to 60s Empirical evidence: MDM baseline policy push lands well within 60s after PPKG triggers immediate reboot path on bays where assignment filter matches. Bays where it doesn't deliver in 60s aren't going to deliver in 180s either - they're blocked on an assignment-filter or dynamic-group lag (sometimes 30+ min in GCC-High), not on the raw sync window. Trimming 120s of dead wait off every imaging cycle. Aggressive 30s Schedule #3 hammer + early-exit on baseline (>=5 subkeys) preserved - those still help bays that DO deliver fast. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:10:50 -04:00
cproudlock	44bbd23e4d	Monitor-IntuneProgress: auto-fire idx=8 on lockdown detection Local-user shopfloor fleet (ShopFloor is a LOCAL account) means AzureAdPrt stays NO, user-scoped Intune policies never deliver, and the natural completion gate (baseline policies >= 5 + DSCInstall success + Phase 4 wrappers) never closes. Dashboard sessions stuck at 7/8 / 87.5% forever even though the bay is functionally complete. Real-world definition of "done" for these bays is lockdown applied. Add a per-tick check in Get-Phase1 (alongside the DeviceId push) that detects either: * Winlogon DefaultUserName -like 'ShopFloor' AND AutoAdminLogon=1 OR C:\Enrollment\force-lockdown-applied.txt marker file When either is present and not yet pushed, fire Send-PxeStatus StageIndex=8 / StageTotal=8 / Status='succeeded' with the captured IntuneDeviceId for the dashboard QR. One-shot per session via the LockdownCompletePushed cache flag. No need for the operator to run mark-complete.bat anymore - Monitor's main loop will fire idx=8 within ~5s of force-lockdown or autologon flip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 15:47:23 -04:00
cproudlock	a8d38f6117	imaging: load Send-PxeStatus at script scope + bump QR size to 160px Monitor-IntuneProgress.ps1: the previous Ensure-SendPxeStatus function ran '. $lib' from inside the function body. PowerShell's dot-source- inside-function semantics put the imported Send-PxeStatus into the function's LOCAL scope, not the script scope. By the time Get-Phase1 called Get-Command Send-PxeStatus, the function had already returned and Send-PxeStatus was out of scope - silently never invoked, no log entry at all (success or failure). Diagnostic confirmed: bay had DeviceId in dsregcmd, manual Send-PxeStatus from operator prompt fired idx=7 cleanly with QR rendered, but Monitor's automatic call never showed up in C:\Logs\send-pxe-status.log. Fix: dot-source at script top-level (outside any function). Then Send-PxeStatus is in script scope where every function in the file can call it. Keep Ensure-SendPxeStatus as a no-op stub for any caller still invoking it. imaging.html: bump QR data-qr-size from 56 to 160 px. A 36-char UUID at ECC M needs ~29x29 modules; at 56px each module was ~1.5px which is too tight for a phone camera to lock onto from typical distance. 160 px gives ~5 px/module which scans cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 15:41:52 -04:00
cproudlock	320b241942	winpe-status-push: brace var names before colon (parser bug) PowerShell parses $var: as scope-namespaced syntax (e.g. $env:NAME, $global:foo). The line Log "server=$PxeServer:$Port pctype=$PCType" errored at line 26 col 13 - parser interpreted $PxeServer: as a scope prefix and bailed. Fix: use ${PxeServer}:${Port} so the colon is literal. The $uri line below already had the right form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 14:45:37 -04:00
cproudlock	2e8cf4b5be	Monitor-IntuneProgress: fix DeviceId capture gate DeviceId capture was nested inside the -not AzureAdJoined gate. Once AAD joined flipped true the block stopped running, but DeviceId only appears in dsregcmd output AFTER AzureAdJoined is set, so the capture never fired and Send-PxeStatus -IntuneDeviceId never pushed. Webapp session JSON missing intune_device_id field; /imaging card couldn't render the QR even though the bay-side Build-QRCodeText showed the QR correctly (it calls dsregcmd each render with no gate). Fix: change the gate condition so the dsregcmd call keeps running while EITHER AzureAdJoined OR DeviceId is still missing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 14:35:18 -04:00
cproudlock	85278e01bf	Run-ShopfloorSetup: fix Send-PxeStatus dot-source path Run-ShopfloorSetup.ps1 is copied by startnet.cmd to W:\Enrollment\ (root of Enrollment, NOT inside shopfloor-setup/). So $PSScriptRoot is W:\Enrollment\. The dot-source path was Join-Path $PSScriptRoot 'Shopfloor\lib\Send-PxeStatus.ps1' which resolves to W:\Enrollment\Shopfloor\lib\Send-PxeStatus.ps1 - that path does not exist. The actual file lands at W:\Enrollment\shopfloor-setup\Shopfloor\lib\ Send-PxeStatus.ps1 (xcopied by startnet from the Shopfloor share dir into the shopfloor-setup\ subdir). Test-Path returned false, dot-source silently skipped, Send-PxeStatus was never defined, every Report-Stage call no-op'd, no log file was written, no POSTs reached the dashboard. Symptom: bay reaches Windows desktop + runs Run-ShopfloorSetup but never appears on /imaging dashboard. C:\Logs\send-pxe-status.log does not exist on the bay. Fix: add the missing 'shopfloor-setup\' segment so the path resolves to the actual file location. 09-Setup-*.ps1 use a different relative path ('..\Shopfloor\lib\...') from inside the per-type dir and were unaffected. Monitor-IntuneProgress sits in Shopfloor\lib already and uses a sibling lookup - also fine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 14:24:36 -04:00
cproudlock	a57ed5fd96	winpe: externalize WinPE-phase status push to scripts/winpe-status-push.ps1 The inline one-liner in startnet.cmd called Get-NetAdapter, which is not available in WinPE's stripped PowerShell (no NetTCPIP module). Errors silently swallowed by the surrounding try/catch - POST never fired, dashboard never showed bays during the WIM-apply phase. Externalize to a standalone .ps1 on the enrollment share: * Uses wmic (always present in WinPE 10+) for both serial AND mac instead of Get-CimInstance / Get-NetAdapter. * Logs every step to X:\Windows\Temp\winpe-status-push.log so a future "POST didn't fire" debug is one file read away. * startnet.cmd now just runs powershell -File Y:\scripts\winpe-status- push.ps1. Future edits to the push logic do NOT require a boot.wim rebuild; just edit the .ps1 on the share. Mirror the existing pattern for run-enrollment.ps1 / wait-for-internet.ps1 / migrate-to-wifi.ps1 (all already at /srv/samba/enrollment/scripts/). Add the new file to the playbook's enrollment-scripts copy loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 14:05:50 -04:00
cproudlock	1e21a54a41	imaging: idx=8 completion + Send-PxeStatus success+failure logging Two related changes so the /imaging dashboard reaches 100% and so the operator can see why POSTs are not arriving when a session stalls. Monitor-IntuneProgress.ps1: * After sync-complete.txt is written (DSC + lockdown done) fire a final Send-PxeStatus -StageIndex 8 -StageTotal 8 -Status 'succeeded' + IntuneDeviceId. Previously the script exited without any final status push, so even a perfect run capped at idx=7 / 87.5%. The session now reaches 8/8 / 100% green when imaging actually finishes. Send-PxeStatus.ps1: * Log EVERY POST attempt (both success and failure) to C:\Logs\ send-pxe-status.log with idx, status, stage name, and either the HTTP code on success or the exception message on failure. Was previously silent-on-success, log-on-failure. Operator can now correlate dashboard state to actual outbound activity: OK idx=2/8 status=in_progress http=200 stage='Run-ShopfloorSetup: starting' ERR idx=2/8 status=in_progress uri=http://10.9.100.1:9009/... err=Unable to connect * Errors still swallowed - imaging never blocks on a failed status push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 13:32:33 -04:00
cproudlock	6f88075e98	ntlars: add _ntlars-backups-manual/ sibling for operator NTLARS Load The existing _ntlars-backups/.reg files are used by the automated imaging pipeline. They are REGEDIT5 UTF-16 with explicit WOW6432Node in the key path. That works for automated reg-import during install but the NTLARS Load... button in the NTLARS settings dialog rejects them - NTLARS expects: REGEDIT4 ANSI / CRLF / no BOM * Bare path: HKLM\SOFTWARE\GE Aircraft Engines\DNC\... (no \WOW6432Node\ - NTLARS is 32-bit, Windows redirector handles the mapping transparently when NTLARS reads/writes) * No semicolon comment header This commit generates the parallel 147-file tree at playbook/shopfloor-setup/_ntlars-backups-manual/ derived from the existing _ntlars-backups/ files. Content (FMSHostPrimary + FMSHostSecondary edits from `df443d5` + `802d85e`) is preserved; only format and path are transformed. Both trees coexist - automation continues to pull from _ntlars-backups/, operators use _ntlars- backups-manual/ when they need to Load... a bay's reg manually. Also live at /srv/samba/enrollment/shopfloor-setup/_ntlars-backups- manual/ on the PXE server (\\10.9.100.1\enrollment) for share-based access by the operator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 13:08:27 -04:00
cproudlock	ae037d0f49	Revert "migrate-to-wifi: restore wired-disable behavior" This reverts commit `2b730969dd`.	2026-05-13 12:29:26 -04:00
cproudlock	2b730969dd	migrate-to-wifi: restore wired-disable behavior Reverts the 2026-04-24 no-op stub. Empirically the gateway-suppression fix (dnsmasq dhcp-option=3/=6 empty) alone is NOT sufficient to keep Windows from using the wired NIC for Intune Device Configuration / DSC traffic. Even with no default route on wired AND the unattend's InterfaceMetric trick (WiFi=10, Wired=100), the bay stalls on the DSC phase until the wired cable is physically unplugged. Restoring the prior body that disables non-WiFi NICs at first logon post-PPKG. Gated on Get-NetAdapter for a Wi-Fi/Wireless/WLAN/802.11 adapter - towers without WiFi stay on ethernet (the only-NIC scenario where disabling would hang first logon). Falls back to re-enabling ethernet if login.microsoftonline.us:443 doesn't respond within 5 min. Monitor-IntuneProgress.ps1 already has the symmetric re-enable ("Re-enable any wired NICs that Order 5 disabled") at the start of its monitor loop, which kicks in after DSC creds land. Net effect: wired disabled during the DSC fetch window, re-enabled by the time eDNC autostart needs the local NIC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 12:26:43 -04:00
cproudlock	c2f7285090	keyence: patch VR-6000 MSI to disable dpinst CustomAction VM-test-confirmed root cause for the imaging "Setup wizard prompt" hang. The bundled MSI has CustomAction caDriverInstall_x64 that invokes dpinst.exe at install time: caDriverInstall_x64 type=3666 (exe, deferred) Source=dpinst.exe Condition: (Not Installed) And VersionNT64 And (VersionNT>=601) Sequence: 6505 The bundled dpinst.xml is minimal (<dpInst><legacyMode/></dpInst>) - no <quietInstall/> directive - so dpinst pops its wizard and waits for operator click-through. /qn on the parent MSI does NOT propagate into the dpinst child process. pnputil pre-staging the INF + cert pre-trust to TrustedPublisher does NOT prevent the CA from firing (the CA runs unconditionally on first install regardless of DriverStore presence). Fix: msibuild patch the MSI's InstallExecuteSequence to set the action's Condition column from "(Not Installed) And VersionNT64 And (VersionNT>=601)" to "0" which evaluates false on every install attempt - the action never fires, dpinst never runs, no wizard pops. The driver itself is now installed exclusively by: 1. our pnputil pre-stage in 09-Setup-Keyence.ps1 (already there), 2. the manifest's separate "KEYENCE VR Series USB Driver" INF entry. End-to-end VM test: 36s, exit 0, VR-6000 DisplayVersion 4.3.7 detected, zero dpinst processes at finish. MSI size 1751552 -> 1744896 bytes (msibuild table rewrite). md5: c6edcfc6c6808617598bcb7a15072a30. Backups of original MSI on live PXE server enrollment share + local SFLD share mirror as .bak-<TS>-orig. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 12:16:13 -04:00
cproudlock	8cd0c147d8	imaging: renumber stages to be time-monotonic (1=WinPE, 7=Intune ID) Previously the stage indices reflected logical milestones but not the order they fire in. Run-ShopfloorSetup posted idx=1 (start) and idx=4 (PPKG) - but 09-Setup-Keyence (inside per-type loop) ran BETWEEN them and posted idx=5/6. The dashboard then "regressed" from 6 back to 4 when PPKG fired, making it look stuck at the per-type-complete card. New numbering matches actual execution order: 1 - WinPE: PESetup / WIM apply (startnet.cmd) 2 - Run-ShopfloorSetup: starting (Run-ShopfloorSetup.ps1) 3 - 09-Setup-<Type>: starting (per-type) 4 - 09-Setup-<Type>: complete (per-type) 5 - Run-ShopfloorSetup: PPKG enrollment (Run-ShopfloorSetup.ps1) 6 - Run-ShopfloorSetup: handoff to Monitor (Run-ShopfloorSetup.ps1) 7 - Monitor-IntuneProgress: Intune Device ID captured services/imaging_status.py rewind threshold reverts to stage_index <= 1 now that WinPE startnet posts idx=1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:34:01 -04:00
cproudlock	e3f523eedd	webapp/imaging: bump rewind threshold to stage_index <= 2 Reset trigger previously fired only when a new POST landed at idx <= 1, which meant a reimage didn't reset the dashboard card until Run-ShopfloorSetup ran post-PPKG (~10-20 min in). With the WinPE-phase status push from startnet.cmd in commit `4e018fe` firing at idx=2, that earlier signal needs to count as a new-run marker too. Threshold of 2 makes startnet.cmd the canonical reset point: within seconds of PXE menu choice on the bay, the dashboard card flips from the previous run's high-idx state back to "WinPE: PESetup / WIM apply" + fresh started_at. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:29:34 -04:00
cproudlock	6de19fd250	webapp/reports: trim list to Serial/Model/Date/Result Drops the filename, type, and size columns from the Blancco Reports list - operators want bay-identification fields, not file metadata. Filename moves to a row hover tooltip (title attribute) so it is still recoverable for ad-hoc lookups. Adds a Result column derived from each XML report's overall erasure state: * Successful -> green badge (all erasure entries report Successful) * Failed -> red badge (any erasure entry reports a non-Successful state) * other -> grey badge with the verbatim state * blank/non-XML -> dash The state roll-up lives in the blancco_reports route's per-file parse loop next to the existing serial/model extraction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:13:09 -04:00
cproudlock	4e018feaa0	webapp/imaging: rewind detection + WinPE-phase status push services/imaging_status.py - if a new POST arrives with stage_index <= 1 that is lower than the cached stage_index, OR the previous run already finished (status=succeeded\|failed), reset the session: clear log_tail, mint a fresh started_at, drop the status field so the in_progress default re-applies. Preserves serial + records the previous run's last_updated under previous_run_at for audit. Without this, a reimage on the same bay would leave a stale 6/8 "succeeded" card visible until the new run progressed past that index. playbook/startnet.cmd - one-line PowerShell POST after the PXE menu choice + enrollment-share mount, before PESetup.exe waits to start. Captures BIOS serial via wmic, MAC via Get-NetAdapter, and posts: stage_index=2, current_stage="WinPE: PESetup / WIM apply". Best-effort; try/catch swallows any network failure so a missing webapp never blocks imaging. PXE clients will now appear on the /imaging dashboard during WinPE phase instead of only post-PPKG. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:11:03 -04:00
cproudlock	908b668bde	shopfloor: instrument 09-Setup-CMM, Common, Heattreat with Send-PxeStatus Wires the imaging-progress helper into the three PC-type setup scripts that were either clean (CMM) or untracked (Common, Heattreat). Each gains two calls per the pattern committed for Keyence in `9122b28`: * idx 5/8 - "09-Setup-<Type>: starting" right after the session start banner * idx 6/8 - "09-Setup-<Type>: complete" just before the completion banner Display, Genspect, and WaxAndTrace also got the same two-line additions locally and on the live server, but those files have pre-existing WIP edits intermixed so they aren't staged here. They'll travel along when the operator commits their unrelated shopfloor work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 10:17:24 -04:00
cproudlock	9122b28c31	webapp: imaging progress dashboard + serial column on reports list Adds end-to-end progress tracking for PXE imaging sessions and surfaces each Blancco report's BIOS serial in the report list. webapp: * services/imaging_status.py - JSON-per-serial state store under IMAGING_DIR (default /var/log/pxe-imaging). Atomic write via tempfile + rename. log_tail capped at 50 lines. Merges partial updates so clients can post just the current_stage tick. * config.py - new IMAGING_DIR env-overridable path. * services/csrf.py - explicit exempt list for machine-to-machine endpoints; /imaging/status is the first entry. Air-gapped LAN; trust-by-network for client posts. * app.py - four new routes: GET /imaging dashboard (renders all sessions) POST /imaging/status client status push (JSON body) GET /imaging/<serial>.json raw session JSON for ad-hoc polling POST /imaging/delete/<s> clear a session from the dashboard Also parses each Blancco XML in the /reports list to surface system.serial + system.model columns. * templates/imaging.html - Bootstrap dashboard with per-session cards (state badge, progress bar, stage idx/total, mac, elapsed, log tail). meta http-equiv refresh=5 for auto-tick. * templates/base.html - new "Imaging Progress" nav entry. * templates/reports.html - Serial + Model columns added. playbook: * shopfloor-setup/Shopfloor/lib/Send-PxeStatus.ps1 - new helper. Dot-source this then call Send-PxeStatus -Stage X -StageIndex N -StageTotal M from any stage script. BIOS serial via CIM, MAC via Get-NetAdapter, pctype + machinenumber from C:\Enrollment. Failures are swallowed to a local log so a network blip doesn't block imaging. * shopfloor-setup/Run-ShopfloorSetup.ps1 - dot-sources helper + posts at three coarse milestones (start, PPKG enrollment, handoff to Monitor-IntuneProgress). * shopfloor-setup/gea-shopfloor-keyence/09-Setup-Keyence.ps1 - posts at session start + after Install-FromManifest with succeeded/failed status derived from $rc. Other 09-Setup-*.ps1 scripts can follow the same pattern. ID is BIOS serial (stable across WinPE -> Windows transition and across reboots, unlike hostname which is random pre-PPKG). Operator already knows the serial of the bay they imaged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 10:07:18 -04:00
cproudlock	1d3f21f814	keyence: document Data1.cab staging requirement, gitignore the 560 MB cab VR-6000 Series Software.msi is an InstallShield MSI that references Data1.cab in the same directory for its compressed payload. The cab was never staged into the repo's keyence installers/ dir, so msiexec exited 1603 with "SECREPAIR: Failed to open the file ... Data1.cab" on every imaging run (see Logs/Keyence/install.log on a failed bay for the canonical signature). Only the 1.75 MB MSI was committed; the 560 MB cab lives on the GE-Enforce SFLD share at tsgwp00525\sfld$\v2\shared\dt\shopfloor\gea-shopfloor-keyence\apps\. This commit doesn't add the cab itself (560 MB; same gitignore convention as PrinterInstallerMap.exe and other large binaries). Instead it pins the staging requirement in two places: * .gitignore: explicit entry with the SFLD share path so a future operator wiring up a fresh PXE server build knows where to source it. * keyence-manifest.json _comment: documents the dependency next to the MSI declaration that needs it. The local repo at /home/camp/projects/pxe now has the cab staged in playbook/shopfloor-setup/gea-shopfloor-keyence/installers/ for the next USB build. Rebuilding the Keyence image and re-imaging the failed bay should now reach DisplayVersion 4.3.7 detection successfully. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 09:36:52 -04:00
cproudlock	974accf98a	blancco: fix silent prefs fallback, suspend trap, display blank + add View End-to-end fixes for Blancco Drive Eraser PXE flow uncovered by chasing "reports never reach SMB share" across two air-gapped sites: playbook/blancco-init.sh: * Drop silent \|\| true on wget of preferences.xml + config.xml. Fail loud with shell-drop if download or marker grep fails. Background: airootfs /opt/scripts/validate_preferences.sh restores /albus/preferences.save (factory defaults, empty network_share) if xmllint fails. wget failure made every report silently land nowhere. * Clobber /albus/preferences.save with the same served file so even if the validator fallback fires, the SMB target survives. * Bind-mount /dev/null over /sys/power/{state,disk,mem_sleep,autosleep} before switch_root. Albus's license-retry path writes /sys/power/state directly (bypassing systemd targets); this is the last-line block. * /dev/null symlinks for sleep/suspend/hibernate systemd targets in the airootfs overlay + logind drop-in with IdleAction/Handle=ignore. Three independent layers because cmdline systemd.mask alone is bypassed by direct /sys/power/state writes. xinitrc.d/00-no-screen-blank.sh runs xset s off -dpms + setterm -blank 0 -powerdown 0 so the Blancco GUI doesn't blank during long erasures. * Removed the 20-failsafeDriver.conf "modesetting" pin. modesetting needs DRM/KMS which we disable on kernel cmdline; "vesa" also failed on NVIDIA. With the pin gone Xorg auto-picks fbdev which uses the kernel framebuffer from vga=normal - works across Intel, AMD, and older NVIDIA without nouveau. playbook/pxe_server_setup.yml: * dnsmasq.conf: explicit empty-value dhcp-option=3 + dhcp-option=6. Without them, dnsmasq defaults to sending its own IP as router AND DNS. Commenting the configured-value lines did NOT disable the push (root cause of "wired keeps picking up 10.9.100.1 as gateway"). * Split the Blancco config.img extraction and preferences.xml deploy into separate tasks. The previous shell-with-creates: gate caused playbook re-runs to skip the prefs deploy entirely after first run. * Added a validation task that runs python3 xml parse + grep on the deployed preferences.xml to fail the playbook at deploy time if the SMB markers are missing. * Added Environment=TZ=America/New_York to the pxe-webapp systemd service so report mtimes and audit log render in Eastern time even if the Python process is started before timedatectl converges. webapp: * services/blancco_report.py: parse Blancco's XML report format (recursive <entries name="..."> walker) into a friendly dict. * templates/report_view.html: Bootstrap "Drive Erasure Certificate" layout - hero summary, customer + system cards, per-drive cards with step-by-step erasure timeline, document signing footer with integrity hash detail. * /reports/view/<filename> route + View button on the reports list (XML reports only; PDFs still download). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 07:38:54 -04:00
cproudlock	adc8d50e66	pxe: arch-aware NBP + undionly.kpxe for legacy BIOS clients Legacy-BIOS PXE clients booting Blancco reported "NBP is too big to fit in free base memory". Cause: dnsmasq unconditionally served ipxe.efi (~675KB EFI binary) which legacy BIOS PXE ROMs cannot execute and which exceeds their NBP cap. Fix: - Add undionly.kpxe (~70KB BIOS-mode iPXE, from boot.ipxe.org). - dnsmasq: dhcp-match on option:client-arch,0 (BIOS) -> undionly.kpxe; default (everything else, including UEFI x86_64 arch 7 and 9) keeps getting ipxe.efi. Tag form is reversible: if the match fails to evaluate, fallback is the working EFI path, not the new binary. - Ansible TFTP-copy loop: mirror undionly.kpxe alongside ipxe.efi. - .gitignore exception: track the open-source kpxe binary so the air-gapped USB build stays self-contained. UEFI clients unchanged. Blancco/Clonezilla/WinPE chain after the iPXE menu is identical regardless of which iPXE variant delivered it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:13:44 -04:00

1 2 3 4 5 ...

280 Commits