Commit Graph

223 Commits

Author SHA1 Message Date
cproudlock
4015adeb33 utilities/waxtrace-recovery: ship cal diagnostic + repair scripts
Pair of operational tools used when a wax/trace bay's cal apply fails
(218-378-13 series cal Setup.exe crash, or any future variant). Were
living in /home/camp/pxe-images/ on the workstation; promoting to the
repo so they ship with the codebase, get version-controlled, and can
be pushed onto each PXE server's enrollment share via the standard
sync flow.

debug-waxtrace-cal.ps1 (+ .bat launcher):
- 9-section forensic walkthrough that runs on the bay as admin.
- Autodetects FormTracePak install location, dumps data/ dir contents,
  finds + mounts the per-asset cal ISO, lists its contents, checks the
  on-disk 09-Setup-WaxAndTrace.ps1 for the direct-copy bypass marker,
  greps the imaging-time log for cal lines, pulls the last 24h of
  .NET Runtime / Application Error / WER events related to Setup.exe.
- Output: C:\Logs\WaxTrace\debug-waxtrace-cal.log

fix-waxtrace-cal.ps1 (+ .bat launcher):
- Idempotent recovery: mounts the bay's cal ISO, unconditionally copies
  data\* into FormTracePak's data dir, renames any filename containing
  ' _' (space-underscore) to drop the embedded space, clears read-only,
  dismounts. Works on both 218-378-13 (broken filenames) and 218-458A
  (clean filenames) since the rename is a no-op when no space is
  present. Bypasses the buggy vendor cal Setup.exe entirely.
- Output: C:\Logs\WaxTrace\fix-waxtrace-cal.log

Both already pushed to \\172.16.9.1\enrollment\tools\ on both PXE
servers earlier today; this commit lands them in the repo as the
source of truth so future PXE server builds + ad-hoc rsyncs pick
them up automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:10:23 -04:00
cproudlock
de3018512a 09-Setup-WaxAndTrace: fix broken-filename detection (Get-ChildItem -Filter)
The direct-copy bypass added earlier was never firing - Get-ChildItem -Filter
uses Win32 filename filtering and does NOT honor PowerShell wildcards or
character classes. The previous detection filter '*[0-9] _*.txt' matched
literal bracket-zero-through-nine text, which never appears in any
filename. $hasBrokenFilenames was therefore always False, and every
218-378-13 series cal apply fell through to the vendor setup.exe which
crashes with System.ArgumentException (exit -532462766).

Confirmed via debug-waxtrace-cal.ps1 log on WJF00159: section 6 reports
the on-disk script has the direct-copy fix, section 7 shows the actual
runtime log line 'running cal Setup.exe' followed by exit -532462766.
The "fix" was never executing because the gate was broken.

Replace -Filter with Get-ChildItem -File + Where-Object regex match on
' _\d+\.txt$' which catches the actual buggy filename pattern (space-
underscore-digits-.txt at end of name) regardless of probe series.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:09:10 -04:00
cproudlock
7f93347f74 CMM: park DODA entry under _pending_doda_entry until binary arrives
Removed the placeholder DODA entry from Applications so a bay imaged
with the 'With DODA' submenu choice today does not log a 'Installer
not found: DODA-PLACEHOLDER.exe' error per cycle. Wiring is otherwise
unchanged: startnet.cmd still offers the With-DODA submenu, pc-subtype.txt
is still written as 'doda', and 09-Setup-CMM.ps1 still passes PCSubType
through to Install-FromManifest. When the DODA installer is sourced,
move the entry from _pending_doda_entry back into Applications (engine
ignores any top-level field other than Version + Applications).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:59:05 -04:00
cproudlock
548d85fed5 CMM: subtype gating + relax exact-version + conditional cleanup + DODA placeholder
Carries over the lessons learned from wax/trace + Keyence imaging today
and threads the same pattern through the CMM path.

09-Setup-CMM.ps1:
- Pass PCType + PCSubType to Install-FromManifest so the manifest's
  per-entry PCTypes filter is honored. Without this every entry runs
  regardless of bay variant - the same bug Keyence had before per-model
  gating was added.
- Move bootstrap cleanup to a conditional that only deletes
  C:\CMM-Install once every (filter-applicable) manifest entry detects
  as installed. If a Hexagon installer forces an unplanned reboot
  mid-install, the new Run-ShopfloorSetup self-resume RunOnce fires on
  the next auto-login; the staging dir needs to still be on disk for
  the re-run to recover. Logs "retained ... not all entries installed
  yet - will retry on next self-resumed run" when partial.

cmm-manifest.json:
- Drop exact DetectionValue from PC-DMIS 2016, PC-DMIS 2019 R2, CLM
  1.8.73, and goCMM. Detection is now uninstall-key presence only, so
  a Hexagon security patch that bumps the DisplayVersion does not
  trigger a re-install loop with exit 1638 every GE-Enforce cycle.
  Bumping the installer in apps/ is the upgrade path - manifest engine
  detection should not also be a version drift catcher for vendor MSIs
  whose backward-compat is established by the vendor.
- Specific to goCMM: the installer filename version (1.1.6718.31289)
  does not match what the installer registers under its uninstall key.
  Dropping DetectionValue silences the false-mismatch loop the prior
  version would have triggered.
- Add DODA placeholder entry gated to PCTypes=["cmm-doda"]. Real
  Installer filename, args, and DetectionPath still TODO once the
  DODA binary is sourced + dropped at installers-post/cmm/.

startnet.cmd:
- Add :cmm_submenu after the user picks gea-shopfloor-cmm from the
  main menu. Two options: Standard (default PC-DMIS + CLM + goCMM
  + Protect Viewer) or With DODA. Mirrors :keyence_submenu pattern.
- Write CMMVARIANT to W:\Enrollment\pc-subtype.txt so Install-FromManifest
  on the bay can apply the PCTypes filter against gea-shopfloor-cmm-doda.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:48:16 -04:00
cproudlock
45f39fd431 startnet.cmd: suppress 'System error 85' on duplicate Y: mount
The :prompt_waxtrace_asset picker block maps Y: to the enrollment share
so select-waxtrace-asset.ps1 can read INDEX.csv. The later :skip_machinenum
block also runs `net use Y: \\172.16.9.1\enrollment ...` to keep the share
mounted through the rest of imaging, but Y: is already mapped at that
point so the second net use throws "System error 85: The local device
name is already in use". Harmless (script proceeds and "the command
completed successfully" prints right after) but visible noise to the
operator during PXE imaging.

Gate the second net use behind `if exist Y:\` so we only map when Y: is
not already mounted. Same end state, no error message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:38:34 -04:00
cproudlock
b86b830568 Run-ShopfloorSetup: self-resume RunOnce + top up AutoLogonCount
Imaging chain stalled on WJF00159 after FormTracePak Setup.exe forced
a reboot: SupportUser auto-logged in fine, briefly flashed something
(an HKLM\Run logon hook), then idle - no resume of Run-ShopfloorSetup.
Confirmed via diag-dispatcher.ps1: bay had AutoLogon working, RunOnce
empty, no Stage-Dispatcher.ps1 on disk, no setup-stage.txt.

Root cause: Run-ShopfloorSetup launches once from unattend XML's
FirstLogonCommands and has no self-resume mechanism. If anything cuts
it off mid-flight (FormTracePak Setup, eDNC MSI, Oracle install with
forced reboot, etc) the chain dies and nothing brings it back.
Stage-Dispatcher.ps1 in the repo is academic infrastructure that was
never wired into the live flow - startnet.cmd does not stage it and
nothing creates setup-stage.txt.

Fix: have Run-ShopfloorSetup register ITSELF as RunOnce at the top of
the script. The script is idempotent throughout (detection checks
skip already-done work) so re-entry post-reboot picks up cleanly.
Normal completion path removes the RunOnce so it does not re-fire
after the planned end-of-script reboot.

Also top up AutoLogonCount to 10 at script start. The unattend XML's
LogonCount=7 budget gets consumed across typical imaging reboots
(Office, Oracle, FormTracePak, Run-ShopfloorSetup explicit, sync-intune)
and an unplanned FormTracePak forced reboot pushes the counter past 0,
clearing AutoAdminLogon and parking the bay at the login screen.
Restoring the budget every Run-ShopfloorSetup entry keeps SupportUser
auto-logging in across any number of forced reboots until normal
completion. Lockdown's Autologon.exe sets its own AutoAdminLogon for
the ShopFloor user when it runs, so the post-completion natural
decrement to 0 does not affect the final state.

Verified via diag-dispatcher.ps1 capture on WJF00159 today - that bay
had AutoLogonCount=4 and no Stage-Dispatcher.ps1 on disk, which both
match this root cause + fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:17:07 -04:00
cproudlock
a22d2f0313 Stage-Dispatcher: top up AutoLogonCount during shopfloor-setup stage
Unattend XML sets LogonCount=7 for SupportUser autologon. Each interactive
login decrements the counter; at 0 Windows clears AutoAdminLogon and the
bay parks at the login screen with no one to fire RunOnce -> dispatcher
never re-runs -> imaging chain stalls.

Typical imaging burns several logons: Windows OOBE first logon,
post-Office reboot, Oracle install reboot, FormTracePak Setup.exe forced
reboot, Run-ShopfloorSetup's own shutdown /r at end of script, stage
advances. The unplanned FormTracePak reboot pushes the count past 0 on
some bays - the exact failure mode the WJF00159 imaging today hit, where
Stage-Dispatcher.ps1 had the new defensive RunOnce-re-register fix
landed but the dispatcher still never re-fired because there was no
auto-logon to trigger it.

Top up AutoLogonCount to 10 every time the dispatcher hits the
'shopfloor-setup' stage. Restores the autologon budget across any
vendor-forced reboots that fire during the stage. When sync-intune
finishes the whole pipeline, AutoLogonCount is left to decrement
naturally; by then lockdown's Autologon.exe has set its own
AutoAdminLogon for the ShopFloor user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:59:22 -04:00
cproudlock
44554b95b0 gea-shopfloor-waxtrace: stage Mitutoyo Backup/Install/Export triad
User received three PowerShell scripts from a Mitutoyo source for
backing up + restoring FormTracePak settings per asset:

  Export-FormtracepakInventory.ps1 - audit: enumerate files + reg keys
  Backup-FormtracepakSettings.ps1  - capture: config + data + reg into
                                     timestamped ZIP, manifest-driven
  Install-FormtracepakSettings.ps1 - restore: replay ZIP to a new bay,
                                     hash-skip identicals, backup
                                     existing as .pre_restore_bak

Cleanup pass over the vendor-shipped versions:
- Strip Unicode box-drawing characters from banners (ASCII-only policy)
- Install: switch to [ordered]@{} for DefaultAppTargets/DefaultDataTargets
  so fallback priority is deterministic
- Install: add -AssetNumber gate that defaults to per-asset SFLD path
  \\tsgwp00525...\Shopfloor\backup\formtracepac\<AssetNumber>
- Install: timestamp the .pre_restore_bak filename so re-runs don't
  clobber the previous backup
- Install: handle BackupPath being a directory containing
  formtracepak_backup_*.zip files (picks newest)
- .bat launchers for each PS1 (bypass execution policy, double-click)

Not yet wired into 09-Setup-WaxAndTrace.ps1; pending reference-backup
capture from a known-good bay before promoting to imaging path. Today
the V6.213 vendor MSI install + per-asset cal ISO still handle the
imaging-time setup directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:39:24 -04:00
cproudlock
27045d5e4a gea-shopfloor-collections: controller NIC auto-IP + credential break-glass
04-SetControllerNicIP.ps1 (imaging-time, runs once via Run-ShopfloorSetup):
- Finds the Realtek physical Ethernet adapter (controller NIC on every
  collections bay; corp LAN is Intel)
- Skips any candidate with a DHCP default gateway (that one is the corp
  LAN, not the controller)
- Skips any candidate already on 192.168.1.2
- Sets static 192.168.1.2/24, no gateway, clears DNS - matches the
  manual procedure documented in post-deploy-debug-flowchart.md section 2B
- Refuses to guess when multiple Realtek NICs remain ambiguous
- Imaging-time only, not enforced via GE-Enforce so the tech can override
  on a specific bay if needed without the drift-catcher reverting

Set-ControllerCredential.ps1 + manifest-entry-controller-credential.json:
- Break-glass cmdkey /add for the controller SMB share (\\192.168.1.1\md1
  used by DNC). Scoped to the 12 Okuma LOC650 machine numbers (3201-3212).
- Manifest entry is detection-less so it runs every enforce cycle if the
  script is armed (.ps1 extension); disarmed by default (.ps1.bak on the
  share) so a coach can rename when a bay loses its credential without
  the enforcer overwriting per-bay deviations between events.
- Smoke-tested end-to-end on win11 VM via QGA: SYSTEM context cmdkey /add
  succeeds, cmdkey /list shows the entry. DNC service runs as LocalSystem
  so SYSTEM vault is the right target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:38:59 -04:00
cproudlock
1f60c86ec8 Shopfloor: auto-register 9999-placeholder machine number prompt
If a bay is imaged with the 9999 placeholder (tech leaves the WinPE
prompt blank or types 9999), the lockdown+auto-login chain ends up at
the ShopFloor user with no real machine number. We had Check-MachineNumber.ps1
written - InputBox + Update-MachineNumber pulls per-machine NTLARS .reg
+ udc_settings_<N>.json from the SFLD share - but it only got registered
when a tech manually ran Configure-PC + toggled item 6. Fresh 9999 bays
never got the prompt, leaving the bay stuck on placeholder values until
someone noticed.

New Register-CheckMachineNumberTask.ps1 auto-registers the logon task
at imaging time. Gated on C:\Enrollment\machine-number.txt == 9999;
bays imaged with a real number never get the task (and any stale task
from a prior 9999-imaging on the same disk is cleaned up).

Wired into Run-ShopfloorSetup.ps1 right after the S: drive logon mapper
register. Skipped for self-contained types (display kiosks have no
machine number).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:38:48 -04:00
cproudlock
44d2f0afd5 Stage-Dispatcher: re-register RunOnce BEFORE calling Run-ShopfloorSetup
Setup chains we do not control (FormTracePak Setup.exe, eDNC MSI, any
vendor installer that forces an immediate reboot) cut & $script off
mid-flight. Without this, the dispatcher never returns from the call
and the post-call Register-NextRun never fires, leaving the next boot
with no RunOnce + a stalled imaging chain. Observed today on WJF00159
where the FormTracePak v6.213 Setup.exe rebooted the bay before the
dispatcher could advance the stage.

Register the dispatcher's own RunOnce defensively before invoking the
sub-script. If a reboot interrupts the call, the next boot re-fires the
same dispatcher, which re-reads the still-'shopfloor-setup' stage file,
re-runs Run-ShopfloorSetup (every step is idempotent + detects already-
installed state), and converges. The existing post-call Register-NextRun
+ stage advance still run on the happy path - cheap, idempotent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:38:39 -04:00
cproudlock
5891a1966f Wax/Trace: heal 218-378-13 cal disc filename bug + VC++ 2017 + picker
09-Setup-WaxAndTrace.ps1 Step 3:
- Detect Mitutoyo's burn-time typo on 218-378-13 series cal discs
  (filenames carry a trailing space inside the probe ID component,
  e.g. "Linear_X_218-378-13 _100072210.txt"). Their own .NET Setup.exe
  calls FileSystemInfo.set_Attributes on the source path and throws
  System.ArgumentException because the path contains an embedded space
  component, crashing every cal apply on 218-378-13 bays (exit
  -532462766 = 0xE0434352, .NET unhandled exception). Confirmed via
  WER Event 1026 captured during today's WJF00159 imaging.
- When the buggy filenames are detected, bypass the broken vendor
  Setup.exe and direct-copy data\*.* into
  C:\Program Files (x86)\MitutoyoApp\Formtracepak\data\, renaming
  each file to strip ' _' (space-underscore) -> '_'. Clear read-only
  attr on each landed file. Older 218-458A discs have clean filenames
  and still use the vendor Setup.exe path.

waxtrace-manifest.json:
- Drop DetectionValue=v14.15.26706 from both VC++ 2017 redist entries.
  Windows Update routinely bumps the VS14 runtime to 14.16+ / 14.3x+,
  the older Mitutoyo redist refuses to install over the newer (exit
  1638 'Another version already installed') and the manifest engine
  marked it as failed even though the runtime was fine. Detection is
  now by registry-key+name presence, which any VC++ 2015-2022 redist
  satisfies (they are backward-compatible).

startnet.cmd:prompt_waxtrace_asset:
- Replace free-text input with select-waxtrace-asset.ps1 arrow-key
  picker driven from installers-post/waxtrace/calibrations/INDEX.csv.
- Map Y: enrollment share early so the picker can read INDEX.csv.
- Replace parens-in-parens block (echo of '(e.g. WJRP2335)' inside
  the if-paren caused 'to was unexpected at this time' parse error
  observed by tech mid-imaging) with goto-flow.
- Fall back to free-text prompt if picker unavailable or operator
  presses Esc.

select-waxtrace-asset.ps1:
- Sort bays descending by asset tag so WJRP* lands at top of menu.
- Also staged as gea-shopfloor-waxtrace/select-waxtrace-asset.ps1 so
  sync-waxtrace.sh ships it to installers-post/waxtrace/ on the share.

sync-waxtrace.sh:
- Push select-waxtrace-asset.ps1 next to INDEX.csv on the share.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:38:19 -04:00
cproudlock
e1ea6b7c62 Wax/Trace: switch baseline to FormTracePak v6.213 vendor install
Replace the V6.0 captured-binary replay (pf-x86-MitutoyoApp.zip +
c-MitutoyoApp.zip + hklm-wow-mitutoyo.reg.gz) with a real vendor install
from FORMTRACEPAK-V6.213.iso mounted via Mount-DiskImage.

09-Setup-WaxAndTrace.ps1:
- Step 2 rewritten: Mount-DiskImage on
  C:\WaxTrace-Install\formtracepak\FORMTRACEPAK-V6.213.iso, run the VB6
  Setup.exe wrapper from the assigned drive letter (DRIVE_CDROM check
  satisfied by virtual mount, no real CD needed), then Dismount.
- Header rewritten: drop captured/ description, note legacy fallback
  remains in the repo (captured-binary/ unchanged) for manual recovery
  if the v6.213 vendor install fails on a bay.

sync-waxtrace.sh:
- Push formtracepak/FORMTRACEPAK-V6.213.iso (2.0 GB) into the bundle
  instead of captured/ payload. Override path via $FTPAK_ISO env var if
  needed (e.g. testing a v6.213 patch ISO).
- Sanity check no longer demands pf-x86-MitutoyoApp.zip; only requires
  prereqs/ + the manifest + dispatcher PS1.

playbook/utilities/convert-cal-iso.sh:
- New helper. Rebuilds a Linux-dd of a multi-session UDF cal disc into
  a clean ISO9660+UDF hybrid that Windows Mount-DiskImage reads. mkisofs
  reads the file via loop-udf, repacks single-session with the asset tag
  as volume label. Run when `file CAL-*.iso` reports "data" (multi-session
  UDF dd produced a Linux-readable but Windows-unreadable container).
  Single-session 218-458A discs from dd are already ISO9660 and don't
  need this.

Verified on win11 VM via qga: V6.213 ISO mounts, Setup.exe locatable.
14 cal ISOs all converted to ISO9660 (md5s refreshed in INDEX.csv),
re-synced to /srv/samba/enrollment/installers-post/waxtrace/calibrations/.
PXE share bundle now 2.0 GB total (V6.213 ISO + 14 cal ISOs + prereqs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 15:44:42 -04:00
cproudlock
2a0b4885fe Keyence VR-3000 G2: imaging-time FIPS opt-out for .exe.configs
Under Intune-enforced LSA FIPS policy, Profilometer / VRAnalyzer /
VRInspection apps crash at device init when MD5CryptoServiceProvider's
ctor is called to verify the probe EEPROM calibration (see keyence3000.txt
+ .png in pxe-images for the dialog + stack).

Patch each .exe.config under C:\Program Files\KEYENCE\<model>\ with
<runtime><enforceFIPSPolicy enabled="false"/></runtime>. Scope is app-CLR
only; OS-wide Lsa FIPS policy stays enforced. CMMC posture: scoped
exception, non-CUI integrity hash, documented in SSP. Each affected bay's
hostname must be on InfoSec's FIPS-exception list before imaging.

09-Setup-Keyence.ps1 gates the patch behind model=vr3000 only. vr5000 /
vr6000 bays do not auto-apply. Verified on win11 VM via qga: 29 configs
across vr5000+vr6000 layouts (vr3000 install was incomplete on VM),
patched + idempotent on re-run, existing <runtime> children preserved.
Also verified on a real PC: 27 patched + 2 skipped (Keyence pre-shipped
the element in two configs), 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 14:35:29 -04:00
cproudlock
37357eee43 Shopfloor images: add Wax/Trace + Keyence per-model variants
Wax/Trace (gea-shopfloor-waxtrace):
- captured/ holds master FormTracePak v6.0 state (Program Files reg dump
  gzipped, ARP entries) taken from a win11 VM where the CD-ROM-bound VB6
  wrapper was driven to completion. xcopy + reg-import replays the install
  on real bays without running the wrapper itself.
- 09-Setup-WaxAndTrace.ps1 rewrites the stub: installs prereqs via manifest
  (VC++ 2008/2017 x86+x64, Sentinel HASP), expands the captured zips into
  C:\Program Files (x86)\MitutoyoApp + C:\MitutoyoApp, imports the reg
  hive, then mounts the bay's per-machine cal ISO (matched by asset tag
  in machine-number.txt) and runs its Setup.exe.
- waxtrace-manifest.json lists the 5 prereqs with InstallShield-style
  silent flags verified on the win11 VM.
- sync-waxtrace.sh ships captured-binary/ + prereqs + cal ISOs from
  /home/camp/pxe-images/iso/mitutoyo-cal/ to
  /srv/samba/enrollment/installers-post/waxtrace/ on the PXE box.
- select-waxtrace-asset.ps1 arrow-key bay picker for WinPE (parses
  INDEX.csv from the cal share, offers "Other (new bay)" fallback).
- startnet.cmd: prompt_waxtrace_asset prompt, skip_waxtrace_stage xcopy
  block (mirrors :skip_cmm_stage), machine-number.txt write covers bay
  asset tag (WJRP*).

Keyence (gea-shopfloor-keyence) - now multi-model:
- vr3000/manifest.json + vr5000/manifest.json + vr6000/manifest.json
  (current single-model VR-6000 moved into vr6000/ subdir). Each ships
  the model's MSI silent-install + DetectionPath via ProductCode.
  Big payloads (Data1.cab, Data11.cab) gitignored, staged via
  sync-keyence.sh from /home/camp/pxe-images/iso/keyence/.
- 09-Setup-Keyence.ps1 dispatches by C:\Enrollment\keyence-model.txt
  (written by startnet.cmd in :keyence_submenu) and points
  InstallerRoot at C:\KeyenceInstall\<model>. DXSETUP probe widened
  to all three Program Files paths (VR-3000 G2, VR-5000, VR-6000).
- startnet.cmd: :keyence_submenu picks vr3000/vr5000/vr6000,
  :skip_keyence_stage xcopy block selectively stages chosen model bundle,
  pc-subtype.txt also written = drops directly into existing GE-Enforce
  PCSubType wiring (looks for gea-shopfloor-keyence-<model>\manifest.json
  on the tsgwp00525 share for ongoing enforcement, no dispatcher change
  needed).
- sync-keyence.sh mirrors sync-waxtrace.sh pattern.

Verified silent MSI install for VR-3000 G2 v2.5.0 and VR-5000 v3.3.1 on
the win11 VM 2026-05-18 with /qn /norestart ALLUSERS=1 REBOOT=ReallySuppress
TRANSFORMS=1033.mst. boot.wim on 172.16.9.1 wimupdate'd with the new
startnet.cmd.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:34:20 -04:00
cproudlock
7f097013fc BIOS sub-stage: switch to flag-file signaling, move push after W: copies
check-bios.cmd: drop literal `^` from BIOS_STATUS (caret survives quoted
SET so substring search for `->` never matched). Write X:\bios-fired.flag
on flash_done + staged paths so startnet.cmd can detect via if-exist.

startnet.cmd: replace `call set` substring-replace with `if exist
X:\bios-fired.flag`. Move push to after W:\Enrollment xcopy completes
(before Y: cleanup) so dashboard reflects "BIOS firmware update" stage
once file staging is done, matching user mental model of imaging order.

Tested flag-file logic in win11 VM cmd.exe: missing -> SKIPS, present
-> FIRES, removed -> SKIPS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 15:58:25 -04:00
cproudlock
9108b495c9 startnet.cmd: fix BIOS sub-stage detection (substring-replace trick)
Original used 'echo %BIOS_STATUS% | findstr /C:"->"' to detect whether
check-bios.cmd actually applied a firmware update. The '>' inside the
value (from check-bios writing e.g. 'updated 1.5.0 -> 1.6.0') gets
parsed by cmd.exe as a redirect operator BEFORE the pipe is set up.
Result: echo wrote a file named '1.6.0' in cwd instead of piping to
findstr. findstr saw no input, returned errorlevel=1, block never
fired. Plus a stray file got created in X:\Windows\System32.

Confirmed empirically in the win11 VM with test bat:
 - echo|findstr approach: SKIPS even when '->' present  (bug)
 - substring-replace: FIRES iff '->' present  (correct)

Fix: replace echo/pipe/findstr with a substring-replace test:

  call set "BIOS_STATUS_STRIPPED=%%BIOS_STATUS:->=%%"
  if not "%BIOS_STATUS%"=="%BIOS_STATUS_STRIPPED%" ( ... )

The '>' inside %VAR:->=...% is parsed as part of the substring
substitution token, not as a redirect. Yields true diff only when
the arrow was actually in BIOS_STATUS.

xcopy region was never impacted by the bug because subsequent
if-exist blocks didn't depend on the previous errorlevel state.
But the BIOS sub-stage push to the dashboard was silently broken.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 11:50:12 -04:00
cproudlock
d8c64bef2b Add conditional BIOS-update sub-stage on idx=1
winpe-status-push.ps1 now accepts -CurrentStage / -StageIndex
params so callers can override the default "WinPE: PESetup / WIM
apply" string. Backwards compatible.

startnet.cmd: after the existing initial WinPE status push,
inspect $BIOS_STATUS for the "->" marker that check-bios.cmd
writes when an update was actually applied or staged. If present,
fire a second idx=1 push with stage="WinPE: BIOS firmware update -
<status>". No-op for clean "up to date" / "no update in catalog"
runs.

imaging.html: at stage_idx=1 with "bios" in current_stage, swap
friendly label to "Updating BIOS firmware" with a do-NOT-power-off
hint. Bays without firmware updates show the default "Booting from
PXE" label as before.

boot.wim startnet.cmd updated via wimupdate so live PXE clients
pick it up at next boot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 10:26:31 -04:00
cproudlock
76a3ba513c Monitor: drop cert pre-gate + force Report IP after AESFMA connect
Two fixes for the AESFMA swap path:

1. Removed the X509Chain root-thumbprint pre-check. Bay user reported
   "claims connect not yet operational, but i was able to manually
   connect" - meaning the cert IS in LocalMachine\My but
   $chain.Build() returns a partial chain (probably missing an
   intermediate in the local trust store), so our root-thumbprint
   match returned false and Monitor never even tried the netsh
   connect. Letting netsh attempt directly - it's the source of
   truth on whether EAP-TLS auth succeeds. Rate-limited to 30s
   between attempts to avoid log spam when AESFMA truly isn't
   reachable.

2. Bumped post-connect verify sleep 8s -> 15s. WLAN auth + DHCP can
   take longer than 8s on first attempt.

3. New: once Test-AESFMAConnected returns true and INTERNETACCESS
   is deleted, force-run GE_ReportIP_3_v1.EXE /ForceUpdate=True /S
   so the webhook gets the corp-AESFMA IP immediately instead of
   waiting for the next DHCP-change trigger (which may never fire
   if AESFMA was the bay's first 10.x lease). $script:cache.
   ReportIpForced caches the one-shot fire.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 08:25:37 -04:00
cproudlock
3385bc87aa Monitor + imaging: per-phase sub-stages within idx=7
Monitor's Get-Snapshot already tracks Phase 1-5 (Intune Registration,
Device Configuration, Software Deployment, Credentials, Lockdown).
The webapp dashboard only saw a single idx=7 push for the entire
post-PPKG / pre-lockdown window, so the friendly label couldn't
reflect "where is this bay actually". Operator looking at the
dashboard had no idea whether to assign category or hit ARTS for
lockdown next.

Monitor now pushes additional idx=7 entries as it crosses Phase
boundaries:
 - On DeviceId capture: "Intune Device ID captured" (existing)
 - On Phase 2 done (SFLD policy delivered = category was assigned):
   "Phase 2 SFLD policy delivered (device configuration)"
 - On Phase 1-4 all complete: "Phases 1-4 complete - ready for
   lockdown (ARTS request)"
 - On lockdown done: idx=8 (existing)

imaging.html maps the stage_string substring to friendly labels:
 - default idx=7         -> "Registered - assign category"
 - 'sfld policy' / 'phase 2' -> "Phase 2 - device configuration"
 - 'credentials' / 'phase 4' -> "Phase 3 / 4 - DSC + credentials"
 - 'ready for lockdown' / 'request lockdown' -> "Ready - request
                              lockdown" (hint: click ARTS request)

Operator now knows exactly when to act vs when to wait.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 07:39:20 -04:00
cproudlock
520d4aa791 Monitor: fix AESFMA-connected detection + stop retrying once connected
Two bugs causing "AESFMA cert detected, connecting AESFMA..." to log
over and over even after AESFMA is already up:

1. Regex 'SSID\s*:\s*AESFMA.*?State\s*:\s*connected' required SSID
   line BEFORE State line. Actual netsh wlan show interfaces order
   on Win11 is "Name / State / SSID" - State comes FIRST. The non-
   greedy match never succeeded. Always thought AESFMA wasn't
   connected. Refactor to a Test-AESFMAConnected helper that splits
   output into per-adapter blocks and checks SSID + State independently,
   tolerating either order.

2. Added a fast-path at top of the WiFi-swap block: if AESFMA is
   already connected (no help needed from us), just delete
   INTERNETACCESS if still present and flip the cache flag to stop
   running this block. Previously the block only set the flag after a
   successful connect-then-verify-then-delete cycle; if AESFMA was
   already up at first check, the cycle "succeeded" each tick but
   the flag never flipped, producing the log spam.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 20:06:00 -04:00
cproudlock
894305e906 Monitor: drop AESFMA-connected from Phase 1 done; webapp: LAPS endpoint
1. Phase 1 done gate was requiring 'AESFMA WLAN connected' in addition
   to the data-side signals (AAD + Intune + EmTask + baseline). If the
   bay never reached AESFMA (cert never landed, RADIUS unreachable),
   Phase 1 stayed IN PROGRESS forever even though Intune registration
   was actually complete. Reverting to the data-side-only definition.

2. New webapp endpoint POST /imaging/<serial>/laps stores a LAPS
   password in the session JSON so it survives the 5s dashboard
   auto-refresh. Empty body clears the field. Daily reset of the
   server (cron/restart) is the lifetime cap on stored passwords.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 19:53:05 -04:00
cproudlock
4599c85509 Monitor: strip ANSI escape codes from dsregcmd output before regex
Smoking gun for "Monitor's on-screen QR works but no idx=7 push lands
on the PXE dashboard". Win11's dsregcmd emits ANSI VT100 escape codes
(e.g. \x1B[7mDeviceId\x1B[0m :) around field labels. Captured output
strings then have those codes between "DeviceId" and ":". The strict
regex 'DeviceId\s*:\s*<guid>' fails because \s* doesn't match ANSI
escape chars. $script:cache.DeviceId stays null, idx=7 push never
fires.

Build-QRCodeText was unaffected because it uses Select-String 'DeviceId'
(substring match, tolerates anything in between) then splits on ':'.

Fix: strip ANSI sequences via -replace '\x1B\[[0-9;]*[A-Za-z]', '' before
running the regex. Same pattern covers all CSI sequences dsregcmd uses.
Also force Out-String to get a single string back (was an array of lines
from 2>&1; -match on arrays returns matching elements but $matches
behavior across mixed objects is fragile).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 19:17:05 -04:00
cproudlock
2d75935dfc PostPpkg settle 60s -> 120s
Empirical: a fresh-imaged bay often hasn't finished AAD-join + first
Intune sync by 60s, so the post-PPKG-reboot Monitor instance starts
without DeviceId visible to dsregcmd yet. Doubling the settle to 120s
gives MDM more time to land baseline policies before the reboot,
which means the post-reboot Monitor sees AAD-joined + DeviceId on
first tick and fires idx=7 immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 19:13:26 -04:00
cproudlock
3fb1d983df Stop moving OpenText / WJ Shopfloor shortcuts into Shopfloor Tools
OpenText / Host Explorer shortcut filenames vary by installed profile
(e.g. 'WJ Shopfloor OpenText.lnk', 'WJ Shopfloor.lnk', 'HostExplorer
ShopFloor.lnk'). The taskbar-pin path in site-config.json hardcodes
'Shopfloor Tools\WJ Shopfloor.lnk' - mismatches the actual filename
so 07-TaskbarLayout silently skips pinning it.

Drop OpenText/ShopFloor/HostExplorer pattern moves from 06's
categorization regex. Shortcuts stay at the public-desktop top
level where the OpenText installer placed them. Tech sees the
icon on the desktop, no taskbar pin (the variable filename made
the pin unreliable anyway).

Other categories (UDC, eDNC, NTLARS, etc with stable filenames)
still move into Shopfloor Tools and pin correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:10:16 -04:00
cproudlock
9beee842f1 Monitor: deterministic AESFMA cert check via X509Chain root match
Walk Cert:\LocalMachine\My, build each cert's chain, look for chain
element with thumbprint 27F0C9A22B28CE7687B115A29E31BF4B3ABB180F.
That's the AESFMA.xml TrustedRootCA value = the GE Aerospace
FreeRADIUS root that AESFMA EAP-TLS validates against. A client cert
chained to that root is the SCEP-provisioned AESFMA machine cert.

Combined with the verify-before-delete connect attempt, this gives
two gates:
 1. Cert deterministically exists + chains correctly
 2. netsh wlan connect to AESFMA actually reports State=connected

Only after both pass does INTERNETACCESS get deleted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:48:00 -04:00
cproudlock
f013aa2bff Monitor: AESFMA verify-before-delete - keep INTERNETACCESS until cert ready
Old gate (SCEP cert in LocalMachine\My with Client Auth EKU) was both
too loose (matches non-AESFMA certs) and unable to verify the cert
chains to GE's RADIUS root. INTERNETACCESS got deleted before AESFMA
could actually authenticate, orphaning the bay.

New flow: when Phase 1 essentials (AAD + Intune + EmTask + baseline)
are complete, ATTEMPT netsh wlan connect AESFMA with INTERNETACCESS
still up as fallback. Wait 8s, parse netsh wlan show interfaces for
SSID=AESFMA + State=connected. Only delete INTERNETACCESS after
operational verification. If AESFMA connect fails (cert not provisioned
yet, RADIUS server unreachable, etc), keep INTERNETACCESS and retry
next tick. Loop runs every 5s while DeviceIdReported is false, so the
swap fires as soon as AESFMA is operationally viable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:46:19 -04:00
cproudlock
a9260ecadd Monitor: 5s tight poll while DeviceId still missing
DeviceId may not be in dsregcmd output the moment Monitor starts after
PPKG reboot - takes a few minutes for AAD-join to settle. Default 30s
PollSecs leaves wide gaps where Monitor isn't checking. Sleep 5s
instead while DeviceIdReported is still false. Once captured + idx=7
push lands, falls back to PollSecs (30s) for the rest of the loop.

Worst case for QR-on-dashboard latency: ~5 seconds after dsregcmd
starts returning a DeviceId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:08:12 -04:00
cproudlock
ab3e1c98f6 Monitor: fire idx=7 immediately on DeviceId capture (beat LAPS reboot)
User constraint: GE-issued LAPS-prompt reboot lands ~1 minute after
Report IP posts its log. Need the QR on the PXE dashboard BEFORE
that reboot or the operator has no way to look up the device for
LAPS retrieval.

Previously idx=7 was gated on Phase 1 essentials (AAD + Intune
enrolled + EmTask + baseline policies >=5). Those flips happen
later than DeviceId capture (dsregcmd shows DeviceId the instant
AAD-join completes during PPKG). Dropping the gate so idx=7
fires the moment the cache has a DeviceId. Phase 1 row on the
on-bay Monitor display still has its own AESFMA-required gate
for operational completeness; only the dashboard push is moved
earlier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:06:02 -04:00
cproudlock
842ef88ccb Monitor: gate WiFi swap on SCEP cert + Phase 1 done on AESFMA connected
Two related fixes for the WiFi handoff timing:

1. WiFi swap (delete INTERNETACCESS + connect AESFMA) was firing on
   Phase 1 essentials being green (AAD + Intune + EmTask + baseline
   policies >=5). That signal flips ~minutes BEFORE the Intune SCEP
   machine cert actually lands in LocalMachine\My. Without the cert,
   AESFMA EAP-TLS auth fails and the bay has no path at all (we just
   deleted INTERNETACCESS). Stuck.

   New gate: walk Cert:\LocalMachine\My for any cert with Client
   Authentication EKU (1.3.6.1.5.5.7.3.2). When that's present, SCEP
   has delivered, AESFMA EAP-TLS will succeed. Swap then fires safely.

2. Phase 1 row on the on-bay Monitor display now ALSO requires
   AESFMA to be actively connected (parsed from netsh wlan show
   interfaces: SSID=AESFMA + State=connected). Phase 1 stays IN
   PROGRESS until the bay is operationally on corp WLAN, not just
   data-side enrolled. Matches user request "not complete phase 1
   until AESFMA is ready".

idx=7 dashboard push still fires on the original Phase 1 essentials
gate so the QR appears as soon as Intune registers the device,
independent of AESFMA join timing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:04:09 -04:00
cproudlock
a17b3fae6a Retire wired-disable/re-enable dance now that PXE LAN is 172.16.9.0/24
GE Report IP filters Get-NetIPAddress on StartsWith("10.") and PXE LAN
addresses are now 172.16.9.x which the filter skips naturally. The
disable-then-re-enable workaround was only needed when PXE LAN was
10.9.100.x and bays leaked that IP to the GE webhook. With the renumber
that whole flow is dead weight.

Removed:
 - playbook/shopfloor-setup/Shopfloor/lib/Disable-WiredNics.ps1 (file)
 - Run-ShopfloorSetup: Disable-WiredNics call after PPKG returns
 - Run-ShopfloorSetup: "GE Re-enable Wired NICs" SYSTEM task registration
 - Monitor-IntuneProgress: reportIpLog-gated wired re-enable + idx=7 retry
 - Monitor-IntuneProgress: reportIpDone gate on Phase 1 done check

Side benefit: stages 2-6 dashboard pushes no longer go dark mid-flow
(used to die between idx=6 and idx=7 when wired was off). Phase 1 row
on the Monitor screen now flips COMPLETE on the natural AAD + Intune
+ EmTask + baseline-policies condition instead of waiting on the
Report IP log file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:45:54 -04:00
cproudlock
ce604adcda Renumber PXE LAN from 10.9.100.0/24 to 172.16.9.0/24
Single-site bay-stuck issue at WJ: GE Intune Report IP script filters
Get-NetIPAddress on StartsWith("10.") and posts everything matching
to the GE Tines webhook. Bays at WJ get the PXE LAN 10.9.100.x IP
captured and reported -> GE backend tags bays as on a non-corp 10.x
subnet -> dynamic group eligibility for SFLD policy never matches.
Other GE sites work because their PXE LANs aren't on 10.x at all.

Renumber PXE LAN to RFC1918 172.16.9.0/24 so the GE filter naturally
skips wired PXE addresses without any disable-NIC dance.

Server-side already in flight (netplan dual-bound, dnsmasq scope +
boot URL repointed, blancco preferences + grub.cfg + iPXE GetPxeScript
all sed'd to 172.16.9.1). This commit is the playbook / scripts /
docs side: 109 hits across 35 files sed'd in one shot.

After this lands + boot.wim is rebuilt + bays renumber off DHCP,
the 10.9.100.1 binding will be dropped from netplan as the final
cleanup step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:30:32 -04:00
cproudlock
c6b249f866 Monitor: idx=7 push fires on Phase 1 essentials complete
Pair with the INTERNETACCESS -> AESFMA WiFi-swap commit. Once
AAD-joined + IntuneEnrolled + EmTaskExists + baseline policies
all true AND DeviceId is captured, push idx=7 to PXE dashboard
with the DeviceId immediately - don't wait for the Report IP log
(which depends on AESFMA join + script timing).

Side note: the legacy wired-NIC re-enable + reportIpLog-gated
idx=7 push block earlier in Get-Phase1 still exists. Both paths
guard on $script:cache.DeviceIdReported so only one fires, but
that block is dead-ish under the new WiFi-swap flow (no wired
disable -> no NIC state file -> re-enable block no-ops; Report
IP log gate may still fire idx=7 if Phase 1 essentials haven't
all flipped yet but Report IP did). Worth cleaning up next pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:24:18 -04:00
cproudlock
f404cd2892 Monitor: drop INTERNETACCESS WiFi + connect AESFMA on Phase 1 complete
When Intune registration lands (AAD-joined + IntuneEnrolled + EnterpriseMgmt
task present + baseline policies >=5), the bay is presumed to have its
SCEP-provisioned machine cert in LocalMachine\My. At that point the
INTERNETACCESS profile (172.16.x guest/internet WiFi) is no longer
useful - it just keeps the bay on a non-corp range so Report IP can't
find a 10.x to POST and the SFLD assignment filter never matches.

Action: in Get-Phase1, once all four registration signals are green,
fire 'netsh wlan delete profile name=INTERNETACCESS' then immediately
'netsh wlan connect name=AESFMA ssid=AESFMA'. Bay drops onto corp WLAN
with EAP-TLS, picks up a 10.x lease, Report IP fires cleanly. One-shot
per Monitor lifetime via $script:cache.InternetAccessDeleted flag.

This is the alternative to pre-staging the AESFMA profile during
imaging (which was reverted). AESFMA profile is assumed to exist
already because Intune's WiFi config profile delivers it during the
same enrollment that just completed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:22:40 -04:00
cproudlock
f8944fbc49 Revert AESFMA preinstall stage from Run-ShopfloorSetup
User call: don't install AESFMA profiles during imaging preinstall.
Removed the MA package copy + MA4NetworkConfigv2.bat invocation
from Run-ShopfloorSetup line 43 area. .gitignore additions for
profile XML patterns are kept - those are harmless safety net.

PXE share's /srv/samba/enrollment/MachineAuth/ staging directory
is left in place (not deleted) - no consumer references it after
this revert.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:19:41 -04:00
cproudlock
a80bdd6923 Filtered Report IP shim - POSTs only WJ corp ranges to GE webhook
GE's Intune-deployed ReportIPAddresses_v2.ps1 filters Get-NetIPAddress
with $_.StartsWith("10.") - too broad for WJ where PXE LAN is 10.9.x.
Can't modify the GE script (signed). Workaround: run our own POST to
the same Tines webhook with a tight subnet filter, beating GE's
script to the punch.

New Invoke-FilteredReportIP.ps1 (lib/):
 - Walks Get-NetIPAddress -AddressFamily IPv4
 - Filters strictly to 10.134.48.0/23 OR 10.48.249.0/26 (WJ corp)
 - POSTs to https://tines.apps.geaerospace.com/webhook/.../... with
   {host, fqdn, IP, force_update} body matching GE's payload shape
 - Local dedup via C:\ProgramData\GEA\FilteredReportIP\last-ip.txt
 - 6 retries with 10s backoff on transient HTTP error
 - Logs to C:\Logs\FilteredReportIP.log

Monitor-IntuneProgress main loop calls it each tick until it
succeeds once. After success, $filteredReportIpSucceeded flag short-
circuits further attempts.

If WJ later moves to a different VLAN, edit the $allowedRanges array
in Invoke-FilteredReportIP.ps1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:16:39 -04:00
cproudlock
4dd300e7ab Stage GE MachineAuth profiles at imaging time (AESFMA auto-join)
Hypothesis test for WJ Phase 2 stuck issue. GE Report IP script
filters Get-NetIPAddress on StartsWith("10.") - WJ bays don't see
ANY 10.x because:
 - PXE LAN is 10.9.100.x (we'd disable wired anyway to avoid leak)
 - Internet WiFi at site is 172.16.x (filter rejects)
 - AESFMA corp WiFi (10.x) requires machine cert that Intune SCEP
   provisions a few minutes AFTER PPKG enrollment

Result: Report IP webhook gets nothing -> GE backend never sees the
bay -> bay never enters the dynamic group that SFLD policy is
assigned to. Other GE sites work because their corp WiFi/wired is
on a real 10.x corp network and the script always finds a 10.x to
report.

Drop the MA package (8021x.xml + AESFMA.xml + multi-NIC bat) onto
each bay early in Run-ShopfloorSetup, run MA4NetworkConfigv2.bat to
import both profiles to every physical wired + wireless adapter.
AESFMA.xml patched to connectionMode=auto (default V02 was manual)
so WLAN service auto-joins as soon as the SCEP cert lands. Bay
gets a real 10.x corp address. Report IP webhook fires cleanly.

Profile XMLs (8021x.xml, AESFMA.xml, BLUESSO.xml, WiFi-Profile.xml,
*.wlanprofile, *.lanprofile) added to .gitignore - they contain
GE-internal SSID + trusted-root thumbprint and are staged on the
PXE enrollment share at /srv/samba/enrollment/MachineAuth/ instead
of git.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:13:11 -04:00
cproudlock
86c7ffccd5 Monitor: bump post-reEnable settle 1s->5s + retry idx=7 push
Field test: bay imaged end-to-end, Monitor saw Report IP log,
captured DeviceId (Phase 1 went COMPLETE on screen + QR rendered
from dsregcmd), but the idx=7 push to the PXE dashboard never
landed before the Intune-triggered LAPS-prompt reboot. Root cause:
Enable-NetAdapter + 1s sleep doesn't give Windows time to renew
DHCP + populate routes before Send-PxeStatus POSTs to PXE webapp.
Push silently caught (Send-PxeStatus has try/catch), next tick was
30s away, LAPS reboot fired in between.

Two changes:
1. Sleep bumped 1s -> 5s after Enable-NetAdapter so wired path is
   actually carrying traffic before we POST.
2. When the tick that did the re-enable is also the push tick,
   retry Send-PxeStatus up to 6 times with 2s spacing (~12s total)
   instead of one-shot-then-give-up. Surfaces the warning to the
   transcript if all attempts fail so we can diagnose next time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:33:11 -04:00
cproudlock
59dbd64e37 Fix Report IP glob (.LOG not .txt) + add device-id copy button
Field bay surfaced two bugs in one diag dump (mdm-diag-F907T5X3 -
6PPSF24):

1. GE Proactive Remediation Report IP actually writes
   GE_Report_IP_Address_2_5.LOG (uppercase .LOG), not the .txt I
   assumed. Globs in two places had .txt filter -> never matched ->
   Phase 1 stuck IN PROGRESS forever even after the file landed and
   wired-NIC re-enable never fired. Drop extension from both globs
   in Monitor-IntuneProgress.ps1 (id=7 push gate + p1Done check).

2. The "GE Re-enable Wired NICs" SYSTEM task registered by
   Run-ShopfloorSetup was polling Autologon_Remediation.log for
   "Autologon set for ShopFloor" - a lockdown-time signal. Re-enable
   needs to fire at Report-IP time (well before lockdown) so that
   Monitor can push idx=7 with the QR before the Intune-triggered
   LAPS-prompt reboot. Repoint the SYSTEM task's poll to
   C:\Logs\GE_Report_IP_Address* (any extension).

Plus minor UX: copy button next to the Intune device ID on
/imaging dashboard so techs can grab the GUID without having to
double-click-select the <code>.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 18:27:26 -04:00
cproudlock
7e1ea03f02 Decouple wired-NIC re-enable from DeviceId capture
Previous logic bundled re-enable into the idx=7 DeviceId-push gate.
If DeviceId hadn't been captured yet (AAD join lag, dsregcmd parse
miss), re-enable never fired even though the Report IP log was
already sitting at C:\Logs\GE_Report_IP_Address*.txt and the NIC
state file was on disk.

Split into two independent checks per tick:
 1. Re-enable: triggered by (Report IP log) AND (NIC state file) only.
 2. idx=7 push: still gated on (DeviceId) AND (Report IP log).

Fixes case observed in field: file exists in C:\Logs but wired NICs
stayed off and the bay couldn't reach the PXE dashboard for idx=7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 18:15:04 -04:00
cproudlock
2bfb2522c7 Phase 1 stays "in progress" until Report IP log appears
Monitor on-screen Phase 1 row used to show COMPLETE the instant AAD
join + Intune enroll + EmTask + baseline policies (>=15 subkeys) all
hit. That's misleading: the bay isn't actually registration-clean
until GE's Proactive Remediation Report IP script has fired on
WiFi-only and dropped C:\Logs\GE_Report_IP_Address*.txt. Without
that log, the SFLD ConfigurationProfile assignment filter still sees
a leaked 10.9.100.x IP and Phase 2 won't unblock.

Add reportIpDone to both the p1Done gate and the Get-PhaseStatus
input list so the on-screen Intune Registration row stays IN PROGRESS
until the file lands. Matches the dashboard side: idx=7 push is
already gated on the same file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 18:11:36 -04:00
cproudlock
d87be4c40d Move wired-disable from stage 2 to post-PPKG-return
Push stages 2-6 to dashboard before going dark. Wired stays up through
PPKG enrollment so all standard imaging progress lights up the dashboard
card. Disable fires AFTER idx=6 push (handoff to Monitor PostPpkg) +
BEFORE PostPpkg settle's Schedule #3 hammer + BEFORE the PPKG-driven
reboot + BEFORE IME starts firing Report IP. Result: dashboard shows
2-6 cleanly, dark from 6 to 7, then catches up at 7 with QR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 17:28:58 -04:00
cproudlock
b8328171eb Kill wired NICs post-stage-2 until Report IP log appears
Recurring Phase 2 "Device Configuration" stuck: GE Intune Proactive
Remediation "Report IP" script enumerates Get-NetIPAddress and POSTs
all IPs to a GE webhook. Bays cabled to air-gapped PXE LAN have
10.9.100.x leak into that report. GE backend tags bays "not on corp
net" -> dynamic-group assignment-filter at GE excludes them from the
SFLD ConfigurationProfile (Function + SasToken OMA-URI) ->
HKLM:\SOFTWARE\GE\SFLD\DSC never populates -> Monitor Phase 2 gate
never closes. Confirmed via mdm-diag-F907T5X3 dump: every Microsoft
policy delivered fine, zero SFLD/GE-namespace OMA-URI present.

Fix flow:
1. Run-ShopfloorSetup line 43: disable every Up wired NIC right after
   stage 2 push. NIC names persisted to
   C:\Enrollment\disabled-wired-nics.txt for later re-enable.
2. Stages 3-6 status pushes fail silently while wired is down (PXE
   server lives on the air-gapped 10.9.100.0/24 LAN, unreachable from
   WiFi). Dashboard goes dark in that window.
3. PPKG installs, immediate reboot, AAD/Intune enroll over WiFi only.
4. IME boots, Report IP script fires with corp-WiFi IP only, writes
   C:\Logs\GE_Report_IP_Address*.txt. Webhook records clean IP. GE
   dynamic group eligibility flips. SFLD policy delivers next sync.
5. Monitor-IntuneProgress detects the log file, re-enables every NIC
   in the persisted list, sleeps 1s for link, then pushes idx=7 with
   DeviceId so the dashboard card flips to QR before the Intune-
   triggered LAPS-prompt reboot lands.

Phase 1 remains "in progress" on the dashboard until Report IP fires
- correct, the bay isn't actually registration-clean until then.

Files:
- Disable-WiredNics.ps1 (new) - persists names + disables
- Run-ShopfloorSetup.ps1 - call after stage 2 Report-Stage
- Monitor-IntuneProgress.ps1 - gate idx=7 push + re-enable

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 17:22:41 -04:00
cproudlock
b5a067bd48 Cut Post-PPKG settle from 180s to 60s
Empirical evidence: MDM baseline policy push lands well within 60s
after PPKG triggers immediate reboot path on bays where assignment
filter matches. Bays where it doesn't deliver in 60s aren't going
to deliver in 180s either - they're blocked on an assignment-filter
or dynamic-group lag (sometimes 30+ min in GCC-High), not on the
raw sync window. Trimming 120s of dead wait off every imaging cycle.

Aggressive 30s Schedule #3 hammer + early-exit on baseline (>=5
subkeys) preserved - those still help bays that DO deliver fast.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 17:10:50 -04:00
cproudlock
44bbd23e4d Monitor-IntuneProgress: auto-fire idx=8 on lockdown detection
Local-user shopfloor fleet (ShopFloor is a LOCAL account) means
AzureAdPrt stays NO, user-scoped Intune policies never deliver, and
the natural completion gate (baseline policies >= 5 + DSCInstall
success + Phase 4 wrappers) never closes. Dashboard sessions stuck
at 7/8 / 87.5% forever even though the bay is functionally complete.

Real-world definition of "done" for these bays is lockdown applied.
Add a per-tick check in Get-Phase1 (alongside the DeviceId push) that
detects either:
  * Winlogon DefaultUserName -like 'ShopFloor*' AND AutoAdminLogon=1
  * OR C:\Enrollment\force-lockdown-applied.txt marker file

When either is present and not yet pushed, fire Send-PxeStatus
StageIndex=8 / StageTotal=8 / Status='succeeded' with the captured
IntuneDeviceId for the dashboard QR. One-shot per session via the
LockdownCompletePushed cache flag.

No need for the operator to run mark-complete.bat anymore -
Monitor's main loop will fire idx=8 within ~5s of force-lockdown
or autologon flip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 15:47:23 -04:00
cproudlock
a8d38f6117 imaging: load Send-PxeStatus at script scope + bump QR size to 160px
Monitor-IntuneProgress.ps1: the previous Ensure-SendPxeStatus function
ran '. $lib' from inside the function body. PowerShell's dot-source-
inside-function semantics put the imported Send-PxeStatus into the
function's LOCAL scope, not the script scope. By the time Get-Phase1
called Get-Command Send-PxeStatus, the function had already returned
and Send-PxeStatus was out of scope - silently never invoked, no log
entry at all (success or failure). Diagnostic confirmed: bay had
DeviceId in dsregcmd, manual Send-PxeStatus from operator prompt
fired idx=7 cleanly with QR rendered, but Monitor's automatic call
never showed up in C:\Logs\send-pxe-status.log.

Fix: dot-source at script top-level (outside any function). Then
Send-PxeStatus is in script scope where every function in the file
can call it. Keep Ensure-SendPxeStatus as a no-op stub for any caller
still invoking it.

imaging.html: bump QR data-qr-size from 56 to 160 px. A 36-char UUID
at ECC M needs ~29x29 modules; at 56px each module was ~1.5px which
is too tight for a phone camera to lock onto from typical distance.
160 px gives ~5 px/module which scans cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 15:41:52 -04:00
cproudlock
320b241942 winpe-status-push: brace var names before colon (parser bug)
PowerShell parses $var: as scope-namespaced syntax (e.g. $env:NAME,
$global:foo). The line

    Log "server=$PxeServer:$Port  pctype=$PCType"

errored at line 26 col 13 - parser interpreted $PxeServer: as a scope
prefix and bailed. Fix: use ${PxeServer}:${Port} so the colon is
literal. The $uri line below already had the right form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 14:45:37 -04:00
cproudlock
2e8cf4b5be Monitor-IntuneProgress: fix DeviceId capture gate
DeviceId capture was nested inside the -not AzureAdJoined gate. Once
AAD joined flipped true the block stopped running, but DeviceId only
appears in dsregcmd output AFTER AzureAdJoined is set, so the capture
never fired and Send-PxeStatus -IntuneDeviceId never pushed. Webapp
session JSON missing intune_device_id field; /imaging card couldn't
render the QR even though the bay-side Build-QRCodeText showed the
QR correctly (it calls dsregcmd each render with no gate).

Fix: change the gate condition so the dsregcmd call keeps running
while EITHER AzureAdJoined OR DeviceId is still missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 14:35:18 -04:00
cproudlock
85278e01bf Run-ShopfloorSetup: fix Send-PxeStatus dot-source path
Run-ShopfloorSetup.ps1 is copied by startnet.cmd to W:\Enrollment\
(root of Enrollment, NOT inside shopfloor-setup/). So $PSScriptRoot is
W:\Enrollment\. The dot-source path was Join-Path $PSScriptRoot
'Shopfloor\lib\Send-PxeStatus.ps1' which resolves to
W:\Enrollment\Shopfloor\lib\Send-PxeStatus.ps1 - that path does not
exist.

The actual file lands at W:\Enrollment\shopfloor-setup\Shopfloor\lib\
Send-PxeStatus.ps1 (xcopied by startnet from the Shopfloor share dir
into the shopfloor-setup\ subdir). Test-Path returned false, dot-source
silently skipped, Send-PxeStatus was never defined, every Report-Stage
call no-op'd, no log file was written, no POSTs reached the dashboard.

Symptom: bay reaches Windows desktop + runs Run-ShopfloorSetup but
never appears on /imaging dashboard. C:\Logs\send-pxe-status.log does
not exist on the bay.

Fix: add the missing 'shopfloor-setup\' segment so the path resolves
to the actual file location.

09-Setup-*.ps1 use a different relative path ('..\Shopfloor\lib\...')
from inside the per-type dir and were unaffected. Monitor-IntuneProgress
sits in Shopfloor\lib already and uses a sibling lookup - also fine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 14:24:36 -04:00
cproudlock
a57ed5fd96 winpe: externalize WinPE-phase status push to scripts/winpe-status-push.ps1
The inline one-liner in startnet.cmd called Get-NetAdapter, which is
not available in WinPE's stripped PowerShell (no NetTCPIP module).
Errors silently swallowed by the surrounding try/catch - POST never
fired, dashboard never showed bays during the WIM-apply phase.

Externalize to a standalone .ps1 on the enrollment share:

  * Uses wmic (always present in WinPE 10+) for both serial AND mac
    instead of Get-CimInstance / Get-NetAdapter.
  * Logs every step to X:\Windows\Temp\winpe-status-push.log so a
    future "POST didn't fire" debug is one file read away.
  * startnet.cmd now just runs powershell -File Y:\scripts\winpe-status-
    push.ps1. Future edits to the push logic do NOT require a boot.wim
    rebuild; just edit the .ps1 on the share.

Mirror the existing pattern for run-enrollment.ps1 / wait-for-internet.ps1
/ migrate-to-wifi.ps1 (all already at /srv/samba/enrollment/scripts/).
Add the new file to the playbook's enrollment-scripts copy loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 14:05:50 -04:00