DeviceId may not be in dsregcmd output the moment Monitor starts after
PPKG reboot - takes a few minutes for AAD-join to settle. Default 30s
PollSecs leaves wide gaps where Monitor isn't checking. Sleep 5s
instead while DeviceIdReported is still false. Once captured + idx=7
push lands, falls back to PollSecs (30s) for the rest of the loop.
Worst case for QR-on-dashboard latency: ~5 seconds after dsregcmd
starts returning a DeviceId.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User constraint: GE-issued LAPS-prompt reboot lands ~1 minute after
Report IP posts its log. Need the QR on the PXE dashboard BEFORE
that reboot or the operator has no way to look up the device for
LAPS retrieval.
Previously idx=7 was gated on Phase 1 essentials (AAD + Intune
enrolled + EmTask + baseline policies >=5). Those flips happen
later than DeviceId capture (dsregcmd shows DeviceId the instant
AAD-join completes during PPKG). Dropping the gate so idx=7
fires the moment the cache has a DeviceId. Phase 1 row on the
on-bay Monitor display still has its own AESFMA-required gate
for operational completeness; only the dashboard push is moved
earlier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes for the WiFi handoff timing:
1. WiFi swap (delete INTERNETACCESS + connect AESFMA) was firing on
Phase 1 essentials being green (AAD + Intune + EmTask + baseline
policies >=5). That signal flips ~minutes BEFORE the Intune SCEP
machine cert actually lands in LocalMachine\My. Without the cert,
AESFMA EAP-TLS auth fails and the bay has no path at all (we just
deleted INTERNETACCESS). Stuck.
New gate: walk Cert:\LocalMachine\My for any cert with Client
Authentication EKU (1.3.6.1.5.5.7.3.2). When that's present, SCEP
has delivered, AESFMA EAP-TLS will succeed. Swap then fires safely.
2. Phase 1 row on the on-bay Monitor display now ALSO requires
AESFMA to be actively connected (parsed from netsh wlan show
interfaces: SSID=AESFMA + State=connected). Phase 1 stays IN
PROGRESS until the bay is operationally on corp WLAN, not just
data-side enrolled. Matches user request "not complete phase 1
until AESFMA is ready".
idx=7 dashboard push still fires on the original Phase 1 essentials
gate so the QR appears as soon as Intune registers the device,
independent of AESFMA join timing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GE Report IP filters Get-NetIPAddress on StartsWith("10.") and PXE LAN
addresses are now 172.16.9.x which the filter skips naturally. The
disable-then-re-enable workaround was only needed when PXE LAN was
10.9.100.x and bays leaked that IP to the GE webhook. With the renumber
that whole flow is dead weight.
Removed:
- playbook/shopfloor-setup/Shopfloor/lib/Disable-WiredNics.ps1 (file)
- Run-ShopfloorSetup: Disable-WiredNics call after PPKG returns
- Run-ShopfloorSetup: "GE Re-enable Wired NICs" SYSTEM task registration
- Monitor-IntuneProgress: reportIpLog-gated wired re-enable + idx=7 retry
- Monitor-IntuneProgress: reportIpDone gate on Phase 1 done check
Side benefit: stages 2-6 dashboard pushes no longer go dark mid-flow
(used to die between idx=6 and idx=7 when wired was off). Phase 1 row
on the Monitor screen now flips COMPLETE on the natural AAD + Intune
+ EmTask + baseline-policies condition instead of waiting on the
Report IP log file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-site bay-stuck issue at WJ: GE Intune Report IP script filters
Get-NetIPAddress on StartsWith("10.") and posts everything matching
to the GE Tines webhook. Bays at WJ get the PXE LAN 10.9.100.x IP
captured and reported -> GE backend tags bays as on a non-corp 10.x
subnet -> dynamic group eligibility for SFLD policy never matches.
Other GE sites work because their PXE LANs aren't on 10.x at all.
Renumber PXE LAN to RFC1918 172.16.9.0/24 so the GE filter naturally
skips wired PXE addresses without any disable-NIC dance.
Server-side already in flight (netplan dual-bound, dnsmasq scope +
boot URL repointed, blancco preferences + grub.cfg + iPXE GetPxeScript
all sed'd to 172.16.9.1). This commit is the playbook / scripts /
docs side: 109 hits across 35 files sed'd in one shot.
After this lands + boot.wim is rebuilt + bays renumber off DHCP,
the 10.9.100.1 binding will be dropped from netplan as the final
cleanup step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pair with the INTERNETACCESS -> AESFMA WiFi-swap commit. Once
AAD-joined + IntuneEnrolled + EmTaskExists + baseline policies
all true AND DeviceId is captured, push idx=7 to PXE dashboard
with the DeviceId immediately - don't wait for the Report IP log
(which depends on AESFMA join + script timing).
Side note: the legacy wired-NIC re-enable + reportIpLog-gated
idx=7 push block earlier in Get-Phase1 still exists. Both paths
guard on $script:cache.DeviceIdReported so only one fires, but
that block is dead-ish under the new WiFi-swap flow (no wired
disable -> no NIC state file -> re-enable block no-ops; Report
IP log gate may still fire idx=7 if Phase 1 essentials haven't
all flipped yet but Report IP did). Worth cleaning up next pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When Intune registration lands (AAD-joined + IntuneEnrolled + EnterpriseMgmt
task present + baseline policies >=5), the bay is presumed to have its
SCEP-provisioned machine cert in LocalMachine\My. At that point the
INTERNETACCESS profile (172.16.x guest/internet WiFi) is no longer
useful - it just keeps the bay on a non-corp range so Report IP can't
find a 10.x to POST and the SFLD assignment filter never matches.
Action: in Get-Phase1, once all four registration signals are green,
fire 'netsh wlan delete profile name=INTERNETACCESS' then immediately
'netsh wlan connect name=AESFMA ssid=AESFMA'. Bay drops onto corp WLAN
with EAP-TLS, picks up a 10.x lease, Report IP fires cleanly. One-shot
per Monitor lifetime via $script:cache.InternetAccessDeleted flag.
This is the alternative to pre-staging the AESFMA profile during
imaging (which was reverted). AESFMA profile is assumed to exist
already because Intune's WiFi config profile delivers it during the
same enrollment that just completed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User call: don't install AESFMA profiles during imaging preinstall.
Removed the MA package copy + MA4NetworkConfigv2.bat invocation
from Run-ShopfloorSetup line 43 area. .gitignore additions for
profile XML patterns are kept - those are harmless safety net.
PXE share's /srv/samba/enrollment/MachineAuth/ staging directory
is left in place (not deleted) - no consumer references it after
this revert.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GE's Intune-deployed ReportIPAddresses_v2.ps1 filters Get-NetIPAddress
with $_.StartsWith("10.") - too broad for WJ where PXE LAN is 10.9.x.
Can't modify the GE script (signed). Workaround: run our own POST to
the same Tines webhook with a tight subnet filter, beating GE's
script to the punch.
New Invoke-FilteredReportIP.ps1 (lib/):
- Walks Get-NetIPAddress -AddressFamily IPv4
- Filters strictly to 10.134.48.0/23 OR 10.48.249.0/26 (WJ corp)
- POSTs to https://tines.apps.geaerospace.com/webhook/.../... with
{host, fqdn, IP, force_update} body matching GE's payload shape
- Local dedup via C:\ProgramData\GEA\FilteredReportIP\last-ip.txt
- 6 retries with 10s backoff on transient HTTP error
- Logs to C:\Logs\FilteredReportIP.log
Monitor-IntuneProgress main loop calls it each tick until it
succeeds once. After success, $filteredReportIpSucceeded flag short-
circuits further attempts.
If WJ later moves to a different VLAN, edit the $allowedRanges array
in Invoke-FilteredReportIP.ps1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hypothesis test for WJ Phase 2 stuck issue. GE Report IP script
filters Get-NetIPAddress on StartsWith("10.") - WJ bays don't see
ANY 10.x because:
- PXE LAN is 10.9.100.x (we'd disable wired anyway to avoid leak)
- Internet WiFi at site is 172.16.x (filter rejects)
- AESFMA corp WiFi (10.x) requires machine cert that Intune SCEP
provisions a few minutes AFTER PPKG enrollment
Result: Report IP webhook gets nothing -> GE backend never sees the
bay -> bay never enters the dynamic group that SFLD policy is
assigned to. Other GE sites work because their corp WiFi/wired is
on a real 10.x corp network and the script always finds a 10.x to
report.
Drop the MA package (8021x.xml + AESFMA.xml + multi-NIC bat) onto
each bay early in Run-ShopfloorSetup, run MA4NetworkConfigv2.bat to
import both profiles to every physical wired + wireless adapter.
AESFMA.xml patched to connectionMode=auto (default V02 was manual)
so WLAN service auto-joins as soon as the SCEP cert lands. Bay
gets a real 10.x corp address. Report IP webhook fires cleanly.
Profile XMLs (8021x.xml, AESFMA.xml, BLUESSO.xml, WiFi-Profile.xml,
*.wlanprofile, *.lanprofile) added to .gitignore - they contain
GE-internal SSID + trusted-root thumbprint and are staged on the
PXE enrollment share at /srv/samba/enrollment/MachineAuth/ instead
of git.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Field test: bay imaged end-to-end, Monitor saw Report IP log,
captured DeviceId (Phase 1 went COMPLETE on screen + QR rendered
from dsregcmd), but the idx=7 push to the PXE dashboard never
landed before the Intune-triggered LAPS-prompt reboot. Root cause:
Enable-NetAdapter + 1s sleep doesn't give Windows time to renew
DHCP + populate routes before Send-PxeStatus POSTs to PXE webapp.
Push silently caught (Send-PxeStatus has try/catch), next tick was
30s away, LAPS reboot fired in between.
Two changes:
1. Sleep bumped 1s -> 5s after Enable-NetAdapter so wired path is
actually carrying traffic before we POST.
2. When the tick that did the re-enable is also the push tick,
retry Send-PxeStatus up to 6 times with 2s spacing (~12s total)
instead of one-shot-then-give-up. Surfaces the warning to the
transcript if all attempts fail so we can diagnose next time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
navigator.clipboard.writeText is gated on isSecureContext - HTTPS
or localhost only. PXE dashboard is served over plain HTTP
(10.9.100.1:9009) so the API was undefined and the chain threw
before .catch fired - user saw nothing. Wrap clipboard write in
copyText() that prefers the modern API and falls back to the
classic invisible-textarea + document.execCommand('copy') path
which works on HTTP. Visual flash logic moved into flashCopied()
for reuse.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Click effect: button flashes green with "copied!" text and 1.15x
scale pulse, reverts after 1.2s. Failure case (clipboard API blocked
or HTTP context) shows red "failed" for 1.5s. Handler moved out of
inline onclick into a single delegated click listener at the doc
level so future copy buttons just need the .copy-btn class +
data-copy-text attribute.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Field bay surfaced two bugs in one diag dump (mdm-diag-F907T5X3 -
6PPSF24):
1. GE Proactive Remediation Report IP actually writes
GE_Report_IP_Address_2_5.LOG (uppercase .LOG), not the .txt I
assumed. Globs in two places had .txt filter -> never matched ->
Phase 1 stuck IN PROGRESS forever even after the file landed and
wired-NIC re-enable never fired. Drop extension from both globs
in Monitor-IntuneProgress.ps1 (id=7 push gate + p1Done check).
2. The "GE Re-enable Wired NICs" SYSTEM task registered by
Run-ShopfloorSetup was polling Autologon_Remediation.log for
"Autologon set for ShopFloor" - a lockdown-time signal. Re-enable
needs to fire at Report-IP time (well before lockdown) so that
Monitor can push idx=7 with the QR before the Intune-triggered
LAPS-prompt reboot. Repoint the SYSTEM task's poll to
C:\Logs\GE_Report_IP_Address* (any extension).
Plus minor UX: copy button next to the Intune device ID on
/imaging dashboard so techs can grab the GUID without having to
double-click-select the <code>.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous logic bundled re-enable into the idx=7 DeviceId-push gate.
If DeviceId hadn't been captured yet (AAD join lag, dsregcmd parse
miss), re-enable never fired even though the Report IP log was
already sitting at C:\Logs\GE_Report_IP_Address*.txt and the NIC
state file was on disk.
Split into two independent checks per tick:
1. Re-enable: triggered by (Report IP log) AND (NIC state file) only.
2. idx=7 push: still gated on (DeviceId) AND (Report IP log).
Fixes case observed in field: file exists in C:\Logs but wired NICs
stayed off and the bay couldn't reach the PXE dashboard for idx=7.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Monitor on-screen Phase 1 row used to show COMPLETE the instant AAD
join + Intune enroll + EmTask + baseline policies (>=15 subkeys) all
hit. That's misleading: the bay isn't actually registration-clean
until GE's Proactive Remediation Report IP script has fired on
WiFi-only and dropped C:\Logs\GE_Report_IP_Address*.txt. Without
that log, the SFLD ConfigurationProfile assignment filter still sees
a leaked 10.9.100.x IP and Phase 2 won't unblock.
Add reportIpDone to both the p1Done gate and the Get-PhaseStatus
input list so the on-screen Intune Registration row stays IN PROGRESS
until the file lands. Matches the dashboard side: idx=7 push is
already gated on the same file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Push stages 2-6 to dashboard before going dark. Wired stays up through
PPKG enrollment so all standard imaging progress lights up the dashboard
card. Disable fires AFTER idx=6 push (handoff to Monitor PostPpkg) +
BEFORE PostPpkg settle's Schedule #3 hammer + BEFORE the PPKG-driven
reboot + BEFORE IME starts firing Report IP. Result: dashboard shows
2-6 cleanly, dark from 6 to 7, then catches up at 7 with QR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Recurring Phase 2 "Device Configuration" stuck: GE Intune Proactive
Remediation "Report IP" script enumerates Get-NetIPAddress and POSTs
all IPs to a GE webhook. Bays cabled to air-gapped PXE LAN have
10.9.100.x leak into that report. GE backend tags bays "not on corp
net" -> dynamic-group assignment-filter at GE excludes them from the
SFLD ConfigurationProfile (Function + SasToken OMA-URI) ->
HKLM:\SOFTWARE\GE\SFLD\DSC never populates -> Monitor Phase 2 gate
never closes. Confirmed via mdm-diag-F907T5X3 dump: every Microsoft
policy delivered fine, zero SFLD/GE-namespace OMA-URI present.
Fix flow:
1. Run-ShopfloorSetup line 43: disable every Up wired NIC right after
stage 2 push. NIC names persisted to
C:\Enrollment\disabled-wired-nics.txt for later re-enable.
2. Stages 3-6 status pushes fail silently while wired is down (PXE
server lives on the air-gapped 10.9.100.0/24 LAN, unreachable from
WiFi). Dashboard goes dark in that window.
3. PPKG installs, immediate reboot, AAD/Intune enroll over WiFi only.
4. IME boots, Report IP script fires with corp-WiFi IP only, writes
C:\Logs\GE_Report_IP_Address*.txt. Webhook records clean IP. GE
dynamic group eligibility flips. SFLD policy delivers next sync.
5. Monitor-IntuneProgress detects the log file, re-enables every NIC
in the persisted list, sleeps 1s for link, then pushes idx=7 with
DeviceId so the dashboard card flips to QR before the Intune-
triggered LAPS-prompt reboot lands.
Phase 1 remains "in progress" on the dashboard until Report IP fires
- correct, the bay isn't actually registration-clean until then.
Files:
- Disable-WiredNics.ps1 (new) - persists names + disables
- Run-ShopfloorSetup.ps1 - call after stage 2 Report-Stage
- Monitor-IntuneProgress.ps1 - gate idx=7 push + re-enable
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empirical evidence: MDM baseline policy push lands well within 60s
after PPKG triggers immediate reboot path on bays where assignment
filter matches. Bays where it doesn't deliver in 60s aren't going
to deliver in 180s either - they're blocked on an assignment-filter
or dynamic-group lag (sometimes 30+ min in GCC-High), not on the
raw sync window. Trimming 120s of dead wait off every imaging cycle.
Aggressive 30s Schedule #3 hammer + early-exit on baseline (>=5
subkeys) preserved - those still help bays that DO deliver fast.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local-user shopfloor fleet (ShopFloor is a LOCAL account) means
AzureAdPrt stays NO, user-scoped Intune policies never deliver, and
the natural completion gate (baseline policies >= 5 + DSCInstall
success + Phase 4 wrappers) never closes. Dashboard sessions stuck
at 7/8 / 87.5% forever even though the bay is functionally complete.
Real-world definition of "done" for these bays is lockdown applied.
Add a per-tick check in Get-Phase1 (alongside the DeviceId push) that
detects either:
* Winlogon DefaultUserName -like 'ShopFloor*' AND AutoAdminLogon=1
* OR C:\Enrollment\force-lockdown-applied.txt marker file
When either is present and not yet pushed, fire Send-PxeStatus
StageIndex=8 / StageTotal=8 / Status='succeeded' with the captured
IntuneDeviceId for the dashboard QR. One-shot per session via the
LockdownCompletePushed cache flag.
No need for the operator to run mark-complete.bat anymore -
Monitor's main loop will fire idx=8 within ~5s of force-lockdown
or autologon flip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Monitor-IntuneProgress.ps1: the previous Ensure-SendPxeStatus function
ran '. $lib' from inside the function body. PowerShell's dot-source-
inside-function semantics put the imported Send-PxeStatus into the
function's LOCAL scope, not the script scope. By the time Get-Phase1
called Get-Command Send-PxeStatus, the function had already returned
and Send-PxeStatus was out of scope - silently never invoked, no log
entry at all (success or failure). Diagnostic confirmed: bay had
DeviceId in dsregcmd, manual Send-PxeStatus from operator prompt
fired idx=7 cleanly with QR rendered, but Monitor's automatic call
never showed up in C:\Logs\send-pxe-status.log.
Fix: dot-source at script top-level (outside any function). Then
Send-PxeStatus is in script scope where every function in the file
can call it. Keep Ensure-SendPxeStatus as a no-op stub for any caller
still invoking it.
imaging.html: bump QR data-qr-size from 56 to 160 px. A 36-char UUID
at ECC M needs ~29x29 modules; at 56px each module was ~1.5px which
is too tight for a phone camera to lock onto from typical distance.
160 px gives ~5 px/module which scans cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PowerShell parses $var: as scope-namespaced syntax (e.g. $env:NAME,
$global:foo). The line
Log "server=$PxeServer:$Port pctype=$PCType"
errored at line 26 col 13 - parser interpreted $PxeServer: as a scope
prefix and bailed. Fix: use ${PxeServer}:${Port} so the colon is
literal. The $uri line below already had the right form.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DeviceId capture was nested inside the -not AzureAdJoined gate. Once
AAD joined flipped true the block stopped running, but DeviceId only
appears in dsregcmd output AFTER AzureAdJoined is set, so the capture
never fired and Send-PxeStatus -IntuneDeviceId never pushed. Webapp
session JSON missing intune_device_id field; /imaging card couldn't
render the QR even though the bay-side Build-QRCodeText showed the
QR correctly (it calls dsregcmd each render with no gate).
Fix: change the gate condition so the dsregcmd call keeps running
while EITHER AzureAdJoined OR DeviceId is still missing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run-ShopfloorSetup.ps1 is copied by startnet.cmd to W:\Enrollment\
(root of Enrollment, NOT inside shopfloor-setup/). So $PSScriptRoot is
W:\Enrollment\. The dot-source path was Join-Path $PSScriptRoot
'Shopfloor\lib\Send-PxeStatus.ps1' which resolves to
W:\Enrollment\Shopfloor\lib\Send-PxeStatus.ps1 - that path does not
exist.
The actual file lands at W:\Enrollment\shopfloor-setup\Shopfloor\lib\
Send-PxeStatus.ps1 (xcopied by startnet from the Shopfloor share dir
into the shopfloor-setup\ subdir). Test-Path returned false, dot-source
silently skipped, Send-PxeStatus was never defined, every Report-Stage
call no-op'd, no log file was written, no POSTs reached the dashboard.
Symptom: bay reaches Windows desktop + runs Run-ShopfloorSetup but
never appears on /imaging dashboard. C:\Logs\send-pxe-status.log does
not exist on the bay.
Fix: add the missing 'shopfloor-setup\' segment so the path resolves
to the actual file location.
09-Setup-*.ps1 use a different relative path ('..\Shopfloor\lib\...')
from inside the per-type dir and were unaffected. Monitor-IntuneProgress
sits in Shopfloor\lib already and uses a sibling lookup - also fine.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The inline one-liner in startnet.cmd called Get-NetAdapter, which is
not available in WinPE's stripped PowerShell (no NetTCPIP module).
Errors silently swallowed by the surrounding try/catch - POST never
fired, dashboard never showed bays during the WIM-apply phase.
Externalize to a standalone .ps1 on the enrollment share:
* Uses wmic (always present in WinPE 10+) for both serial AND mac
instead of Get-CimInstance / Get-NetAdapter.
* Logs every step to X:\Windows\Temp\winpe-status-push.log so a
future "POST didn't fire" debug is one file read away.
* startnet.cmd now just runs powershell -File Y:\scripts\winpe-status-
push.ps1. Future edits to the push logic do NOT require a boot.wim
rebuild; just edit the .ps1 on the share.
Mirror the existing pattern for run-enrollment.ps1 / wait-for-internet.ps1
/ migrate-to-wifi.ps1 (all already at /srv/samba/enrollment/scripts/).
Add the new file to the playbook's enrollment-scripts copy loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related changes so the /imaging dashboard reaches 100% and so the
operator can see why POSTs are not arriving when a session stalls.
Monitor-IntuneProgress.ps1:
* After sync-complete.txt is written (DSC + lockdown done) fire a
final Send-PxeStatus -StageIndex 8 -StageTotal 8 -Status 'succeeded'
+ IntuneDeviceId. Previously the script exited without any final
status push, so even a perfect run capped at idx=7 / 87.5%. The
session now reaches 8/8 / 100% green when imaging actually finishes.
Send-PxeStatus.ps1:
* Log EVERY POST attempt (both success and failure) to C:\Logs\
send-pxe-status.log with idx, status, stage name, and either the
HTTP code on success or the exception message on failure. Was
previously silent-on-success, log-on-failure. Operator can now
correlate dashboard state to actual outbound activity:
OK idx=2/8 status=in_progress http=200 stage='Run-ShopfloorSetup: starting'
ERR idx=2/8 status=in_progress uri=http://10.9.100.1:9009/... err=Unable to connect
* Errors still swallowed - imaging never blocks on a failed status push.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing _ntlars-backups/*.reg files are used by the automated
imaging pipeline. They are REGEDIT5 UTF-16 with explicit WOW6432Node
in the key path. That works for automated reg-import during install
but the NTLARS Load... button in the NTLARS settings dialog rejects
them - NTLARS expects:
* REGEDIT4 ANSI / CRLF / no BOM
* Bare path: HKLM\SOFTWARE\GE Aircraft Engines\DNC\...
(no \WOW6432Node\ - NTLARS is 32-bit, Windows redirector handles
the mapping transparently when NTLARS reads/writes)
* No semicolon comment header
This commit generates the parallel 147-file tree at
playbook/shopfloor-setup/_ntlars-backups-manual/ derived from the
existing _ntlars-backups/ files. Content (FMSHostPrimary +
FMSHostSecondary edits from df443d5 + 802d85e) is preserved; only
format and path are transformed. Both trees coexist - automation
continues to pull from _ntlars-backups/, operators use _ntlars-
backups-manual/ when they need to Load... a bay's reg manually.
Also live at /srv/samba/enrollment/shopfloor-setup/_ntlars-backups-
manual/ on the PXE server (\\10.9.100.1\enrollment) for share-based
access by the operator.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reverts the 2026-04-24 no-op stub. Empirically the gateway-suppression
fix (dnsmasq dhcp-option=3/=6 empty) alone is NOT sufficient to keep
Windows from using the wired NIC for Intune Device Configuration / DSC
traffic. Even with no default route on wired AND the unattend's
InterfaceMetric trick (WiFi=10, Wired=100), the bay stalls on the DSC
phase until the wired cable is physically unplugged.
Restoring the prior body that disables non-WiFi NICs at first logon
post-PPKG. Gated on Get-NetAdapter for a Wi-Fi/Wireless/WLAN/802.11
adapter - towers without WiFi stay on ethernet (the only-NIC scenario
where disabling would hang first logon). Falls back to re-enabling
ethernet if login.microsoftonline.us:443 doesn't respond within 5 min.
Monitor-IntuneProgress.ps1 already has the symmetric re-enable
("Re-enable any wired NICs that Order 5 disabled") at the start of its
monitor loop, which kicks in after DSC creds land. Net effect: wired
disabled during the DSC fetch window, re-enabled by the time eDNC
autostart needs the local NIC.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VM-test-confirmed root cause for the imaging "Setup wizard prompt"
hang. The bundled MSI has CustomAction caDriverInstall_x64 that
invokes dpinst.exe at install time:
caDriverInstall_x64 type=3666 (exe, deferred) Source=dpinst.exe
Condition: (Not Installed) And VersionNT64 And (VersionNT>=601)
Sequence: 6505
The bundled dpinst.xml is minimal (<dpInst><legacyMode/></dpInst>) -
no <quietInstall/> directive - so dpinst pops its wizard and waits
for operator click-through. /qn on the parent MSI does NOT propagate
into the dpinst child process. pnputil pre-staging the INF + cert
pre-trust to TrustedPublisher does NOT prevent the CA from firing
(the CA runs unconditionally on first install regardless of
DriverStore presence).
Fix: msibuild patch the MSI's InstallExecuteSequence to set the
action's Condition column from
"(Not Installed) And VersionNT64 And (VersionNT>=601)"
to
"0"
which evaluates false on every install attempt - the action never
fires, dpinst never runs, no wizard pops.
The driver itself is now installed exclusively by:
1. our pnputil pre-stage in 09-Setup-Keyence.ps1 (already there),
2. the manifest's separate "KEYENCE VR Series USB Driver" INF entry.
End-to-end VM test: 36s, exit 0, VR-6000 DisplayVersion 4.3.7
detected, zero dpinst processes at finish.
MSI size 1751552 -> 1744896 bytes (msibuild table rewrite).
md5: c6edcfc6c6808617598bcb7a15072a30.
Backups of original MSI on live PXE server enrollment share + local
SFLD share mirror as .bak-<TS>-orig.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the stage indices reflected logical milestones but not the
order they fire in. Run-ShopfloorSetup posted idx=1 (start) and idx=4
(PPKG) - but 09-Setup-Keyence (inside per-type loop) ran BETWEEN them
and posted idx=5/6. The dashboard then "regressed" from 6 back to 4
when PPKG fired, making it look stuck at the per-type-complete card.
New numbering matches actual execution order:
1 - WinPE: PESetup / WIM apply (startnet.cmd)
2 - Run-ShopfloorSetup: starting (Run-ShopfloorSetup.ps1)
3 - 09-Setup-<Type>: starting (per-type)
4 - 09-Setup-<Type>: complete (per-type)
5 - Run-ShopfloorSetup: PPKG enrollment (Run-ShopfloorSetup.ps1)
6 - Run-ShopfloorSetup: handoff to Monitor (Run-ShopfloorSetup.ps1)
7 - Monitor-IntuneProgress: Intune Device ID captured
services/imaging_status.py rewind threshold reverts to stage_index <= 1
now that WinPE startnet posts idx=1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reset trigger previously fired only when a new POST landed at idx <= 1,
which meant a reimage didn't reset the dashboard card until
Run-ShopfloorSetup ran post-PPKG (~10-20 min in). With the WinPE-phase
status push from startnet.cmd in commit 4e018fe firing at idx=2, that
earlier signal needs to count as a new-run marker too.
Threshold of 2 makes startnet.cmd the canonical reset point: within
seconds of PXE menu choice on the bay, the dashboard card flips from
the previous run's high-idx state back to "WinPE: PESetup / WIM apply"
+ fresh started_at.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the filename, type, and size columns from the Blancco Reports
list - operators want bay-identification fields, not file metadata.
Filename moves to a row hover tooltip (title attribute) so it is still
recoverable for ad-hoc lookups.
Adds a Result column derived from each XML report's overall erasure
state:
* Successful -> green badge (all erasure entries report Successful)
* Failed -> red badge (any erasure entry reports a non-Successful state)
* other -> grey badge with the verbatim state
* blank/non-XML -> dash
The state roll-up lives in the blancco_reports route's per-file parse
loop next to the existing serial/model extraction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
services/imaging_status.py - if a new POST arrives with stage_index <= 1
that is lower than the cached stage_index, OR the previous run already
finished (status=succeeded|failed), reset the session: clear log_tail,
mint a fresh started_at, drop the status field so the in_progress
default re-applies. Preserves serial + records the previous run's
last_updated under previous_run_at for audit. Without this, a reimage
on the same bay would leave a stale 6/8 "succeeded" card visible until
the new run progressed past that index.
playbook/startnet.cmd - one-line PowerShell POST after the PXE menu
choice + enrollment-share mount, before PESetup.exe waits to start.
Captures BIOS serial via wmic, MAC via Get-NetAdapter, and posts:
stage_index=2, current_stage="WinPE: PESetup / WIM apply".
Best-effort; try/catch swallows any network failure so a missing
webapp never blocks imaging. PXE clients will now appear on the
/imaging dashboard during WinPE phase instead of only post-PPKG.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the imaging-progress helper into the three PC-type setup scripts
that were either clean (CMM) or untracked (Common, Heattreat). Each
gains two calls per the pattern committed for Keyence in 9122b28:
* idx 5/8 - "09-Setup-<Type>: starting" right after the session start banner
* idx 6/8 - "09-Setup-<Type>: complete" just before the completion banner
Display, Genspect, and WaxAndTrace also got the same two-line additions
locally and on the live server, but those files have pre-existing WIP
edits intermixed so they aren't staged here. They'll travel along
when the operator commits their unrelated shopfloor work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds end-to-end progress tracking for PXE imaging sessions and surfaces
each Blancco report's BIOS serial in the report list.
webapp:
* services/imaging_status.py - JSON-per-serial state store under
IMAGING_DIR (default /var/log/pxe-imaging). Atomic write via
tempfile + rename. log_tail capped at 50 lines. Merges partial
updates so clients can post just the current_stage tick.
* config.py - new IMAGING_DIR env-overridable path.
* services/csrf.py - explicit exempt list for machine-to-machine
endpoints; /imaging/status is the first entry. Air-gapped LAN;
trust-by-network for client posts.
* app.py - four new routes:
GET /imaging dashboard (renders all sessions)
POST /imaging/status client status push (JSON body)
GET /imaging/<serial>.json raw session JSON for ad-hoc polling
POST /imaging/delete/<s> clear a session from the dashboard
Also parses each Blancco XML in the /reports list to surface
system.serial + system.model columns.
* templates/imaging.html - Bootstrap dashboard with per-session
cards (state badge, progress bar, stage idx/total, mac, elapsed,
log tail). meta http-equiv refresh=5 for auto-tick.
* templates/base.html - new "Imaging Progress" nav entry.
* templates/reports.html - Serial + Model columns added.
playbook:
* shopfloor-setup/Shopfloor/lib/Send-PxeStatus.ps1 - new helper.
Dot-source this then call Send-PxeStatus -Stage X -StageIndex N
-StageTotal M from any stage script. BIOS serial via CIM, MAC via
Get-NetAdapter, pctype + machinenumber from C:\Enrollment.
Failures are swallowed to a local log so a network blip doesn't
block imaging.
* shopfloor-setup/Run-ShopfloorSetup.ps1 - dot-sources helper +
posts at three coarse milestones (start, PPKG enrollment,
handoff to Monitor-IntuneProgress).
* shopfloor-setup/gea-shopfloor-keyence/09-Setup-Keyence.ps1 -
posts at session start + after Install-FromManifest with
succeeded/failed status derived from $rc. Other 09-Setup-*.ps1
scripts can follow the same pattern.
ID is BIOS serial (stable across WinPE -> Windows transition and
across reboots, unlike hostname which is random pre-PPKG). Operator
already knows the serial of the bay they imaged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VR-6000 Series Software.msi is an InstallShield MSI that references
Data1.cab in the same directory for its compressed payload. The cab was
never staged into the repo's keyence installers/ dir, so msiexec exited
1603 with "SECREPAIR: Failed to open the file ... Data1.cab" on every
imaging run (see Logs/Keyence/install.log on a failed bay for the
canonical signature). Only the 1.75 MB MSI was committed; the 560 MB
cab lives on the GE-Enforce SFLD share at
tsgwp00525\sfld$\v2\shared\dt\shopfloor\gea-shopfloor-keyence\apps\.
This commit doesn't add the cab itself (560 MB; same gitignore convention
as PrinterInstallerMap.exe and other large binaries). Instead it pins the
staging requirement in two places:
* .gitignore: explicit entry with the SFLD share path so a future
operator wiring up a fresh PXE server build knows where to source it.
* keyence-manifest.json _comment: documents the dependency next to the
MSI declaration that needs it.
The local repo at /home/camp/projects/pxe now has the cab staged in
playbook/shopfloor-setup/gea-shopfloor-keyence/installers/ for the next
USB build. Rebuilding the Keyence image and re-imaging the failed bay
should now reach DisplayVersion 4.3.7 detection successfully.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end fixes for Blancco Drive Eraser PXE flow uncovered by chasing
"reports never reach SMB share" across two air-gapped sites:
playbook/blancco-init.sh:
* Drop silent || true on wget of preferences.xml + config.xml. Fail
loud with shell-drop if download or marker grep fails. Background:
airootfs /opt/scripts/validate_preferences.sh restores
/albus/preferences.save (factory defaults, empty network_share) if
xmllint fails. wget failure made every report silently land nowhere.
* Clobber /albus/preferences.save with the same served file so even if
the validator fallback fires, the SMB target survives.
* Bind-mount /dev/null over /sys/power/{state,disk,mem_sleep,autosleep}
before switch_root. Albus's license-retry path writes /sys/power/state
directly (bypassing systemd targets); this is the last-line block.
* /dev/null symlinks for sleep/suspend/hibernate systemd targets in the
airootfs overlay + logind drop-in with IdleAction/Handle*=ignore.
Three independent layers because cmdline systemd.mask alone is bypassed
by direct /sys/power/state writes.
* xinitrc.d/00-no-screen-blank.sh runs xset s off -dpms + setterm
-blank 0 -powerdown 0 so the Blancco GUI doesn't blank during long
erasures.
* Removed the 20-failsafeDriver.conf "modesetting" pin. modesetting
needs DRM/KMS which we disable on kernel cmdline; "vesa" also failed
on NVIDIA. With the pin gone Xorg auto-picks fbdev which uses the
kernel framebuffer from vga=normal - works across Intel, AMD, and
older NVIDIA without nouveau.
playbook/pxe_server_setup.yml:
* dnsmasq.conf: explicit empty-value dhcp-option=3 + dhcp-option=6.
Without them, dnsmasq defaults to sending its own IP as router AND
DNS. Commenting the configured-value lines did NOT disable the push
(root cause of "wired keeps picking up 10.9.100.1 as gateway").
* Split the Blancco config.img extraction and preferences.xml deploy
into separate tasks. The previous shell-with-creates: gate caused
playbook re-runs to skip the prefs deploy entirely after first run.
* Added a validation task that runs python3 xml parse + grep on the
deployed preferences.xml to fail the playbook at deploy time if the
SMB markers are missing.
* Added Environment=TZ=America/New_York to the pxe-webapp systemd
service so report mtimes and audit log render in Eastern time even
if the Python process is started before timedatectl converges.
webapp:
* services/blancco_report.py: parse Blancco's XML report format
(recursive <entries name="..."> walker) into a friendly dict.
* templates/report_view.html: Bootstrap "Drive Erasure Certificate"
layout - hero summary, customer + system cards, per-drive cards with
step-by-step erasure timeline, document signing footer with
integrity hash detail.
* /reports/view/<filename> route + View button on the reports list
(XML reports only; PDFs still download).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Legacy-BIOS PXE clients booting Blancco reported "NBP is too big to
fit in free base memory". Cause: dnsmasq unconditionally served
ipxe.efi (~675KB EFI binary) which legacy BIOS PXE ROMs cannot
execute and which exceeds their NBP cap.
Fix:
- Add undionly.kpxe (~70KB BIOS-mode iPXE, from boot.ipxe.org).
- dnsmasq: dhcp-match on option:client-arch,0 (BIOS) -> undionly.kpxe;
default (everything else, including UEFI x86_64 arch 7 and 9) keeps
getting ipxe.efi. Tag form is reversible: if the match fails to
evaluate, fallback is the working EFI path, not the new binary.
- Ansible TFTP-copy loop: mirror undionly.kpxe alongside ipxe.efi.
- .gitignore exception: track the open-source kpxe binary so the
air-gapped USB build stays self-contained.
UEFI clients unchanged. Blancco/Clonezilla/WinPE chain after the
iPXE menu is identical regardless of which iPXE variant delivered it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tech catches a PC imaged with a wrong machine number. Previously the
share restore (NTLARS .reg + UDC settings + UDC live data) only fired
on the placeholder->real transition, so a real->real change rewrote
only UDC JSON, eDNC reg, and MTConnect Devices.xml - leaving the wrong
NTLARS config in place.
Update-MachineNumber.ps1: replace the placeholder-only guard with an
any-change guard so the share restore block fires on reassign too.
The existing one-shot migrated/ consumption keeps live-data restore
idempotent. Also writes C:\Enrollment\machine-number.txt to keep
imaging-time scripts in sync.
Set-MachineNumber.ps1 (both collections + nocollections): show a
confirmation dialog when reassigning between two real numbers, naming
old/new and listing what gets pulled. Audit each call to
C:\Logs\Shopfloor\reassign.log.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step-by-step runbook for techs at the imaged PC. Two symptoms covered:
UDC not collecting data (admin-unlock + COM port walkthrough) and DNC
not pushing to controller (NTLARS reg restore + FMS Host FQDN + Realtek
PCIe GbE static IP). Mermaid overview links to each section. Live HTML
uses CDN; static HTML pre-renders SVG for offline / printable use.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The GE Aerospace Shop Floor MCE image type was unused (never had
content on the live PXE server) and cluttered the dashboard. The
GE Legacy Shop Floor MCE entry stays.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The playbook deploys SFLD provisioning packages to
/srv/samba/enrollment/ppkgs/ but the /enrollment route scanned the
share root. Result: every visit reported "no enrollment packages
found" even though three .ppkg files were present.
Add ENROLLMENT_PPKG_DIR (defaults to {ENROLLMENT_SHARE}/ppkgs,
overridable via env var) and point all four enrollment routes at it
(list, upload, download, delete).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Configure static IP for PXE interface" task writes a flat
single-NIC netplan config. Live PXE server (10.9.100.1) overrides
this with a bridge config bonding the USB-C 5 Gbps NIC and the
onboard NIC into br-pxe, because the onboard NIC alone cannot
sustain the imaging throughput required by the floor.
Add a comment block above the task warning that re-running it on
a box already using the bridge config will replace the bridge with
a flat single-iface config and likely break the PXE LAN. The full
bridge YAML is included for reference. Recovery is via
/etc/netplan/50-cloud-init.yaml.pre-gold-swap (preserved by netplan
backup).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Models.txt entry maps "7080" substring (matches WMI csproduct name
"OptiPlex 7080") to OptiPlex_7080_1.37.0.exe. BIOS .exe already
deployed to /srv/samba/winpeapps/_shared/BIOS/ on the live PXE
server via download-drivers.py.
Also adds docs/geastandardpbr-overrides.md tracking the local
geastandardpbr/ edits (user_selections.json + HardwareDriver.json
get a 7080 entry under "D11 OptiPlex Family") that the gitignore
prevents from being tracked directly. Includes a Python snippet
to idempotently re-apply after a fresh USB import.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three related fixes:
1. Hard-coded BIOS push path was /srv/samba/enrollment/BIOS, which does
not exist on the live PXE server. Real path is the shared
/srv/samba/winpeapps/_shared/BIOS/ where check-bios.cmd lives and
playbook task pxe_server_setup.yml:485 deploys Flash64W.exe + the
per-model BIOS .exe files.
2. Generated models.txt was 2-column (ModelSubstring|BIOSFile) but
check-bios.cmd reads tokens=1,2,3 with delims=| and uses field 3
for the version compare. Without the 3rd column, the version-skip
logic never engages and every imaged PC re-flashes BIOS on every
boot. Now writes 3-column (ModelSubstring|BIOSFile|Version).
3. The script overwrote the live models.txt with only the entries it
touched in the current run. Live had 50+ entries; a single-model
run wiped the other 49. Now prints the lines and asks the operator
to merge them into playbook/shopfloor-setup/BIOS/models.txt and
re-deploy via scripts/deploy-bios.sh.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Display kiosk user cannot authenticate to the tsgwp00525 SFLD share,
so any share-dependent enforcement task on Displays would fail every
cycle. Display is now self-contained: kiosk EXE installs at imaging
time via preinstall.json (Install-KioskApp.cmd) and Edge kiosk
policies via 09-Setup-Display.ps1. No ongoing SFLD-share dependency.
Gate both registrations behind a $noEnforceTypes alias group so
either pcType form (Display, gea-shopfloor-display) hits the skip
path. Other PC types still register both tasks unchanged.
Verified on win11 VM: matrix test confirmed Display + gea-shopfloor-
display SKIP both gates while Standard / CMM / gea-shopfloor-
collections still REGISTER.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Block run-enrollment when this PC has no WiFi adapter and no default
route. PXE imaging LAN has no DHCP gateway, so towers without WiFi
get stuck in PPKG enrollment (AAD + Intune endpoints unreachable)
and require a re-image. Recurring failure mode observed 2026-05-05.
Tech-facing R/X retry+abort prompt walks them through plugging into
a corp wall jack.
Replace plain post-PPKG reboot with handoff to Monitor-IntuneProgress
-PostPpkg: cancel the pending shutdown timer, run a 180s settle so
MDM can push the baseline policy, render live status during settle,
then issue a clean reboot. The persistent @logon sync_intune task
resumes tracking on the next boot.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Filter by PhysicalMediaType + HardwareInterface instead of
InterfaceDescription regex. Name/description varies per vendor
(Realtek Gaming GbE, Intel I219-V, etc.) so a name-only filter
missed adapters on some hardware. Keep an InterfaceDescription
negative guard for drivers that mis-report PhysicalMediaType.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>