Files
pxe-server/scripts/diagnostics/snapshot-runbook.txt
cproudlock ce604adcda Renumber PXE LAN from 10.9.100.0/24 to 172.16.9.0/24
Single-site bay-stuck issue at WJ: GE Intune Report IP script filters
Get-NetIPAddress on StartsWith("10.") and posts everything matching
to the GE Tines webhook. Bays at WJ get the PXE LAN 10.9.100.x IP
captured and reported -> GE backend tags bays as on a non-corp 10.x
subnet -> dynamic group eligibility for SFLD policy never matches.
Other GE sites work because their PXE LANs aren't on 10.x at all.

Renumber PXE LAN to RFC1918 172.16.9.0/24 so the GE filter naturally
skips wired PXE addresses without any disable-NIC dance.

Server-side already in flight (netplan dual-bound, dnsmasq scope +
boot URL repointed, blancco preferences + grub.cfg + iPXE GetPxeScript
all sed'd to 172.16.9.1). This commit is the playbook / scripts /
docs side: 109 hits across 35 files sed'd in one shot.

After this lands + boot.wim is rebuilt + bays renumber off DHCP,
the 10.9.100.1 binding will be dropped from netplan as the final
cleanup step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:30:32 -04:00

122 lines
4.8 KiB
Plaintext

SFLD imaging-lifecycle snapshot runbook
========================================
Run all three snapshots on the imaged device (elevated PowerShell). Each
captures registry, files, logs, scheduled-task state, and event logs at a
distinct lifecycle checkpoint so the deltas between them isolate which
phase delivered (or failed to deliver) each component.
Prereq: device is in supportuser auto-logon, just finished PPKG bulk
enrollment, and is enrolled to Intune but no device category assigned yet.
----------------------------------------
0. Map share + stage script (run once, at the start)
----------------------------------------
net use Z: \\172.16.9.1\image-upload /user:pxe-upload pxe /persistent:no
Copy-Item Z:\Capture-LockdownState.ps1 C:\Windows\Temp\
Set-ExecutionPolicy -Scope Process Bypass -Force
----------------------------------------
1. Snapshot BEFORE assigning device category
----------------------------------------
State:
- PPKG ran, enrolled to Intune
- Device sitting in SupportUser, no category assigned in portal yet
- Win32Apps + DSC profiles tied to category have NOT delivered
C:\Windows\Temp\Capture-LockdownState.ps1 -Stage pre-category
Output: C:\ProgramData\state-pre-category-<stamp>\
----------------------------------------
2. Assign device category in Intune portal
----------------------------------------
- Intune portal -> Devices -> Windows -> [this device] -> Properties
- Set Device category to MAIN (or whichever is correct)
- Wait ~5-10 min for sync (or force sync via Settings -> Accounts ->
Access work or school -> Info -> Sync)
----------------------------------------
3. Snapshot AFTER category, BEFORE lockdown
----------------------------------------
State:
- Category lands, dynamic-group membership evaluates
- SFLD-DSC Win32App (or whatever the category-driven config is) has had
a chance to download, install, write registry, schedule its task
- Lockdown has NOT yet flipped Winlogon DefaultUserName from SupportUser
to ShopFloor (i.e., still in tech / setup mode)
C:\Windows\Temp\Capture-LockdownState.ps1 -Stage post-category
Output: C:\ProgramData\state-post-category-<stamp>\
----------------------------------------
4. Wait for lockdown to finish landing
----------------------------------------
Watch for the two terminal signals (per Monitor-IntuneProgress.ps1
notes):
- HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon
DefaultUserName flips from "SupportUser" to "ShopFloor"
- AssignedAccess kiosk profile becomes active
----------------------------------------
5. Snapshot AFTER lockdown
----------------------------------------
C:\Windows\Temp\Capture-LockdownState.ps1 -Stage post-lockdown
Output: C:\ProgramData\state-post-lockdown-<stamp>\
----------------------------------------
6. Copy all three snapshots back to PXE
----------------------------------------
Get-ChildItem 'C:\ProgramData\state-*' -Directory |
Where-Object Name -match '^state-(pre-category|post-category|post-lockdown)-' |
ForEach-Object {
robocopy $_.FullName "Z:\$($_.Name)" /E /NFL /NDL /NJH /NJS
}
net use Z: /delete /y
The three folders land at \\172.16.9.1\image-upload\state-*-<stamp>\.
On the workstation: pull from /home/pxe/image-upload/ on the PXE server
(scp or local mount) and diff against any prior baseline (e.g. the
4/15 v1.3.1 working snapshot at pxe-images/state-post-lockdown-20260415-154705/).
----------------------------------------
What each diff reveals
----------------------------------------
pre-category -> post-category
- Which Win32Apps Intune assigned via the category
- Whether SFLD-DSC bootstrap actually ran (DSCDeployment.log,
HKLM:\SOFTWARE\GE\SFLD\Credentials\baseVersion)
- Whether sastoken.txt was present in the IMECache (IMECache-Files.csv)
- Scheduled task SFLD-ApplyDSCConfig - was it created? Did it run?
What was its last result code? (Tasks-RunHistory.csv)
- Outbound MDM events: 429 throttles, AAD failures
(DeviceManagement-Events.csv)
post-category -> post-lockdown
- Lockdown DSC delta: AssignedAccess kiosk config, AppLocker rules,
Winlogon flip, autologon change
- Final registry state for HKLM:\SOFTWARE\GE\SFLD\Credentials\* +
HKLM:\SOFTWARE\GE\SFLD\DSC (Site, Environment, Function, SasToken)
- Final PolicyManager state (which Intune profiles fully landed)
----------------------------------------
Key files to look at first when comparing
----------------------------------------
SFLD.reg -> creds + DSC values landed?
IMECache-Files.csv -> sastoken.txt present in Win32App content?
DSCDeployment.log -> bootstrap version + warnings
Tasks-RunHistory.csv -> SFLD-ApplyDSCConfig LastRunTime + LastTaskResult
DeviceManagement-Events.csv -> 429s, AAD token failures, sync stalls
GE-WOW6432.reg -> baseVersion (1.3.1 = working, 2.0.2 = broken)