Files
shopdb-flask/docs/adr/ADR-006-collector-contract.md
cproudlock d4e3ac9fc8 Phase 5: Alembic baseline, per-site deploy, ADRs to docs/adr
Migration runner ready and a sister site can deploy from a clean
checkout with one .env file.

ADRs relocated (migrations/adr/ -> docs/adr/):
- migrations/ is now Alembic territory, not docs.
- All cross-references updated: CLAUDE.md, docs/PLUGIN-HOOKS.md,
  docs/PLUGIN-QUICKSTART.md.

Alembic initialized (migrations/):
- env.py, script.py.mako, alembic.ini copied from Flask-Migrate
  templates so `flask db migrate` and `flask db upgrade` work without
  a one-time `flask db init` (which would clash with the existing
  migrations/ directory).
- Baseline migration generated via autogenerate, captures all 47
  tables (core models + 6 plugins) as the upgrade target. Ready for
  per-site `flask db upgrade` from an empty schema.

Deploy artifacts:
- Dockerfile: python:3.12-slim base, gunicorn server, non-root user,
  healthcheck against /api/auth/login. Single image bundles all six
  plugins; sites enable via `flask plugin install <name>`.
- docker-compose.yml: MySQL 8 + API container, healthcheck-gated
  startup, env-driven secrets that fail loud on missing values
  (`${SECRET_KEY:?}` form).
- .env.example: full env-var inventory with comments. Calls out
  required vs optional. Matches what ProductionConfig.validate
  enforces.

docs/DEPLOY.md:
- Step-by-step per-site runbook: clone, configure .env, bring up
  stack, run migrations, seed reference data, install plugins,
  create admin, front with TLS, backups, updates.
- Common-issues table.
- Cross-links to ADR-004 (per-site rationale), ADR-003 (plugin
  distribution), and the config source.

Skills:
- migrating-asset-schema: Alembic + one-shot data migration policy.
  Rules: additive first, renames are three steps, destructive ops
  need rollback, equipment migration filter per ADR-001 + ADR-005.
- hardening-flask-config: production validation, CORS allowlist
  policy, JWT cookie hardening, per-site deploy isolation per ADR-004.

CLAUDE.md updated to reflect the post-Phase-5 state. No tests added
this commit; the Alembic baseline is exercised by the existing
db.create_all-based test suite (tests do not touch the migration
runner; that's by design until per-plugin migrations land).

Test count unchanged: 101 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:56:19 -04:00

5.5 KiB

ADR-006: Plugin collector contract pattern

  • Status: ACCEPTED
  • Date: 2026-05-08
  • Deciders: cproudlock
  • Supersedes: none

Context

PC inventory data was collected by PowerShell scripts pushing to /api/collector/pc (shopdb/core/api/collector.py, ~374 LOC). The endpoint is hardcoded for PCs: it accepts a fixed schema and writes to the legacy Machine model.

Per ADR-001, Machine is being retired in favor of Asset. Per the project shift to PXE-driven imaging, PC inventory is moving to a new collection pipeline (PXE / GE-Enforce / manifest engine produces JSON about each PC). Other asset classes may want similar collector pipelines (printers via Zabbix, network gear via SNMP scan).

This calls for a generalizable contract: any plugin that wants to accept external collector input declares a JSON schema, and the framework wires the endpoint, auth, and idempotency.

Decision

BasePlugin gets one new optional hook:

def get_collector_schema(self) -> Optional[dict]:
    """Return JSON Schema describing the collector payload for this plugin.
    Return None if the plugin does not accept collector input.

    The schema must include:
    - 'identityfield': name of the field that uniquely identifies an asset
      across submissions (e.g., 'hostname' for PCs, 'macaddress' for network
      devices). Used for idempotent upsert.
    - 'fields': JSON Schema definitions for the rest of the payload.
    """
    return None

Plugin loader auto-registers an endpoint at /api/collector/<pluginname> for each plugin returning a schema. Auth is API-key, separate from JWT. Per-plugin keys via env vars:

  • COLLECTOR_API_KEY_<PLUGINNAME> (preferred, plugin-specific)
  • COLLECTOR_API_KEY (fallback, shared)

Idempotent upsert

The endpoint uses the identityfield to find an existing Asset for the same identity. Found = update. Not found = insert. Existing relationships are preserved on update.

Response contract

{
  "status": "ok",
  "action": "created" | "updated" | "noop",
  "assetid": 12345,
  "identityvalue": "PC-1234",
  "warnings": []
}

Audit logging

Every collector submission produces an audit log entry: {action, plugin, identityvalue, before/after diff}. Audit retention per site policy.

Schema discovery

The framework exposes the registered schemas at /api/collector/_schemas (read-only, JWT-protected) so external collector authors can introspect what payloads are accepted by which plugins.

Concrete first user: computers plugin

The computers plugin is the first to implement get_collector_schema. The PXE pipeline conforms.

Initial computers collector schema (sketch, finalized when plugin is built):

{
  "identityfield": "hostname",
  "fields": {
    "hostname": "string, required",
    "macaddress": "string, optional, secondary identity",
    "osname": "string",
    "osversion": "string",
    "lastboottime": "datetime",
    "currentuser": "string",
    "ipaddress": "string",
    "memorygb": "number",
    "cputype": "string",
    "imagename": "string (PXE image deployed)",
    "imageappliedat": "datetime",
    "installedsoftware": "array of {name, version}"
  }
}

The PC re-image case is handled by the identity field: a freshly imaged PC keeps its hostname, so the existing Asset row is updated rather than duplicated. Existing AssetRelationship rows pointing at that PC (e.g., controls to a machine) are preserved across re-images.

Migration of the existing endpoint

shopdb/core/api/collector.py (/api/collector/pc) is deprecated in v1 and removed before v1.0.

Migration path:

  1. Implement get_collector_schema on the computers plugin. New endpoint /api/collector/computers is auto-registered.
  2. Run both endpoints in parallel for one cycle of PXE imaging across the floor. PXE pipeline switches to /api/collector/computers.
  3. Remove shopdb/core/api/collector.py and the legacy blueprint registration.

Consequences

Positive

  • Generalizable across plugins. Sister sites adopting printers, network, etc. can wire their own collectors with no core change.
  • Identity-based idempotency makes PC re-imaging safe by default.
  • Audit logging is uniform across plugins.
  • Schema discovery enables external tools to validate before submission.

Negative

  • Plugin authors must write a JSON schema. Slight learning curve, but JSON Schema is widely understood and the framework can ship a few examples.
  • The /api/collector/_schemas endpoint plus per-plugin endpoints expand the public API surface; minor maintenance cost.

Neutral

  • API-key auth pattern stays as it is today (separate from JWT). Sites manage their own collector keys per plugin via env vars.

Alternatives considered

  1. Keep /api/collector/pc and add new plugin-specific endpoints alongside. Two ways to send PC data, plugin authors confused. Rejected.
  2. Use JWT for collectors instead of API key. Collectors are headless processes (PXE pipeline, scripts), not interactive users. JWT lifecycle (refresh tokens, expiry) is the wrong tool. API key is simpler. Rejected.
  3. Plugins write directly to the database, no collector endpoint. Skips audit logging and schema validation. Rejected.

References

  • shopdb/core/api/collector.py (legacy endpoint to be removed)
  • shopdb/plugins/base.py (get_collector_schema hook to be added)
  • ADR-001 (asset model the collectors target)
  • ADR-002 (collector schema is part of plugin contract; changes to the hook signature are major bumps)
  • The PXE project (/home/camp/projects/pxe/) which feeds the computers collector