Skip to content

methodic — design

methodic is the Python client for the Chronicle experiment platform's REST API. It is the same client menlo-park (Chronicle's first-party training runner) uses; extracting it gives third-party runners a supported integration path and gives researchers a Python-native way to drive experiments.

Scope

Two audiences, one client:

  • Researchers — experiment lifecycle (create / commit / conclude / retract), variation management, lineage walks, full-text search across research docs and arxiv assets, asset metadata.
  • Runners — pull variation config at run start, transition runs through pending → running → succeeded/failed, upload outputs, send heartbeats, download resume checkpoints.

The client does not know about training, models, datasets, or PyTorch state. Those belong in menlo-park (or whatever runner the adopter is building).

API shape

The top-level export is methodic.Chronicle. Construct one per process and reuse it for both researcher and worker operations.

from methodic import Chronicle, Run, UploadTracker

Every endpoint lives on a namespace (chronicle.experiments, chronicle.variations, chronicle.search, chronicle.runs, chronicle.assets). Each namespace is a class that takes the underlying transport as a dependency, so adding the next 20 endpoints means adding methods to a namespace, not to the top-level class.

Resource handles (Experiment, Variation, Run) are sugar over the namespaces. They hold an id (and optionally a cached server response) and proxy mutations into the relevant namespace, returning self so callers can chain:

exp.commit().variations.create(config_yaml="...")

Mutations invalidate the cached server data internally; the next attribute access (exp.committed_at, exp.state) re-fetches transparently. Users do not call refresh().

Namespaces vs handles

The namespace methods are canonical and complete: chronicle.experiments.commit(exp_id) works without a handle. Handles are optional ergonomics, not mandatory state. Tests, scripts that hold an id but no handle, and codepaths that walk lists of ids all use the namespace form directly.

Type system

Response types are plain dataclasses (Experiment, Variation, ExperimentDetail, LineageResponse, SearchResponse, etc.) with from_dict classmethods that ignore unknown keys — server-side schema additions don't break older clients. Fields the server treats as arbitrary JSON (config_json, accelerate_config_json, launch_config, run environment snapshots) stay as dict[str, Any]. UUIDs and timestamps come through as strings (no uuid.UUID / datetime coercion), matching the JSON wire format.

Authentication

Chronicle accepts both Auth0 access tokens (researcher use) and API keys with the sk_<type>_<id><secret> format (agents and workers). The Python client doesn't distinguish — it puts whatever string you pass as api_key into the Authorization: Bearer header.

Asset upload protocol

Three-step for binary assets (the upload itself goes to cloud storage, not Chronicle):

  1. POST /assets with presign: true → Chronicle returns presigned upload URLs keyed by component name.
  2. PUT each component to its presigned URL.
  3. PUT /assets/{id}/finalize → asset transitions to ready (immutable).

For small inline payloads (research reports, environment snapshots), POST /assets accepts the content directly and Chronicle auto-finalizes.

The Run.register_and_upload_async flow uses an UploadTracker (local SQLite, WAL mode) for crash recovery: components are registered before upload begins, so a restarted process can detect incomplete uploads and resume.

Run lifecycle

get_variation_config()  →  start()  →  [heartbeat() loop]  →  succeed() | fail()
                                       →  [upload_asset() as outputs are produced]

succeed() waits for pending background uploads before reporting success. fail(reason="abandoned") is for user/agent cancels; fail(reason="crash") for unrecoverable errors. The 15-minute heartbeat watchdog is enforced server-side.

Errors

The client raises typed exceptions for non-2xx responses: BadRequestError (400/422), AuthenticationError (401), PermissionDeniedError (403), NotFoundError (404), ConflictError (409), ServerError (5xx). All inherit APIErrorChronicleError. Search returns 503 when Vertex AI Search isn't configured server-side; the SDK surfaces that as ServerError(status_code=503) so callers can branch.

Backwards compatibility

The class was originally methodic.Client (and before that menlo_park.client.ChronicleClient). With the researcher API expansion the top-level class is methodic.Chronicle. Run-lifecycle methods moved off the top-level class onto Run (run.start() instead of client.start_run()). No deprecation alias — the package hasn't been released to PyPI yet, so callers update once.

Release process

methodic is published to PyPI from this monorepo via GitHub Actions (.github/workflows/methodic-lib-publish.yml) using PyPI's trusted publishing (OIDC) — no long-lived API tokens.

Tags follow the prefix methodic-lib-v* (e.g., methodic-lib-v0.1.0). The lib infix is intentional: the methodic-v* prefix is reserved in case the Chronicle platform itself is later renamed to methodic, at which point platform releases would naturally take that prefix. Keeping the SDK on methodic-lib-* from day one avoids a future tag-collision migration.

Non-goals

  • Async (httpx.AsyncClient) — the runner contract is sequential and the threadpool covers async uploads. Async can be layered on later without breaking the sync API.
  • Retries / backoff — wrap calls yourself. Adding retry logic that's correct for every endpoint (idempotent vs. mutating, server-side rate limits) belongs in the application, not the client.
  • Caching beyond per-handle data — handle-level cache is invalidated on mutation; we don't keep a cross-handle entity store.