Loom Databricks Job Editor — Parity build spec¶

Reference: Azure Databricks Workflows / Lakeflow Jobs UI (adb-<id>.azuredatabricks.net/jobs). Jobs are the orchestration primitive: one or more tasks arranged as a DAG, triggered on a schedule, on continuous, on file arrival, or manually.

Why this exists¶

Loom ships DatabricksJobEditor in lib/editors/databricks-editors.tsx plus API routes at /api/items/databricks-job/**. Today it lists jobs (listJobs), reads job spec (getJob), creates / updates / deletes (createJob / updateJob / deleteJob), runs now (runJob → runNow), and lists run history (listJobRuns). DAG editing is a flat task table with task_key, notebook path, cluster, and CSV depends_on. That's A-grade — real Lakeflow Jobs API, real DAG persistence, real runs. Polish gaps are about visual DAG, broader task types, and notification config.

Databricks Workflows UX inventory (Workflows / Jobs UI)¶

Jobs list page (`/jobs`)¶

Region	Elements
Header	"Jobs" title · Create job button · Filter (owner / tag / status) · Search
Table	Name · Tasks count · Last run time · Last run status · Triggered by · Created by · Tags
Row action	Pause/unpause schedule · Run now · Edit · Delete · Permissions

Job detail / editor page¶

Tab	Contents
Tasks	Visual DAG canvas — nodes are tasks, edges are `depends_on`. Click a node → right panel: Task name · Task type · Source · Path · Cluster · Parameters · Retries · Timeout · Email/webhook notifications · Run if dependencies
Runs	Run history table (run_id, state, duration, trigger, parameters); click a run → per-task timeline, output, logs, Spark UI link
Job details	Tags · Permissions (ACL) · Notifications (job-level) · Maximum concurrent runs · Run as (user / SP) · Git source · Job parameters · Queue · Health rules
Schedules & triggers	None / Scheduled (cron + timezone, pause/unpause) / Continuous / File arrival / Table update

Task types supported by Lakeflow Jobs¶

Notebook (most common) — path + base_parameters
Python script — workspace path / DBFS / Git
Python wheel — package + entry_point
JAR — main_class + parameters
SQL — File / Query / Alert / Dashboard / Legacy dashboard
Pipeline — Lakeflow Spark Declarative Pipeline
Run Job — chain another job (max nesting 3)
If/else condition — branching with boolean expression
For each — looping over an input array
dbt — dbt project run
Spark Submit — legacy

Compute per task¶

Job cluster (preferred, cheaper) — spec defined in the job, ephemeral per run
Existing all-purpose cluster — existing_cluster_id (what Loom uses today)
Serverless — no cluster spec needed; managed compute

Notifications¶

Email: on_start / on_success / on_failure / on_duration_warning_threshold_exceeded / on_streaming_backlog_exceeded
Webhook: integrate with PagerDuty, Slack, Teams
System destinations: notification destinations configured by admin

Retries / timeouts¶

Per-task: max_retries, min_retry_interval_millis, retry_on_timeout, timeout_seconds
Per-job: timeout_seconds, max_concurrent_runs

Parameters¶

Job parameters — defined once at job level, accessible as {{job.parameters.<key>}}
Task parameters — task-specific overrides
Dynamic value references — {{job.id}}, {{job.run_id}}, {{job.start_time.[iso_date]}}, {{tasks.<task_key>.values.<key>}}

What Loom has today (wired)¶

Capability	Backend	UI
List jobs	`GET /api/items/databricks-job` → `listJobs()` → `/api/2.1/jobs/list`	Left panel
Read job spec	`GET /api/items/databricks-job/[id]?jobId=` → `getJob()` → `/api/2.1/jobs/get`	Form populates name/cron/tz/tasks
Create job	`POST /api/items/databricks-job` → `createJob()` → `/api/2.1/jobs/create`	Save (when no jobId)
Update job	`PUT /api/items/databricks-job/[id]?jobId=` → `updateJob()` → `/api/2.1/jobs/reset`	Save (when jobId)
Delete job	`DELETE …?jobId=` → `deleteJob()` → `/api/2.1/jobs/delete`	Delete button
Run now	`POST …/run?jobId=` → `runJob()` → `/api/2.1/jobs/run-now`	Run now button
Run history	`GET …/runs?jobId=` → `listJobRuns()` → `/api/2.1/jobs/runs/list`	Runs table
Schedule	UI checkbox + cron + timezone → persists as `schedule.quartz_cron_expression`	Switch + 2 inputs
Tasks DAG	CSV `depends_on` text field per task row	Table with rows

Status: A-grade. Real Lakeflow Jobs end-to-end. No mocks. Limitation: only notebook_task with existing_cluster_id is exposed in the form (other fields round-trip in the spec but aren't editable).

Gaps for parity (polish)¶

Visual DAG canvas — current "depends_on csv" works but isn't visual. Add a React-Flow / dagre canvas showing tasks as nodes, dependencies as edges; drag-to-connect. The job spec already has full DAG info from getJob().
More task types — add type picker per task: notebook, python_wheel, sql_file, pipeline, run_job, if_else_condition, for_each_task. Each renders a different sub-form. Backend already passes full spec through.
Job-cluster (new_cluster) option — currently only existing_cluster_id. Add radio "Existing cluster | New job cluster | Serverless"; for "new job cluster", reuse the cluster spec form (node type, spark version, autoscale).
Retry / timeout per task — surface max_retries, min_retry_interval_millis, timeout_seconds, retry_on_timeout.
Run if dependencies — ALL_SUCCESS (default) · AT_LEAST_ONE_SUCCESS · NONE_FAILED · ALL_DONE · AT_LEAST_ONE_FAILED · ALL_FAILED.
Notifications — email_notifications + webhook_notifications editor (job-level and per-task).
Job parameters — top-level parameters: [{name, default}] editor; passed in run-now dialog as overrides.
Triggers beyond cron — File arrival trigger (file_arrival_trigger), Continuous, Table update. Each is a discriminated union in the spec.
Run detail drilldown — click a run → side drawer with per-task timeline, output, logs link, Spark UI link. Backend: GET /api/2.1/jobs/runs/get?run_id=&include_history=true + get-output.
Permissions — /api/2.0/permissions/jobs/<job_id> GET + PATCH (Owner / Can Manage / Can Manage Run / Can View).
Pause/unpause schedule — quick toggle from list, not just from edit form. Wire pause_status: PAUSED|UNPAUSED via updateJob.
Tags — tags: { key: value } editor for cost-allocation.

Backend mapping¶

List: GET /api/2.1/jobs/list?limit=50&expand_tasks=false (wired)
Get: GET /api/2.1/jobs/get?job_id= (wired)
Create: POST /api/2.1/jobs/create (wired)
Update: POST /api/2.1/jobs/reset (wired)
Update partial: POST /api/2.1/jobs/update (not yet used; useful for pause toggle)
Delete: POST /api/2.1/jobs/delete (wired)
Run now: POST /api/2.1/jobs/run-now (wired)
Cancel run: POST /api/2.1/jobs/runs/cancel (not yet exposed)
Repair run: POST /api/2.1/jobs/runs/repair (not yet exposed)
Runs list: GET /api/2.1/jobs/runs/list?job_id= (wired)
Run get: GET /api/2.1/jobs/runs/get?run_id= (wired)
Run output: GET /api/2.1/jobs/runs/get-output?run_id= (wired)
NEW for ACL: GET /api/2.0/permissions/jobs/<id> · PATCH same path

Required Azure resources¶

Azure Databricks workspace (existing — same as notebook editor)
UAMI as workspace user with Workflow create permission (already granted via SCIM bootstrap)
Clusters to attach tasks to (reuses cluster editor's list)
No new Bicep needed.

Estimated effort¶

Gap	Hours
Visual DAG canvas (React Flow + dagre layout)	5
Multi-task-type form (notebook / python / sql / pipeline / run_job / if_else / for_each)	4
New-cluster + serverless compute options per task	2
Retry / timeout / run-if-dependencies fields	1.5
Notifications editor (email + webhook)	2
Job parameters + run-now-with-params dialog	1.5
Triggers beyond cron (file arrival, continuous)	2
Run detail drawer (per-task timeline + logs)	2
Permissions panel	1.5
Pause toggle + tags	1
Total	~22.5 hrs (3-4 focused sessions)