Recording Mutations, Not Events

Half of the library code in Moments, my photo manager, calls a one-method trait after every database write. That trait has two implementations. One of them returns Ok(()) and does nothing else. From inside the library, there is no way to tell which one is wired up.

That is the entire architecture of how Moments syncs to Immich — and the entire reason the local backend, which has no sync at all, is the same program as the Immich one.

This post is about that trait, the enum it consumes, the outbox table underneath it, and the architectural property they buy together: a library that has no idea it has a sync backend, and a sync backend that’s purely additive to it.

The offline-first problem

Moments has two backends. The local backend stores photos on disk and indexes them in SQLite. The Immich backend talks to a self-hosted Immich server but caches everything in the same SQLite schema — the UI reads from the local DB either way, and an Immich library works fully offline after initial sync.

That means local writes have two possible destinations. Always the local DB. Sometimes — if the backend happens to be Immich — also a remote server, eventually, when the network cooperates. “Trash these three assets” should commit locally, return immediately, and arrange for the server to be told later. “Create this album” should be visible in the UI the moment the row hits the database, regardless of whether the push call has happened, will happen, or will fail and retry six times.

The naive answer is to give every service a reference to the sync layer. MediaService::trash calls sync.enqueue_trash(ids) after writing to the DB. AlbumService::create calls sync.enqueue_album_created(album). It works for the Immich backend. It also pollutes the local backend, which has no sync, with either a second constructor for every service or an Option<SyncHandle> carried everywhere it’s irrelevant. Every service in the library starts to know that there is such a thing as sync.

What I wanted instead was: services produce a value describing what they changed, and someone else decides whether to do anything about it.

The vocabulary

The value is Mutation, in src/library/mutation.rs. It is an enum of state changes the library has just committed:

#[derive(Debug, Clone)]
pub enum Mutation {
    AssetTrashed       { ids: Vec<MediaId> },
    AssetFavorited     { ids: Vec<MediaId>, favorite: bool },
    AssetDeleted       { items: Vec<(MediaId, Option<String>)> },
    AlbumCreated       { id: AlbumId, name: String },
    AlbumMediaAdded    { album_id: AlbumId, media_ids: Vec<MediaId> },
    AssetEditsApplied  { id: MediaId },
    // …
}

Eighteen variants in total — asset lifecycle, album lifecycle, edit application, stacks, tags, people. The shape matters more than the count: every variant names a state change the database has just committed.

The discipline that keeps this enum small is what it deliberately isn’t. There are no *Requested variants — UI intent isn’t a mutation, it’s a method call on a service. There are no *Result variants — results are return values. There are no UI hint variants like ThumbnailReady — those are events on a specific service’s event emitter and don’t belong here.

A Mutation is one thing: a state change the database has just committed, that might need to leave the machine. Nothing more.

The trait

MutationRecorder is in src/library/recorder.rs. It is the entire interface between the library and whatever wants to know what changed:

#[async_trait]
pub trait MutationRecorder: Send + Sync {
    async fn record(&self, mutation: &Mutation) -> Result<(), LibraryError>;
}

That is it. One method, async, takes a borrowed Mutation, returns an error if recording itself failed. Services hold an Arc<dyn MutationRecorder> and call it after a successful write:

pub async fn trash(&self, ids: Vec<MediaId>) -> Result<(), LibraryError> {
    self.repo.trash(&ids).await?;
    self.recorder.record(&Mutation::AssetTrashed { ids }).await?;
    Ok(())
}

A service has no idea what the recorder does with the mutation. It might write a row to a queue table. It might publish to a message broker. It might do absolutely nothing.

Two strategies, one of which is empty

There are exactly two implementations, both in src/sync/outbox/mod.rs.

The first is the entire point of the design:

pub struct NoOpRecorder;

#[async_trait]
impl MutationRecorder for NoOpRecorder {
    async fn record(&self, _mutation: &Mutation) -> Result<(), LibraryError> {
        Ok(())
    }
}

The local backend takes NoOpRecorder. Every service in the library still calls recorder.record(...) after every mutation. Those calls do nothing.

The code path is identical to the Immich one. No if backend == local anywhere in the services. No second constructor. Just a vtable dispatch and an Ok(()).

The second is the implementation that does the actual work:

pub struct QueueWriterOutbox { repo: OutboxRepository }

#[async_trait]
impl MutationRecorder for QueueWriterOutbox {
    async fn record(&self, mutation: &Mutation) -> Result<(), LibraryError> {
        for row in mutation.to_outbox_rows() {
            self.repo.insert(&row).await?;
        }
        Ok(())
    }
}

QueueWriterOutbox is used by the Immich backend. It serialises the mutation into one or more rows in a sync_outbox table — multi-id mutations like AssetTrashed { ids: [a, b, c] } fan out to three rows so retries can be per-entity rather than per-batch. The schema is unsurprising: id, entity_type, entity_id, action, payload (JSON), created_at, plus status, attempts, next_attempt_at, and last_error for retry bookkeeping. This is the transactional outbox pattern, more commonly seen in service-to-service messaging, applied here to client-server sync — with one small adaptation to the trait signature I’ll come to in the next section.

The mapping from Mutation to OutboxRow lives in src/sync/outbox/mutation.rs and is genuinely boring — a match on the variant producing an entity_type string, an entity_id, an action verb, and an optional JSON payload. It is the kind of code that should be boring; the interesting part has already happened.

My favourite test in this codebase

My favourite test in Moments doesn’t assert on database state. It doesn’t assert on side effects. It asserts on a sequence of Mutations the library produced — on what the library said it did, in its own vocabulary, in order. That test works because the third implementation of MutationRecorder lives in src/library/recorder.rs:

#[derive(Default)]
pub(crate) struct CapturingRecorder {
    recorded: Mutex<Vec<Mutation>>,
}

impl CapturingRecorder {
    pub(crate) fn snapshot(&self) -> Vec<Mutation> {
        self.recorded.lock().unwrap().clone()
    }
}

#[async_trait]
impl MutationRecorder for CapturingRecorder {
    async fn record(&self, mutation: &Mutation) -> Result<(), LibraryError> {
        self.recorded.lock().unwrap().push(mutation.clone());
        Ok(())
    }
}

Service tests construct a Library with Arc::new(CapturingRecorder::default()) and then assert on the recorded sequence after exercising a method. “When I trash three assets, then create an album from two of them, the recorder should see AssetTrashed followed by AlbumCreated followed by AlbumMediaAdded.” That assertion is a stronger property than “the database ends up in the right state”, because it pins down the intent surface the library exposes to anything that might eventually act on it — including providers that don’t exist yet.

This is what the trait actually buys: not two production strategies, but three perspectives on the same library code — a no-op that proves the local backend doesn’t care, a queue writer that ships changes upstream, and a capturing fixture that pins down what services promise. None of them required a single change to the services themselves.

Dependency injection: sync as someone else’s problem

The wiring happens in exactly one place. Library::open takes the recorder as a parameter and threads it into every service:

pub async fn open(
    bundle: Bundle,
    mode: LocalStorageMode,
    db: Database,
    recorder: Arc<dyn MutationRecorder>,
    resolver: Arc<dyn OriginalResolver>,
) -> Result<Self, LibraryError> {
    db.open(&bundle.database.join("moments.db")).await?;

    let albums  = AlbumService::new(db.clone(), Arc::clone(&recorder));
    let faces   = FacesService::new(db.clone(), bundle.thumbnails.clone(), Arc::clone(&recorder));
    let editing = EditingService::new(db.clone(), Arc::clone(&recorder));
    let media   = MediaService::new(db.clone(), bundle.originals.clone(), mode,
                                    Arc::clone(&recorder), resolver);
    // …

    Ok(Self { albums, faces, editing, media, /* … */, recorder })
}

The local backend’s caller hands in Arc::new(NoOpRecorder). The Immich backend’s caller hands in Arc::new(QueueWriterOutbox::new(db.clone())), and separately starts a background PushManager that drains the same table.

That single substitution — the choice of recorder at the call site — is the only thing in the system that knows whether sync is happening. Everything below it is identical. From the library’s point of view, sync is someone else’s problem: it dutifully describes what it changed and trusts that someone, somewhere, may care.

The drain side

For completeness: src/sync/providers/immich/push.rs is the PushManager. It is a long-running Tokio task that wakes up on an interval, reads up to a hundred pending rows from sync_outbox, deserialises each back into an OutboxMutation, makes the appropriate Immich API call, and marks the row done or failed. There is exponential backoff on failure, a DeadLetter status after ten consecutive failures, and a MAX_BACKOFF_SECS cap so attempt ten doesn’t sleep for seventeen hours.

It is the side of the system that has to know about HTTP, retries, idempotency, and Immich-specific endpoints. It is also the side that doesn’t exist at all when you’re running the local backend, because nobody constructed it.

The pull side (pull.rs + handlers/) is the reverse direction — POST /sync/stream streams server-side changes back into the local DB. It does not go through the recorder: pull-driven changes are server-originated, and recording them would feed the server’s own news straight back into the outbox heading the other way.

Adapting the pattern

The classical transactional outbox pattern — the one I borrowed the name and the table from — requires the business write and the outbox INSERT to share a single database transaction. That’s the whole point of the pattern: if the entity write commits, the outbox row commits with it; if either fails, both roll back. Atomicity is what guarantees the consumer eventually sees every committed change.

Moments needs that guarantee, non-negotiably. If trash returns success to the caller, the outbox row must be there waiting for the push manager. Otherwise the local state silently drifts ahead of the server: the user trashes a photo, the desktop hides it, the server never hears about it, and the next pull cycle dutifully restores it. A sync layer that loses writes is worse than no sync layer at all — once the user catches it doing that, they stop trusting it for everything else.

The adaptation is small but load-bearing. The trait signature I’ve shown so far is simplified for exposition; the real one takes a transaction handle, and the three implementations all thread it through. Services own the transaction lifecycle, and both the repository write and the recorder call happen inside the same transaction:

#[async_trait]
pub trait MutationRecorder: Send + Sync {
    async fn record(
        &self,
        tx: &mut sqlx::Transaction<'_, Sqlite>,
        mutation: &Mutation,
    ) -> Result<(), LibraryError>;
}

// in MediaService:
pub async fn trash(&self, ids: &[MediaId]) -> Result<(), LibraryError> {
    let mut tx = self.db.begin().await?;
    self.repo.trash_in_tx(&mut tx, ids).await?;
    self.recorder
        .record(&mut tx, &Mutation::AssetTrashed { ids: ids.to_vec() })
        .await?;
    tx.commit().await?;
    Ok(())
}

QueueWriterOutbox::record uses the transaction handle to insert into sync_outbox inside the same transaction the service is about to commit. If the insert fails, the ? propagates and tx.commit() is never called; the business write rolls back with it. The local database and the outbox share a fate by construction — neither can be ahead of the other.

NoOpRecorder ignores the transaction parameter entirely. The local backend pays for opening and committing a transaction the recorder didn’t touch — a small, fixed cost on every mutation, in exchange for the library never having to know which recorder it’s been given. That’s the price of the abstraction, and it’s the only price the abstraction asks the local backend to pay.

An honest aside, because the timing is too good to leave out. I drafted this section assuming the code matched the design above. Then I went to check. It didn’t. The actual service code commits the repository write and then calls the recorder in a second, separate await — and at most call sites the recorder error is logged and swallowed rather than propagated. A recorder failure can silently leave the local state ahead of the outbox, which is exactly the failure mode this section claims is impossible by construction. I filed the bug while finishing this paragraph.

The design above is the design. The code is catching up.

Adding a third backend

The shape of the substitution generalises beyond Immich. Adding a Nextcloud backend tomorrow would touch zero lines in src/library/. The services already produce Mutations. The outbox table is provider-agnostic. The new code lives entirely in src/sync/providers/nextcloud/: an HTTP client, a push manager that drains the outbox into Nextcloud’s API, and a pull manager that streams server-side changes back into the local DB. The wiring in main swaps QueueWriterOutbox in as the recorder, starts the Nextcloud managers, and is done.

That doesn’t make a new backend free — push and pull are still real work, with their own retry semantics and identity gotchas — but it does make the cost proportional to what’s actually different. The 99% of the app that doesn’t care which server it’s talking to doesn’t have to be told.

Why a trait, and not a channel

The same job could be done with an mpsc::UnboundedSender<Mutation> — services hold a sender and push mutations into it; a consumer task somewhere pulls them out. A channel even handles NoOpRecorder naturally: open a channel and never spawn a consumer — the sends just buffer.

But a channel ties the producer and the consumer to a specific lifecycle. Who closes the sender? When? What happens when the buffer fills? Is the consumer guaranteed to run, and if not, are the dropped messages a bug or a feature? The trait sidesteps all of this by making “what to do with a mutation” a synchronous decision at the point of production. NoOpRecorder returns immediately. QueueWriterOutbox writes a row and returns. CapturingRecorder pushes onto a vec and returns. There is no pending state, no backpressure, no “what if the consumer crashed” failure mode lurking in the design. The service doesn’t have to reason about whether its mutation was received, because reception is part of the same await as the rest of the operation.

A Box<dyn Fn(Mutation)> closure could work too, and would even be slightly cheaper. The reason the trait wins isn’t performance; it’s naming. NoOpRecorder, QueueWriterOutbox, CapturingRecorder — each one is a documented, importable type that says what role it plays. A closure is anonymous; it can do anything; you cannot grep for who’s using it. The trait makes the contract a concrete thing that can be referred to in design documents and PR descriptions, and the production implementations a fixed, small set that anyone can enumerate.

The verb “record” matters too. “Emit” or “publish” or “broadcast” would have invited fan-out — and fan-out is exactly the property Mutation has been built to avoid. There is one recipient, statically chosen at Library::open time. “Record” implies durability and report-after-the-fact, which is what services actually do: they write to the database, then they describe what they wrote. The naming makes the contract harder to violate.

What I’d tell myself

If I were starting the Immich backend again, I would write MutationRecorder and NoOpRecorder on day one, before there was anything that needed syncing, and have services call it from the start. The local backend would carry a NoOpRecorder it would never need, and when the Immich backend arrived months later, nothing in the services would change. That’s the value the trait actually provides: not a separation of concerns at the API surface, but the freedom to add an entire concern later without rewriting what’s already there.

The library doesn’t know sync exists. grep -rn 'crate::sync' src/library/ returns six matches: five are #[cfg(test)] imports of NoOpRecorder from test helpers, one a comment pointer. Production imports: zero. From the inside, the local backend and the Immich backend are the same program; the only difference is what an Arc<dyn …> happens to do.

The offline-first problem#

The vocabulary#

The trait#

Two strategies, one of which is empty#

My favourite test in this codebase#

Dependency injection: sync as someone else’s problem#

The drain side#

Adapting the pattern#

Adding a third backend#

Why a trait, and not a channel#

What I’d tell myself#