Structural fix on services/mongodb-api/src/utils/tokenHoldings.ts:reconcileHoldings. The previous version used deleteMany({ tokenMint, userPublicKey: { $nin: activeUsers } }) to remove any user not in the freshly-computed set on every cron run. That made every transient RPC failure a destructive write — if a single Helius 429 caused a stream to silently drop from positiveHoldings, the user vanished from token_holdings until the next successful run re-inserted them. Combined with the cron's high-concurrency RPC pattern, this produced the user-visible flapping that motivated v0.3.4 (naive retry) and v0.3.5 (rate-limit-aware retry).
Retries reduce the rate at which the destructive write fires; soft-delete makes the destructive write structurally impossible from a single bad tick.
What changed
reconcileHoldings now writes in three steps per call:
- Upsert positive holdings, resetting
consecutiveMissing: 0. (The reset is the critical part — without it a flapping user could carry stale miss-count across cycles.)
- Increment
consecutiveMissing for every existing doc on this token whose user is NOT in the current positive set. $inc on a missing field starts from 0, so backwards-compatible with pre-soft-delete docs.
- Hard-delete only entries whose
consecutiveMissing >= SOFT_DELETE_THRESHOLD (currently 3).
Result shape gains a new incremented field; existing fields (upserted, removed, totalHolders) keep their meaning. The Vercel cron's typed flyApiPost<{...}> ignores the extra field — no compatibility shim needed.
Failure-mode comparison
| Scenario | Old behavior | New behavior |
|---|---|---|
| RPC succeeds for all streams | User stays in DB | Same — consecutiveMissing reset to 0 |
| 1 transient RPC failure → user missing this run | Hard-deleted; vanishes from leaderboard | consecutiveMissing: 1, still in DB, still ranked |
| User missing 2 consecutive runs | Already gone | consecutiveMissing: 2, still in DB |
| User missing N≥3 consecutive runs | Already gone | Hard-deleted (genuinely closed position) |
| 0 positive holdings this run (e.g., total RPC outage) | All users for token nuked | All users get +1; no one purged unless threshold reached over multiple ticks |
At threshold=3 and ~5min cron cadence, a genuinely closed position is purged within ~15 min. That's slightly slower freshness than before (was instant), but trades that for full immunity to single-tick RPC failures — the overwhelmingly more common case in practice.
Schema migration
TokenHoldingDocument.consecutiveMissing is added as optional (?: number). Pre-existing docs lack the field; MongoDB's $inc on a missing field starts from 0, so the next miss for an old doc sets it to 1, the next to 2, etc. No batch migration script required — the field populates lazily as docs cycle.
If for any reason you need to force purge old behavior, that's db.token_holdings.deleteMany({ consecutiveMissing: { $exists: true, $gte: 3 } }).
Index considerations
The two existing indexes still cover writes:
{ userPublicKey: 1, tokenMint: 1 } (unique) — covers Step 1 upserts
{ tokenMint: 1, rank: 1 } — covers leaderboard reads
Step 2's updateMany({ tokenMint, userPublicKey: { $nin: [...] } }) uses the unique compound index. Step 3's deleteMany({ tokenMint, consecutiveMissing: $gte }) does a tokenMint-scoped scan — at the current ~76-row scale this is trivial. If the collection grows past tens of thousands of rows per token, add { tokenMint: 1, consecutiveMissing: 1 } opportunistically; not needed today.
Deploy
services/mongodb-api/ is a separate Fly.io service. After this commit lands on main, ship it with:
cd services/mongodb-api
fly deploy
The Vercel app (StreamlockFun frontend) does not need a separate redeploy for this change — the cron route's contract with the upstream is unchanged.
Verification
After deploying, trigger the cron and confirm the response includes incremented:
curl -H "Authorization: Bearer $CRON_SECRET" \
"https://app.streamlock.fun/api/cron/reconcile-holdings"
# expect each row to include incremented: <N>
If you want to verify the soft-delete actually fires, induce a temporary RPC outage (or run the cron with a bad RPC URL once) and confirm the leaderboard count does NOT drop. Then run with the good RPC again and confirm it stays at the correct count without flapping.
Follow-ups not in this change
- Add
retryWithBackoff to the four mainnet-tx crons (lst-stake-crank, lst-unstake-crank, accumulator-distribute, timelock-settle-crank). Cosmetic now that soft-delete handles the corruption case, but reduces ops noise.
- Audit other destructive writes in
services/mongodb-api/src/utils/*.ts for the same (compute → deleteMany($nin) → insert) pattern. None spotted on a quick grep, but worth a careful pass.
- Eventually wrap
Connection in a factory that auto-retries getAccountInfo / getMultipleAccountsInfo / getProgramAccounts so future contributors can't forget. Higher effort; deferred.