01 · The Problem
Three entirely different
payment problems, one system
The product is a B2B SaaS platform serving professional coaches worldwide. When I joined, the billing architecture had three separate, largely disconnected payment concerns that needed to be unified, hardened, and made production-reliable.
The fundamental challenge was that these three domains had completely different requirements, different Stripe accounts, different failure modes, and different compliance concerns — yet all lived inside one Rails application and needed to coexist without bleeding into each other.
What the system needed to solve — all at once
- Stripe webhooks firing multiple times for the same event causing duplicate subscription activations, plan changes, and invoice records
- No clean separation between platform Stripe credentials and connected coach Stripe accounts — risk of cross-coach payment leakage
- Race conditions during subscription checkout where two simultaneous browser tabs could create two subscriptions for one coach
- Failed SaaS payments needed graceful degradation — keep access alive during retries, terminate only after all retries exhausted
- Plan upgrades with proration were billing incorrectly when interval changed from monthly to annual
- PayPal execute callbacks had no idempotency — network retries could mark the same invoice paid twice
02 · Architecture
Clean isolation across
three billing domains
The first architectural decision was absolute: platform billing and marketplace billing must never share credentials or state. A bug in one domain must never affect the other. Here is how the system is structured:
PLATFORM SAAS BILLING (coach pays platform) SubscriptionsController → Stripe Platform Account (credentials.stripe_secret_key) → business_plans (local subscription state) → stripe_invoices (local invoice cache) → payment_methods (card metadata only, never full card) StripeWebhooksController (POST /stripe_webhook) → processed_stripe_events (idempotency — unique stripe_event_id) → StripeBusinessPlanSyncService STRIPE CONNECT MARKETPLACE (client pays coach) StripeController / PaymentsController → omini_auths (provider: stripe_connect, token, o_uid) → PaymentService → StripePaymentService → Stripe.api_key = coach_token (never platform key) → stripe_account: coach_o_uid (full isolation) StripeWebhooksController (POST /stripe_connect_webhook) → invoices / payment_histories / client_subscriptions PAYPAL MARKETPLACE (client pays coach via PayPal) PaypalController → omini_auths (provider: paypal, email) → PaypalCheckoutService (Express Checkout) → payment_histories (first_or_create on payment_token) → invoices / event_bookings SHARED LAYER (common to all domains) → invoices (coach-created, provider-agnostic) → payment_histories (all successful payments) → invoice_histories (immutable audit trail)
Two separate webhook endpoints — /stripe_webhook and /stripe_connect_webhook — with different signing secrets mean a connected account event can never accidentally trigger platform billing logic. This boundary is enforced at the infrastructure level, not just in code.
03 · The Hard Parts
Five engineering problems
that actually mattered
Problem 1: Webhook idempotency. Stripe retries failed webhooks up to 72 hours. Without idempotency, a network timeout after processing but before responding would trigger the same subscription creation twice — activating two plans, sending two emails, granting double SMS credits.
# Insert FIRST — process ONLY if insert succeeds ProcessedStripeEvent.create!( stripe_event_id: event.id, event_type: event.type, stripe_created_at: Time.at(event.created) ) # Unique index on stripe_event_id prevents duplicates # If already exists → ActiveRecord::RecordNotUnique → return 200 OK silently Billing::StripeBusinessPlanSyncService.new(event).handle
Problem 2: Race condition during checkout. A coach opening two browser tabs simultaneously could both reach the Stripe checkout creation endpoint before either tab received a response — creating two Stripe subscriptions billed on day one.
current_user.with_lock do # Reuse unexpired pending checkout session if current_user.pending_checkout_session_id.present? && current_user.checkout_session_expires_at > Time.current return render_existing_session end # Block if live Stripe subscription already exists ensure_no_live_stripe_subscription!(customer) # Only now: create new Checkout Session session = Stripe::Checkout::Session.create(...) current_user.update!(pending_checkout_session_id: session.id, ...) end
Problem 3: Plan upgrade proration with interval change. Upgrading from monthly to annual required resetting the billing cycle anchor to now — otherwise Stripe prorated against the old monthly anchor, generating incorrect invoices.
interval_changed = current_plan.interval != new_plan.interval Stripe::Subscription.update( stripe_subscription_id, items: rebuilt_items, proration_behavior: 'always_invoice', payment_behavior: 'pending_if_incomplete', billing_cycle_anchor: interval_changed ? 'now' : 'unchanged' ) # Only update local plan if payment intent actually succeeded # Declined card = no local state change, user stays on old plan if payment_captured?(stripe_subscription) business_plan.update!(plan: new_plan, ...) end
Problem 4: Connected account isolation. Every Stripe Connect API call must use the coach's token, not the platform key. A single wrong credential resolves to a different Stripe account entirely — charging or reading data from another coach's account.
stripe_auth = coach.omini_auths.find_by(provider: 'stripe_connect') # Set API key to coach's connected token — never the platform key Stripe.api_key = stripe_auth.token session = Stripe::Checkout::Session.create( { line_items: line_items, mode: 'payment', success_url: success_url, cancel_url: cancel_url, metadata: metadata }, { stripe_account: stripe_auth.o_uid } # explicit account header )
Problem 5: Failed payment graceful degradation. Stripe retries failed subscription payments automatically. The wrong behaviour is to immediately terminate access on first failure. The correct behaviour is to keep access alive during retries and only terminate after the final retry fails.
def handle_payment_failed(invoice) subscription = Stripe::Subscription.retrieve(invoice.subscription) if subscription.next_payment_attempt.present? # Stripe has another retry scheduled — keep access active sync_payment_status(subscription) send_payment_warning_email else # Final failure — no more retries scheduled Stripe::Subscription.cancel(subscription.id) Stripe::Invoice.void_invoice(invoice.id) business_plan.update!(is_plan_active: false, status: false) business.update!(terminated: true) send_suspension_emails end end
04 · Engineering Decisions
Why the system
is built this way
Two webhook endpoints, two signing secrets
Platform events and connected account events use different Stripe webhook secrets and different endpoints. A misconfigured connected account can never accidentally trigger platform billing logic.
Insert-first idempotency pattern
The stripe_event_id is inserted with a unique constraint before any side effects run. If the insert fails, the event was already processed. No risk of double-activation, double-deduction, or double-email under any retry scenario.
Pessimistic locking during checkout creation
with_lock on the user record during checkout session creation prevents two simultaneous requests from both creating Stripe subscriptions. The second request reuses the first session or fails cleanly.
Local invoice cache reduces Stripe API calls
stripe_invoices acts as a local cache for hosted URLs, PDFs, status, and amounts. Common reads never hit the Stripe API, avoiding rate limits and reducing latency for billing history pages.
Graceful degradation on payment failure
Access remains active during Stripe retry windows. Only the final failed payment triggers termination. This matches how real users expect billing failures to behave and reduces support tickets from false lockouts.
PayPal first_or_create idempotency
PaymentHistory.first_or_create on payment_token prevents a PayPal callback retry from marking the same invoice paid twice. The same pattern protects Stripe invoice payments from duplicate payment_histories.
"The billing system taught me that financial engineering is not about integrating an API. It is about anticipating every way a distributed system can lie to you — duplicate events, concurrent requests, network timeouts between charge and response — and building explicit defences against each one."
05 · What I Learned
Ten things this system
taught me as an engineer
Idempotency is the first requirement of any payment system
Networks fail between charge and response. Stripe retries webhooks. Users double-click buttons. Every financial side effect must be safe to trigger multiple times without multiplying the result.
Credential isolation is a security boundary, not a code style
Platform Stripe credentials and connected account credentials must never be confused. One wrong variable assignment charges the wrong account. Explicit account headers on every connected API call enforce this at the Stripe API level.
Local state is the source of truth, not the payment provider
Stripe's state and your database can diverge. Always sync local records from webhook events, not from real-time API calls. Your database is what your users see — keep it accurate and cache aggressively.
Billing cycle anchors matter more than the subscription update itself
Upgrading a monthly plan to annual without resetting the billing anchor generates incorrect prorations. The billing_cycle_anchor parameter is one line of code with enormous financial consequences.
Graceful degradation protects both users and business
Immediately terminating access on first payment failure loses customers who would have paid if given time. Stripe's retry window exists for a reason. Honour it in your local state machine.
Stripe Prices are immutable — design around it
You cannot update amount, currency, or interval on an existing Stripe Price. Archive the old one and create a new one. Building this into the update flow from the start prevents production errors later.
Webhook signature verification is non-negotiable
Unsigned webhook payloads must never be processed in production. Anyone who can hit your endpoint can fake a payment success event. The signature check is the entire security model of webhook-driven billing.
Multiple payment providers means multiple failure modes
Stripe and PayPal fail differently, retry differently, and represent success differently. Building a shared invoice and payment_history layer on top abstracts these differences and keeps reporting consistent.
Checkout session reuse prevents duplicate subscriptions
Users open multiple tabs. Caching the pending checkout session and reusing it within its expiry window is a simple, effective defence against duplicate subscription creation that most tutorials never mention.
Audit trails are legal infrastructure, not nice-to-haves
invoice_histories creates an immutable record of every state transition. payment_histories records every successful charge. These are not logs — they are the evidence trail for disputes, chargebacks, and reconciliation.
06 · What I'd Do Next
If I were continuing
to improve this system
Stripe API call optimization
Several flows use Stripe list endpoints where a direct retrieve by ID would suffice. Replacing list calls with ID-based retrieval reduces API latency and avoids pagination issues under load.
Guest invoice payment authorization
Public invoice payment flows rely on invoice ID and session ID combinations. Adding a signed token to public payment URLs prevents unauthorized access to invoices not intended for that payer.
Credit balance Redis caching
SMS credit balance is calculated via a database aggregation query on every SMS send. Caching with Redis and invalidating on transaction creation would eliminate this query from the hot path entirely.
Stripe Price archive cleanup
Each payment option update archives the old price and creates a new one. Over time this creates significant archived price accumulation. A background job to clean up stale archived prices would maintain account hygiene.
Real-time payment status via webhooks
Currently clients poll for payment confirmation. A WebSocket or server-sent events channel that pushes the webhook result to the open browser tab would eliminate polling latency entirely.
Stripe Connect credential rotation
Connected account tokens in omini_auths have no automatic rotation. Adding refresh token rotation on a schedule and alerting on expired tokens would prevent silent payment failures from stale credentials.
07 · Outcome
What shipped and
what it changed
The billing system now handles three completely different payment flows cleanly: coaches subscribing to the platform, clients paying coaches through Stripe Connect, and clients paying coaches through PayPal. All three domains write to the same invoice and payment_history layer for consistent reporting.
What I am most proud of is what did not happen. No duplicate subscriptions in production. No coach accidentally charged through another coach's account. No invoice marked paid twice from a PayPal callback retry. The defensive engineering held under real production conditions.
08 · Tech Stack