# Webhook best practices Source: https://docs.mambu.com/docs/notifications-channels-webhooks-best-practices This page outlines practical recommendations to avoid common pitfalls and keep webhook integrations reliable at scale. The guidance reflects how the **Notifications** service actually dispatches webhooks. ## Template configuration - **Prefer specific triggers over broad ones** - Very broad events (for example, “account activity” style events) can generate high volumes of webhook calls. - Retries for transient failures may amplify traffic bursts. - For high-volume use cases, consider: - Narrowing conditions. - Using separate templates and/or endpoints to isolate the load. - **Use HTTPS endpoints only** - Webhook destinations must use `https`. Non-HTTPS URLs are rejected before dispatch. - Avoid private IPs or internal-only hosts that cannot be reached from Mambu’s infrastructure. - **Validate destinations and headers** - Use valid, production URLs—not placeholder or test values. - Avoid putting secrets in URLs (for example, query parameters). Put credentials or tokens into headers instead. - Do not rely on redirects: `3xx` responses are **not** treated as success. Use the final URL directly. - Custom headers defined on the template are forwarded to your endpoint (except for some reserved internal headers that are stripped for security). - **Deactivate templates you no longer need** - Deactivating a template stops **new** notifications from being produced upstream. - Messages already queued for delivery may still be dispatched by the Notifications service, so expect a short tail of in-flight requests. - **Multiple templates to the same URL** - If several templates post to the same endpoint, size and tune that endpoint for the combined throughput and potential retry bursts. - Consider separating high-volume and low-volume integrations by URL. ## Endpoint design - **Respond quickly with a 2xx status** - Only HTTP **2xx** responses are treated as success. - `3xx`, `4xx`, `5xx`, timeouts, and network/TLS errors are treated as failures. - Process the webhook asynchronously on your side (for example, enqueue to a job queue) and return 2xx as soon as you accept the payload. - Requests are sent with a timeout; heavy synchronous work increases the chance of timeouts and repeated delivery attempts. For more details, see [Troubleshoot delayed notifications](/docs/troubleshooting-delayed-notifications). - **Idempotency and deduplication** - Webhook requests include `x-notifications-idempotency-key`. - Automatic retries of the same delivery use the **same** idempotency key. - Manual “resend” actions generate a **new** idempotency key. - Use this header to: - Detect duplicate deliveries. - Make handlers idempotent (for example, ignore a second request with a key that has already been processed successfully). - **Design for at-least-once delivery** - Because of retries in case of transient failures, delivery is **at-least-once**. - Your receiver must tolerate duplicates: - Use `x-notifications-idempotency-key` and/or identifiers in the payload to implement safe deduplication. - Avoid non-idempotent side effects (for example, double-charging or double-posting). - **Request method and payload** - The HTTP method and body are defined by the notification template/message. - Ensure your endpoint: - Supports the configured HTTP method. - Validates and safely processes the payload format you expect (for example, JSON). - **Security and authentication** - Require HTTPS. - Use header-based authentication (for example, API keys, HMAC signatures, or OAuth tokens) rather than credentials in URLs. - Rotate credentials regularly and update template headers accordingly. - Validate and sanitize incoming payloads before further processing. - **Avoid redirects and heavy synchronous logic** - Do not rely on redirect chains; `3xx` responses are not treated as success. - Keep synchronous logic small and fail-fast; push heavy work to background jobs. ## Monitoring and alerting - **Use status codes as signals** - **2xx** – Success. No retry. - **4xx** – Client errors (typically non-retryable). - Often indicates misconfiguration, missing/invalid credentials, or authorization issues. - Fix the problem quickly to restore delivery. - **5xx**, timeouts, and network/TLS errors – Treated as transient. - These are retried with backoff and contribute to protection mechanisms (circuit breaker). - **Track latency and timeouts** - Monitor response times for your webhook endpoints. - Sustained increases in latency lead to timeouts and retries. - Alert on: - Rising average or P95/P99 latency. - Increases in timeout rates. - **Instrument your receiver logs** Log enough information to understand what happened without storing unnecessary sensitive data. For each request, log, for example: - Timestamp. - Endpoint/path and HTTP method. - HTTP status code. - The `x-notifications-idempotency-key` value. - High-level processing result (accepted / rejected / failed). - A correlation ID or your own trace ID if you add one via custom headers. Avoid logging full payloads if they may contain personal or sensitive data; consider structured, redacted logging. - **Watch for failure streaks** - Spikes in non-2xx responses (especially 5xx) and sustained failure streaks usually indicate: - Receiver outages or dependency failures. - Misconfigurations (for 4xx). - These patterns also influence retry behavior and may contribute to circuit-breaker protection, temporarily reducing or pausing deliveries to protect your systems. ## Failure handling, retries, and protection - **Retries** - Certain failures (for example, 5xx, timeouts, and network errors) are retried according to internal policies with backoff. - This means you may see repeated attempts for the same webhook until either: - It succeeds with a 2xx, or - Retry limits are reached / protection mechanisms apply. - **Circuit breaker** - When repeated failures occur for a specific destination (for example, persistent 5xx or timeouts), a protection mechanism may temporarily pause or slow deliveries for that template and tenant. - This helps: - Prevent overwhelming an unstable endpoint. - Protect the overall system. - Once the destination recovers and failures cease, deliveries resume automatically, according to the circuit breaker’s rules. - **What this means for you** - Keep your endpoint healthy and fast. - Use clear 2xx/4xx/5xx responses to signal the real state. - Fix persistent 4xx errors quickly. - Scale receiver capacity or apply graceful degradation during incidents. ## Security recommendations - Use HTTPS everywhere for webhook endpoints. - Prefer short-lived tokens or signed headers rather than long-lived static secrets. - Restrict access to your endpoints using: - Network controls (IP allowlists, if appropriate). - Application-level authentication and authorization. - Validate payloads strictly and apply input validation to avoid injection or deserialization issues. ## Troubleshooting tips - **No events are received** - Check that: - The relevant webhook template is **Active**. - The triggering business event actually occurs. - The destination URL is correct and reachable from the public internet. - Verify that your endpoint is returning 2xx and not silently failing with 4xx/5xx. - **Lots of 4xx responses** - Likely misconfiguration: - Wrong URL or path. - Missing or invalid authentication. - Authorization rules block the request. - Update configuration and redeploy. Consider returning 2xx if you accept the event and handle business-level errors asynchronously. - **Lots of 5xx, timeouts, or connection errors** - Indicates that your endpoint or its dependencies are unhealthy or overloaded. - Scale horizontally, add caching, or temporarily reduce downstream work. - Expect the Notifications service to retry with backoff and, in persistent cases, to apply circuit-breaker protection. - **Duplicates seen in logs or processing** - Use `x-notifications-idempotency-key` and/or your own business identifiers to avoid processing the same event twice. - Confirm your handlers are idempotent (safe to call multiple times with the same key). --- See also: - [Circuit breaker](/docs/webhooks-circuit-breaker) - [Troubleshooting](/docs/troubleshooting-webhooks)