Notifications - v1.75.0
v1.75.0
Release Dates
- Sandbox: 31.03.2026
Features
Circuit Breaker Poller for Notifications Dispatcher
Introduces automated recovery testing for notification templates that have been temporarily disabled due to failures. The system now performs controlled dispatch tests on templates in HALF_OPEN state, automatically transitioning them back to normal operation once they pass the configured number of test messages (max_half_open_tests). Failed recovery attempts extend the cooldown period and revert templates to OPEN state. Includes comprehensive logging of all state transitions with tenant and template context, plus Prometheus metrics for monitoring circuit recovery rates and open circuit counts.
Retry Poller for Notifications Dispatcher
Implements intelligent retry processing for failed notification messages through a dedicated background poller. Messages marked as ON_RETRY are reprocessed using configurable batch sizes and pacing controls to prevent downstream system overload. The system increments retry_count on each failure and automatically marks messages as FAILED when they exceed MAX_RETRIES threshold. All retry attempts update the failure_aggregate table for corresponding tenant_id and template_id combinations, with full support for concurrent execution across multiple pods using row-level locking.
Circuit Breaker Evaluator Job and Failure Aggregate Tracking
Establishes core circuit breaker decision engine with new failure_aggregate table for tracking per-template failure metrics by (tenant_id, template_id). The Evaluator Job runs periodically to analyze failure patterns and automatically opens circuits by inserting entries into on_hold_templates when FAIL_THRESHOLD is exceeded. Key parameters including FAIL_THRESHOLD, COOLDOWN_SEC, and MAX_HALF_OPEN_TESTS are externally configurable. The system preserves cumulative failure data for observability and uses fail_count_snapshot deltas to detect new failure incidents without duplicate circuit opens.
Improvements
AWS Secret Manager Integration for Encrypted Message Processing
The Notifications service now supports AWS Secrets Manager for retrieving AEAD keysets required for decrypting MessageQueueItem payloads. The implementation includes a 24-hour cache using Caffeine to minimize AWS API calls and reduce latency. The system fetches the complete Tink keyset JSON in a single operation and leverages AWS SDK v2's built-in retry mechanisms for resilience. Failed decryption attempts are tracked via notifications.secret.decrypt.failure metrics and logged with AWS secret version IDs for debugging.
Local Tink Decryption Support for Development Environments
Local development environments can now decrypt MessageQueueItem payloads using mock Tink keysets configured in config.YAML. The implementation supports both plaintext legacy messages (tinkKeyId=NULL) and encrypted messages using the Header Detachment pattern. Decryption failures automatically mark messages as FAILED state and increment the notifications.decryption.failure metric. The LocalSecretFetcher reads keyset configuration directly from the YAML file, while SubscriberApplication handles environment-specific wiring based on the ENVIRONMENT_VENDOR system property.
Bug fixes
Fixed NullPointerException in SMS notification dispatch causing notification failures
Resolved a critical null pointer exception that was preventing SMS notifications from being dispatched through the Infobip SMS gateway. The error occurred during HTTP client initialization when the executor service parameter was unexpectedly null, causing the entire notification dispatch process to fail. This fix ensures SMS notifications are processed reliably without interruption.
For more information on the release timeline, see Mambu Release Cycle.