SSL/TLS connection failure on Wirecard & First Data gateways
Incident Report for Chargify
We were notified at 9:45am CST of failing subscriptions on the First Data gateway due to an TLS connection failure. We investigated promptly and found that First Data (and also Wire Card) had switched to a new TLS certificate provider whose root certificate was not in our default package. We were able to deploy an update to begin trusting the new certificate and subscriptions for these gateways were restored to normal function at 11:25 CST.

We take our role in connecting to gateway providers very seriously, as this is a critical function of Chargify and your business. Although the impact was very limited, we're posting additional information in the interest of full transparency.

We support over 25 different gateways. In most cases, we have no direct relationship to the gateway providers (that relationship exists between the merchant and the gateway). Each gateway has its own procedures, maintenance windows, change schedules, and general "ways of doing things" (some are better than others). We do our best to monitor each one for availability, problems, and changes through whatever channels the provider makes publicly available as well as with our own tooling.

Normally, a TLS root certificate change is a non-event and is handled transparently without any need for change on our end (just like with web browsers). Even so, when we hear of an upcoming change, we test to make sure nothing will go wrong.

In this particular case, we had not received any prior notice from First Data about the upcoming change to their TLS certificate. They had published a notice about the change schedule, but we hadn't seen it and had received no email or other notification. The page(s) we were previously monitoring for no longer work or are not actively maintained, likely because of several name changes that First Data / Payeezy's product brand has gone through.

Even without prior notification, a change like this is usually proceeded by a change to the sandbox/test environments first. This is an early opportunity to automatically catch such problems before the live transactions are affected. But in the case of First Data, they regularly revoke testing credentials and we have difficulty getting new ones in a timely fashion. So this failsafe was not effective.

We certainly want to do better and offer the best reliability we can. First Data did nothing wrong or unexpected and are not to blame; their actions were a typical and normal part of routine upgrades that should not cause problems for our merchants. We'll be taking a number of steps to improve and prevent further occurrences like this. We'll add improvements to our monitoring of publicly posted change notices from gateways. We'll also more aggressively test sandbox endpoints and ensure we maintain adequate test credentials (hopefully with some cooperation from the gateways our merchants rely on).

Thank you for trusting us with your business. As always, if you have any questions, we're here to help and would be happy to answer any questions.
Posted Mar 07, 2018 - 12:00 CST