Oliver Nassar

Stripe Webhooks Security Concern: Accidental Replay Attacks

February 16, 2015

An interesting edge-case came up with our Stripe application that we wanted to share with the community.

While building out the webhook functionality for AccountDock, we stumbled onto a bug that was causing our webhook-code to be triggered twice.

Not a replay attack

This stood out to us because we're already accounting for replay-attacks

As Stripe recommends:

We also advise you to guard against replay-attacks by recording which events you receive, and never processing events twice.

We do this through a relatively trivial process:

  1. We record the ID of the incoming events in a table; eg:
  2. We check the table for the non-existence of the event before ever running webhook controller logic

The edge-case

What happens when a Stripe account has connected to both our development and our production applications?

eg. They connected to our Stripe development application while they were building their platform, and then once they went live with payments, connected to our production application to give us access to their live transactions?

Here's a visual of what they would see in their App Settings page if they did so:

Well, here's what happens (as our server see's it):

  1. If a transaction (eg. a charge) occurs on a live customer, Stripe fires 1 webhook for that charge (because of the connection to our production Stripe application)
  2. If a transaction (eg. a charge) occurs on a test customer, Stripe fires 2 webhooks for that charge; one for the development application, and one for the production application (this is because a connection to a production application forwards webhooks for both test and live transactions, where as a connection to a development application only forwards webhooks for test transactions)

Why this affected us

Our servers are set up to receive webhooks from both development and production application connections to one endpoint (eg. https://accountdock.com/webhooks)

(This may be unique to us, so I'll come back to this later as a pre-condition to this edge-case)

The expected flow

Ordinarily what might happen is the 2 webhooks would be sequential, causing the second one to fail because of our checking for replay-attacks

However, and this is a big however:
What happens if Stripe calls these webhooks so close, they're effectively being made in parallel?

If the requests where made at even moderately different times (eg. 10+ milliseconds), the following would happen:

  1. Our servers would receive the first webhook
  2. They would find out that the event is unique
  3. They would insert the event into our database as a precaution against replay attacks
  4. They would run our webhook controller logic

This would all happen before the next webhook was received, which would result in that webhook failing since it would be perceived as a replay-attack

However, that didn't happen.
We've tested webhooks hundreds of times
But this one time, something different happened

(Close enough to) Parallel Stripe Webhooks

The webhooks were received so closely together that the insert query for tracking the first (unique) event took long enough that our server thought the 2nd webhook was also unique (because it was received while the insert was still happening)

Our insert queries are fast but these webhooks were fired even faster (which could make sense: I'm sure Stripe has a robust queue system to handle outbound webhooks)

This caused the controller logic to fire twice, and to have the event logged twice in our database:

Implications

The application-level implications of this were fairly manageable:
A receipt was sent twice to a customer

However, we were lucky
If we were expecting truly unique webhooks to be processed for something more vital (eg. issuing a charge after a credit card was updated, or customer record created), this could have resulted in multiple charges that weren't immediately noticed.

The origin of this edge-case

This bug is an edge case for three reasons:

  1. Most Stripe accounts will only be connected to either a development application, or production application, not both
  2. Most Stripe applications will have different webhooks for their development and production applications (I'll go into why we don't in a future post; it's not arbitrary and really important for a synchronous testing environment)
  3. The time it took to insert the first Stripe webhook to our database was longer than the difference in time between Stripe's webhook calls (often, the time between Stripe webhook calls is between 5 and 15 seconds) As mentioned, we've tested webhooks hundreds of times, and this has only ever happened once (that we know about)

Despite these three cases, we were hit by it

The Solution

The solution was actually pretty simple:

  1. Perform a WRITE lock against the the table that is in charge of tracking your Stripe events
  2. Release the lock after the Stripe event has successfully been tracked and stored in your database

By locking your table for what will likely be less than 50ms, you ensure that any lookups on that table in the interim will be held until the lock has been released

Why we think this case should be accounted for

As mentioned, this is an edge case, and AccountDock was the perfect storm for it. However we'd advise anyone using Stripe to power their payments to take note of it (and even more so, those building Stripe applications).

The implications could likely be pretty damaging if this case is not accounted for, and considering the simplicity of the fix (eg. a WRITE lock), we thinks it's worth it.