Oliver Nassar

You're not a robot, so why am I treating you like one?

April 16, 2011

First Impressions

When I meet someone new, I don't make the defensive-assumption that they're going to steal from me. They might, but I don't want to start the relationship off like that.

Similarly, when you fill out a form, you're not trying to spam me. You might be (and in many cases, "you" are actually a "robot"), but the default case shouldn't be that you are.

Form-verification

Forms that do not require authentication (and even some that do) often require you to fill-in a reCAPTCHA inspired input. As you know, this input requires you to fill out text from an image, which is apparently challenging for a bot. I don't like this for a few reasons, the foremost being that weeding out the bots/automated scripts is my job, not yours. Semantically, you shouldn't have to care about that.

I've seen some creative approaches to alternative verification systems, but they all require user-input to verify you're non-robotness (aka. humanness), and are therefore flawed.

My Workaround

I've been using the following approach for a few years, and while it requires upfront cost to implement, it pays in the long run (in theory, I have no analytics to back this up).

Here are the steps:

  1. You setup your form how you want it
  2. You use JS to override the submit event for the form
  3. You do whatever (if any) client side validation
  4. Before passing to the server, you hit an endpoint (via ajax) such as /ajax/token/
  5. Your endpoint opens a session, generates a token, and passes it back
  6. The callback on this ajax-call creates a hidden input (eg. named token) whose value is the response from the endpoint; it then submits the form
  7. On the server side, before verifying the form-input's, you ensure that the posted token equals the session's
  8. If it does, success. If it doesn't (or wasn't even posted), fail

Why this works

There are a few reasons this works.

  1. Many bots can't initiate sessions since they're cookie based; this would open up bots to potential security holes, and quite frankly, would just be a hassle to deal with from a disk-perspective (eg. spamming millions of sites)
  2. Bots generally (although this is shifting) can't/don't run JS
  3. This gives you greater control, as in the endpoint/token-generation-phase, you can record details such as browser, IP and micro-time generated, and further-secure your check

While I'm sure there are ways to improve this method, I find it just as effective (again, no analytics), less obtrusive (virtually non-existent) to the end-user, and provides me with greater control.

And for some subjective proof (which is always the worst), when my blog was hosted off WordPress (the library, not the web-service), it received over 500 spam-comments (with reCAPTCHA turned off). This current one which uses the outlined approach? None. The bots haven't cracked it yet :)

PS. Worth noting is the "lag" this introduces into the system. Yes, before submitting a form I'm firing an ajax call, but the request and response are tiny, and generally only add 50-200ms (depending on if you're running a flat codebase, framework, or library) to the total time. I believe user's are fine with this, as they're expecting a 'delay' when they submit a form anyway.