First Impressions
When I meet someone new, I don't make the defensive-assumption that they're going to steal from me. They might, but I don't want to start the relationship off like that.
Similarly, when you fill out a form, you're not trying to spam me. You might be (and in many cases, "you" are actually a "robot"), but the default case shouldn't be that you are.
Form-verification
Forms that do not require authentication (and even some that do) often require you to fill-in a reCAPTCHA inspired input. As you know, this input requires you to fill out text from an image, which is apparently challenging for a bot. I don't like this for a few reasons, the foremost being that weeding out the bots/automated scripts is my job, not yours. Semantically, you shouldn't have to care about that.
I've seen some creative approaches to alternative verification systems, but they all require user-input to verify you're non-robotness (aka. humanness), and are therefore flawed.
My Workaround
I've been using the following approach for a few years, and while it requires upfront cost to implement, it pays in the long run (in theory, I have no analytics to back this up).
Here are the steps:
- You setup your form how you want it
- You use JS to override the submit event for the form
- You do whatever (if any) client side validation
- Before passing to the server, you hit an endpoint (via ajax) such as
/ajax/token/
- Your endpoint opens a session, generates a token, and passes it back
- The callback on this ajax-call creates a hidden input (eg. named token) whose value is the response from the endpoint; it then submits the form
- On the server side, before verifying the form-input's, you ensure that the posted token equals the session's
- If it does, success. If it doesn't (or wasn't even posted), fail
Why this works
There are a few reasons this works.
- Many bots can't initiate sessions since they're cookie based; this would open up bots to potential security holes, and quite frankly, would just be a hassle to deal with from a disk-perspective (eg. spamming millions of sites)
- Bots generally (although this is shifting) can't/don't run JS
- This gives you greater control, as in the endpoint/token-generation-phase, you can record details such as browser, IP and micro-time generated, and further-secure your check
While I'm sure there are ways to improve this method, I find it just as effective (again, no analytics), less obtrusive (virtually non-existent) to the end-user, and provides me with greater control.
And for some subjective proof (which is always the worst), when my blog was hosted off WordPress (the library, not the web-service), it received over 500 spam-comments (with reCAPTCHA turned off). This current one which uses the outlined approach? None. The bots haven't cracked it yet :)
PS. Worth noting is the "lag" this introduces into the system. Yes, before submitting a form I'm firing an ajax call, but the request and response are tiny, and generally only add 50-200ms (depending on if you're running a flat codebase, framework, or library) to the total time. I believe user's are fine with this, as they're expecting a 'delay' when they submit a form anyway.