CAPTCHA and proposed alternative, SAPTCHA

Introduction

(skip to next section if you are familiar with concept of CAPTCHA)
CAPTCHA stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart. [Wikipedia / Captcha]
Simply put, CAPTCHA is a set of methods commonly used to block automated account registration and spamming. Most common types of CAPTCHA present user with an image that user must recognize. Most likely you have already been tested by CAPTCHAs - that's those images of distorted and obstructed letters that you must enter into text field to complete registration of email account or to post reply to blog.

The name is somewhat of a misnomer. There is no such thing as "automated Turing Test". While computers are not smart enough to pass true Turing Test (with a human examiner) yet, computers are not smart enough to reliably tell computers and humans apart either. Computer can fool other computer into recognizing itself as human; it takes intelligence to tell dumb automation and intelligence apart.

CAPTCHA suffers from many problems. First, it is often very unethical - it unnecessarily discriminates against blind and otherwise visually impaired people.
Many sites offer audio as alternative - while such measure lets blind but hearing people in, it still discriminates against smaller minority of impaired and is thus still ethically unacceptable. And heither is it acceptable technically - it lets computers in (computers can recognize voice rather well, especially if trained to specific voice or samples. Unlike a human who hears such CAPTCHA for first time).

Secondly, CAPTCHA is not always very good at keeping spam away because computer software already can, generally speaking, recognize letters as well as humans, and computer itself has plenty of advantages over human. Often you see a misguided attemt at making CAPTCHA harder - for example, low text-to-background contrast or bad color combination does nothing to stop computer, but makes it harder to read for human. Often, human don't know how many letters should be there, and random lines may look like yet another distorted letter; whereas captcha-breaking software would know how many letters are supposed to be in this captcha, and when detecting more letters, can eliminate the least likely. Some letters in common fonts differ too little to be reliably recognized by human when distorted (such as 0,O ; I,l,i,!,j ; vv,w and so on). Humans recognize heavily distorted letters in handwriting based on the context, but letters in CAPTCHAs lack context.
All the above, in combination, often results in a CAPTCHA that computer can, in principle, recognize better than human. Furthermore computer does not get tired and can keep trying even if it succeeds only once per ten attempts.

Thirdly, CAPTCHA turns away undecided visitors, and may very well result in loss of revenue.

You can read the wikipedia article linked above for good overview of problems with CAPTCHA.

But wait, I hear you think, how it happens that CAPTCHAs 'work' even though the principle is flawed? Very simply. It takes effort to make spambot that passes through CAPTCHA; as long as different sites use different CAPTCHAs, it is not very useful for spammer to break some specific CAPTCHA. Furthermore, once CAPTCHA is broken, it can be replaced. CAPTCHA works by wasting time and money on software that is forcing spammer to waste time and money, and hoping that spammer runs out of time and money first; if you have CAPTCHA you're mostly wasting other people's time rather than yours so the deal seems far more viable than it really is.

SAPTCHA.

SAPTCHA stands for Semi Automatic Public Turing Test to Tell Computers and Humans Apart.

The key concept is same as with CAPTCHA: user is presented with test question or instructions and must give correct answer to use resource. Main difference is that computer does not try to automatically generate "unique" test questions on each query; only verification of answer is automatic. Instead, unique test question and answer[s] are set by moderator or owner when SAPTCHA is installed, and are changed every time spamming happens.

SAPTCHA is proposed as more accessible alternative to CAPTCHA that may replace CAPTCHA in services such as most blogs and forums. SAPTCHA works as lightweight CAPTCHA.
The concept follows from observation that there are many cases where automated generation of unique test question or image does not add any security - spammer do not need to pass test more than once on same forum or blog. Often, there's no human spammer interacting with website at all [Indeed, every blog or site owner would love to believe that his site is so important that it is spammed personally, in a very weird way whereby rather than just reading the image with eyes, spammer would write visual recognition software, but that's in fact not happening :-)]. In such cases, static question can't be worse at stopping bot than dynamic.
Human generated questions have much broader diversity and are thus harder for computer to answer. Parsing arbitrary sentences is an unsolved problem, unlike distorted letter recognition which is a solved problem. It must be also noted that CAPTCHA itself is not really "completely automatic" - human has to write and maintain CAPTCHA software, and update it every time it is broken.

Example questions: User is given instruction like "write [no i'm not a computer!] in this text field" or "write 'i'm human' in reverse" or "write[or copy-paste] web address of this page there" (please don't use too similar things. No default questions and answers. Think up something yourself. Don't try to be clever. It should be not more complex to understand and do than rest of registration, and thus shouldn't decrease website's accessibility(!). It's better if answer is more than 1 character long.)
Bots can try to understand text written by human in normal language (very hard problem in AI) or try to guess (some delay can make it pointless) or try some common test answers if any (then, common test questions and answers will disappear)
Spammer have to manually answer the question to start spamming. This is exactly same problem as with CAPTCHA at registration. Similarly to CAPTCHA at registration, human invervention is necessary to stop spam. - account must be banned and for SAPCHA question must be changed.

In a way, SAPTCHA can be viewed as light weight disposable CAPTCHA test that is cheap to replace when it get compromised.

Comparison

Sample use scenarios

SAPTCHA
s.0) Normal user comes accross your blog. If he can answer question, he can post reply, unless you made bad question and/or instructions. If user can't read your question, probably he can't read your blog either, so the SAPTCHA shouldn't make it less accessible.
s.1) Spammer bot comes accross your blog. No spamming happens. Bots can't understand human language yet.
s.2) Spammer human comes accross your blog/forum, answer question, register account, and possibly add answer and account to spambot database or proceeds to spam manually. You are spammed. You'll have to take action manually to ban spammer and stop spam; you may also want to change the question if spamming was done by bot that "knows" answer to question.

CAPTCHA
s.0) Normal user comes accross your blog/forum. If he can see, and CAPTCHA is simple he can post reply with small hastle if he doesn't have to pass CAPTCHA every time he replies. If CAPTCHA is "unbreakable" or uses bad colors, he will need several attempts, especially so if he need to pass it for every reply. This makes it much less likely that someone will reply to your posts at all. If user is blind or otherwise can't see it, no way.
s.1) Spammer bot comes accross your blog. You might get spammed if bot can recognize image (it is possible if you are using popular CAPTCHA that was broken).
s.2) Spammer human comes accross your blog/forum. He can answer question, register account, add it to spambot database. You are spammed. It will take moderator to ban the bot, and delete spam[assuming that spam filters alone don't suffice without CAPTCHA]; so you still need human intervention from your side. As have been said before, if you'd ask to pass CAPTCHA for every message it'd be too annoying for normal users as well.

Comparison of SAPTCHA versus CAPTCHA features

Advantages of SAPTCHA over CAPTCHA:
  1. SAPTCHA software is much easier to implement or replace than CAPTCHA
  2. Textual SAPTCHA does not discriminate against disabled who can use internet. [Audio CAPTCHA plus visual CAPTCHA still discriminates against some people, plus the audio part is far easier for computer to break]
  3. There is methods for breaking image based CAPTCHAs. If you use popular CAPTCHA, you may still get spammed by entirely automatic bot. SAPTCHAs can be much more varied and there won't be common method of breaking until it becomes possible for computers to interpret human instructions in normal human language. At which time we'd rather have to worry about things like skynet.
Advantages of CAPTCHA over SAPTCHA (disadvantages of SAPTCHA):
  1. With SAPTCHA, when banning spammer, moderator must enter new question and answer. With CAPTCHA, though, there's point 1 above (& CAPTCHA code won't remain useful forever either), so for not extremely popular websites it seems highly unlikely that even in long run CAPTCHA would save work.
  2. If SAPTCHA is used to protect registration, it is easier to register many accounts at once with SAPTCHA than with CAPTCHA; this might or might not matter with popular email services.
  3. Verbal SAPTCHA may be problematic for multi-language resources that need frequent changes.
  4. When it is something like photo gallery, visual CAPTCHA is allright as it doesn't contribute to inaccessibility.

Conclusion:

SAPTCHA can be viable alternative to CAPTCHA for web resources like forums and blogs and in other situations when spammer can not afford to target resources individually. With textual resources, SAPTCHA does not lessen accessibility of resource to disabled.

It is suggested that forum and blogging software should offer support for SAPTCHA in addition to existing support for CAPTCHA, thus allowing administrator to use SAPTCHA and switch to CAPTCHA only when and if SAPTCHA is found to be really inadequate in this situation (which is expected to happen only on very popular web resources. How popular? Millions users kind of popular). By the method of operation, SAPTCHA can give only limited protection against account registration abuses when abuser is willing to solve SAPTCHA and consequently run bot that register really many accounts (e.g. for use of email as storage), which would be prevented by CAPTCHA on every registration.

Live example of question

John had one thousand apples and five oranges. He ate as many of his apples as there is letters in word "apple". Also he ate two bananas :-). How many appl es John have?
Your answer:
Other example: in a mathematical forum, for example, you can ask what is the square root of minus one. This not only reduces spam, but trolling as well.

If you are annoyed by CAPTCHA, think about alternatives and discuss concept of SAPTCHA with others. Make the best meme win.
Another trick is on the bottom of this page - email address is generated by javascript and thus is invisible to most of email bots. Same can be done to submit forms. Most spam bots do not execute javascript.

(C) 2004..2013 Dmytry Lavrov.
Want to say something or ask some question? Contact:

_