What is CAPTCHA?
The term CAPTCHA is actually a contrived acronym, meaning it was chosen deliberately because it is apt for what it is naming. CAPTCHA stands for Completely Automated Public Turing [test to tell] Computers [and] Humans Apart. The term was first used by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford in 2003. At its core, CAPTCHAs are a a form of challenge-response test. Sometimes, when trying to access certain portions of a website, you may encounter a CAPTCHA before you are granted access. A test is presented to you, and your entrance to the page is decided based on your response to the challenge. Using this challenge, in theory, CAPTCHAs are able to determine whether you are a real, live meat-bag, or actually a computer/bot posing as a human.
As per their name, CAPTCHAs are a completely automated process. This means that once one is implemented into a website, it’ll require little to no human maintenance or mediation for it to continue functioning properly. This makes CAPTCHAs an extremely cost efficient form of cyber security.
The Turing test was designed in 1950 by Alan Turing. Its purpose was to distinguish humans from AI. To put it into simplistic terms, a human evaluator would theoretically view a conversation between a computer and a human. The conversation is had on a text-based medium (think texting) and the evaluator would not be able to see either party. The evaluator would know that only one of the two is human. Following the conversation, if the evaluator can correctly choose which of the two is the computer, then the computer failed the test. If the evaluator cannot distinguish between the two, then the AI has successfully fooled the evaluator. CAPTCHAs are sometimes considered to be the reverse Turing test, since the roles of the human and machine are reversed.
By Turing test standard, if the computer gives the correct answer to a test, this does not necessarily mean that its answer is correct. In order for the answer to be considered ‘correct’ it must closely resemble something a human would say, to the extent of indistinguishability. So CAPTCHAs are merely tests which we have rigged, to our benefit. This means creating a test which need little to no instruction of what to do, or how to do it. People must quickly know what to do when they encounter it. It also must be a quick and painless to take, or users won’t be happy. Most importantly though, it must be extremely easy for pretty much all humans that will encounter it, and nearly impossible for any AI that tries to crack it. The standard CAPTCHA demonstrates these principles beautifully.
The Go-To CAPTCHA
The standard version, the one that most likely comes to mind for most, was invented in 1997. It’s a text-based CAPTCHA, and is widely used today. Often accompanying the sign in or sign up pages, will be an image containing a string of letters and/or numbers. Also nearby will be a text-input box, along with a checkbox next to the statement “I am not a robot”. To pass the test, and ultimately ‘pass’ through the virtual guard station, you must correctly type out the string. The text in the image, however, has been distorted in certain ways which make the task a difficult one for computers.
In order to create such a difference in the difficulty level, the text forces the use of three specific mental abilities. Invariant recognition is the ability to recognize the vast amount of variability of the physical shape of our letters and numbers. Segmentation, in this sense, is the ability to recognize letters that are crowded together and possibly overlapping other letters. Parsing is the most uniquely human ability of the three. This is the capacity of using context clues to identify letters. For example, if viewed individually, a certain letter may look like ‘m’ but within context it becomes clear that it is actually a ‘w’. Our mind’s flexibility and power of deduction makes it all possible.
While CAPTCHAs are undoubtably great at their main job of protecting our personal information from the eyes of prying bots, they also picked up a side gig over at Google. In 2009 Google acquired a well-known deployment of CAPTCHA technology, cleverly named reCAPTCHA. Along with the normal security work, in 2011 Google began using reCAPTCHA to convert the archives of The New York Times, and the books from Google Books to a digital format. Millions of books later (literally), and quite possibly the sum of human knowledge is preserved for future generations. On top of that, you are owed a thank you for the task. reCAPTCHA trains itself by utilizing the user input from their own tests.
CAPTCHAs keep you safe from bots patrolling for personal data. They are powerful and inexpensive, in terms of both money and resources. To demonstrate confidence as a legit form of security, the algorithms that create CAPTCHA tests must always be made public (though sometimes they may be protected under a patent). Doing so proves the concept, being that passing this test is extremely difficult for an AI, or hackers simply reverse engineering the algorithm.
Further Reading:- AlterEgo. “Everything You Need to Know about CAPTCHA - AlterEgo.” Medium, 2 Apr. 2018, medium.com/@cyberalterego/everything-you-need-to-know-about-captcha-f77d65c56d2c.- “By Typing Captcha, You Are Actually Helping AI’s Training.” AP NEWS, 27 Nov. 2020, apnews.com/article/technology-technology-issues-digitization-spamming-artificial-intelligence-9e2aec49792c3a1e31c1f94f1a5e7ede.- O’Malley, James. “Captcha If You Can: How You’ve Been Training AI for Years without Realising It.” TechRadar, 12 Jan. 2018, www.techradar.com/news/captcha-if-you-can-how-youve-been-training-ai-for-years-without-realising-it.- “The Surprisingly Devious History of CAPTCHA.” Mental Floss, 21 June 2016, www.mentalfloss.com/article/81927/surprisingly-devious-history-captcha.