A great question, and at the original posting, an extraordinary discussion.


A great question, and at the original posting, an extraordinary discussion. I can see several papers coming out of this.

Originally shared by Yonatan Zunger

A CS question that I don't know the answer to

A conversation on another thread raised an interesting question about computers that I can't figure out the answer to: Is judging a Turing Test easier than, harder than, or equivalently hard to passing a Turing Test?

I figured I would throw this question out to the various computer scientists in the audience, since the answer isn't at all clear to me -- a Turing Test-passer doesn't seem to automatically be convertible into a Turing Test-judger or vice-versa -- and for the rest of you, I'll give some of the backstory of what this question means.

So, what's a Turing Test?

The Turing Test was a method proposed by Alan Turing (one of the founders of computer science) to determine if something had a human-equivalent intelligence or not. In this test, a judge tries to engage both a human and a computer in conversation. The human and computer are hidden from the judge, and the conversation is over some medium which doesn't make it obvious which is which -- say, IM -- and the judge's job is simple: to figure out which is which. Turing's idea was that to reliably pass such a test would be evidence that the computer is of human-equivalent intelligence.

Today in CS, we refer to problems which require human-equivalent intelligence to solve as "AI-complete" problems; so Turing hypothesized that this test is AI-complete, and for several decades it was considered the prototypical AI-complete problem, even the definition of AI-completeness. In recent years, this has been cast into doubt as chatbots have gotten better and better at fooling people, doing everything from customer service to cybersex. However, this doubt might be real and it might not: another long-standing principle of AI research is that, whenever computers start to get good at a task that was historically considered AI, people redefine AI to be "oh, well, not that, even a computer can do it."

The reason a Turing Test is complicated is that to carry on a conversation requires a surprisingly complex understanding of the world. For example, consider the "wug test," which human children can pass starting from an extremely early age. You make up a new word, "wug," and explain what it means, then have conversations about it. In one classic example, the experimenter shows the kids a whiteboard, and rubs a sponge which he calls a "wug" across it, which (thanks to some dye) marks the board purple. Human children will spontaneously talk about "wugging" the board; but they will never say that they are "wugging" the sponge. (It turns out that this has to do with how, when we put together sentence structures, the grammar we use depends a lot on which object is being changed by the action. This is why you can "pour water into a glass" and "fill a glass with water," but never "pour a glass with water" or "fill water into a glass.") 

It turns out that even resolving what pronouns refer to is AI-complete. Consider the following dialogue:

Woman: I'm leaving you.
Man: ... Who is he?

If you're a fluent English speaker, you probably had no difficulty understanding this dialogue. So tell me: who does "he" refer to in the second sentence? And what knowledge did you need in order to answer that?

(If you want to learn more about this kind of cognitive linguistics, I highly recommend Steven Pinker's The Stuff of Thought [http://www.amazon.com/The-Stuff-Thought-Language-Window/dp/0143114247] as a good layman's introduction.)

In Turing's proposal, the test was always administered by a human: the challenge, after all, was to see if a computer could be good enough to fool a human into accepting it as one as well. But given that we're getting computers which are doing a not-bad job at these tests, I'm starting to wonder: how good would a computer be at identifying other computers?

It might be easier than passing a Turing Test. It could be that a computer could do a reasonable job of driving "ordinary" conversation off the rails (that being a common way of finding weaknesses in a Turing-bot) and, once a conversation had gone far enough away from what the computer attempting to pass the test could handle, its failures would become so obvious that it would be easy to identify.

It might be harder than passing a Turing Test. It's possible that we could prove that any working Turing Test administrator could use that skill to also pass such a test -- but not every Turing Test-passing bot could be an administrator. Such a proof isn't obvious to me, but I wouldn't rule it out.

Or it might be equivalently hard: either equivalent in the practical sense, that both would require AI-completeness, or equivalent in the deeper mathematical sense, that if you had a Turing Test-passing bot you could use it to build a Turing Test-administering bot and vice-versa. 

If there is a difference between the two, then this might prove useful: for example, if it's easier to build a judge than a test passer, then Turing Tests could be the new CAPTCHA. (Which was Chris Stehlik's original suggestion that sparked this whole conversation) 

And either way, this might tell us something deep about the nature of intelligence.

Comments

Popular posts from this blog

Knives Out (2019)