by George Dvorsky
Anyone who plays video games knows that game
bots, artificially intelligent virtual gamers, can be spotted a mile
away on account of their mindless predictability and utter lack of
behavioral realism. Looking to change this, 2K Games recently launched the BotPrize competition,
a kind of Turing Test for nonplayer characters (NPCs). And remarkably,
this year's co-winner, a team from The University of Texas at Austin,
created a nonplayer character (NPC) that was so realistic that it
appeared to be
more human than the human players — which is kind of a problem when you think about it.
Neuroevolution
To
create their super-realistic game bots, the software developers, a team
led by Risto Miikkulainen, programmed their NPCs with pre-existing
models of human behavior and fed them through a Darwinian weeding-out
process called neuroevolution. Essentially, the only bots that survived
into successive generations were the ones that appeared to be the most
human — what the developers and competition organizers likened to
passing the classic Turing Test (an attempt to distinguish AIs from
actual humans).
With each passing generation, the developers re-inserted exact copies
of the surviving NPCs, along with slightly modified (or mutated)
versions, thus allowing for ongoing variation and selection. The
simulation was run over and over again until the developers were
satisfied that their game bot had evolved the desired characteristics
and behavior. And in fact, Miikkulainen and his team have been refining
their virtual player over the past five years.
Humanness
The
final manifestation of their efforts was dubbed UT^2 — and it was this
NPC that went head-to-head against human opponents and other game bots
at the 2K Games tournament.
And the game of choice? Unreal
Tournament 2004, of course. The game was selected on account of its
complex gameplay and 3D environments — a challenge that would require
humans and bots to move around in 3D space, engage in chaotic combat
against multiple opponents, and reason about the best strategy at
crucial moments. Moreover, the game is also capable of bringing about
some telltale human behaviors, including irrationality, anger, and
impulsivity.
As each player (human or otherwise) worked to
eliminate their opponents, they were subsequently assessed for their
"humanness." By the end of the tournament, there were two clear winners,
UT^2 and MirrorBot (developed by Romanian computer scientist Mihai
Polceanu). Both NPCs scored a humanness rating of 52%, which is all fine
and well except for the fact that the human players scored only 40%.
In other words, the game bots appeared to be more human than human.
Limits of the Turing Test
Now,
this is a serious problem. Human players should have been assessed with
a humanness rating of 100%, not 40%. Clearly, the judges utterly failed
to identify true human characteristics among the human players. So by
consequence, UT^2 and MirrorBot essentially achieved a rating better
than 100% — which is impossible. How can something be more than
something you're trying to emulate?
And indeed, this experiment
is a good showcase for the limits of the Turing Test. Admittedly, the 2K
Games tournament wasn't meant to be a true Turing Test, merely one that
measured the humanness of NPCs in a very specific gaming setting. That
said, the results demonstrated that human behavior is much more complex
and difficult to quantify than we tend to think. Human idiosyncrasies,
plus the ability to adapt and counter-adapt to attempts to identify it,
will likely forever put it beyond the reach of a simple Turing Test.
For
example, given the implications of the 2K Games tournament, how are we
supposed to assess something like a chatbot for its humanness now that
we know something can apparently appear to be more human-like than
humans? Moreover, given all the subjectivity involved in the evaluation,
how accurate is any of this?
Perhaps its time to retire the Turing Test and come up with something a bit more....scientific.
http://io9.com/5947796/how-can-a-game-bot-score-higher-than-humans-on-a-turing-test