The three major mobile operating systems all have their respective digital voice assistants who help users with various tasks, simple or complicated, throughout the day. But what happens when things get really tough?
Cortana, Google Now and Siri were asked a series of incredibly specific and complex questions using a random query generator made by two developers, Daniel and Aimee Hendrycks, using a test called PAAIST (Personal Assistant Artificial Intelligence Strength Test) and their results conclude that Cortana is seriously lacking when you want assistance with something less trivial than the weather.
Out of the 60 questions asked from each of the assistant using this generator, Microsoft's Cortana only answered 11.7% correctly, Siri answered 25.8% correctly whereas Google Now lead the race with exactly a third (33.3%) of the questions answered.
The questions asked were extremely specific and tough like "Sphere A, with a charge of 258 micro C, is located near another charged sphere B. Sphere B has a charge of 784 micro C, and is located 47.6 cm to the right of A. What is the force of sphere B on sphere A?", and "Zelda strikes a 0.436 kg golf ball with a force of 106 N and gives it a velocity of 3 m/s. How long was Zelda’s club in contact with the ball?"
The full list of the questions has been revealed by the two developers, and can be seen below,
- What is the principle quantum number of selenium?
- Is radon an actinide?
- What will be the weather in two days?
- What will be the weather Sunday?
- How long is the movie Fanny and Alexander?
- What is the genre of the movie The Fighter?
- What is the genre of the show The Sopranos?
- Who is the director of the series John Doe?
- What is The University of Kansas’s rank in Higher Education Administration?
- What is UCLA’s rank in Finance?
- Find the limit of 5x/(1+x^3) as x approaches infinity.
- Determine the interior angles of a rhombus with side lengths 1 and 3.
- What is the special ability of the Pokemon Scrafty?
- What type or types is the Pokemon Victreebel?
- Which studio developed the game Mario Tennis?
- How long does it typically take to complete the game Dragon Age: Origins?
- What is the governmental leader of Greece?
- What is the median home value in Macedonia?
- What was the release date of Where the Wild Things Are?
- Who wrote The Girl with the Dragon Tattoo?
- What animals are related to anteaters?
- What is the diet of a cichlid?
- Where and when was Peter Norvig born?
- Who are relatives of Niels Henrik Abel?
- What is the midcareer salary of a Aircraft Maintenance Engineer (Structures)?
- What is the midcareer salary of a Fire Fighter?
- Where, near here, can I get a pineapple?
- How much vitamin D3 is in a margarine teaspoon?
- Give me directions to the nearest Chinese restaurant.
- Give me directions to the nearest bakery.
- What are ways to prevent typhoid fever?
- Does scarlet fever go away on its own?
- What is Commercial International Bank’s total assets?
- What is the YTM of a zero-coupon bond with a face value of $3000 a current price of 2500 and a maturity of 26 years?
- What is the cost of miracle grow?
- What is a complementary product for pots?
- What is “cat” translated to Lao?
- Did the Houston Texans win?
- Did the Dallas Cowboys win?
- Show me sad images related to Bleach, if available.
- Show me animated images related to Middle of the Earth in Ecuador, if available.
- Tell me an argument for and against the claim, “It is morally permissible to kill one innocent person to save the lives of more innocent people.”
- Tell me an argument for and against the claim, “Justice requires the recognition of animal rights.”
- How do people get alzheimer’s?
- How does mindfulness reduce depression?
- Why is gold considered valuable?
- Why are people against cloning?
- When is the next IMO test?
- When is the next Chicago Marathon?
- Whose epitaph reads Lived a philosopher died a Christian?
- According to the proverb which fruit tastes sweetest?
- Sphere A, with a charge of 258 micro C, is located near another charged sphere B. Sphere B has a charge of 784 micro C, and is located 47.6 cm to the right of A. What is the force of sphere B on sphere A?
- Zelda strikes a 0.436 kg golf ball with a force of 106 N and gives it a velocity of 3 m/s. How long was Zelda’s club in contact with the ball?
- Play me a song or piece in the genre K-pop.
- Play me a song or piece in the genre trad jazz.
- Bring me to a credible report describing that perceived price of an object affects the experience one has with the object.
- List the countries that allow autonomous vehicles.
- Advise me how to reduce my mortality rate given that I am a middle-aged woman.
- Advise me how to improve sustained attention and executive processing.
Looking at the difficulty level of the questions asked, it's not surprising to see that none of the assistants could correctly answer even half of the queries. Specifics have not been revealed about which questions were answered correctly by each of the three assistants.
The working of the Query Generator and method of awarding points on correct answers is described by the developers as,
The Query Generator randomly selects a type of question and from there it randomly selects attributes for that type of question. For example, the generator might select a geometry problem, then a rhombus problem, and then it will run the code.
Points are assigned as follows:
0%: Award a grand total of nothing if the AI answers incorrectly, does not answer, or cannot understand the question
75%: Possible only for relatively long answers. In this case, give 75% if the AI returns a correct paragraph, video, etc. with unnecessary information and without highlighting the relevant part (e.g., bolding text, skipping to the relevant part of the video, etc.). There must be a distinction between the answer and a list of links.
100%: Give full credit if it answers correctly and, if the response is relatively long, highlights or reads the relevant part
Sum the percentages and divide by the number of questions.
While the results may not be surprising for some, they do offer a reasonable insight on the capability of answering questions of each of the three voice assistants, and it's quite clear that the AI tech behind the assistants isn't quite there yet.
63 Comments - Add comment