Passive voice activation is coming to Cortana

Cortana has been the subject of many conversations over the past two weeks as Marcus Ash, a program manager for the platform, has been spilling details about the personal digital assistant. What started off with a bit of confusion about Cortana possibly heading to iOS and Android led to many more tweets about the feature, and in the latest example, Ash revealed that voice activation is coming to the platform.

In a tweet, Ash said that as the team is designing voice activation, one factor they need to keep in mind is ambient background noise. Ash was referring to the issue where some Xbox One commercials have gained the ability to turn on Xbox One consoles when they are played. 

It shouldn't be a major surprise that Microsoft will bring this passive listening feature to Cortana. Currently, the only way to launch Cortana is to hold the search button or to tap the live tile. Google has a similar feature where you can say 'Ok Google' to launch its digital assistant; Apple will have a similar feature with Siri in iOS 8, but it has limitations. Of course, Xbox One already has this feature too.

Seeing that Ash is talking about the feature publicly, it would seem that it is in development but there are no specifics on when this feature will arrive on your device.

Source: Marcus Ash

Report a problem with article
Previous Story

With launch date looming, Surface Pro 3 ship dates start slipping

Next Story

Microsoft redesigns Facebook beta app for Windows Phone, adds video upload

20 Comments

Commenting is disabled on this article.

Does that mean they are thinking about adding the feature, or that they are thinking about how to avoid accidental activation of it? I hope it means they're working on it, it'd be great. Esp with Google and Siri (with iOS 8) both having it.

I am concerned about security though, will it be able to detect your voice well enough? Otherwise anyone will be able to do things with your phone.

Both; the ability to reject ambient noise and detect real commands is the part that they're talking about refining here.

Starting a question with 'Cortana' should be enough. Such as 'Cortana, what is the weather like?' And if you use the word 'Cortana' mid-sentence the app will start a response and cancel once it recognizes no question was asked.

It is annoying if a Xbox or Cortana commercial is one but then again it is the only way for Microsoft to show off how it works. If they change around the voice command for the commercial then it will only confuse future owners / make it seem less cool on TV.

One solution would be voice recognition i.e. having Cortana / Xbox only respond to your voice.

That's asking for a lot of extra complexity at v1 for people who don't want to use the name Cortana.

Like me. Because I think the name is stupid. ;P

Though Samsung managed to allow custom trigger words for S Voice, so I guess it's a question of implementation. Meanwhile, Android's "Ok Google" barely has international support.

Joshie said,
That's asking for a lot of extra complexity at v1 for people who don't want to use the name Cortana.

Like me. Because I think the name is stupid. ;P

Though Samsung managed to allow custom trigger words for S Voice, so I guess it's a question of implementation. Meanwhile, Android's "Ok Google" barely has international support.

Didnt they find code in Cortana that suggest room for different personalities? An important step in having your own assistent is being able to give it its own name. But that is much further down the line. At the moment its all about promoting VA's by giving them one (sassy) personality that they can market. Naturally not everyone can like the personality or name of this VA...

Sir Topham Hatt said,
But then when you're discussing Cortana with your friend, and your phone in your pocket, what happens then?

Needs to be a bit more than just that.

Can you give an example of an exact sentence? Because I already described how it could work in real life circumstances. I'd like to hear a sentence that would break the rules I described. If not then I cant really answer your question, I would just be guessing at what you mean.

Ronnet said,

Didnt they find code in Cortana that suggest room for different personalities? An important step in having your own assistent is being able to give it its own name. But that is much further down the line. At the moment its all about promoting VA's by giving them one (sassy) personality that they can market. Naturally not everyone can like the personality or name of this VA...

Much, much further down the line. We still can't even choose whether the voice is male or female on any of these platforms. Android offers male/female and regional dialect settings in its accessibility menu...which has absolutely no effect whatsoever on regular use cases. Map navigation voice cues? No effect. Ok Google voice responses? No effect.

This level of customization has been so slow to come that I doubt we'll see it even in the next five years. Easily ten. These product roadmaps are totally sold on the studies that show yada yada type of voice is the 'best' to use for whatever language. So for the last and the next 10 years, nothing changes.

Joshie said,

Much, much further down the line. We still can't even choose whether the voice is male or female on any of these platforms. Android offers male/female and regional dialect settings in its accessibility menu...which has absolutely no effect whatsoever on regular use cases. Map navigation voice cues? No effect. Ok Google voice responses? No effect.

This level of customization has been so slow to come that I doubt we'll see it even in the next five years. Easily ten. These product roadmaps are totally sold on the studies that show yada yada type of voice is the 'best' to use for whatever language. So for the last and the next 10 years, nothing changes.

Are you serious? Cortana is only 2 months old... you're complaining that progress is slow? I'm suprised that they've already pushed as many upgrades as they did. In a year from now Cortana will be very different from what she is today. Whether it will include customization will depend on how they want to market the app. But if they want to go for customization then I do expect them to deliver it within the next two years easily.

Ronnet said,

Are you serious? Cortana is only 2 months old... you're complaining that progress is slow? I'm suprised that they've already pushed as many upgrades as they did. In a year from now Cortana will be very different from what she is today. Whether it will include customization will depend on how they want to market the app. But if they want to go for customization then I do expect them to deliver it within the next two years easily.

I'm talking about the voice industry as a whole. The pace of one player in a short period of time absolutely in no way indicates they're going to maintain that pace. My point was that this has been a struggle across the industry. Microsoft isn't magically immune to this. They're undoubtedly partnering with other players.

Joshie said,

I'm talking about the voice industry as a whole. The pace of one player in a short period of time absolutely in no way indicates they're going to maintain that pace. My point was that this has been a struggle across the industry. Microsoft isn't magically immune to this. They're undoubtedly partnering with other players.

Ah ok, I guess we weren't talking exactly about the same thing then. But could it be that the industry sa a whole simply didn't feel the need to customize?

Right now the most important aspect is improving the quality of the service with regards to response rate and accuracy. There is so much to gain on that aspect that nobody is considering simply stuff like customizing the personality of the service.

It seems to me Google isnt even interested in offering a personality and Apple likes that they're giving Siri one specific reputation. Microsoft is the new kid on the block so they could fuel the fire by competing on customization (in part to hide the fact that they're still behind on quality).

Ronnet said,

Ah ok, I guess we weren't talking exactly about the same thing then. But could it be that the industry sa a whole simply didn't feel the need to customize?

Right now the most important aspect is improving the quality of the service with regards to response rate and accuracy. There is so much to gain on that aspect that nobody is considering simply stuff like customizing the personality of the service.

It seems to me Google isnt even interested in offering a personality and Apple likes that they're giving Siri one specific reputation. Microsoft is the new kid on the block so they could fuel the fire by competing on customization (in part to hide the fact that they're still behind on quality).

I think, like many sciences and areas of research, priorities differ from group to group. It's hard to say what's most important across the board, but it's a little easier to say what the biggest challenges are, even if they aren't all getting much attention yet.

Voice synthesis and voice recognition are pretty independent of each other in terms of R&D. The personal assistant concept manifesting in Siri, Cortana, and so on, are just examples of where these two fields meet. Voice synthesis is going to be more heavily focused on in the accessibility field and areas of software design research for people with, say, impaired vision.

Voice recognition is broad and can be integrated into products that have no need for synthesis. It's also dramatically more complex. Synthesis can be "good enough" and, like someone just beginning to learn a new foreign language, still get the idea across to the listener for successful communication.

Recognition doesn't have that much wiggle room. "Getting the idea across" isn't good enough to users, and might not even be possible. 20% success rates in recognition would probably result in a completely unintelligible output. For any viability at all, 90-95% or better is the minimum acceptable success rate to keep people from rage quitting the product entirely. And even that might not be good enough--software hasn't figured out how to say "You wanted to jump over the what?" or "What was the last word you said?" So correcting misinterpretation is tedious and not quite "magical".

And don't even get me started on accents and speech impediments. Internationalization is a massive challenge, and from a business perspective, it's hard to convince the decision makers that the UK English setting needs a way to handle Norwegian accents if it's going to be the only English in a "Europe" package. Intelligently recognizing different accents from the same room of users is depressingly far away from us.

So compared to recognition, synthesis is not only a separate field of research, but a dramatically less complex one where, if the challenges aren't fewer in number, they're smaller in scope. Gender and variations require little in the way of intelligence, and don't care anything about the unique characteristics of the user. Such a level of customization would just be for the sake of vanity and differentiation--stuff your marketing department would ask for, and not a "real" challenge.

It should be noted that Google has a perfectly good male voice synthesis option in Android's accessibility settings menu. It sounds really good, especially if you download the HQ package. I was actually stunned by how natural it sounds compared to where we were just a few years ago. Why they don't allow your accessibility setting to apply to any of Google's apps, but only to accessibility-related functions (read the screen, etc), is baffling to me, and part of why I suspect that there's a willful dismissal of it as an option.

"Is Cortana the new voice assistant?" <-- it's a question and Cortana won't be listening to what is before her name.
You're not directly asking her a question if you're asking your friend that.

The newer Qualcomm chips already out in the wild support this kinda stuff (low power listening, see: Sensor core API), so it really was just a matter of time.

Yes, I hope it can be turned off. While I would dearly LOVE the feature, I'm absolutely certain it will be a huge drain on the battery.