Microsoft patents AI device with fisheye camera and multiple microphones

Microsoft's latest patent showcases an ambient capture device which includes a fisheye camera and a microphone array. This system features much less complex design compared to similar tech today.

Microsoft has often highlighted the importance of AI in the recent past, even predicting its various applications to contribute $5 trillion to global GDP growth in the next few years. Many of the company's partnerships in the past couple of months have also been centered around the same field.

Now, a new Microsoft patent published this month has emerged, showcasing an AI system with a single fisheye camera and multiple microphones.

Fisheye cameras have special lenses that enable them to monitor a full 360 degree view. Although the technology isn't new by any means, Microsoft plans to combine it with ambient capture devices in a different manner. Such ambient capture systems typically have multiple cameras to expand their field of view. This requires complicated designs and also results in the need for additional hardware to fuse multiple audiovisual streams of data.

As such, Microsoft has proposed an upward-facing fisheye camera to be utilized with these devices, along with a microphone array. This would resolve the difficulties associated with capturing moving objects, or focusing on the movement of a single object among multiple targets. Plenty of techniques have been described regarding the hardware placement. For example, the majority of microphones may be placed in a circular or hexagonal pattern, with one other place at the center point in the same plane. Similarly, it is desirable for the fisheye camera to be located in close proximity to the microphone array, as well as to a floor or table surface, so as to capture data in an optimal manner.

A fusion model may also work as part of this system, using deep learning algorithms to gain better intuition regarding the audio and visual data it gathers. As an example, a long term short memory (LSTM) recurrent neural network may be deployed. Through its inherent nature, this kind of network can store contextual and historical information, making any undertaken analysis much more useful for future use-cases as well.

*AI system with capture device and cloud server*

Based upon the described model, it is noted that the device's uses could could be further expanded, such as to recognizing speech from an identified human speaker. This essentially means that the tech may be incorporated with digital assistants like Cortana as well. Microsoft notes that the described setup can enable digital assistants to record richer information with regards to their environment, while simultaneously recognizing and acting upon the higher quality sound and video cues received from users with relative ease. The fisheye camera and microphone array would serve as in integral part of this application. Data collected from these could take advantage of aforementioned LSTM models to carry further analysis or fusion in an AI cloud server, after storing it in compressed form locally.

However, the system's deployment is not only limited to digital assistants. Any of the described techniques may well allow it to work with PCs, tablets, mobile phones, and more. Either way, as interesting as it sounds, there's no guarantee Microsoft plans to introduce such a device any time soon, if at all.