This year’s Baidu World was aimed at showing how the company’s tech will “Bring AI to life”, and the video and voice recognition demonstrated could start to have a significant impact on how we interact with what now seems like fairly traditional: maps and online video. DuerOS 2.0 was announced, just four months after the initial system was launched in July. Five DuerOS powered devices a month are being launched by third-party developers.
In his opening keynote, Baidu CEO Robin Li announced that every day there are around 218.8 billion uses of the Baidu Brain, the central hub that handles AI tasks such as natural language processing and voice recognition. The repeated message given throughout the day was that we live in a complicated world and Baidu is striving to simplify it for us with a vast array of products. As DuerOS continues to improve and more of Baidu’s AI functions are being shared with partners via its open platform—over 80 capabilities being used by 370,000 partners—Baidu’s AI will spread, though most announcements were still from Baidu’s own departments.
Conference goers were so keen to get a vision of this easy life, a scuffle even erupted at the doors after seats ran out inside.
AI and Video
The AI analysis of video and DuerOS TV capabilities demonstrated at Baidu World were perhaps the most intriguing of the day’s announcements. Baidu’s online video portal iQiyi wants to become a “large-scale entertainment company that is powered by creative technology,” announced Gong Yu, CEO of iQiyi. “Baidu understands entertainment even better” was one of the slogans that actually held water. Baidu is using AI to try to analyze content to work out why things work so well, which will impact how it works with content creator partners to generate more hits, but it will impact on viewers too.
The example given was an AI analysis of an episode of The Rap of China, a smash hit remake of South Korea’s Show Me the Money TV rap competition. Running the video plus the rest of Baidu and iQiyi’s data through the analysis and the show is made searchable via lyrics, song names, participants and a heat map is generated of the whole show based on user interaction, allowing rapid editing of highlights for show producers and a way to skip to the best bits for viewers.
Agonizing conversations of “What’s his name? You know, he was in that show with that other man in from that other show” may be about to disappear forever.
TVs equipped with Baidu’s DuerOS—a system of hardware and software that brings AI capabilities to devices and conversation-based interaction with users—will offer similar functions. The demo showed a viewer talking to his TV while watching Wolf Warrior 2. “Who’s the actor on the left?” he asks and the film pauses, a box appears around the face of the actor on the left and an infobox inset pops up with his name, which the TV also reads out. Later he asks to go to a scene with tanks. The film jumps to the scene and plays it. He asks what the music is. “Tank Ballet,” the interface tells him, and the viewer asks for the track to be saved to his favorites. He then asks to see all the scenes that include a certain actor and the interface goes to a panel of clips with him in.
Robin Li announced that Baidu is going to work with authorities to bring intelligent transportation to the Xiong’an New Area which is to be built 100km southwest of Beijing. Baidu is also partnering with bus manufacturer King Long to mass-produce driverless buses by July 2018.
Li also demonstrated a facial recognition device that can be fitted to truck cabs. The camera scans the driver’s face for signs of fatigue and will play loud dance music if the driver starts to fall asleep, similar to the CarRobot device that anyone can install. Trying to hide this behind sunglasses will not fool it.
Apollo, Baidu’s open source autonomous driving platform, will become more integrated with other devices as the company’s AI follows users around. If you’re watching a TV show or listening to a certain artist at home, go out to your car and the entertainment will pick up where you left off. And tell your smart speaker to turn on the car’s air conditioning before you head outside.
Baidu’s mapping functionality will improve the more it gets to know you as a user. By building up an ever greater picture of your life (“Baidu understands you better” “Baidu has been getting to know you for 17 years” were some of the slightly Orwellian slogans popping up throughout the day), the map will use context to better understand what a user wants.
The app will accept voice commands and will now accept voice requests while already navigating a journey. Voice commands such as “Xiaodu xiaodu” (“小度小度”) wakes the voice recognition. If it’s for a restaurant along the way, the map will already have plotted a route and when you arrive and head indoors, the map will be able to continue navigating you right to the very building. And it will already know whether, for example, your trip to the Kerry Center in Beijing (where part of Baidu World was held) is for work or as a visit and will offer information accordingly, such as whether your beloved Starbucks is nearby.
The voice recognition system used by DuerOS has been trained with over 2,000 hours of recording to better identify whether the sound ambient noise of the user speaking to the device is that of a car and whether it’s moving.
Voice Recognition, Singing and Music
Baidu has been working on voice for 7 years and has highly accurate voice recognition and semantics, particularly in Mandarin though other Chinese languages are not far behind. Developments emerging are the detection of user age, gender, and mood. Baidu has signed a strategic partnership with Qualcomm to build chips for the DuerOS developer kits that are embedded in products.
The system has been tweaked so that users can speak any way they want and don’t have to effect a certain style just for speaking to devices. Speak quickly to a DuerOS device and it will recognize that you are in a rush and speak back to you more quickly. The time to respond is now 1.4 seconds, quicker than competitor systems. The new software has also improved speech synthesis to give a much more human sound to the software.
If it detects that the person speaking is a child, different content will be suggested in the form of search results, with an emphasis on visuals and educational AR.
Song recognition is now integrated. Sing a song and it will try to work out what it is and start playing it for you. Users can initiate voice searches for track names, lyrics, artists, genres and song language. Devices can integrate multiple music streaming clients which allows a user to tell a speaker to play tracks by a list of different artists and it will curate a playlist.
DuerOS will become more powerful as more devices around the home are integrated. Ask your speaker when the next Liverpool match is and it will tell you, then ask if you want a reminder of when it’s on and even switch on your TV for you. Also, Baidu Waimai and Ele.me are developing skills for DuerOS devices listening to you to pick up your take away food orders.