The recent actions against ZTE have acted as a catalyst for China’s chipmaking and AI sector. But well before all the international spotlight, there were a group of companies who already started to bolster their core tech capabilities.

Rokid, a Hangzhou-based startup which specializes in robotics research and AI development, is about to launch and mass-produce its own dedicated voice-first AI chip after two years of research and development. The company told TechNode that the custom AI chip is more power efficient, lower in cost, and better designed for third-party vendors, OEMs, and small appliance manufacturers. The chip’s specifications will be unveiled at the “Rokid Jungle” event in Hangzhou on June 26, along with new product developments and major partnerships.

Founded in 2014, the company’s product lineup includes smart speaker Rokid Pebble, home AI assistant Alien, and AR Glass which are currently available in China.

TechNode spoke to Dr. Zhou Jun, who headed Samsung’s Semiconductor Institute in China prior to joining Rokid as vice president in April, about the new AI chip and its significance in the AI chip wave that we’re witnessing now.

Dr. Jun Zhou (l) and Misa Zhu (r), Rokid founder & CEO (Image Credit: Rokid)

Voice recognition is all the rage

In China, voice recognition is an increasingly competitive market that has bred a handful of prominent AI companies like iFlytek (科大讯飞), Aispeech (思必驰), and Unisound (云知声). Chinese tech powerhouses have also been scrambling to get their share in the smart speaker market. iFlytek and Huawei recently announced that they have signed a cooperation agreement, in a large part, to enhance consumer voice recognition technology.

Read More: iFlytek’s journey from the bottom to the top of China’s voice AI industry

Despite tough competition, Zhou said, Rokid is doing well in the vertical because it has an obvious advantage. Unlike some companies that focus on specific aspects of AI product development (for example signal processing), Rokid has experience and knowledge in developing both the front-end and back-end technologies for their products.

The 4-year-old startup has been innovating and optimizing its voice-recognition algorithms such as noise reduction in the front-end, and speech recognition and speech understanding in the back-end.

Rokid started developing its voice-first AI chip back in 2016, when the AI voice recognition hardware space was—relatively speaking—a no man’s land. Getting a head start bid well for Rokid since AI chip development is generally a year-long process. The company said they initially developed the AI chip for their own smart devices because even though tech pioneers like Google, Apple, Amazon had started developing voice recognition technologies, there weren’t many companies in China developing voice recognition hardware.

But the market has been heating up since last year as an increasing number of companies bet on smart speakers—consumer voice recognition biggest application—as the “next big thing” in consumer electronics. Alibaba’s Tmall Genie, Xiaomi’s MI AI Speaker, JD’s DingDong, and most recently in April Tencent launched its own smart speaker, TingTing.

General purpose vs. custom chips

“We discovered that developing AI products on general-purpose chips is more power-consuming and costly, which is a clear disadvantage to the implementation and development of such a powerful technology,” Zhou explained. 

Zhou explained that Rokid’s self-developed algorithms could not run or load optimally on general purpose chips, which don’t have the custom digital signal processor (DSP) nor Neutral Processing Unit (NPU).

“Developing AI products like smart speakers involves other front-end algorithms like noise reduction and acoustic echo cancellation (AEC) algorithms, which, in reality, need more powerful computational capabilities [than what general purpose chips can offer],” he added.

Rokid’s AI chip is tailored to voice recognition systems—they’ve developed their own DSP and NPU tailored for smart speakers. General purpose chips perform well for a broad range of applications but are less efficient for specific tasks.

The development of voice recognition technologies is still in early stages and there are still many areas that still need a breakthrough, such as multi-person voice recognition, Zhou said. 

Towards the edge

“Back in 2014, there were discussions in academic circles surrounding AI applications but there weren’t many real-world edge AI applications like smart speakers.” But the trends in AI applications are becoming more and more apparent: it is moving towards the edge.

In the age of AI, data is being generated and gathered from different sources like smartphone, drone, sensor, or autonomous vehicle. The massive data computing demands gave rise to information processing closer to its source (or the edge of the network) instead of sending it to data centers or clouds.

Read More: How a Taiwan-based AIoT startup is taking on the next big wave

Now that chipset and system software are being integrated more tightly, and many big companies are moving their processing capability from cloud to the edge—to share the processing load and overcome some of the vulnerabilities associated with the cloud. 

“I still think Rokid’s decision to make its own standalone edge product is forward-thinking,” Zhou said as AI moves rapidly into edge devices, more standalone edge products are surfacing. Voice recognition was previously used in cloud or smartphone like Siri, but in standalone devices are quite recent.

The ZTE ban is now commonly referred to as a wake-up call for the Chinese chip industry to see the heavy reliance on foreign technology. Although China may be lagging behind technology advanced nations like the US, with the government’s AI development plan and the “Made in China 2025” strategic plan, it is hard to say how quick the tables will turn.

“I reckon that with or without the recent ZTE ban, the development of integrated circuit technology in China is moving forward steadily,” Zhou said.