MOST Launches โ€œAI Voice Data Setโ€ to Assist Chinese AI Language Technology โ€“ CTIMES

What happened: Taiwanโ€™s Ministry of Science and Technology (MOST) released 400 hours of Chinese-language voice data to the public for use as training material for AI-powered voice applications. According to CTimes, the dataset includes self-recordings, as well as โ€œdata related to police and educational broadcasts.โ€ It will be uploaded onto the National Center for High Performance Computingโ€™s (NCHC) Data Market Platform, and is the first of multiple planned releases from MOSTโ€™s collection of 2,000 to 3,000 hours of voice data. 

Why itโ€™s important: Voice recognition tech is one of the hottest subcategories of Chinaโ€™s rapidly growing AI industry, as evidenced by the countryโ€™s voice recognition unicorn and $9 billion-valued iFlyTekโ€™s recent efforts to raise up to $350 million to invest in AI startups worldwide. And while iFlyTek and Chinaโ€™s other tech giants may hold a sizable share of the $55 billion voice recognition market, MOSTโ€™s data release can be particularly helpful to smaller players who lack large straining sets and are looking to improve the quality of their machine-learning processes.

Leave a comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.