主讲人介绍:
Prof. Dong Wang is an associate professor at Tsinghua University, the deputy dean of the Center for Speech and Language Technologies (CSLT) at Tsinghua University. He obtained the Bachelor and Master degrees at Tsinghua University, and the PhD degree at the University of Edinburg in 2010. Prof. Wang worked in Oracle China, IBM China, EURECOM France and Nuance US. He worked on speech processing since 1998, and published more than 140 academic papers. He is the chair of APSIPA SLA track, and serves as a distinguished lecture during 2018-2019.
内容摘要:
State-of-the-art speech processing technologies, including speech recognition, language recognition, speaker recognition, are mostly based on large-scaled deep neural nets trained with large amount of data. This approach, however, cannot fully utilize the information embedded in speech signals, which are assumed to be highly complex and convolved in an unknown way. Recently, we found that a deep generative model is powerful to simulate the speech production process, paving the way of factorizing speech signals into independent information factors. This new approach integrates both generative and discriminative models, and combines the capability of neural nets and Bayesian methods.