www.elsevier.com/locate/neucom A dynamic algorithm for blind separation of convolutive sound mixt... more www.elsevier.com/locate/neucom A dynamic algorithm for blind separation of convolutive sound mixtures
Computational and Applied Mathematics Seminar, Nov 10, 2004
Sound signal processing is based on spectral analysis normally done with Fourier type transforms.... more Sound signal processing is based on spectral analysis normally done with Fourier type transforms. In this talk, we discuss an invertible transform with built-in auditory filter characteristics, its mathematical properties and applications.
Multi-resolution paths and multi-scale feature representation are key elements of semantic segmen... more Multi-resolution paths and multi-scale feature representation are key elements of semantic segmentation networks. We develop two techniques for efficient networks based on the recent FasterSeg network architecture. One is to use a state-of-the-art high resolution network (e.g. HRNet) as a teacher to distill a light weight student network. Due to dissimilar structures in the teacher and student networks, distillation is not effective to be carried out directly in a standard way. To solve this problem, we introduce a tutor network with an added high resolution path to help distill a student network which improves FasterSeg student while maintaining its parameter/FLOPs counts. The other finding is to replace standard bilinear interpolation in the upscaling module of FasterSeg student net by a depth-wise separable convolution and a Pixel Shuffle module which leads to 1.9% (1.4%) mIoU improvements on low (high) input image sizes without increasing model size. A combination of these techniques will be pursued in future works.
Variational auto-encoder(VAE) is an effective neural network architecture to disentangle a speech... more Variational auto-encoder(VAE) is an effective neural network architecture to disentangle a speech utterance into speaker identity and linguistic content latent embeddings, then generate an utterance for a target speaker from that of a source speaker. This is possible by concatenating the identity embedding of the target speaker and the content embedding of the source speaker uttering a desired sentence. In this work, we found a suitable location of VAE’s decoder to add a self-attention layer for incorporating non-local information in generating a converted utterance and hiding the source speaker’s identity. In experiments of zero-shot many-to-many voice conversion task on VCTK data set, the self-attention layer enhances speaker classification accuracy on unseen speakers by 27% while increasing the decoder parameter size by 12%. The voice quality of converted utterance degrades by merely 3% measured by the MOSNet scores. To reduce over-fitting and generalization error, we further app...
G-equations are level-set type Hamilton-Jacobi partial differential equations modeling propagatio... more G-equations are level-set type Hamilton-Jacobi partial differential equations modeling propagation of flame front along a flow velocity and a laminar velocity. In consideration of flame stretching, strain rate may be added into the laminar speed. We perform finite difference computation of G-equations with the discretized strain term being monotone with respect to one-sided spatial derivatives. Let the flow velocity be the time-periodic cellular flow (modeling Rayleigh-Bénard advection), we compute the turbulent flame speeds as the asymptotic propagation speeds from a planar initial flame front. In strain G-equation model, front propagation is enhanced by the cellular flow, and flame quenching occurs if the flow intensity is large enough. In contrast to the results in steady cellular flow, front propagation in time periodic cellular flow may be locked into certain spatial-temporal periodicity pattern, and turbulent flame speed becomes a piecewise constant function of flow intensity....
The approach combines second and fourth order statistics to perform BSS of instantaneous mixtures... more The approach combines second and fourth order statistics to perform BSS of instantaneous mixtures. It applies for any number of receivers if they are as many as sources. It is a batch algorithm that uses non-Gaussianity and stationarity of source signals. It is linear algebra based direct method, reliable and robust, though large dimensions of sources may slow down the computation significantly. It is however limited to instantaneous mixtures.
www.elsevier.com/locate/neucom A dynamic algorithm for blind separation of convolutive sound mixt... more www.elsevier.com/locate/neucom A dynamic algorithm for blind separation of convolutive sound mixtures
Computational and Applied Mathematics Seminar, Nov 10, 2004
Sound signal processing is based on spectral analysis normally done with Fourier type transforms.... more Sound signal processing is based on spectral analysis normally done with Fourier type transforms. In this talk, we discuss an invertible transform with built-in auditory filter characteristics, its mathematical properties and applications.
Multi-resolution paths and multi-scale feature representation are key elements of semantic segmen... more Multi-resolution paths and multi-scale feature representation are key elements of semantic segmentation networks. We develop two techniques for efficient networks based on the recent FasterSeg network architecture. One is to use a state-of-the-art high resolution network (e.g. HRNet) as a teacher to distill a light weight student network. Due to dissimilar structures in the teacher and student networks, distillation is not effective to be carried out directly in a standard way. To solve this problem, we introduce a tutor network with an added high resolution path to help distill a student network which improves FasterSeg student while maintaining its parameter/FLOPs counts. The other finding is to replace standard bilinear interpolation in the upscaling module of FasterSeg student net by a depth-wise separable convolution and a Pixel Shuffle module which leads to 1.9% (1.4%) mIoU improvements on low (high) input image sizes without increasing model size. A combination of these techniques will be pursued in future works.
Variational auto-encoder(VAE) is an effective neural network architecture to disentangle a speech... more Variational auto-encoder(VAE) is an effective neural network architecture to disentangle a speech utterance into speaker identity and linguistic content latent embeddings, then generate an utterance for a target speaker from that of a source speaker. This is possible by concatenating the identity embedding of the target speaker and the content embedding of the source speaker uttering a desired sentence. In this work, we found a suitable location of VAE’s decoder to add a self-attention layer for incorporating non-local information in generating a converted utterance and hiding the source speaker’s identity. In experiments of zero-shot many-to-many voice conversion task on VCTK data set, the self-attention layer enhances speaker classification accuracy on unseen speakers by 27% while increasing the decoder parameter size by 12%. The voice quality of converted utterance degrades by merely 3% measured by the MOSNet scores. To reduce over-fitting and generalization error, we further app...
G-equations are level-set type Hamilton-Jacobi partial differential equations modeling propagatio... more G-equations are level-set type Hamilton-Jacobi partial differential equations modeling propagation of flame front along a flow velocity and a laminar velocity. In consideration of flame stretching, strain rate may be added into the laminar speed. We perform finite difference computation of G-equations with the discretized strain term being monotone with respect to one-sided spatial derivatives. Let the flow velocity be the time-periodic cellular flow (modeling Rayleigh-Bénard advection), we compute the turbulent flame speeds as the asymptotic propagation speeds from a planar initial flame front. In strain G-equation model, front propagation is enhanced by the cellular flow, and flame quenching occurs if the flow intensity is large enough. In contrast to the results in steady cellular flow, front propagation in time periodic cellular flow may be locked into certain spatial-temporal periodicity pattern, and turbulent flame speed becomes a piecewise constant function of flow intensity....
The approach combines second and fourth order statistics to perform BSS of instantaneous mixtures... more The approach combines second and fourth order statistics to perform BSS of instantaneous mixtures. It applies for any number of receivers if they are as many as sources. It is a batch algorithm that uses non-Gaussianity and stationarity of source signals. It is linear algebra based direct method, reliable and robust, though large dimensions of sources may slow down the computation significantly. It is however limited to instantaneous mixtures.
Uploads
Papers by Jack Xin