Abstract
The efficient transmission of the audio objects can be achieved by spatial audio object coding (SAOC) method that conveys a mono downmix signal together with side information parameters that enable object reconstruction in the decoder. To allow the transmission of audio objects at low bitrates, we present a new audio coding method with convolutional auto-encoder (CAE) and dense convolutional network (DenseNet) mixture model, optimizing the compression of side information parameters of audio objects. It has two main advantages: 1) Different from the linear transform methods, CAE can dig the nonlinear relationship of side information parameters and can effectively reduce the dimension of side information parameters; 2) DenseNet is adding in the decoder to make full use of low dimensional features of side information parameters, which improves the audio quality at low bitrate. Experiments show that our method outperforms base-line methods permitting bitrates as low as 1 kbps per object.