Blind Audio Source Separation Using Independent Component Analysis and Independent Vector Analysis

Blind Source Separation (BSS) is one of the most important and challenging problem for the researchers in audio and speech processing area. In the literature, many different methods have been proposed to solve BSS problem. In this study, we have compared the performance of three popular BSS methods based on Independent Component Analysis (ICA) and Independent Vector Analysis Models, which are Fast-ICA, Kernel-ICA and Fast-IVA. We collected experimental data by recording speech from 13 people. Three different scenarios are proposed to compare the performance of BSS methods effectively. Experimental results show that the Fast-IVA has better performance than the ICA based methods according to performance metrics of Source-to-Artifact Ratio, Source-to-Distortion Ratio and Source-to-Noise Ratio. But ICA methods give better results than Fast-IVA according to the Source-to-Interference Ratio.


Introduction
Blind Source Separation (BSS) is one of the most important problems in speech processing area.The better description of this problem can be represented by this question: How can we accurately determine what a particular person talks among several speakers at the same time?. Figure 1 shows an illustration of BSS problem.This problem describes the situation of focusing on one speaker in case of several persons talking simultaneously in same room.To separate the mixed speech signals to obtain just a speech signal which belongs to a particular speaker is very challenging and complicated problem [1].In the literature, many different methods based on signal processing and statistics were proposed to solve BSS problem.BSS was firstly addressed by Herault and Jutten in 1985 [2].In their work, the sound is directly transmitted to the microphones without any delay which known as standard blind source separation.Then Bell and Sejnowski in 1995 developed the Independent Component Analysis (ICA) method to solve BSS problem when the sources are mixed simultaneously [3].Also some different algorithms based on ICA such as Fast-Fixed Point ICA [4], the Jade-ICA [5], the EGLD-ICA, the MS-ICA [6], and the Kernel-ICA [7] were proposed in literature.
As a result, BSS problem becomes more complicated for real room environment and this speech propagation problem is called convolutive blind source separation (CBSS) [8].In the literature, some solutions were proposed in the time domain.Due to the complicated calculation caused by convolution, Parra et al [15] suggested another method based on frequency domain.In frequency domain, the convolution is replaced with multiplication to have low cost in terms of execution time.However, the frequency based methods still have scaling and permutation ambiguities.To prevent permutation problem, an advanced method named Independent Vector Analysis (IVA) was proposed by Kim et al [9].In this study we have compared the performance of two ICA based algorithms and Fast-IVA according to performance measurement metrics commonly used for BSS problem.This paper is organized as follow: Section 2 contains the standard ICA method and its properties, ICA based algorithms (Fast-ICA and Kernel-ICA).Section 3 explains details of Fast Fixed Point IVA algorithm.Section 4 gives brief information about the commonly used performance measurement metrics.Experimental results obtained by using different proposed scenarios are represented in Section 5. Finally the conclusion is presented in the last section.

Independent Component Analysis
Independent Component Analysis (ICA) is one of the most popular BSS methods.ICA was used extensively for many applications in various field of science and engineering.ICA, which is a statistical computational method, was employed to find underlying hidden factors among set of random vectors.The main aim of ICA method is to obtain the independent components (ICs), which are linearly independent or as independent as possible, by finding a linear representation of non-Gaussian data.obtained.

ICA Model
Suppose that we have two persons talking simultaneously in a room and two microphones were placed in different places for recording their talking.In this way, two speech signals are recorded by these microphones.These signals can be represented as x 1 (t) and x 2 (t) where (t) is the time index.The signals from the speakers can also be represented as s 1 (t), s 2 (t), so the linear combination of the speech signals can be expressed as: Where  1 and  2 are the original speech signals,  11 ;  12 ;  21 and  22 denote the parameters that depend on the distance between the speakers and the microphones.Assume that M is the number of observed mixture signals and L is the number of independent source signals.Then the model will be as follow: Since instead of summations like in (2),using vector-matrix notation is more suitable to, the ICA model can be rewritten as: Where A is an unknown matrix called the mixing matrix, x is a vector of the observed signals and s is a vector of the source signals.The challenge in the method is the estimation of both A and s by using only the observed random vector x.It is assumed that the unknown mixing matrix should be invertible or pseudo-invertible.After the matrix A estimation, its inverse denoted by W can be computed.As a result ICs denoted by y is obtained simply by using following equation.
ICA requires some assumptions related to the sources and the mixing process.These assumptions make this method different from the other source separation approaches [10].The first assumption is that the sources being considered are statistically independent.The other one is that the sources must have non-Gaussian distribution.The last one is to have invertible mixing matrix [10].Two inherent ambiguities are hold in ICA framework.One of them is magnitude and scaling ambiguity and the other is the permutation ambiguity [10].Some preprocessing steps can be performed to improve the performance of ICA based methods.Some useful preprocessing techniques are Centering and Whitening [10].There are several kinds of algorithm based on ICA.In this study we have performed two of them, which are widely used.

Kernel Independent Component Analysis
The Kernel Independent Component Analysis (Kernel-ICA) is a different version of ICA model that based on the minimization of a contrast function based on kernel ideas.Kernel-ICA rely on an entire function space of candidate nonlinearities.In particular, Kernel-ICA works with the functions in a reproducing kernel Hilbert space using canonical correlation based functions.The Kernel trick is used to make the search over this space efficiently.Some new modifications were proposed to make the algorithm more robust and efficient to different source distributions [7].In short, Kernel-ICA use optimization methods for canonical correlations to reproduce kernel Hilbert space.

Independent Vector Analysis
Independent Vector Analysis (IVA), which is one of the most advanced method, show better performance in the field of BSS.(𝑘);  2 ();   ()]  denotes the original source vector in the frequency domain and Ŝ() = [Ŝ 1 (); Ŝ 2 ( ); Ŝ  ()]  represents the estimated source vector in the frequency domain.And (. )  is the vector transpose.The number of microphones and the number of sources are represented by m and n, respectively.H (k), which is a m n dimension matrix, denotes the mixing matrix, and W(k), which is a n m dimension matrix, denotes the unmixing matrix.In this study, we assume that the number of sources and the number of microphones should be the same.The main goal is that the sources should be estimated by using only the observed (mixed) signals.An objective function is defined to separate multivariate sources from multivariate observations.Kullback-Leibler divergence between two functions as the measure of dependence is employed in IVA.These two functions are the joint probability density function  = Ŝ 1 , … .Ŝ  and the product of probability density functions of the individual source vectors Πq(Ŝ i ).This function can be defined as follow: We can keep the dependency between the components of each vector, and remove the dependency between the source vectors if the cost function is minimized [11].In literature, there are different version of IVA such as NG-IVA, Fast-IVA and Aux-IVA [11].In this study Fast-IVA algorithm is employed for BSS.

Fast Fixed-Point Independent Vector Analysis
This algorithm utilizes Newton's method to update the original IVA method, which converges quadratically and select an efficient learning rate.In order to apply Newton's method in the update rules, polynomial approximation of a quadratic Taylor series is produced in the notations of complex variables.In this way, it can be used for a contrast function of complex-valued variables [11].The contrast function used by Fast IVA is as follows: Where,  is the ith Lagrange multiplier, and w(i) denotes the ith row of the unmixing matrix W, G (.) is the nonlinearity function, which can take on several different forms as discussed in [11].
The learning rule can be defined as follow with normalization: where  ′ ( ) and G ′′ ( ) represent the first and second derivative of G( ), respectively.And if we use that for all sources, we can construct an unmixing matrix W(k) to be decorrelated with

Performance Measurement
There are several performance measurement metrics to evaluate the quality of estimated signals obtained by BSS methods.The performance of BSS algorithms can be measured by comparing each estimated source ŝj to a given true source sj.The measuring processing includes two successive steps [12].The first step involves decompose ŝj as: ŝ  =   +   +   +   (11) Where   =  (s j ) denotes the version of s j modified by an allowed distortion, and the interferences, noise, and artifacts error terms are represented by s interf , s noise and s artif respectively.The second step involves computing the energy ratios in order to estimate the relative amount of each of these four terms either on the local frames of the signal or the whole signal duration.The way of decomposing into four terms are given in [12] in detail.
Relevant energy ratios between these terms are defined.
After the decomposition of ŝ  following the procedures given in

Experimental Results
The performance of Fast-ICA, Kernel-ICA and Fast-IVA methods for separating mixing speech signals was compared.We collected 13 speech signals recorded in real room, each record long 10sec in Arabic language with 16 kHz.These records are mixed by using random parameters.In our experiment, three different scenarios are proposed to compare performance more effectively.The first scenario includes measuring and comparing the performance of Fast-ICA, Kernel-ICA and Fast-IVA for separating mixing speech signals without noise, as shown in Figure 2. Figure 3 illustrates the second scenario which shows the performance of these methods for separating mixed speech signals with Gaussian noise added to signals before mixing.In the third scenario, we add Gaussian noise to the signals after mixing as shown in Figure 4. Since Gaussian noise is added to the sources or mixtures in the second and third scenarios, Savitzky-Golay smoothing filter [13] is performed to enhance the signals before the separation.

Conclusion
Blind source separation (BSS) is one of the most important and challenging problem for the researchers in audio and speech processing fields.In this study, we have implemented and compared three popular BSS methods, which are Fast-ICA, Kernel-ICA and Fast-IVA.Three different scenarios were proposed to measure the performance of BSS methods extensively using four different commonly used performance metrics.According to experimental results Fast-IVA shows better performance than the other methods according to SDR, SAR and SNR metrics.But Fast-ICA and Kernel-ICA methods have high performance than Fast-IVA according to SIR metric.In future, some advanced BSS algorithms will be implemented and compared using different scenarios.

Figure 2 .Figure 3 .
Figure 2. Illustration of Scenario 1 for the experiment

Figure 4 .
Figure 4. Illustration of Scenario 3 for the experiment Theoretically this method overcomes the permutation problem that inherent from ICA.It is designed to remove the dependency between different source vectors while keeping the dependency within individual source vectors.The problem is transformed into frequency domain to reduce computational complexity of the time domain.The noise free model in the frequency domain can be defined as follow: ||  +   +  ||2

Table 2 and
[9]]e3.In this study, we have implemented Fast-ICA algorithm in MATLAB.A different version of Fast-ICA algorithm can be downloaded from http://research.ics.aalto.fi/ica/fastica/code/dlcode.shtml.For the other BSS methods (Kernel-Ica and Fast-IVA), we have used the shared codes.The kernel-ICA package is Copyright (c) 2002 by Francis Bach[14], and Fast-IVA code by Taesu Kim, recently revised at Nov. 2, 2005[9].The experiments were performed in Matlab ver.8.1.0.604 (R2013a).The algorithms are also compared in terms of execution time.The time that needed for running the algorithms was also given in Table4. 1,

Table 1 .
Experimental result for the first scenario

Table 2 .
Experimental result for the second scenario

Table 3 .
Experimental result for the Third scenario

Table 4 .
Execution time for the algorithm