The Azimuth Project
Bilinear regression (changes)

Showing changes from revision #5 to #6: Added | Removed | Changed

Contents

Idea

In statistics/machine learning the individual samples often come in the form of 2-D arrays, eg, a set of population counts of different species (one axis) at different points in time (second axis). Standard regression collapses these arrays into vectors and thus loses the structure in the regression process. Bilinear regression attempts to use the array structure by using the samples as matrices.

Details

Basic model

The bilinear predictor function takes the form

(1)f(X)=tr(U TXV)+b= i=1:mu i TXv i+b f(X) = tr(U^T X V) + b = \sum_{i=1:m} u^T_i X v_i + b

Noe that, as is particularly apparent in the (u i,v i)(u_i,v_i) form, there is a freedom to move a multiplicative factor between the u iu_i columns and the matching v iv_i forms.

When performing regularised fitting the score function is

(2)E= j=1:n( i=1:mu i TX jv i+by j) 2+λ i=1:mR(u i)+R(v i) E = \sum_{j=1:n} \left(\sum_{i=1:m} u^T_i X_j v_i + b - y_j\right)^2 + \lambda \sum_{i=1:m} R(u_i) + R(v_i)

where λ\lambda is the regularization strength and R()R() is the regularization function.

The derivatives are

(3)Ev I=2 j=1:n( i=1:mu i TX jv i+by j)(u I TX j) T+λR(v I)v I \frac{\partial E}{\partial v_I}=2 \sum_{j=1:n} \left(\sum_{i=1:m} u^T_i X_j v_i + b - y_j\right) (u^T_I X_j)^T + \lambda \frac{\partial R(v_I)}{\partial v_I}
(4)=2(u I T( j=1:n( i=1:mu i TX jv i+by j)X j)) T+λR(v I)v I =2 \left( u^T_I \left( \sum_{j=1:n} \left(\sum_{i=1:m} u^T_i X_j v_i + b - y_j\right) X_j \right)\right)^T + \lambda \frac{\partial R(v_I)}{\partial v_I}
(5)=2(u I TA) T+λR(v I)v I =2 (u^T_I A)^T + \lambda \frac{\partial R(v_I)}{\partial v_I}

and

(6)Eu I=2 j=1:n( i=1:mu i TX jv i+by j)(X jv I)+λR(u I)u I \frac{\partial E}{\partial u_I}=2 \sum_{j=1:n} \left(\sum_{i=1:m} u^T_i X_j v_i + b - y_j\right) (X_j v_I) + \lambda \frac{\partial R(u_I)}{\partial u_I}
(7)=2( j=1:n( i=1:mu i TX jv i+by j)X j)v I+λR(u I)u I =2 \left( \sum_{j=1:n} \left(\sum_{i=1:m} u^T_i X_j v_i + b - y_j\right) X_j\right) v_I + \lambda \frac{\partial R(u_I)}{\partial u_I}
(8)=2Av I+λR(u I)u I =2 A v_I + \lambda \frac{\partial R(u_I)}{\partial u_I}

Finally for the constant term bb the derivative is

(9)Eb=2 j=1:n( i=1:mu i TX jv i+by j) \frac{\partial E}{\partial b}=2 \sum_{j=1:n} \left(\sum_{i=1:m} u^T_i X_j v_i + b - y_j\right)

so that given the other variables the optimal bb is

(10)b=1n j=1:n( i=1:mu i TX jv iy j) b = -\frac{1}{n} \sum_{j=1:n} \left(\sum_{i=1:m} u^T_i X_j v_i - y_j\right)

Another possibility for fitting the regression coefficients is Co-ordinate descent.

References

J. V. Shi, Y. Xu, and R. G. Baraniuk, “Sparse Bilinear Logistic Regression”, Submitted to Journal of Machine Learning Research, 2014, Apr, Submitted.