EMBL Centre for Network Analysis workshop 9-11 December 2015

Data integration and mining with graphs:

Tensor factorizations for dynamic graphs

Jean-Karim Hériché

Ellenberg lab

Cell Biology and Biophysics unit

Summary

As gene networks become available for multiple time points and experimental conditions, there's a need for principled ways of detecting patterns and their evolution in such dynamic graphs. We will introduce a tensor representation of dynamic graphs and tensor factorization methods as ways of investigating patterns in these graphs.

Motivation

Many biological processes are driven by systems that evolve over time. New technologies make the dynamics of such systems more accessible with increasingly higher time resolution. In particular, data on gene networks can be acquired for multiple time points and experimental conditions. Detecting patterns and how they evolve over time and/or across experimental conditions in such multidimensional data is becoming a common task. For example, one may have performed AP-MS experiments of several proteins at different cell cycle or developmental stages and be interested in identifying protein complexes and how they change between time points.

Dynamic graphs

- What is a dynamic graph ?

A dynamic graph can be seen as multiple instances of a graph whose edges change between two instances. This definition includes time-varying graphs whose edges change as a function of time but also multigraphs (graphs in which nodes are connected with edges of different types) if you segregate each edge type into a separate instance of the graph. Examples of dynamic graphs are protein interaction networks and gene co-expression networks obtained at different cell cycle or developmental stages or under different experimental conditions (e.g. control vs treatment vs disease). With this definition, a dynamic graph can therefore be represented as a series of adjacency matrices.

- Different approaches to mining dynamic graphs

It is common to treat dynamic graphs in one of three ways:

•Aggregation:

All instances are combined into one, for example by using a weighted average of the edge weights across instances. This has the advantage of reducing the size of the data and can be a way of addressing changes in individual edges over long period of times. This is also a good approach to take when assuming that the instances are not different from each other (as, for example, in the kernel-based integration approach seen earlier). However, this approach could reveal patterns that don't exist if nodes belong to different clusters in different instances and any temporal pattern or correlation is lost.

•Splitting:

In this case, each graph instance is treated in isolation. Patterns found in each instance are then linked and analysed in post-processing steps. In addition to the difficulty in devising adequate post-processing steps, the problem here is that there is no guarantee that a structure inferred in one instance is the same in another.

•Evolutionary clustering:

This approach assumes that there's continuity between instances, i.e. structures do not change dramatically between instances. Therefore, one can constrain analysis of one instance on the analysis result from another instance or analyse two aggregated consecutive instances. This approach can work in some cases but the continuity assumption is generally not realistic. Abrupt changes in patterns will make it fail and it is unsuitable for analysis of temporal patterns over long periods.

All the above treat time or experimental conditions separately from the nodes. A global approach treating time/conditions and nodes simultaneously would better capture evolving patterns. One of the first such approach did so by extending the modularity coefficient [1] to multiple instances of a graph [2,3] by defining links between graph instances. Here, we'll introduce another global approach which treats genes, time and conditions as multiple dimensions of the data.

Tensors

- What is a tensor ?

There are different definitions of tensors used in different fields but they all come down to the same concept that tensors are a generalization of scalars, vectors and matrices: a scalar is a rank zero tensor, a vector is a rank one tensor, a matrix a rank two tensor... Therefore for most practical purposes, a tensor is a multidimensional array. It's easy to see that a dynamic graph represented as a series of adjacency matrices is a three-dimensional tensor.

- Terminology and notation:

Although the notation and terminology are not fully standardized, many papers now follow the conventions in [4] which we also adopt here. A tensor will be denoted using underlined bold face e.g. X to distinguish it from the matrix X.

The order of a tensor is its number of dimensions also called modes (or sometimes ways). The equivalent of matrix rows and columns are fibers and when used as vectors, they are assumed to be column vectors. Slices are 2D sections of a tensor. See Figure 1 for a graphical representation of these terms for third-order tensors.

Figure 1: Tensor terminology

- Working with tensors:

Most of the tasks we're interested in rely on transforming tensors into matrices using a process called matricization or unfolding. Matricization consists in rearranging the elements of the tensor into a matrix. The most commonly used way of doing this is called mode-n matricization which takes the mode-n fibers of the tensors and arranges them as columns of a matrix (Figure 2). The mode-n unfolding of Z is denoted Z(n). The order of columns is not important as long as it is consistent across calculations. Sometimes it can be useful to represent a tensor as a vector, in a process called vectorization. Ordering of the elements in the vector is also not important as long as it is consistent.

Although tensors can be multiplied together, we will only need the mode-n multiplication of a tensor with a matrix in which the mode-n fibers of the tensor are multiplied by the matrix. This is written Z = Y xn V and can be expressed in matricized form: Z(n) = VY(n). See Figure 3 for an intuitive graphical example.

Working with matricized tensors also makes use of various matrix products: the Kronecker product, the Khatri-Rao product and the Hadamard products. See paragraphs 1 and 2 of [5] and chapter 1.4 of [6] for a more detailed introduction to tensor algebra.

Figure 2: Matricization of a tensor

Figure 3: Mode-n multiplication with a matrix

Graph nodes clustering by matrix factorization

Nodes of static graphs can be clustered by applying clustering methods to the adjacency matrix. Among these methods, matrix factorization approaches have already been successfully applied to some computational biology questions (see [7] for a review on non-negative matrix factorization (NMF) in computational biology, [8,9] for examples of spectral clustering). Spectral clustering and NMF solve a problem of the form: A = W * H + E. A more general form of factorization is the singular value decomposition (SVD): A = U * S * VT where columns of U and VT are constrained to be orthogonal. SVD corresponds to the decomposition of the matrix into the sum of outer products of vectors (Figure 4)

Figure 4: Singular value decomposition

Tensor factorization

Tensor factorization can be seen as a natural extension of SVD to an arbitrary number of dimensions (Figure 5).

Figure 5: A higher order SVD

The general form is the Tucker decomposition which factorizes a tensor into a small core tensor and a set of matrices (See Figure 6). See [5] for a general introduction and [6] for non-negative factorizations.

The Tucker decomposition can be written in terms of matrix unfolding of the tensor X (of size I x J x K):

X(1) ≈ AG(1)(C ⊗ B)T

X(2) ≈ BG(2)(C ⊗ A)T

X(3) ≈ CG(3)(B ⊗ A)T

where ⊗ is the Kronecker product and A, B and C are factor matrices of size I x R, J x S and K x T respectively where R,S and T are the numbers of factors desired for each dimension. The factor matrices are usually constrained to have orthogonal columns.

Like PCA and other matrix decompositions, the Tucker decomposition is not unique as the core can be modified without affecting the quality of the solution as long as the inverse modification is applied to the matrices. Restricting the core tensor to a diagonal gives the CP (or PARAFAC) decomposition (Figure 7, which is equivalent to Figure 5) in which the same R factors are shared across the matrices. The CP decomposition can also be expressed in terms of matrices:

X(1) ≈ A(C ⊙ B)T

X(2) ≈ B(C ⊙ A)T

X(3) ≈ C(B ⊙ A)T

where ⊙ is the Khatri-Rao product. A form of non-negative tensor factorisation is obtained by constraining A, B and C to have non-negative entries.

Figure 6: Tucker decomposition of a third order tensor

Figure 7: The CP decomposition

Non-negative tensor factorisation of dynamic graphs

This can be thought of as an extension of spectral clustering to more than one adjacency matrix. Our dynamic graph is represented by a third order tensor G of genes x genes x time points (or experimental conditions) in which each frontal slice is the genes x genes adjacency matrix of a weighted undirected graph. The edge weights are also assumed to be non-negative (≥ 0). Using a non-negative CP decomposition, we can factorize G into non-negative factor matrices A, B and C. Because the graph is undirected, the adjacency matrices are symmetric which leads to A = B. A is a genes x factors matrix and C is a factors x time points matrix. Therefore, if we think of the factors as representing clusters, elements of A can be interpreted as gene cluster memberships and elements of C as measuring the temporal activity of the clusters.

Tools

- MATLAB toolboxes: tensor, nway (compatible with Octave)

- R packages*: PTak, ThreeWay, rTensor

- perl: Algorithms::Cube (https://github.com/jkh1/Perlstuff)

- python*: scikit-tensor (https://github.com/mnick/scikit-tensor)

* non-negativity constraint is not available in R and python packages (December 2015).

Example in R

## Load required package

library(rTensor)

## Where the data is

## This is a simulated dynamic graph with 30 nodes and 6 time points

dataDir <- "~/Documents/Bio-IT/courses/Data_integration_and_mining_with_graphs/dynamic_graph"

## Get list of adjacency matrices

## Each matrix is in a csv file

fileList <- dir(path=dataDir,pattern = ".csv")

## Read all adjacency matrices into an array

A <- array(as.numeric(NA),dim=c(30,30,6)) # There are 6 matrices of size 30 x 30

for (i in 1:length(fileList)){

A[,,i] <- as.matrix(read.delim(file.path(dataDir,fileList[i]), sep = ';', header=TRUE, row.names=1))

}

## Convert list of adjacency matrices to a tensor

G <- as.tensor(A)

## Perform CP factorization

cpG <- cp(G,num_components = 4)

## View the factor matrices

## Note that the first two are identical (up to numerical accuracy)

## due to the fact that the graph adjacency matrices are symmetric

cpG$U

Example in perl

#!/usr/bin/perl

use strict;

use warnings;

use Algorithms::Matrix;

use Algorithms::Cube;

# Where the data is

my $dataDir = "$ENV{'HOME'}/Documents/Bio-IT/courses/Data_integration_and_mining_with_graphs/dynamic_graph";

# Number of time points

my $T = 6;

# Read dynamic graph from files as series of adjacency matrices

my @G;

foreach my $t(0..$T-1) {

my ($rowLabels,$colLabels);

($G[$t], $rowLabels, $colLabels) =

Algorithms::Matrix->load_matrix("$dataDir/A$t.csv",";",1,1);

}

# Create Cube (i.e. third order tensor) object from matrices

my $G = Algorithms::Cube->new(@G);

# CP factorization with non-negativity constraint

my $r = 4; # Number of components

my ($A,$B,$C,@norms) = $G->nncp($r,symmetry=>1,norms=>1);

References:

1. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113.

2. Mucha PJ, Richardson T, Macon K, Porter MA, Onnela JP (2010) Community structure in timedependent, multiscale, and multiplex networks. Science 328: 876–878.

3. Bassett DS, Porter MA, Wymbs NF, Grafton ST, Carlson JM, et al. (2013) Robust detection of dynamic community structure in networks. Chaos 23: 013142.

4. Kiers HAL (2000) http://www.wiley.com/legacy/wileychi/chemometrics/582_ftp.pdf. J. Chemometrics 14: 105-122.

5. Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev., 51(3), 455–500.

6. Cichocki A, Zdunek R, Phan AH, Amari SI (2009) Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons.

7. Devarajan K (2008) Nonnegative matrix factorization: an analytical and interpretive tool in computational biology. PLoS Comput Biol 4(7):e1000029.

8. Kluger Y, Basri R, Chang JT, et al. (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 13(4):703–16.

9. Imangaliyev S, Keijser B, Crielaard W, Tsivtsivadze E (2015) Personalized microbial network inference via co-regularized spectral clustering. Methods 83:28-35.