Saskia Haupt, Alexander Zeilmann, Aysel Ahadova, Hendrik Bläker, Magnus von Knebel Doeberitz, Matthias Kloor, Vincent Heuveline

Read the paperWithin this collaboration, we wanted to have a mathematical model of colorectal carcinogenesis that fulfills the following requirements:

- It offers a
**simultaneous description**of the multiple pathway nature of colorectal carcinogenesis, - it is
**modular**, meaning that we can add various driver genes, mutational dependencies and whole pathways of carcinogenesis in an easy way based on current biomedical data, - it is
**medically interpretable**, meaning that the parameters and components have a biomedical meaning, - it can be
**analyzed systematically**, meaning that we can study the influence of different components on the evolution of cancer, - it is
**computationally feasible**, meaning that even a large system can be computed quite efficiently.

For this, we have to think about how to model the individual steps of the pathways of carcinogenesis, a topic which has been of interest to the mathematical oncology community for a long time

Let’s take our example of Lynch syndrome carcinogenesis where we currently consider five possible driver genes: the classical colorectal cancer driver genes

Here, the state vector, x(t), describes the mutational states of all involved genes at each time point t. This allows us to analyze the effect of, e.g., a specific mutational dependency on the overall carcinogenesis and we can check off requirement 4 on a systematic analysis of the individual components from above. Further, each component has a medical meaning.

An example gene mutation graph is given in Figure 1. For the classical

**Figure 1**: Gene mutation graphs. These graphs represent the possible mutation states and the connections of two subsequent mutation states. This means, the vertices represent the mutations which the alleles of a gene can have accumulated. Here, \(\phi\) represents both alleles to be non-mutated for this gene, \(m\) and \(mm\) represent one respectively two alleles being hit by a point mutation, \(l\) and \(ll\) describe one (respectively two) alleles being affected by an LOH event, and ml is the mutation status where one allele is hit by a point mutation and the other by an LOH event. The edges connecting different vertices represent mutational events, whereas self-loops, i.e., edges that connect a vertex with itself, describe no mutation occurring at the current point in time. The edges are labeled by the amount of change which happens at each point in time.

As you may have noticed, we have built these mutation graphs for each gene individually instead of building one large mutation graph of all possible combinations of mutational events for all genes at once. This can be done because of the following: First, let’s consider the independent mutational events. A mutation in one gene does not affect the other mutations. This means, the mutation probabilities in one gene remain the same no matter what the mutation status in all the other genes is. You can think of this as each gene represents a coordinate axis in a Cartesian coordinate system. And this is what happens to the individual mutation graphs. They are combined using the Cartesian graph product. This huge graph now consists of all possible mutational states of all involved genes of all pathways of carcinogenesis for this type of cancer. This allows us to check off requirement 1 on describing multiple pathways of carcinogenesis simultaneously.

The graph can be written in matrix form as an adjacency matrix. And finally, here is the Kronecker structure: Combining the graphs using the Cartesian graph product leads to a large system matrix of the ODE system built by the Kronecker sum of the small adjacency matrices of the individual gene mutation graphs. This is illustrated in Figure 2.

**Figure 2**: Cartesian product of graphs corresponds to Kronecker sum of matrices. The upper row shows two small graphs (on the left) and their Cartesian graph product (on the right). Notice how each vertex (\(\alpha,\beta,\gamma\)) of the first graph is combined with each vertex (a,b,c,d) from the second graph, yielding a total of 12 (=3x4) vertices in the Cartesian graph product. The edge weights (indicated by numbers next to the edges and the edge thickness) of the graphs on the left and middle transfer to the corresponding edges in the product graph. The bottom row displays the adjacency matrices corresponding to the graphs in the upper row as an equation involving the Kronecker sum of the matrices.

Adding more genes only means that the Kronecker sum includes more adjacency matrices.

And now? What about the dependent mutational events? Again, we can use the Kronecker structure because dependency refers to the correlation of a limited number of genes or events. For example, we might want to model an increased point mutation rate of one gene after a mutational event in another gene. In our case, an increased point mutation rate of

You can see that these biological processes can be highly complex and highly correlated. However, they can be modeled in a structured way using the Kronecker structure because, and this is very important: Although there might be a mutational dependency between some genes, all the other mutational events are not affected by this dependency. And this is the trick: You build the small mutation graph for the affected gene, e.g., for the increased point mutation rate of

This illustrates that we can model quite complex mutational dependencies with a large number of involved genes and resulting pathways of carcinogenesis in a modular way by splitting this high-dimensional problem into small parts and combining them using the Kronecker structure.

In our case, the system matrix A+B+C+D+E+F is a sparse triangular matrix with only a few non-zero entries. In addition, we do not need to compute the matrix exponential as a full matrix. Instead, we only need the action of the matrix exponential on the vector \(x_0\), where a variety of efficient algorithms exist in the context of exponential integrators

Moreover, if we only consider independent mutational events, the Kronecker structure can be used to also split the matrix exponential in small components. In the case of Lynch syndrome carcinogenesis, this means we only have to compute the matrix exponential of at most 6×6 matrices instead of 1250×1250 matrices. With this, also requirement 5 on computational feasibility is satisfied and thus, all requirements are fulfilled by the presented modeling approach.

As a side note, the Kronecker structure is not only very advantageous for modeling multiple pathways of carcinogenesis. It also leads to a visually quite nice fractal matrix structure, as illustrated with the Cover Art of the Math Onco Newsletter, Week 166.

Further details on the modeling approach and how it can be applied to the three pathways of colorectal Lynch syndrome carcinogenesis are available in our recent publication in PLOS Computational Biology, found here.

- Al-Mohy, Awad H., and Nicholas J. Higham. 2011. “Computing the Action of the Matrix Exponential, with an Application to Exponential Integrators.” SIAM Journal on Scientific Computing. https://doi.org/10.1137/100788860.
- Gerstung, Moritz, Michael Baudis, Holger Moch, and Niko Beerenwinkel. 2009. “Quantifying Cancer Progression with Conjunctive Bayesian Networks.” Bioinformatics 25 (21): 2809–15.
- Komarova, Natalia L., Christoph Lengauer, Bert Vogelstein, and Martin A. Nowak. 2002. “Dynamics of Genetic Instability in Sporadic and Familial Colorectal Cancer.” Cancer Biology & Therapy 1 (6): 685–92.
- Niesen, Jitse, and Will M. Wright. 2012. “Algorithm 919.” ACM Transactions on Mathematical Software. https://doi.org/10.1145/2168773.2168781.
- Paterson, Chay, Hans Clevers, and Ivana Bozic. 2020. “Mathematical Model of Colorectal Cancer Initiation.” Proceedings of the National Academy of Sciences of the United States of America 117 (34): 20681–88.
- Tan, W. Y., and C. C. Brown. 1988. “A Nonhomogeneous Two-Stage Model of Carcinogenesis.” Mathematical and Computer Modelling. https://doi.org/10.1016/0895-7177(88)90531-6.