One of the purposes of this blog is to start conversations about ideas before they are fully matured and ready to be published. While I love to see discussions of recently released preprints or published papers, I think blogs are uniquely suited for conversations about ideas and potential projects, even (or especially) at the earlier stages.

In this post I will be talking about Evolutionary Computing (EC), Genetic Algorithms (GAs), Genetic Programming (GP), Cellular Automata (CA), Machine Learning (ML) and High Performance Computing (HPC).

Machine learning (ML) approaches are proving themselves invaluable in using the increasingly abundant longitudinal clinical data to find patterns to help make predictions and suggest treatment schedules. The one critique that has been consistently leveled on ML-based approaches is that it is a black box approach where the resulting algorithm capable of, potentially, outstanding predictions is inscrutable. Various efforts are being carried out to make that black box more translucent and I would like to share one based on my experience.

I came to mathematical oncology after doing a PhD in evolutionary computing. My thesis work allowed me to try to figure out ways for a Cellular Automaton (CA)-based agent-based model to be created so that it would behave and produce spatial patterns of my choice). Later we used this study, and similar ones, to investigate, from a high-level perspective, how this could apply to cancer (see here and here). While most people think of neural networks when they think about ML, there is a growing subfield that uses evolutionary algorithms such as genetic algorithms (GAs) instead. What we got from these GA-evolved CAs was what we wanted, but the CA rules that create those interesting results? Surely more meaningful than the weights in the connections of a neural network but good luck trying to decipher them!

A more structured approach would constrain the rules the GA is allowed to evolve. In ongoing work, Etienne Baratchart has been developing an ODE model that aims to capture bone injury repair dynamics where much of the biology is not yet known but where, together with collaborators at the Lynch lab he has access to data describing key cellular population dynamics over time. Hypothesizing how each population could influence each other is hard work but existing literature has allowed him to narrow down the list of hypotheses to a manageable 18. This is in contrast with most mathematical oncology models where typically only one hypothesis is considered.

Genetic programming, is a subfield in evolutionary computing where the goal is not to evolve solutions to a problem (often an optimization one) but to actually evolve algorithms to solve problems. Whereas in a typical GA one would evolve solutions to optimize the output in a system, one could use GP to evolve the system itself. In that light, the use of an evolutionary algorithm to produce the rules of a CA could be considered at GP.

Thus, one could imagine an unbiased approach where several different hypotheses are tested, combined and constrained by experimental or clinical data. The number of hypotheses could be determined by a combination of what is present existing literature (if most studies suggest only one hypothesis, then it might not be worth speculating about the role of other, less solid ones) and data to test them. The key thing is that this process of combining hypotheses could be led by GP which would evolve mathematical models based on reported cellular mechanisms that are put together in a way where they explain existing data.

A badly sketched outline can be seen in the feature figure below. Different hypotheses are collected and the GP uses them as building blocks for comprehensive models selecting for those models that better explain/fit experimental or clinical data. The top models generated can be then used to make novel predictions for which new data will have to be generated allowing us to differentiate between the best GP-generated ones.

This workflow can easily work in conjunction with simpler modeling tools such as ODEs, but what about more complex ones like agent-based? Many of my models are agent-based and recent frameworks such as HAL and PhysiCell) make it easier for people, not only to delve into this modeling technique, but also for these models to fully utilize high performance computing. For instance, CAs are highly parallelizable and many academic institutions have HPC facilities. Frameworks that facilitate the use of these HPC clusters together with agent-based models built with evolvability in mind could allow for GP approaches that lead to mechanistic multiscale understanding of oncological processes.

This is not meant to be a precise roadmap but, hopefully, the start of a conversation about how evolutionary algorithms could help us utilize new resources (data and high-performance computing) to take mathematical and computational oncology to the next level with models that can both explain and predict.