The purpose of this blog post is to explain the integration of two tools: HAL and EvoFreq. HAL (Hybrid Automata Library) is an agent-based modeling framework. EvoFreq is an open-source comprehensive and flexible R package for the visualization of evolutionary and population frequency dynamics of clones over time.
Changes in genotype frequencies are often visualized using Muller plots, wherein each polygon represents a genotype (or clone), while the thickness of each band indicates either the number of individuals with that genotype. Here's an example:
There has been an influx of publications using various modeling approaches to combine agent-based models (especially spatially-explicit models) with phylogenetic measurements. Recent examples include: quantification of spatial tumor sampling, the influence of tumor architecture, selection due to space, normal tissue turnover, and more.
The Anderson lab at Moffitt Cancer Center has developed HAL and EvoFreq in order to facilitate a straight-forward pipeline of modeling and visualizing simulated tumor evolution. HAL can be used to design an agent-based model and generate a clonal lineage tree. This lineage is automatically output in an EvoFreq-friendly format.
Download the HAL and EvoFreq code and follow along here.
In order to track lineages in HAL, all you need to do is 5 simple steps:
Define the "Clone" class's initialization & attributes.
Create a seed/ancestor clone.
Increment Clone sizes during birth.
Decrement Clone sizes during death.
Create new clone after a mutation.
A simple birth-death model
The first thing you should notice in the repository is the "Model" folder, which has a simple birth-death agent-based model written in Java using the HAL framework. In order to run this code base, you'll need to download the latest version of Java and an editor (we suggest using IntelliJ Idea). Detailed instructions to set up HAL in IntelliJ are posted on the GitHub repository.
In this simple birth-death process, each cell inherits the number of driver mutations and passenger mutations from its parent cell (mutations allowed). Cells divide faster with each subsequent driver mutation. Cells may also acquire (neutral) passenger mutations. Each clone is color-coded by the number driver mutations it inherits. Here's an example simulation:
Tumor Evolution Simulation:
Next, I'll explain how to track all the lineage information from this simulation and plot it in a Muller diagram, similar to the first figure, above. Our HAL implementation has one type of "grid" (a 2-dimensional on-lattice grid) and one type of "agent" (the cancer "Cell" class).
Each cell has a "clone" attribute, which facilitates lineage tracking. The Clone class stores all the clone-specific information: the number of driver mutations ("kd"), the number of passenger mutations ("kp"), and the clone-specific color scheme (color).
When we initialize the agent-based model, we place a 20 by 20 square of cells in the center of the domain. The ancestry of these cells (and each subsequent cell) are tracked by HAL's lineage tracking class. To begin, we set a common ancestor clone: clone0. The model's constructor looks as follows:
Recording clonal populations over time
Inside the main function of our HAL code is a "for" loop which iterates the birth-death process for each cell (this is inside the "Step()" function). Every few time steps I save another frame of the simulation into a GIF, and I save the clonal populations by calling the "RecordClones()" function:
Updating clonal lineage during birth & death
If a mutation has occurred, we need to update several pieces of information. First, we decrement the population of the original clone (DecPop). Next, we create a new Clone with self as the parent, indicating the number of drivers & passengers. Lastly, we increment the population of the newest clone (IncPop).
Please note that IncPop is also called every time a cell is initialized (the "Init()" function in Cell.java) and DecPop is called every time a cell dies (the "Death()" function in Cell.java).
At the end of the main() function, the clonal lineage tree is output in a convenient format for EvoFreq. The following lines facilitate this:
The "OutputClonesToCSV()" function is where the tree is built and recorded. This function is a member of the "Clone" class, meaning it will build a tree from an arbitrary ancestor clone, but here we want the full tumor history so we call it on "clone0," the original initialization clone seed.
We can also output some clone-specific attributes, which may be useful later. In fact, we can output the same clone-specific color used to visualize in HAL, so that our Muller plots match our color scheme! We need to read in a String array of clonal attribute names, plus an inline function to generate the attributes (reads in a clone, outputs an array of attributes). To do so, I'm calling "GetAttributes()." We also ignore any clones that never reach above a threshold value.
The output will look something like this:
The first line are the column headers. There are 3 clone-specific attributes: Drivers, Passengers, and Color. The rest of the columns are a unique clonal id, the corresponding parent's id, and a column for each time point (to store population sizes).
Read in EvoFreq
After running this example in HAL, there should be a file in the "data-output" folder called "phylogeny_tracker.csv." This is the file we will read into EvoFreq. Since EvoFreq is an R package, you'll need to open the "using_evo_freq.R" script in RStudio (or something similar). Here's the script:
The first few lines install EvoFreq. Next you'll use the function "read.HAL" and pass in the correct directory where you CSV is stored. EvoFreq can plot a variety of formats including Muller plots, dendrograms, and even videos. With this line, "fill_value = hal_info$attributes$Color", I ensure that my color scheme is identical to HAL's color scheme for coloring clones. See below:
To adapt this code to suit your own purposes, you'll need to re-write the Clone class to include all the clonal-specific attributes you desire. Next, you'll implement the birth-death scheme that you want, paying close attention to when you need to increment and decrement clonal populations. Lastly, you'll call "RecordClones" at each timestep, in order to save the information over time. To output in an EvoFreq friendly format, call the "PopRecordToCSV" function, and use "read.HAL" in EvoFreq. Again, the link to download all the code used in this post is here.
While here we focus on EvoFreq's use integrated within a modeling framework, it's also fully capable of handling data output from common subclonal reconstruction software such as CALDER, PhyloWGS, and ClonEvol. EvoFreq is also a great tool for visualizing evolutionary studies where barcoding methodologies are employed to track populations over time. Read more in the preprint, here.