Final Project Presentations

Wednesday, 27 April

Monday, 2 May

Project Presentations Schedule

Project Presentations

Each team will have an opportunity to present your project to the class on either Wednesday, 27 April or Monday, 2 May. You will have 5 minutes maximum for your presentation (which will need to be enforced strictly to be able to complete all the presentations within the schedule class period). It is fine if your presentation is shorter (which might even allow time for questions), but you should keep the time limit in mind in planning your presentation.

Teams presenting should submit your slides as either a PDF file, a link to a (publicly-visible) URL, or a PowerPoint file by sending a slack message in your team channel to me no later than 10:29am on the date of your presentation.

Wednesday, 27 April

  • Emily Buckley, Investigating Heart Disease in Cavalier King Charles Spaniels
  • Joshita Gullanki, Sindhu Mente, Shruthi Nyshadham, The Role of Computational Biology in Prenatal Testing
  • Anna Williamson, An Introduction to Connectomics
  • Joshua Devine, Ian Switzer, Ronith Ranjan, New DNA/RNA/Amino Acid File Formats with Biopython Support and a Standalone Python Library
  • Davis Garwood, Kevin Wen, Identifying Genes in DNA Sequences
  • Mohit Srivastav, Trying to re-create the fractal evolution of gene promoter networks using aggregation
  • Meghan Anderson, Kathia Crawford, Izzy Shehan, Visualizing and Understanding Olfaction
  • Marvin Cheng, Computer Vision and Human Vision
  • Brenna Courtney, On Christian Bök’s The Xenotext (Book 1)
  • Shreyas Gullapalli, Zachary Heidel, Nikhil Aluru, An Analysis of COVID Variants
  • Alyce Hong, Ife Adetunji, Faisal Refai, David Kim, Eugene Lee, Rachel Lee, Synthesizing Covid Test Information
  • Yanjin Chen, Gene network analysis
  • Caroline Linkous, Taylor Brooks, GAPDH Primer Selection: Finding the best primers for targeting the GAPDH gene
  • Medhini Rachamallu, Anna Brower, Creation of Synthetic Patient Data
  • Sion Kim, Sequencing my own genome

Monday, 2 May

  • Jason Calem, Will Pemble, Gabriel Silliman, Cooper Scher, DNA Profiling with Incomplete Databases
  • Lily Roark, Allison Branch, Toy CODIS: Loci Variation, Matching and Encryption
  • Yuchen Sun, Nafisa Amrula, SARS-CoV-2 Sequence Analysis
  • Harshita Pathipati, Noor Rafiq, Scientific Exploration of Popular American COVID-19 Vaccines and Other Novel Vaccines
  • Ho Yeon Jeong, Pawan Jayakumar, A Survey of CRISPR and its Newest Variations and Applications
  • Sid Chauhan, Aging and Reversing Aging
  • Justin Ngo, Emily Franklin, The Ethics of Gene Editing on Intellectual Disability
  • Riley Heck, Trophic web modeling using weighted directed graphs
  • Raymond Wen, DIY DNA Extraction and Education
  • Neil Phan, Alip Arslan, Emil Diaz, Evolve
  • Zachery Boner, Jason Yu, Grant Matteo, SmartSleep
  • Tatiana Kennedy, DNA Encoded Library Enumeration
  • Jacob Hilliard, MRI Data Pipeline Builder
  • Ethan Gahm, Peptide Sequencing: A New Project for Computational Biology Students
  • Colin Crowe, The Mystery of Chargaff’s Second Parity Rule

Class 24: Protein Folding and AlphaFold


Project Update: due tomorrow, Tuesday, 19 April, 4:59pm.

Submission Form


Slides: class24.pdf

  • Ken A. Dill, S. Banu Ozkan, M. Scott Shell, and Thomas R. Weikl. The Protein Folding Problem. Annual Review of Biophysics, 2008.

  • Bonnie Berger and Tom Leighton. Protein Folding in the Hydrophobic-Hydrophilic Model is NP-Complete. RECOMB 1998.

  • Scott Aaronson, NP-complete Problems and Physical Reality.

  • Andrew W. Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green, Chongli Qin, Augustin Žídek, Alexander W. R. Nelson, Alex Bridgland, Hugo Penedones, Stig Petersen, Karen Simonyan, Steve Crossan, Pushmeet Kohli, David T. Jones, David Silver, Koray Kavukcuoglu, and Demis Hassabis. Improved protein structure prediction using potentials from deep learning. January 2020.

  • John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, and Demis Hassabis. Highly accurate protein structure prediction with AlphaFold. July 2021.

Class 23: Programmable Pharmacueticals


Project Update: due Tuesday, 19 April, 4:59pm. An update on how your project is going, any changes from your original plans, and summary of what progress you have been able to make. (A link to a form for submitting this will be posted on April 18.)


Slides: class23.pdf

Causes of Death

There was a great question about what fraction of all deaths are accounted for in the statistics I showed from Our World In Data. The source of most of the data is the Global Burden of Disease Study (mostly funded by the Gates Foundation). My understanding is that the statistics do attempt to assign a cause of death to every death, but are using a lot of imputation to do it. According to Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019,

Input data were extracted from censuses, household surveys, civil registration and vital statistics, disease registries, health service use, air pollution monitors, satellite imaging, disease notifications, and other sources. Cause-specific death rates and cause fractions were calculated using the Cause of Death Ensemble model and spatiotemporal Gaussian process regression. Cause-specific deaths were adjusted to match the total all-cause deaths calculated as part of the GBD population, fertility, and mortality estimates.

They make many adjustments and corrections to the reported data to account for all sorts of biases in it (full details in this 1813 page Appendix 1, and available source code. This should, of course, raise concerns about how closely the statistics used match the reality, but it is the best data we have.

For specific countries such as the US, there is more standard and perhaps more carefully collected data. You can see the CDC form doctors in the US fill our for cause of death: Instructions for Completing the Cause-of-Death Section of the Death Certificate. It does require at least one cause of death to be filled in, and discourages use of anything like “natural causes” as the cause of death:

The elderly decedent should have a clear and distinct etiological sequence for cause of death, if possible. Terms such as senescence, infirmity, old age, and advanced age have little value for public health or medical research.

The “manner of death” includes Natural as an option (with the others being Accident, Suicide, Homicide, Pending investigation, and “Could not be determined”.

If you’re interested in exploring causes of death more, this is a (morbidly) great site (based on the CDC data): How Will I Die?.

Logic goes in vitro:

Advances in Applications of Molecular Logic Gates:

Class 22: Implementing DNA Storage


Project Update: due Tuesday, 19 April, 4:59pm. An update on how your project is going, any changes from your original plans, and summary of what progress you have been able to make. (A link to a form for submitting this will be posted on April 18.)


Slides: class22.pdf