DNA Design: CAD of Gene Expression Systems
We are developing DNA design strategies relying on design rules that accelerate the development of expression vectors and reduce costly errors.
Most bioinformatics software packages include sequence editors that facilitate the design and assembly of new DNA sequences. Automatic recognition of sequence features, identification of restriction sites, and tools to add sequence annotations help biologists visualize the different elements of the DNA sequences they manipulate. Software lets users switch between graphical representations of DNA sequences that provide a macroscopic view and textual representations more suitable to examine sequences with a base-level resolution. Irrespective of the software environment used, the cut-and-paste approach to sequence editing increases the chance of introducing errors, such as leaving or deleting a DNA segment, accidentally inserting it twice, or truncating a functional element. For a large, multi-gene construct (e.g. an expression cassette encoding all components of a biochemical pathway), these risks could become unacceptably high. In many cases, errors are uncovered only after several months of unsuccessful attempts to express a gene.
Many of these errors can be avoided by developing a library of genetic parts prior to designing DNA sequences. The notion of genetic parts supports a different approach to sequence design: complex genetic constructs can be designed using drag-and-drop user interfaces that rely on icons to represent different categories of genetic parts. This added level of abstraction makes it easier to understand the structure of a new sequence. It also avoids sequence manipulation errors. However, it still makes it possible to design sequences lacking components required for proper gene expression.
We demonstrated that DNA design rules can be modeled as context-free grammars. GenoCAD, a web-based application to design synthetic DNA sequences, relies on the notion of “grammars” to organize large collections of genetic parts. It also includes a wizard-like sequence editor that guides users through a series of design decisions corresponding to the rewriting rules of a grammar selected by the user. Initially, users could only choose from a set of public grammars when designing sequences. These public grammars were developed by manually adding records in the GenoCAD backend database. The process was tedious and only GenoCAD administrators familiar with the application data model could develop new grammars. These early grammars were interesting as proof of concepts, but they did not necessarily reflect the design rules that users wished to use for specific research projects.
As a result, we have formalized the DNA design process and developed a graphical user interface enabling life-scientists to develop context-free grammars. The GenoCAD Grammar Editor allows users to revise existing grammars, or even to develop brand-new grammars. These grammars can be fairly generic to generate a broad range of expression vectors for a new host. Alternatively, they can be made very specific to capture project-specific design constraints, such as the ones resulting from intellectual property licensing agreements. These languages describing families of synthetic DNA molecules are comparable to Domain-Specific Languages (DSL) used in computer programming.
Additional information about GenoCAD can be found at www.genocad.com.
Source of funding
- 2009-2013:National Science Foundation Award EF-0850100
- OpenHelix: Training and outreach
- Synthetic Biology Open Language (SBOL): Standardization
- Sakiko Okumoto: Design of expression vectors for expression of genes in the chloroplast of Chlamydomonas reinhardtii.
- Tim Lu: Design of synthetic transcription factors
- Oliver Purcell: Design of synthetic transcription factors
- Eric van Wyk: Language design
- Anna Coll and Kristina Gruden: Design of plant expression vectors
Selected publications related to this project
- Cai Y, Hartnett B, Gustafsson C, Peccoud J. (2007) A syntactic model to design and verify synthetic genetic constructs derived from standard biological parts. Bioinformatics. 23, 2760-7.
- Peccoud J, Blauvelt MF, Cai Y, Cooper KL, Crasta O, DeLalla EC, Evans C, Folkerts O, Lyons BM, Mane SP, Shelton R, Sweede MA, Waldon SA (2008) Targeted development of registries of biological parts PLoS ONE.;3 (7):e2671.
- Czar M, Cai Y, Peccoud J (2009) Writing DNA with GenoCAD, Nucleic Acids Research 37, W40-47
- Cai Y, Lux MW, Adam L, Peccoud J (2009) Modeling structure-function relationships in synthetic DNA sequences using attribute grammars, PLoS Comp Bio 5(10):e1000529
- Cai Y, Wilson ML, Peccoud J (2010) GenoCAD for iGEM: a grammatical approach to the design of standard-compliant constructs Nucleic Acids Research 38 (8): 2637-2644
- Wilson ML, Hertzberg R, Adam L, Peccoud J. (2011) A step-by-step introduction to rule-based design of synthetic genetic constructs using GenoCAD. Methods in Enzymology 498:173-88
- Lux MW, Bramlett BW, Ball DA, Peccoud J (2012) Genetic Design Automation: engineering fantasy or scientific renewal? Trends in Biotechnology 30:120-126.
- Wilson ML, Okumoto S, Adam L, Peccoud J (2014) Development of a domain-specific genetic language to design Chlamydomonas reinhardtii expression vectors Bioinformatics 30 (2) 251-257
- O Purcell, J Peccoud, TK Lu (2014) Rule-based design of synthetic transcription factors in eukaryotes ACS Synthetic Biology 3 (10) 737–744
- Michal Galdzicki, Mandy Wilson, Cesar Rodriguez, Ernst Oberortner, Matthew Pocock, Laura Adam, J. Christopher Anderson, Bryan Bartley, Jacob Beal, Deepak Chandran, Joanna Chen, Douglas Densmore, Drew Endy, Raik Gruenberg, Jennifer Hallinan, Nathan Hillson, Cassie Huang, Jeffrey Johnson, Allan Kuchinsky, Matthew Lux, Goksel Misirli, Chris Myers, Jean Peccoud, Hector Plahar, Nicholas Roehner, Evren Sirin, Guy-Bart Stan, Alan Villalobos, Anil Wipat, John H. Gennari, and Herbert M. Sauro (2014) SBOL: A community standard for communicating designs in synthetic biology Nature Biotechnology 32(6):545-50.
- Adames NR. Wilson ML, Fang G, Lux MW, Glick BS, Peccoud J (2015) Nucleic Acids Research 43 (10) 4823–4832