|
MAIZE GENOME SEQUENCING CONSORTIUM 2006 REPORT
As provided to MNL by Sandy Clifton, Wash U Genome Sequencing Center. The below is a condensed version of the report that was supplied to the project's advisory group (Oct 2006). An updated report will be prepared as the MNL goes to press after the Annual Maize Genetics Conference in March of 2007.
Original Project Objectives/Deliverables.
1. Provide the complete sequence and structures of all maize genes and
their locations (in linear order) on both the genetic and physical maps
of maize.
2. The gene space of B73 maize (gene sequences and adjacent regulatory
regions) should be finished to high quality according to currently
acceptable standards.
3. If applicable, the sizes of gaps between the genes should be
estimated and draft sequences of repetitive DNA between genes presented
where possible.
4. The sequence will be fully integrated with the genetic and physical
maps.
5. Annotation will include gene models, predicted exon/intron structure,
incorporation of EST and full-length cDNA data, gene ontology, and
relationship with homologs in other organisms, including but not limited
to, the other sequenced plant genomes.
6. Annotation will be coordinated with existing maize community and
comparative databases with the eventual goal of generating complete
curation of the genomic sequences to a standard set by established model
organism databases.
Research Activities and Results.
The first 6 months of the year were spent coordinating the various tasks
assigned to the participating institutions, Washington University School
of Medicine Genome Sequencing Center (GSC), Arizona Genomics Institute
(AGI), Cold Spring Harbor Laboratories (CSHL) and Iowa State University
(ISU). The AGI has the responsibility for choosing a minimal tiling path
(MTP) of mapped BAC clones and preparing DNA for sequencing in
consultation with the GSC, the primary sequencing center. By year's end, an optimized pipeline for MTP clone selection to BAC library
construction was developed. AGI is on target to process an additional
960 clones/month until the entire maize genome is covered with BACs. In
addition, the first months of the past grant year were spent optimizing
procedures and improving computer access and communication among the
three Centers (GSC, AGI and CSHL) sharing production and finishing tasks.
Bimonthly or monthly conference calls among the three centers involved in
the production/finishing procedures have been and are continuing to be
used as an effective means of anticipating and dealing with any problems
in a timely manner. A smoothly operating protocol is now in place and
functioning well. Through-put numbers are now on track to meet the
project goals. Clone selection has increased to 1,000 clones per month,
with library construction, and production sequencing throughput scaling
proportionally.
The bioinformatics teams at CSHL and Iowa State have been crucial to the
development and/or adaptation of software that has allowed the project to
move forward in a timely manner. The largely manual clone picking
process at AGI has been improved in cooperation with CHSL by automating
the data pipeline and setting up visualization methods using existing
GMOD tools, CMap and GBrowse (http://www.gmod.org/node/).
Annotation is also a responsibility of the bioinformatics teams and the
first year of this endeavor has focused on the development of protocols
that will form the basis of the annotation pipeline. While some analysis
protocols can be directly adopted from the Gramene Project
(http://www.gramene.org) with little modification, new approaches are required
to address characteristics that are specific to the maize genome, such as
repetitive sequences. Repeat classification is also essential for
understanding how proliferation of different transposon classes
contributed to the expansion of the maize genome. To classify repeat
sequences, we are utilizing the MIPS Repeat Element Catalog (MIPS-REcat),
http://mips.gsf.de/proj/plant/webapp/recat/RecatTreeFrameset.jsp). Of
course, most users will be interested in the gene space between the
repetitive regions. We are using both ab initio gene prediction and
evidence-based gene-build approaches to define protein-coding genes. .
We are also working with Brad Barbazuk (Donald Danforth Plant Science
Center) to adopt TWINSCAN software, which is being trained to annotate
the maize genome. The staffs of CSHL and GSC have worked closely to
develop a standardized GenBank submission record that will be accurately
parsed as primary annotation for the sequenced clone. The Maize Project
differs from other clone-by-clone sequencing projects in that most clones
will not be sequenced beyond the Phase I level. Thus most clones will be
represented as multiple contigs. Information on how these contigs are
oriented and associated into scaffolds will be essential to users of the
genome browser. We have therefore established methods to encode this
information within the GenBank record submitted by GSC, so that it can be
conveyed by the CSHL annotation team to the user. The ISU team has been
developing a scaffolding approach that makes use of retrotransposons.
Preliminary experiments conducted on maize BACs have been promising.
Using 257 "random" BACs finished by the GSC downloaded from GenBank it
was possible using this approach to conservatively obtain 1.5 additional
scaffolds per BAC. The first release of the MaizeSequence Browser at
CSHL (http://www.maizesequence.org) went live on 28 September 2006. The browser
and database infrastructure are powered by Ensembl
(http://www.ensembl.org/index.html), which has proven itself highly
robust and flexible in the service of genome projects. The interface
provides convenient entry points to the genome, both by searching and
browsing, and displays salient features, including predicted genes,
markers, repeats, and expressed and conserved regions. A high priority
was to make searching and browsing easy for all members of the maize
community, whether geneticist, breeder, or molecular biologist. Thus
entry is currently possible by sequence accession, physical position,
genetic position, and by conserved synteny with rice (http://www.maizesequence.org). The BLAST search engine will be added in
December 2006, and the ability to view annotated BAC clones for the maize
project in ContigView will be available in January 2007. Future plans
include a software feature that is an automated notification system
allowing end users to "subscribe" to specific regions of the maize
genome. The system, leveraged by the annotation pipeline, will notify
users when a region of interest has an updated sequence or marker
alignment. Another goal is to use Ensembl's Distributed Annotation System
(DAS) infrastructure to provide alignments of procured data sets (such as
mRNAs and full-length cDNAs). A third feature is the visual integration
of the larger-scale FPC view with the more targeted, sequence-based BAC
view, that will provide a uniform browsing context.
Outreach is an important part of the project. In the past year outreach
activities to the maize community, included soliciting maize researchers
for preliminary requirements for the maize genome sequence site,
coordination with the maize community database MaizeGDB, establishing
appropriate contacts with existing and future maize research initiatives
and attending annual meetings. In order to establish browser
requirements we solicited maize researchers at CSHL, Missouri, Iowa and
the Plant Gene Expression center for feedback on the existing Gramene
browser and specific maize requirements. This was done via phone calls,
in person meetings, and email exchanges. To enhance the existing
working relationship with MaizeGDB, Dr. Lawrence and her group spent a
day at CSHL reviewing the browser, discussing mechanisms for linking
between the project sites, data exchange, and establishing a working
model for feedback between the groups. To make the broader maize
community aware of ongoing efforts, Drs. Lawrence and Ware coauthored
MGSC: Gramene and MaizeGDB cooperate to provide access to sequences and
related data published as part of the Maize Genetic Cooperation Newsletter volume 80,
describing efforts to coordinate on the delivery of the maize sequence to
the community. In order to build upon existing resources we have worked
closely with several community members for data integration. These
include annotation of maize retrotransposon elements with Drs. Phillip
San Miguel and Jeff Bennetzen, the maize optical map with Dr. Schwartz
group and gene predictions with Dr. Brad Barbazuk.
The outreach program at CSHL is largely focused on developing a website
with 3D graphics for high school students and the general public. They
have been waiting for enough data to support this activity. Now that the
first large "super contig" is nearing completion, they will begin building the website and
incorporating this data.
A standard presentation has been developed to describe the project to other scientists at
meetings. Dr. Wilson and co-PIs have made several presentations to the
plant biology community concerning the maize genome sequencing project:
1. The National Corn Growers Association Meeting Action Team,
Chesterfield MO, December 2006.
2. Plant Genomics European Meeting, Venice, Italy, October 2006.
3. National Academy of Science Workshop on Agricultural Biotechnology
for the Global Public Good, Chennai, India, October 2006.
4. Plant Genomics in China VII, Harbin, China, August 2006.
5. China Agricultural University, China, August 2006.
6. Monsanto, St. Louis. May 2006
7. Biology of Genomes, Cold Spring Harbor, May 2006
8. Maize Genetics, Asilomar, March 2006
9. Advances in Genome Biology, February 2006
10. Plant and Animal Genomes, San Diego, January 2006
The GSC maintains a web site where current progress can be viewed.
(http://genome.wustl.edu/genome.cgi?GENOME=Zea%20mays%20mays%20cv.%20B73&SECTION=research
Other related links can be found at
http://genome.wustl.edu/genome.cgi?GENOME=Zea%20mays%20mays%20cv.%20B73&SECTION=links
Return to the homepage
Last updated 10:55 am, Jan 26, 2007.
|