MaizeGDB
jobs | upcoming events | sitemap
 | docs | bulk data | browse data | tools | login / register | links 
homeA popup window providing help for using the search form to the right  

MaizeGDB Genome Assembly and Annotation Manifesto

Genome Assembly | Genome Annotation | Suggested Elements for a Successful Collaboration with the Maize Community

If you need to document involvement of MaizeGDB in your planned assembly or annotation efforts, contact Carolyn Lawrence (Carolyn.Lawrence@ars.usda.gov) for a letter of collaboration.


GENOME ASSEMBLY

It is imperative that the community work from the same genome coordinate system across projects in order to allow the data generated by various groups to be leveraged and displayed in a comparable manner. Like many Model Organism Databases, MaizeGDB is charged to facilitate this process and is committed to releasing official genome assemblies as they are made available.

November of 2010 will mark release of B73 RefGen_v2 as the default view of the assembly at MaizeGDB. This version was calculated by the Maize Genome Sequencing Consortium. Thereafter, the default assembly view of the MaizeGDB Genome Browser will be updated at least annually to the highest quality assembly available! FTP and BLAST access to builds will be made available at least one month in advance to allow research groups to align their sequence-indexed data to the assemblies well in advance of their full deployment via the MaizeGDB Genome Browser.

Descriptions regarding which groups are assembling, how you can contribute information, and cutoff dates for representing your related data via MaizeGDB (i.e., genome annotations; see below) will be posted here as details emerge.


GENOME ANNOTATION

The term annotation is often used indiscriminately to describe both structural annotation (which identifies the genes, their boundaries, and regulatory motifs) and functional annotation (which describes what is known about the cellular functions of a gene and its products). The majority of annotation done on the maize genome sequence up to this point has been structural, but it is expected that there will be a larger emphasis on functional annotation in the future. The Maize Genome Sequencing Consortium is in the process of calculating their final sets of gene models (called the working gene set and the filtered gene set where the latter is a more robust subset of the former). Once available, these gene sets will be considered the official maize genome structural annotation version 1.0 and will appear at MaizeGDB, both within the context of the MaizeGDB Genome Browser and as unique entities that integrate with the existing data at MaizeGDB. Where the Maize Genome Sequencing Consortium has assigned putative function to these gene models, that data will be made available via MaizeGDB alongside related information. Data displays are under construction and are modeled after those available at TAIR (e.g., http://www.arabidopsis.org/servlets/TairObject?id=34708&type=locus).

To find out more on how your group's genome annotation endeavors can be coordinated for community distribution via MaizeGDB, read on!

Structural Annotation

Currently the gene structures available via MaizeGDB comprise the 5a.59 "Working Gene Set" product from the Maize Genome Sequencing Consortium, which is based on the B73 RefGen_v2 assembly. As you likely already know, the structures calculated by Maize Genome Sequencing Consortium are a great resource, but computational and experimental evaluations will improve the accuracy of these predicted structures over time. Thereafter, improved gene sets will be calculated by groups focused on structural annotation of the maize genome in collaboration with MaizeGDB. Wholesale updates to the genome annotation will occur annually: like the genome assemblies, these gene sets are anticipated to become available from the context of the MaizeGDB Genome Browser annually.

Community Structural Annotation: Currently, the PlantGDB/ZmGDB group has performed quality assessments of the 5a.59 working gene set including quality assessments and makes available the yrGATE tool for community improvement of these annotations. To contribute structural annotation for your gene of interest, visit the MaizeGDB Genome Browser and click the model for your gene of interest from the track labeled "Working Gene Set 5a.59 Quality" and follow the links to "Annotate this Locus".

Functional Annotation

Functional annotation can mean different things to different people but generally involves attaching information regarding gene product identity, biological or biochemical function, expression, regulation, and interactions to a genomic element. Are you generating RNAseq data and wish for that to be aligned to assemblies to show that the genes in a particular region are expressed? Do you have a mutation for a gene that is mapped to a genome assembly and the mutant phenotype is known? Have you experimentally determined the temporal and spatiial regulation of a small group of transcription factors? MaizeGDB is interested in both small and large functional annotation data sets determined by either in silico analysis or experimental validation. Contact us at MaizeGDB to find out how data such as these can be included in the MaizeGDB resource.

In addition to the sorts of functional annotations already described, we at MaizeGDB accept functional annotations that are based upon assignment of terms from the Gene Ontologies (GO; http://www.geneontology.org/) to gene structures. The first set of GO annotations to gene structures that will be made available via MaizeGDB will come from the Maize Genome Sequencing Consortium's MaizeSequence.org resource. Additions and changes to those functional assignments by groups working on genome annotation are welcome and will be released via MaizeGDB on an annual basis.

More on GO: When GO terms are assigned to a particular gene, standard Evidence Codes are required to document how the inference of function was made. For example, an annotation that was made on the basis of an automatic computational analysis would have the evidence code IEA, an annotation made on the basis of an enzyme assay would have the evidence code IDA (or ISS, if it has been reviewed by a curator). Evidence Codes used by the Gene Ontology Consortium are available here: http://www.geneontology.org/GO.evidence.shtml.


SUGGESTED ELEMENTS FOR A SUCCESSFUL COLLABORATION WITH THE MAIZE COMMUNITY

A plan for providing documentation that is complete, accurate, and timely. A centrally accessible plan should be made available at the time that your project begins and include a timeline for data delivery. Functional and structural annotation should be provided with standard evidence codes, clearly discriminating annotation with experimental evidence from purely in silico analyses.

A plan for developing a close working relationship with MaizeGDB as the ultimate disseminators of the information. Assemblies and annotations should be delivered to MaizeGDB regularly and in a timely fashion. MaizeGDB will display the deliverable dates. These dates should be known in advance, and they should be adhered to. MaizeGDB will also create a mechanism to display progress. It is understood that delays can occur. The intent here is to make the process more transparent to the research community.

A mechanism for interacting with the maize community directly and with a single voice. Maize researchers comprise a vibrant community with researchers at all levels in both the public and private sector and as such an annotation project means different things to different people. A bidirectional means of communicating with the maize community should be deployed at the start of the project so that the maize community can both absorb and respond to new annotation information quickly. The goal is to provide all community members with the same information at the same time so that they can plan their research activities accordingly. This can be accomplished in many ways (quarterly e-newletters, FAQs, blogs, social media, conferences, etc.) and all options should be considered so as to reach the largest number of stakeholders.

A robust way to capture genome assembly information from the community. Currently, individual researchers are generating excellent, lab-validated markers and order/orientation information of sequence fragments within BACs. Researchers are usually willing to share this information freely, but currently, there is no robust means to capture it. There should be an easy to use web interface for researchers to submit data. All annotation submitted by community members should be vetted manually by expert annotators, then incorporated into the assembly, with an indication of who provided the data. It is expected that while there will be comparatively little data entering the assembly process in this way, these data would be of very high quality.

A plan to allow distributed annotation. As with genome assembly, researchers currently have high-quality structural and functional annotation for their genes of interest, both stored on lab computers and documented in publications. Researchers should be provided with tools to improve structural and functional annotation information that may be integrated with the larger project's outcomes. The same tools could be leveraged for classroom teaching. The ZmGDB/PlantGDB yrGATE system and iPlant's DNA Subway are good examples of the sort of interface that could serve both groups.

A workshop for education, outreach, and training (EOT) in all aspects of annotation. EOT plans are valuable mechanisms for increasing science literacy. An EOT plan for increasing community involvement in the annotation of the maize genome is strongly encouraged. One obvious way to involve the community would be to contact the Maize Genetics Conference Steering Committee http://www.maizegdb.org/maize_meeting/ to find out how you can get your message out at the Maize Meeting.


Return to the homepage

Last updated 2:51 pm, Jun 30, 2011.

home  

Please cite us!

This page is HTML 4.01 valid!