Information about assembly Zm-W22-REFERENCE-NRGENE-2.0    (also known as W22)
Genome Sequencing Project Information

   Stock provided by Hugo Dooner. This stock was derived by R.A. Brink at the U. of Wisconsin for his studies on paramutation at the R locus by five back-crosses of the regular W22 inbred (colorless seed because of c1; r alleles) to a purple seed stock that he apparently obtained from Cornell (Brink, 1956, Genetics 41:872-889). This resulted in the introgression of C1 and R alleles from the purple stock into W22. The C1 allele was characterized by Karen Cone. The R allele is the paramutable R-r: standard allele (Brink, 1956, op. cit.; Dooner & Kermicle, 1971, Genetics 67:427-436). Allele was obtained in 1972 from the U. of Wisconsin cold storage collection (pedigree CS-810) and has been selfed for 30 generations. Known alleles of color genes: C1 (Cooper and Cone, unpub.; GenBank AF320614); R-r: std (Walker et al., 1995, EMBO J. 10: 2360-2363; only partial sequence); Bz1-W22 (Dooner & He, 2008, Plant Cell 20:249-258; GenBank EU338354).
   This sequence has been released under the Toronto Agreement. No whole-genome research may be submitted for publication until the official publication for this genome assembly has been published.

   GenBank BioProject   PRJNA311133  
   Project PI   Tom Brutnell
   Project start date   August, 2014
   Release date   2017
   Consortium   W22 Sequencing Consortium
   Publication status   in preparation
Project reference The maize W22 genome: a foundation for gene discovery and functional genomics. Tom Brutnell, Omer Barad, Kobi Baruch, Gil Ben-Zvi, Ed Buckler, Ethalinda Cannon, Paul Chomet, Hugo Dooner, Chunguang Du, Georg Jander, Karen Koch, Don McCarty, Ilya Soifer, Doron Shem-Tov, Erik Vollbrect, Doreen Ware, Maggie Woodhouse

Stock and Biosample Information

Stock information
   Stock name   cultivar:W22 (C1:R-r:std - PI 674445)
   Stock record   9034197
   Stock details   cultivar:W22 (C1:R-r:std - PI 674445)
   Stock provided by   Hugo Dooner
Biosample information
   GenBank BioSample   SAMN04479043  
   Sample type   whole organism
   Sample description   Plant Sample collected by hand, DAN extracted HMW DNA extraction (80-120KB in size)
   Collection date   5-Sep-14
   Collected by   Jiang Hui
   Age   9th day after sowing the seed
   Plant structure   PO:0000003
   Developmental stage   seedling

Sequencing and Assembly Information

   Assembly name   Zm-W22-REFERENCE-NRGENE-2.0
   Sequencing description   Sequence service provider: Roy J. Carver Biotechnology Center (Urbana, IL) at the University of Illinois
Sequencing method: Illumina short read and 10x Genomics
Sequencing hardware: Illumina short read and 10x Genomics
   Assembly description   Assembly methods: DenovoMAGIC
Construction of pseudomolecules: Scaffolds were ordered and oriented
   Release date   2017
   Sequencing method   NRGene de novo assembly
   Finishing strategy   Complete genome
   Seq hardware   Illumina HiSeq2500
   Seq chemistry   v4 and rapid mode 2
   Seq chemistry version   v4 and rapid mode 2
   Genome coverage   210x
   Seq service provider   Roy J. Carver Biotechnology Center (Urbana, IL) at the University of Illinois
Assembly statistics
   Scaff num   306
   Longest scaff   83 bp
   N50 scaff length   35 bp
   N50 scaff count   18
   N90 scaff length   10,997,073 bp
   N90 scaff count   58
   Annotation Identifier   Zm00004b.1
   Annotation Provider   Yinping Jiao, Ware lab
   Annotation Date   May, 2017
   Annotation Software   _Zm00004b_MAKER-P
   Annotation Description   Annotation of protein coding genes was performed using MAKER-P pipeline software(Campbell et al. 2014), with parameters and evidence similar to those recently used to annotate B73(Law et al. 2015; Jiao et al. 2016). Repeat masking by RepeatMasker was performed using exemplar transposon sequences (Schnable et al. 2009) available online at the maize transposable element database. We excluded helitron and MULE elements to avoid false-positive masking from captured exon sequences in such elements. Gene expression evidence included PacBio Iso-seq long reads sequenced from cDNA libraries of six tissues in B73 (n=111,151)(Wang et al. 2016). In addition, we included the following transcriptome assemblies, each processed to exclude short transcripts (<300-bp) and redundancies based on application of CD-HIT(Fu et al. 2012): 1) a pooled set of 94 transcriptome assemblies constructed from publicly-available RNA-seq reads (n=508,233) (Law et al. 2015), 2) a transcriptome assembly of B73 seedlings (n=112,963) (Martin et al. 2014), 3) a transcriptome assembly of W22 tissues (n=589,743). Cross-species evidence was supplied in the form of the following annotated protein files downloaded from Gramene release 46(Gramene FTP) (Tello-Ruiz et al. 2016): 1) Arabidopsis_thaliana.TAIR10.27.pep.all.fa, 2) Brachypodium_distachyon.v1.0.27.pep.all.fa, 3) Oryza_sativa.IRGSP-1.0.27.pep.all.fa, 4) Setaria_italica.JGIv2.0.27.pep.all.fa, and 5) Sorghum_bicolor.Sorbi1.27.pep.all.fa. Alignment and downstream processing of sequence evidence to the repeat-masked W22 reference was performed within the MAKER-P pipeline using default parameters. For gene model prediction, the pipeline incorporated AUGUSTUS(Stanke et al. 2006) applied with the maize5 model and FGENESH(Salamov and Solovyev 2000) applied with the monocot model. Stable gene identifiers were assigned using the format Zm00004bXXXXXX (where the X's represent a random 6-digit number), as specified under A Standard For Maize Genetics Nomenclature available at MaizeGDB.
   Annotation Download