Information about assembly Zm-W22-REFERENCE-NRGENE-2.0
(also known as W22)
to learn about maize genome and gene model nomenclature rules.
Genome Sequencing Project Information
Stock provided by Hugo Dooner. This stock was derived by R.A. Brink at the U. of Wisconsin for his studies on paramutation at the R locus by five back-crosses of the regular W22 inbred (colorless seed because of c1; r alleles) to a purple seed stock that he apparently obtained from Cornell (Brink, 1956, Genetics 41:872-889). This resulted in the introgression of C1 and R alleles from the purple stock into W22. The C1 allele was characterized by Karen Cone. The R allele is the paramutable R-r: standard allele (Brink, 1956, op. cit.; Dooner & Kermicle, 1971, Genetics 67:427-436). Allele was obtained in 1972 from the U. of Wisconsin cold storage collection (pedigree CS-810) and has been selfed for 30 generations. Known alleles of color genes: C1 (Cooper and Cone, unpub.; GenBank AF320614); R-r: std (Walker et al., 1995, EMBO J. 10: 2360-2363; only partial sequence); Bz1-W22 (Dooner & He, 2008, Plant Cell 20:249-258; GenBank EU338354).
The maize W22 genome provides a foundation for functional genomics and transposon biology..
Springer NM, Anderson SN, Andorf CM, Ahern KR, Bai F, Barad O, Barbazuk WB, Bass HW, Baruch K, Ben-Zvi G, Buckler ES, Bukowski R, Campbell MS, Cannon EKS, Chomet P, Dawe RK, Davenport R, Dooner HK, Du LH, Du C, Easterling KA, Gault C, Guan JC, Hunter CT, Jander G, Jiao Y, Koch KE, Kol G, Köllner TG, Kudo T, Li Q, Lu F, Mayfield-Jones D, Mei W, McCarty DR, Noshay JM, Portwood JL 2nd, Ronen G, Settles AM, Shem-Tov D, Shi J, Soifer I, Stein JC, Stitzer MC, Suzuki M, Vera DL, Vollbrecht E, Vrebalov JT, Ware D, Wei S, Wimalanathan K, Woodhouse MR, Xiong, Brutnell TP.
Sequence service provider: Roy J. Carver Biotechnology Center (Urbana, IL) at the University of Illinois Sequencing method: Illumina short read and 10x Genomics Sequencing hardware: Illumina short read and 10x Genomics
Assembly methods: DenovoMAGIC Construction of pseudomolecules: Scaffolds were ordered and oriented
Roy J. Carver Biotechnology Center (Urbana, IL) at the University of Illinois
N50 scaff length
N50 scaff count
N90 scaff length
N90 scaff count
Total number of scaffolds in assembly.
Longest scaffold in assembly.
The length of scaffold which takes the sum length (summing from longest to shortest scaffold) past 50% of the total assembly size.
How many scaffolds are counted in reaching the N50 threshold.
The length of scaffold which takes the sum length (summing from longest to shortest scaffold) past 90% of the total assembly size.
How many scaffolds are counted in reaching the N90 threshold.
A contig is a contiguous consensus sequence that is
derived from a collection of overlapping reads.
A scaffold is set of a ordered and orientated contigs
that are linked to one another by mate pairs of sequencing reads.
Yinping Jiao, Ware lab
Annotation of protein coding genes was performed using MAKER-P pipeline software(Campbell et al. 2014), with parameters and evidence similar to those recently used to annotate B73(Law et al. 2015; Jiao et al. 2016). Repeat masking by RepeatMasker was performed using exemplar transposon sequences (Schnable et al. 2009) available online at the maize transposable element database. We excluded helitron and MULE elements to avoid false-positive masking from captured exon sequences in such elements. Gene expression evidence included PacBio Iso-seq long reads sequenced from cDNA libraries of six tissues in B73 (n=111,151)(Wang et al. 2016). In addition, we included the following transcriptome assemblies, each processed to exclude short transcripts (<300-bp) and redundancies based on application of CD-HIT(Fu et al. 2012): 1) a pooled set of 94 transcriptome assemblies constructed from publicly-available RNA-seq reads (n=508,233) (Law et al. 2015), 2) a transcriptome assembly of B73 seedlings (n=112,963) (Martin et al. 2014), 3) a transcriptome assembly of W22 tissues (n=589,743). Cross-species evidence was supplied in the form of the following annotated protein files downloaded from Gramene release 46(Gramene FTP) (Tello-Ruiz et al. 2016): 1) Arabidopsis_thaliana.TAIR10.27.pep.all.fa, 2) Brachypodium_distachyon.v1.0.27.pep.all.fa, 3) Oryza_sativa.IRGSP-1.0.27.pep.all.fa, 4) Setaria_italica.JGIv2.0.27.pep.all.fa, and 5) Sorghum_bicolor.Sorbi1.27.pep.all.fa. Alignment and downstream processing of sequence evidence to the repeat-masked W22 reference was performed within the MAKER-P pipeline using default parameters. For gene model prediction, the pipeline incorporated AUGUSTUS(Stanke et al. 2006) applied with the maize5 model and FGENESH(Salamov and Solovyev 2000) applied with the monocot model. Stable gene identifiers were assigned using the format Zm00004bXXXXXX (where the X's represent a random 6-digit number), as specified under A Standard For Maize Genetics Nomenclature available at MaizeGDB.