TWGFD
The tetraploid wheat gene family database
1. Gene Family Identification in Tetraploid Wheat
Amino acid sequences of Triticum dicoccoides and T. durum were retrieved from Ensembl Plants. Hidden Markov Models were acquired from Pfam. Candidate genes were identified using HMMER v3.0 with default parameters and validated through NCBI-CDD, SMART, HMMER, and InterPro. Protein physicochemical properties (molecular weight, theoretical pI, GRAVY) were calculated using ExPASy ProtParam. Chromosomal localization data were extracted from genome annotation files. Orthologs in Arabidopsis thaliana and Oryza sativa were identified with InParanoid v8.0.
2. Structural and Regulatory Element Analysis
Gene structures (intron/exon organization) were visualized with GSDS v2.0 using genome annotation files. Promoter regions (1.5 kb upstream of translation start sites) were analyzed in PlantCARE to identify cis-regulatory elements. Element abundance was profiled using Pheatmap in R. Conserved protein motifs were detected via MEME v5.3.0 with the following parameters: maximum motifs=8, width=6–250 amino acids.
3. Phylogenetic and Evolutionary Analysis
Protein sequences were aligned using Clustal X v1.83. Maximum-likelihood phylogenies were constructed in MEGA X. Syntenic relationships among A. thaliana, O. sativa, T. aestivum, T. dicoccoides, and T. durum were determined with InParanoid v8.0 and visualized in Circos v0.65. Codon alignments generated with PAL2NAL v14 informed calculations of nonsynonymous (Ka) and synonymous (Ks) substitution rates (PAML v4.9 codeml). Divergence times (T) were estimated as T = Ks / (2λ), where λ = 6.5 × 10⁻⁹ substitutions/site/year.
4. Expression Profiling and Variation Analysis
RNA-seq data from NCBI SRA spanning tissues, developmental stages, and stress conditions were analyzed. Transcript abundance (FPKM) was quantified using Hisat2 v2.1.0 and StringTie v1.3.5. Heatmaps (log₂-transformed FPKM) were generated with Pheatmap in R. Whole-genome resequencing data from 107 tetraploid accessions (33 T. dicoccoides, 25 T. dicoccum, 49 T. durum) were aligned to the T. dicoccoides Zavitan v1.0 reference (BWA-MEM v0.7.13: -k 32 -M -R). PCR duplicates were removed with Picard v2.18. Variant calling (GATK v4.0 HaplotypeCaller), joint genotyping, and hard filtering (QD<2.0, FS>60, MQ<40, MQRankSum>-12.5, ReadPosRankSum<-8.0) were applied. Biallelic SNPs were retained (VCFtools v0.1.14) at MAF>0.05, missingness<10%, and depth within 1/3–3× mean coverage. Functional annotation used SnpEff v4.3 (categories: exonic/synonymous, UTRs, splice sites, introns).
(Sample_information.xlsx)