Skip Navigation

University of Nebraska–Lincoln


Home of the farrp allergen protein database

Celiac Database Beta-2: 21 September 2017

Contact: Rick Goodman

Celiac Tools Updated 21 September 2017

Browse Entries
By Peptides '1013'
By References '71'
By Proteins '68'
Sequence Search
Exact peptide match

New Beta-2 Release CELIAC Database, 21 September 2017 Peptides, Proteins and References: Final update will be January 2018.

This database has been updated for prediction of risks of stimulating celiac disease (CD) for CD subjects. The update includes removal of peptides less than 9 amino acids (AA) long as they are not predictive. This update includes addition of all peptides included in the July 2017 EFSA Appendix A Allergenicity update (EFSA Journal 15(6):4862) as well as most of the peptides from our 2012 Celiac Database (1016 peptides) from 2012 previously on this website. The peptides include native and predicted deamidated peptides of wheat, barley, rye and oats that are associated with CD. NOTE the EFSA list of 9 AA peptides are considered strong candidates for binding MHC Class II HLA DQ2 or DQ8, but that includes 30% of the general population. Since CD effects approximately 1.3% of the global population, the specificity of T-cell receptors for longer or substituted AA positions will alter specificity. In addition other genes or environmental factors (gut bacteria, virual infections) are also important in stimulating development of CD. This database is intended as a risk assessment tool for developers of genetically engineered (modified) crops and novel food ingredients, as well as researchers. Therefore exact identity matches to a 9 AA peptide needs to be interpreted as it will likely over-estimate peptides that pose a risk of eliciting CD. We include 1013 peptides from 9 AA to 53 AA that are published as binding HLA DQ 2 or DQ8, or of stimulating CD T-cell proliferation or toxicity using human samples or subjects. We have therefore included representative proteins and a FASTA algorithm for further analysis. Developers finding matches to the peptides and proteins as discussed in detail here, are encouraged to label their products as containing "gluten" unless they conduct further research to demonstrate their protein will not elicit CD. The references are updated. Search tools remain the same. The peptide entries were reviewed by committee. Proteins will be updated following additional testing. The guidance and suggested criteria for significance is currently being tested and reviewed and will be modified by January 2018. Special thanks to Plaimein Amnuaycheewa, Frits Koning and Barbara Bohle for their contributions. RE Goodman, 22 September 2017.

Celiac Disease (CD) Novel Protein Risk Assessment Tool

The Food Allergy Research & Resource Program (FARRP) in the Department of Food Science & Technology, University of Nebraska, has added a new bioinformatics tool to identify Exact Peptide matches between the amino acid sequence of a query protein and the 1,013 naturally occurring, mutated or deamidated (Gln = Q, converted to Glu = E, by tissue transglutaminase) peptides from wheat and wheat relatives (barley, rye and two proteins from oats) that have been demonstrated to elicit celiac disease or activate MHC Class II restricted T cells of subjects with celiac disease. The basis of specificity is due to antigen presentation of these peptides by genetically inherited specific Major Histocompatibility class II receptors HLA DQ2.5 or DQ8 receptors or variants (DQ2.2, DQ8.5) that activate T cells in affected individuals. Proteins derived from the wheat subfamily (Pooideae) of the grass family (Poaceae), which are considered for use as Novel food ingredients or introduced into other species of food crops through genetic engineering may pose a risk for those with celiac disease if they contain celiac active peptides. The database provides a simple screening tool to identify those proteins that might pose a risk of eliciting celiac disease, or are sufficiently similar to CD eliciting proteins/ peptides that further testing would be reasonable to demonstrate safety for consumption by affected individuals.

In addition to the Exact Peptide match, the linked Celiac Disease database also includes a FASTA algorithm to compare the query protein against 68 celiac inducing proteins that are the sources of the peptides and list of 71+ published references supporting the inclusion of peptides and proteins in the database. Proteins lacking any identity match to the 1,013 peptides are not likely to trigger celiac disease, however it is possible that not all peptides that can trigger CD are known. Thus FASTA to the 68 proteins adds a level of safety. The FASTA comparison has not (yet) been validated sufficiently to set absolute thresholds of concern for celiac disease. However, preliminary searches with proteins from rice, sorghum, maize and other food sources that are considered safe for those with celiac disease allowed us to establish reasonably conservative guidelines. Identity matches of less than 45 percent over at least one-half of the FASTA aligned CD protein and those with an E score greater than 1 x 10-15th using this database are unlikely to present a risk of inducing celiac disease. These criteria are being re-reviewed by our committee and should be finalized by December, 2017

Note: The first version of this database was available for public use on 14 February, 2012. The peptide entries were updated on 10 August 2017 to include 1030 peptides. Further review completed 21 September 2017 demonstrated the 8 AA peptides are not predictive and they were removed. The update on 21 September 2017 has 2013 peptides. In addition, BLAST of all 9 AA peptides found that some have 100% identity matches to other species unrelated to wheat (Pooideae). Words of CAUTION were added to the data table for those 9 AA sequences. Additional tests of FASTA criteria will be conducted before January 2018.

What is CD? Celiac Disease, also known as gluten-sensitive enteropathy or celiac sprue, is a genetically linked inflammatory immune disease with varying severity in an estimated 0.5% to 1.5% of the population in various geographies. (A, B) Affected individuals experience symptoms after the consumption of food containing proteins from wheat, barley, rye and possibly oats and other grass family grains closely related to wheat.(C) The primary target organ is the upper small intestine and symptoms are usually associated with the digestive tract with chronic diarrhea, abdominal pain, cramping, bloating or irritable bowel syndrome.(D) However, general nutritional deficiency, failure to thrive, mouth ulcers, and fatigue are experienced by many subjects. Continuing exposure to glutens leads to increased immune response, increased expression of tissue transglutaminase and inflammation that leads to flattening of the villi in the small intestine, erosion of the mucosal epithelium and loss of absorptive capacity.(E) Vitamin deficiency is common. Loss of calcium density in bones is associated with the disease and there is an increased risk of developing adenocarcinoma of the small intestine and T-cell lymphoma.

The specificity of the disease is determined by T lymphocytes that bind to specific native or deamidated peptides of certain wheat-family glutens (glutenins and gliadins) that are presented in the antigen presenting groove of MHC class II, HLA DQ2.5 or DQ8, leading to activation of CD4 T cell response driven inflammation involving macrophages, NK cells and other inflammatory cells that cause tissue destruction.(F) Interestingly, while nearly 20% of individuals in North America and Europe express HLA DQ2, and ~ 95% of those with celiac disease have HLA DQ2.5 (and others express DQ2.2, DQ8, DQ8.5 and possibly DQ9), only about 1% of the population has been diagnosed with celiac disease. Refer to the reference list for publications on peptides and MHC restriction. Thus, other unknown factors are also very important determinants leading to celiac disease or tolerance. Many speculate that there is a much higher percent of the population that simply have subtle symptoms or are undiagnosed, but there is little hard evidence to support much greater prevalence of disease than 1% of the global population.

Avoiding the proteins that stimulate the CD immune response is the only effective treatment for those with celiac disease. Since these cereal grains are commonly used not only as major carbohydrate and protein food sources in breads and pasta, but processed wheat and wheat relatives are also used as functional food ingredients in many restaurant and processed foods, making dietary avoidance complex. Protection of those with celiac disease requires separation of commodities that are intended for “gluten-free” foods, from the source of the commodity, through processing and packaging. Gluten- free foods must also be labeled clearly and accurately in order to protect the most sensitive affected consumers. Food companies who produce gluten-free foods work hard to source commodities from suppliers with minimal (no) contamination. Interestingly, there is evidence that gluten-like proteins in oats do not affect most celiac patients if the oats are pure and free-from wheat, barley and rye grain. However, oats are often produced on farms that also grow wheat, or wheat is grown in neighboring farms, producing potential source of contamination. Further, farming equipment and shipping containers (trucks, trains and ships) would often carry wheat, barley or rye and can serve as sources of contamination. In addition, commodity processing and food manufacturing facilities are often used for products that contain wheat, so accurate segregation is difficult. Consumers with celiac disease must trust food producers to accurately represent foods as being free from gluten and there are stringent standards for claiming “gluten-free” in many industrialized countries.

Evaluating Genetically Modified Organisms and Novel Food Ingredients

In order to help ensure that those with gluten-sensitivity would not be at greater risk of exposure, regulatory guidelines for genetically modified crops recommend that the proteins encoded by genes transferred from wheat and wheat relatives (members of the Pooideae subfamily) into different food sources (e.g. rice, maize, potato), should be evaluated regarding their capacity to elicit celiac disease. (G) FARRP believes that the current Celiac Disease Peptide and Protein searchable database provides an efficient screening tool to determine whether additional tests (e.g. laboratory T cell activation tests using samples from individuals with CD or performing tissue biopsy challenge or clinical challenge from volunteers with CD) would need to be undertaken to demonstrate safety of a new protein. Proteins isolated from wheat and wheat relatives for use as novel food ingredients could also be assessed using this computer-comparison.

Proteins that do not contain an exact peptide match to those identified in this database are unlikely to induce symptoms in those with celiac disease. The FASTA search routine is provided as an added safety measure in case not all CD active peptides are known. In the event a candidate food protein from a member of the Pooideae subfamily matches of the 68 proteins in the CD database, with an identity of at least 45% over at least one-half of the length of the CD inducing protein, with an E score smaller than 1 x 10-15th , it would be prudent to perform additional tests to rule out any risk the protein might cause CD.

Compilation and Review of FARRP Celiac Disease Peptide and Protein database

Plaimein Amnuaycheewa, PhD, compiled the original set of probable CD active peptides from his review of approximately 100 publications describing proteins and peptides (1,016) as part of his dissertation research. The update was provided by re-review of original publications and newer publications up until 2017 to end with a total of 1030 peptides. The publications provide data these peptides have been tested for T cell activation potential or induction of celiac enteropathy. The list of 68 celiac associated wheat-related proteins was compiled as representing proteins containing one or more of the peptides. Our bioinformatician, John Wise, compiled the data for the 2012 database and structured the database and search routines. The MHC-II binding peptides listed by EFSA (2017) were identified and renamed by Sollid et al., in 2012. The dataset was reviewed by Afua O. Tetteh, PhD, Plaimein Amnuaycheewa, PhD, Barbara Bohle, PhD, Fatima Ferreira, PhD, Frits Koning, PhD and Richard Goodman, PhD. The effort for the 2012 database was funded primarily by FARRP and partly by the six biotechnology companies that fund the FARRP database. Funding in 2017 has been provided by FARRP, two major biotech companies (BASF and Pioneer), JR Simplot, NuSeed and Unilever. The database and bioinformatics methods will be reviewed further and updated by January 2018.


  • A. Biagi F, Klersy C, Balduzzi D, Corazza GR. 2010. Are we not over-estimating the prevalence of coeliac disease in the general population? Annals of Medicine 42:557-561. PMID:20883139
  • B. Abadie V, Sollid LM, Barreiro LB, Jabri B. 2011. Integration of genetic and immunological insights into a model of celiac disease pathogenesis. Annual Reviews in Immunology. 29:493-525. PMID 21219178
  • C. Tye-Din JA, Stewart JA, Dromey JA, Beissbarth T, van Heel DA, Tatham A, Hederson K, Mannering SI, Gianfrani C, Jewell DP, Hill AVS, McCluskey J, Rossjohn J, Anderson RP. 2010. Comprehensive, quantitative mapping of T cell epitopes in gluten in celiac disease. Science Translational Medicine 2(41):41ra51. PMID:20650871
  • D. Scanlon SA, Murray JA. 2011. Update on celiac disease—etiology, differential diagnosis, drug targets, and management advances. Clinical and Experimental Gastroenterology. PMID:22235174
  • E. Sollid LM, Jabri B. 2011. Celiac disease and transglutaminase 2: a model for posttranslational modification of antigens in HLA association in the pathogenesis of autoimmune disorders. Current Opinion in Immunology 23:732-738. PMID: 21917438.
  • F. Kagnoff MF. 2007. Celiac disease: pathogenesis of a model immunogenetic disease. Journal of Clinical Investigation. 117:41-49. PMID:17200705
  • G. Codex (2003). Codex Alimentarius Guidelines. Alinorm 03/34, Joint FAO/WHO Food Standards Programme, Twenty-Fifth Session (FA), Rome, Italy
  • H. Sollid LM, Qiao S-W, Anderson RP, Gianfrani C, Koning K. (2012). Nomenclature and listing of celiac disease relevant gluten T-cell epitopes restricted by HLA-DQ molecules. Immunogenetics. 64:455-460.
  • EFSA. EFSA GMO Panel (EFSA Panel on Genetically Modified Organisms) (2017). Naegli H, Birch AN, Casacuberta J et al., EFSA Journal 2017;15(5):4862, 49 pp.