|  |  | 

India Top Headlines

First “truly complete human genome” sequenced; India’s software plays a key role | India News


NEW DELHI: In what could be the largest enhancement to the human reference genome since its initial launch 20 years ago, researchers from the Telomere-to-Telomere (T2T) consortium, an international collaboration of around 30 institutions, have sequenced the “ first reference genome ”.
This could mark a new era of genomics in which no region of the genome, the entire human genetic code, is out of reach. This unlocks newer regions in human DNA and has the potential to improve understanding of a wide variety of disorders that affect people. It could also lead to better genetic screening that allows rapid and specific diagnostic tests to treat various diseases.
TOI accessed the preprint document entitled ‘The Complete Sequence of a Human Genome’ which it names the new sequence “T2T-CHM13”.
Final validation of this was also assisted by software from Chirag Jain, Assistant Professor, Department of Computational and Data Sciences, Indian Institute of Science (IISc).
Gap filling
In 2001, Celera Genomics and International Human Genome Sequencing published the first drafts of the human genome and revolutionized genomics. But there were gaps: according to Nature, the sequencing was not really complete and about 15% was missing due to technological limitations. Scientists later solved some puzzles, but the most recent human genome, which geneticists have used as a reference since 2013, still lacked 8% of the complete sequence.
Now, researchers from the Telomere-to-Telomere (T2T) consortium, an international collaboration of around 30 institutions around the world, have sequenced the “first truly complete reference human genome.” TOI accessed the preprint document entitled ‘The Complete Sequence of a Human Genome’ which it names the new sequence “T2T-CHM13”.
The human genome is the complete set of DNA. Strings of DNA are like a four-letter language: four chemical units or bases that are the alphabet. The letters are specifically combined with letters on the opposite strand to form words (base pairs or bp), which encode information. All of these words are stored on chromosomes in human cells.
If a human genome were a history book, it would have around 3 billion words (bp) in 22 chapters (chromosomes) that provide information about human travel through time with a detailed plan to build each human cell that would give health care providers new powers. to treat, prevent and cure diseases.
So if 8% of the genome was not sequenced before, it meant that some pages of this book were missing: that means not all of the more than 3 billion base pairs that each human genome contains were sequenced.
“… By addressing this 8% gap, T2T has completed the first truly complete sequence of a human genome,” the paper reads.
The reference sequence includes gaps-free assemblies for the 22 autosomes plus the X chromosome (which looks the same in males and females), corrects errors, introduces 200 million bp of novel sequence containing 2,226 gene copies; 115 are predicted to code for proteins, which is important for understanding disease.
The newly completed regions include all the centromeric satellite arrays and the short arms of the five acrocentric chromosomes.
Satellite arrays, which are known to vary widely in the human population, will aid medical genomics and thus provide a better understanding of the inherited variation that underlies human physiology, evolution, and disease.
Similarly, a better understanding of acrocentric chromosomes, which are related to disorders like Down syndrome, also has its usefulness.
Final validation
“The construction of the genome involved many newly designed computer algorithms, software to process sequencing data and convert it into the entire human genome. A software (Winnowmap2) was developed and contributed by me with contributors. Winnowmap2 was instrumental in the final validation of the genome, ”Jain told TOI.
Noting that the software takes genome sequencing data as input and assigns it to the genome assembly, he added that the mapping method must take into account a large number of repetitive segments.
“The presence of repeats in a genome makes it challenging because there are many possible candidate alignments for a sequence, and the correct one is rarely obvious. Once the data was aligned correctly, the differences found between the genome and the sequencing data exposed some errors that were corrected by T2T before the final publication of the genome, “he said.
Is not the last word
T2T-CHM13 represents the genome of a person and T2T has now partnered with the Human Pangenome Reference consortium to sequence more than 300 genomes of people from around the world.
The new sequence is not the last word on the human genome according to Nature (scientific journal) “since T2T had problems solving some regions in the chromosomes, and it is estimated that around 0.3% of the genome could contain errors”.
In their paper, the T2T researchers note: A limitation of CHM13 is the lack of a Y chromosome. “To finish a T2T reference sequence for all human chromosomes, we are in the process of sequencing and assembling the Y chromosome.”

Original source