Researchers identify genes responsible for sugar production in sugarcane

Brazilian geneticists strategy identifies genes responsible for sugar production in sugarcane

After 18 years of publication of the human genome, the first complete DNA to be sequenced (in 2000), the genomes of 250 animals and 265 plants (excluding fungi and bacteria) have been mapped to date. From the grass family, to which sugarcane belongs, 33 species were sequenced. Among them are crops fundamental to Mankind such as rice, barley, corn, sorghum and wheat. What about sugarcane? Where is this genome? No one has been able to sequence to this day. When this happens, imagine what economic dividends will come from plant breeding?

What is missing to sequence the DNA of sugarcane? Dozens of laboratories around the world have been trying for years to sequence this genome, but the task is too intricate. There are two strategies for sequencing. Top-down and bottom-up. Top-down (or Big Science) seeks to sequencing the sugarcane genome from the massive use of DNA sequencing machines, from which indecipherable sequences with trillions of bases emerge. These, in turn, are crunched and mined by the most advanced techniques of bioinformatics, consuming tens of thousands of hours of computational power throughout the process. It is an expensive strategy, it demands a lot of labor and, so far, has not yet yielded the expected results.

But there is another, more economical method for sequencing the sugarcane genome. Such a strategy does not seek to sequence the entire genome, but to identify those specific genes whose function is linked to the aspects of plant development that one wants to select. "I do not need to know the complete genome of sugarcane to try to identify the genes responsible for sugar production," says plant geneticist Anete Pereira de Souza, head of the Laboratory of Molecular Genetic Analysis at the Institute of Biology of the University State of Campinas (Unicamp).

This was achieved in early 2018 by Souza and her group, formed by Melina Mancini, Danilo Augusto Sforça and Claudio Cardoso-Silva. They are the first group to identify - in the middle of the huge sugarcane genomic haystack - the candidate genes for sugar production. The finding, if confirmed, could in theory lead to future huge jumps in the production of ethanol and sugar per hectare of planted sugarcane. Or even increase the calorific value of sugarcane bagasse produced by the sugar-alcohol industry, which is burned in thermoelectric plants. Bagasse with higher calorific value means higher energy production with less amount of bagasse.

Sugarcane (Saccharum hybridum) is a hybrid species cultivated around the world

A (very) complicated genome

The fact that so far no laboratory has been able to map the genome of sugarcane is due to the extreme complexity of its DNA. The genome of plants is admittedly complex - and the genome of sugarcane is one of the most complex that exists. The genome of plants is larger and more complex than the genome of mammals, birds or reptiles and amphibians (fish are an exception). Human DNA, for example, consists of 3.2 billion base pairs scattered over 23 pairs of chromosomes, out of a total of 46 chromosomes. The wheat genome, on the other hand, has 17 billion bases divided into 21 pairs of chromosomes. The sugarcane genome is composed of 10 billion base pairs, distributed between 100 and 130 chromosomes. Wait a minute? After all, are they 100 or are they 130 chromosomes? It depends. It can be one thing or another, and each one in the interval between them.

Sugarcane cultivated today is a hybrid species (Saccharum hybridum) created from successive crosses of two other species of the genus Saccharum: Saccharum officinarum and Saccharum spontaneum. The first, S. officinarum, is sugarcane that was originally domesticated on the island of New Guinea some 8,000 years ago and began to be cultivated in India 3,000 years ago. This was the species cultivated in the sugarcane mills of colonial Brazil and the rest of the world. Over three centuries of inbreeding have led to the loss of genetic diversity. This meant that by the mid-19th century, the production of sugarcane crops around the world began to decline. At the same time, the loss of vigor of the plant opened the doors to the attack of diseases and pests against which the plant had lost resistance.

To recover the production, it was urgent to make the sugercane recover defenses against the biotic agents that attacked it. The plant needed to regain its lost vigor. The solution was to cross S. officinarum with another plant of the same genus, a grass called S. spontaneum.

"S. spontaneum is not sugarcane, it is a grass. It has no sugar, but it is resistant to diseases and pests, and rich in fiber," explains Souza. "It was a good marriage. The resulting hybrid, S. hybridum, produces a lot of sugar and is resistant to grass diseases and pests."

To reach S. hybridum, S. officinarum (sugarcane) was first crossed with the grass S. spontaneum grass. From this, a first hybrid was obtained, which was resistant to diseases and pests, but produced little sugar. In order to raise the sugar content in the plant, several successive crosses of the hybrids with sugarcane (S. officinarum) were made. "It was the man who created the hybrid sugarcane that we plant today: it is strong and has a lot of sugar," says Souza.

If, on one hand, successive crossings resulted in a quality hybrid, on the other hand they created a plant whose genome is a genetic smorgasbord. The genomic complexity of Saccharum lies in its tendency of sugarcane to polyploidy, that is, the multiplication of chromosomes. Most organisms, including humans, have a diploid genome, carrying two complete sets of chromosomes, one inherited from the father and another from the mother. However, the genus Saccharum is polyploid, that is, it carries more than two copies of each chromosome. In the specific case of S. officinarum, the species is octaploid. During the crossing, 10 plant chromosomes are multiplied by eight. The plant receives from each parent not one but four copies of each chromosome, totalizing a genome with 80 chromosomes.

To complicate further, in S. hybridum the number of chromosomes is not only higher than 80 (due to the insertion of DNA from the grass S. spontaneum). The total copies of each chromosome received from both parents is not fixed, but ranges from eight to 14 copies. Hence the number of chromosomes is not the same for the various varieties of hybrid sugarcane. There are individuals who may have 100 chromosomes, while others have 112, 120, or even 130 chromosomes. But they all belong to the same species.

Saccharum spontaneum is a grass species from the same genus of sugarcane

Genomic shuffling

To get an idea of the complexity of the sugarcane DNA - and the size of the problem that geneticists face - imagine two hypothetical decks of cards. One is called SUGARCANE (S. officinarum) and the other GRASS (S. spontaneum). They have about 1 billion cards each. There are four types of cards: A, G, C, and T. Cards A and G are always associated, no matter what their position in the deck. The same happens with the letters C and T.

Now take the SUGARCANE and GRASS decks shuffle once. The result is a HYBRID deck, with 2 billion cards. Then take the HYBRID deck and shuffle it again with a SUGARCANE deck. The result will be a new HYBRID with 3 billion cards, right? Repeat the operation seven times, always mixing the HYBRID deck resulting from the previous operation with another SUGARCANE deck.

Done! The final result is a massive 10 billion HYBRID deck, in which the A + G and C + T card combinations of the original decks SUGARCANE and GRASS are mixed in a ratio of 20% to 80%. That is, 20% of the cards came from the GRASS deck and 80% came from the various SUGARCANE decks used consecutively.

This monumental mess corresponds, in a very simplified way, to the genome of sugarcane planted today. In real life, however, the DNA of sugarcane is even more complex. This is because, during the successive crosses between S. hybridum and several generations of S. officinarum, in addition to the accumulation of repeated S. officinarum base pairs, random doublings of genes occur.

The challenge for researchers trying to sequence and map the sugarcane genome is to decipher, amidst billions of repetitions and duplications, which are the base sequences (the cards in our hypothetical deck) A, T, C and G which were originally in S. spontaneum and S. officinarum, as well as identify their positions in the gigantic hybrid genome. It's a genomic nightmare!

Sorghum (Sorghum bicolor) is close relative of the sugarcane

Solving the puzzle

"To make the breeding of sugarcane, we need to understand the genetics of the plant. This understanding necessarily involves the sequencing of its genome," says Souza. "The complexity of the sugarcane DNA has prevented its sequencing. We can get the gene sequences, there are machines that just do that. The problem is when it comes to put together the pieces of the puzzle. We do not know which is the specific chromosome of each piece belongs and neither do we know its correct place inside the chromosome. We can not even identify specific genes because there are 12 types for each of them. "

But for everything there is always a solution. Instead of trying to sequence millions of bases at the same time, Anete Pereira de Souza and her students chose to try to identify genes or sets of genes whose specific functions are of interest to breeding. How to identify them if there are multiple types of each gene? The answer is, by similarity, comparing sugarcane genes with similar genes - whose functions are known - in the genome already sequenced from sorghum (Sorghum bicolor).

"The whole world wants to sequence sugarcane. We have developed a strategy for sequencing specific regions of the sugarcane genome by similarities with the genome of sorghum," Souza reveals. The study has just been published in the journal Frontiers in Plant Science.

Among grasses, sorghum is the closest relative of Saccharum. Both of them share a common ancestor that lived about 3.5 million years ago. Unlike the octoplasmic genomic confusion of S. officinarum, sorghum is diploid, that is, it has only two copies of each of its 10 chromosomes, totaling 730 million bases, a fraction of the 10 billion pairs of sugarcane.

The region in the sorghum genome that is responsible for the accumulation of sugar is already known. "Given that both species are so closely related, we assume that genes for sugar production in sweet sorghum should retain the same function in sugarcane," says Souza. "Thus, our work aimed to identify this same genetic sequence within the genome of sugarcane."

The starting point for the project was given in 2011, when Souza's postdoctoral fellow, Danilo Augusto Sforça, left for the Institut National de la Recherche Agronomique at Toulouse, France, to set up BAC libraries for the sugarcane genome with the help and collaboration of Dr. Helenè Bérges.

Geneticists Danilo Sforça and Anete Pereira de Souza, in her laboratory at Unicamp

BAC libraries, or libraries of bacterial artificial chromosomes, served as a starting point for the sequencing of various organisms with large genomes, including human DNA and wheat DNA. BAC libraries were developed to allow the cloning and storage of pieces of DNA within bacteria such as Escherichia coli. Such bacteria have the ability to absorb stretches of DNA from other organisms into the bacterial cell in the form of circular DNA (plasmids). Once the DNA fragment to be preserved has been inserted into the bacteria, they are frozen and stored. When the time comes to manipulate the DNA inserted in the bacteria in the form of plasmids, it is enough for the researcher to thaw the bacteria and let it multiply. You will soon have trillions of clones, each containing a copy of the stretch of alien DNA you wish to study.

From the DNA of two different sugarcane varieties, Sforça constructed two BAC libraries. Each library has about 200,000 pieces of the sugarcane genome. These were inserted into bacteria and stored in 576 plates, each containing 384 pieces of sugarcane DNA, frozen at 80 °C below zero.

"It's as if you take a piece of a sugarcane chromosome and place it inside a bacterium. Bacteria are very easy to manipulate, extract, multiply, and sequence," explains Sforça. "If we want to access a gene from a sugarcane chromosome, we can."

According to Souza, "in a bank of clones I can look for the specific bits of a particular genome that interest me. Possibilities are endless. Knowing the sorghum gene for sugar accumulation, you just have to find a similar sequence in the sugarcane BAC library."

When Sforça finished his work in France, he brought his libraries to Campinas. Then began the work of postdoctoral fellow Melina Mancini. It was she who decided to search the BAC libraries looking for similar sequences between sorghum and sugarcane. "I chose to work with a feature linked to the production and accumulation of sugar. Once we identified the region responsible for sugar accumulation in the genome of sorghum, it was easy to locate similar sequences in sugarcane in the BAC libraries."

One of the 576 plaques (each with 384 pieces of DNA) that preserve the cane genome in the BAC library

Mancini developed some genetic markers to search for sorghum-like sequences in BAC libraries. The specific region of sorghum DNA that contains the genes for sugar accumulation is 500,000 base pairs. The geneticist was able to identify genes with similar sequence in sugarcane. The stretches of sugarcane DNA selected by the molecular markers in the BAC libraries were sequenced and organized in the correct order found in the sugarcane genome. There was thus obtained a continuous sequence of 1.2 million bases corresponding in sugarcane to the sorghum genome region containing the 500,000 bases.

"We identified and mounted 68 continuous sequences (totaling 1.2 million bases) that were supposed to be linked to sugar production," says Mancini. The next step was to arrange the 68 sequences corresponding to 1.2 million bases in a continuous sequence. Mancini was able to mount a base sequence with nine gaps. A fine tuning work meant that six gaps were filled. Three gaps remained, preventing the 1.2 million bases from forming a single fragment.

Using the sorghum genome as a reference, 253 genes encoding sugar accumulation in the sugarcane genome were identified. Of these, 74 sorghum genes were found in the 500,000 base pairs studied, while only 59 of them were found in the 1.2 million base pairs in sugarcane.

The next steps of the research include trying to close the three gaps for a complete sequence. According to Mancini, "we will try to find out which parts of the genes of this genome of the hybrid sugarcane (S. hybridum) came from the genomes of S. officinarum and which came from S. spontaneum."

Another important aspect that remains to be confirmed is whether the function of the 253 genes found in sugarcane is, in fact, to produce and accumulate sugar. It is not enough to know that they perform this function in sorghum, rice and corn. It makes perfect sense to think that this is the case. But science does not work that way. Science requires proof. Therefore, it is absolutely necessary to demonstrate that the action of those genes of sugarcane is to make sugar.

National Interest

"My interest in this research is not to sequence the entire genome of the sugarcane," says Souza. "What interests me is to sequence important regions that encode genes for sugar production, or that can confer pest tolerance, resistance to disease, or allow planting sugarcane in various soil types, with more or less sunshine, water, fertilizer, etc. My research is aimed at helping the work of sugarcane breeders. "

Brazil is the biggest interested. The numbers of the sugar and alcohol sector in the country are very impressive. The country is the world's largest producer of sugarcane. In the 2016/2017 harvest, 657.2 million tons were harvested, almost double the production of the second producer, India. In the same harvest, 11 billion liters of ethanol were produced, enough to supply 25 million flex-fuel vehicles, or 60% of the Brazilian fleet of vehicles.

The consequences of the discovery of genes for sugar accumulation made by Souza and her team may, in the not so distant future, increase these numbers. Exponentially.

The sugarcane sequencing project led by Anete Pereira de Souza has been supported by FAPESP, CNPq and CAPES.

Dr. Anete Pereira de Souza
Laboratório de Análise Genética Molecular 
Instituto de Biologia (IB)
Universidade Estadual de Campinas (UNICAMP)
Phone: (55 19) 3521-1132
Mobile: (55 19) 99111-6547

Mancini MC, Cardoso-Silva CB, Sforça DA and Pereira de Souza A (2018) Targeted Sequencing by Gene Synteny,” a New Strategy for Polyploid Species: Sequencing and Physical Structure of a Complex Sugarcane Region. Front. Plant Sci. 9:397. 
doi: 10.3389/fpls.2018.00397 

The images used in this report are licensed under the Creative Commons Attribution License

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).