DOE Joint Genome Institute

  • COVID-19
  • About Us
  • Contact Us
  • Our Science
    • DOE Mission Areas
    • Bioenergy Research Centers
    • Science Programs
    • Science Highlights
    • Scientists
    Data yielded from RIViT-seq increased the number of sigma factor-gene pairs confirmed in Streptomyces coelicolor from 209 to 399. Here, grey arrows denote previously known regulation and red arrows are regulation identified by RIViT-seq; orange nodes mark sigma factors while gray nodes mark other genes. (Otani, H., Mouncey, N.J. Nat Commun 13, 3502 (2022). https://doi.org/10.1038/s41467-022-31191-w)
    Streamlining Regulon Identification in Bacteria
    Regulons are a group of genes that can be turned on or off by the same regulatory protein. RIViT-seq technology could speed up associating transcription factors with their target genes.

    More

    (PXFuel)
    Designer DNA: JGI Helps Users Blaze New Biosynthetic Pathways
    In a special issue of the journal Synthetic Biology, JGI scientific users share how they’ve worked with the JGI DNA Synthesis Science Program and what they’ve discovered through their collaborations.

    More

    A genetic element that generates targeted mutations, called diversity-generating retroelements (DGRs), are found in viruses, as well as bacteria and archaea. Most DGRs found in viruses appear to be in their tail fibers. These tail fibers – signified in the cartoon by the blue virus’ downward pointing ‘arms’— allow the virus to attach to one cell type (red), but not the other (purple). DGRs mutate these ‘arms,’ giving the virus opportunities to switch to different prey, like the purple cell. (Courtesy of Blair Paul)
    A Natural Mechanism Can Turbocharge Viral Evolution
    A team has discovered that diversity generating retroelements (DGRs) are not only widespread, but also surprisingly active. In viruses, DGRs appear to generate diversity quickly, allowing these viruses to target new microbial prey.

    More

  • Our Projects
    • Search JGI Projects
    • DOE Metrics/Statistics
    • Approved User Proposals
    • Legacy Projects
    Photograph of a stream of diatoms beneath Arctic sea ice.
    Polar Phytoplankton Need Zinc to Cope with the Cold
    As part of a long-term collaboration with the JGI Algal Program, researchers studying function and activity of phytoplankton genes in polar waters have found that these algae rely on dissolved zinc to photosynthesize.

    More

    This data image shows the monthly average sea surface temperature for May 2015. Between 2013 and 2016, a large mass of unusually warm ocean water--nicknamed the blob--dominated the North Pacific, indicated here by red, pink, and yellow colors signifying temperatures as much as three degrees Celsius (five degrees Fahrenheit) higher than average. Data are from the NASA Multi-scale Ultra-high Resolution Sea Surface Temperature (MUR SST) Analysis product. (Courtesy NASA Physical Oceanography Distributed Active Archive Center)
    When “The Blob” Made It Hotter Under the Water
    Researchers tracked the impact of a large-scale heatwave event in the ocean known as “The Blob” as part of an approved proposal through the Community Science Program.

    More

    A plantation of poplar trees. (David Gilbert)
    Genome Insider podcast: THE Bioenergy Tree
    The US Department of Energy’s favorite tree is poplar. In this episode, hear from ORNL scientists who have uncovered remarkable genetic secrets that bring us closer to making poplar an economical and sustainable source of energy and materials.

    More

  • Data & Tools
    • IMG
    • Data Portal
    • MycoCosm
    • PhycoCosm
    • Phytozome
    • GOLD
    HPCwire Editor's Choice Award (logo crop) for Best Use of HPC in the Life Sciences
    JGI Part of Berkeley Lab Team Awarded Best Use of HPC in Life Sciences
    The HPCwire Editors Choice Award for Best Use of HPC in Life Sciences went to the Berkeley Lab team comprised of JGI and ExaBiome Project team, supported by the DOE Exascale Computing Project for MetaHipMer, an end-to-end genome assembler that supports “an unprecedented assembly of environmental microbiomes.”

    More

    With a common set of "baseline metadata," JGI users can more easily access public data sets. (Steve Wilson)
    A User-Centered Approach to Accessing JGI Data
    Reflecting a structural shift in data access, the JGI Data Portal offers a way for users to more easily access public data sets through a common set of metadata.

    More

    Phytozome portal collage
    A More Intuitive Phytozome Interface
    Phytozome v13 now hosts upwards of 250 plant genomes and provides users with the genome browsers, gene pages, search, BLAST and BioMart data warehouse interfaces they have come to rely on, with a more intuitive interface.

    More

  • User Programs
    • Calls for Proposals
    • Special Initiatives & Programs
    • Product Offerings
    • User Support
    • Policies
    • Submit a Proposal
    screencap from Amundson and Wilkins subsurface microbiome video
    Digging into Microbial Ecosystems Deep Underground
    JGI users and microbiome researchers at Colorado State University have many questions about the microbial communities deep underground, including the role viral infection may play in other natural ecosystems.

    Read more

    Yeast strains engineered for the biochemical conversion of glucose to value-added products are limited in chemical output due to growth and viability constraints. Cell extracts provide an alternative format for chemical synthesis in the absence of cell growth by isolating the soluble components of lysed cells. By separating the production of enzymes (during growth) and the biochemical production process (in cell-free reactions), this framework enables biosynthesis of diverse chemical products at volumetric productivities greater than the source strains. (Blake Rasor)
    Boosting Small Molecule Production in Super “Soup”
    Researchers supported through the Emerging Technologies Opportunity Program describe a two-pronged approach that starts with engineered yeast cells but then moves out of the cell structure into a cell-free system.

    More

    These bright green spots are fluorescently labelled bacteria from soil collected from the surface of plant roots. For reference, the scale bar at bottom right is 10 micrometers long. (Rhona Stuart)
    A Powerful Technique to Study Microbes, Now Easier
    In JGI's Genome Insider podcast: LLNL biologist Jennifer Pett-Ridge collaborated with JGI scientists through the Emerging Technologies Opportunity Program to semi-automate experiments that measure microbial activity in soil.

    More

  • News & Publications
    • News
    • Blog
    • Podcasts
    • Webinars
    • Publications
    • Newsletter
    • Logos and Templates
    • Photos
    A view of the mangroves from which the giant bacteria were sampled in Guadeloupe. (Hugo Bret)
    Giant Bacteria Found in Guadeloupe Mangroves Challenge Traditional Concepts
    Harnessing JGI and Berkeley Lab resources, researchers characterized a giant - 5,000 times bigger than most bacteria - filamentous bacterium discovered in the Caribbean mangroves.

    More

    In their approved proposal, Frederick Colwell of Oregon State University and colleagues are interested in the microbial communities that live on Alaska’s glacially dominated Copper River Delta. They’re looking at how the microbes in these high latitude wetlands, such as the Copper River Delta wetland pond shown here, cycle carbon. (Courtesy of Rick Colwell)
    Monitoring Inter-Organism Interactions Within Ecosystems
    Many of the proposals approved through JGI's annual Community Science Program call focus on harnessing genomics to developing sustainable resources for biofuels and bioproducts.

    More

    Coloring the water, the algae Phaeocystis blooms off the side of the sampling vessel, Polarstern, in the temperate region of the North Atlantic. (Katrin Schmidt)
    Climate Change Threatens Base of Polar Oceans’ Bountiful Food Webs
    As warm-adapted microbes edge polewards, they’d oust resident tiny algae. It's a trend that threatens to destabilize the delicate marine food web and change the oceans as we know them.

    More

News & Publications
Home › Podcasts › Natural Prodcast › Natural Prodcast Episode 9: Roger Linington

May 28, 2020

Natural Prodcast Episode 9: Roger Linington

How to start connecting the large and ever-growing set of omics data that natural products research continues to produce.

Show Notes

Roger Linington, Simon Fraser University
Roger Linington, Simon Fraser University

This episode features our conversation with Roger Linington, from Simon Fraser University. Roger is a natural products chemist, and his research group works in metabolomics, drug discovery and screening, structure elucidation, and chemical biology. In recent years, his group has turned to some software development in order to build the tools that they need, and The Natural Product Atlas, a high-quality freely-available natural products structure database is one great result. In our conversation, we talk about NPAtlas, and the large international collaboration that produced it, what one can do with all that information, and we muse on how to start connecting the large and ever-growing set of omics data that natural products research continues to produce.

Transcript

DAN: Welcome back for episode 9 of Natural Prodcast. This week, Alison and I talk to Roger Linington, from Simon Fraser University in British Columbia, Canada. This is the last of the podcasts we recorded in my hotel room at SIMB, or the Society for Industrial Microbiology conference that took place in January this year in San Diego, and that city is where I first met Roger, when we were both postdocs at Scripps Institution of Oceanography. Roger was splitting his time between Bill Gerwick’s lab and doing research in Panama through support from the ICBG training program, which stands for International Cooperative Biodiversity Groups, and it’s this great program funded by the National Institutes of Health and the National Science Foundation to create international partnerships in developing countries for research into various aspects of biodiversity, of which natural products chemistry is one.

DAN: So, I’ve always known Roger as an expert in natural product structure elucidation, but more recently, he’s been leading an effort to build The Natural Product Atlas, or NPAtlas, a freely accessible natural products structure database, which is going to be just invaluable to the natural products community. It’s a big international community effort, and I want to say thank you, personally, to everyone involved in making this happen. We had a really fun conversation with Roger about all this, in terms of how to build an international community collaboration, and in thinking about the future of the natural products field as it relates to structure elucidation and genome mining, and how we might start to better connect those two things, because I think that’s ultimately what needs to happen if we’re really going to understand the chemical diversity and chemical utility that nature has to offer us. I found this conversation really inspiring, and I’m super happy to be able bring it to you. You can access NPAtlas right now at npatlas.org, and we’ll have links to it in the show notes at naturalprodcast.com.

DAN: I also just want to say thanks to everyone listening. Our download numbers keep going up, which continues to surprise and amuse me. I can’t tell, of course, whether this is just the natural products community listening to itself talk, or if we’re finding an audience outside of just us. If you’re new to natural products science, please let me know! I’m dying to hear from you. Toss an email to jgi-comms@lbl.gov, or leave us a review on Apple Podcasts or wherever you’re getting this and tell me why you’re listening.

DAN: But, now, here’s Natural Prodcast Episode 9 – our conversation with Roger Linington.

DAN:  Hey, Alison. We’re still at SIMB.

ALISON: Yay! SIMB!

DAN: We have one more interview to do at SIMB. And sitting in the crazy hotel chair now is Roger Linington. Roger is a professor at Simon Fraser University. And I have known Roger for quite a while. We met right here in San Diego, at Scripps [Institution of Oceanography].

ROGER:  Yeah, that’s right. We were postdocs together in the mid 2000s, I guess. You were working for Brad Moore. I was working for Bill Gerwick as part of the International Biodiversity Drug Discovery consortium doing natural products isolation/structure elucidation type work.

DAN: Yeah, that’s right. I want to talk about that – what were the letters? IBDD? 

ROGER:  ICBG. It’s the International Cooperative for Biodiversity Groups program, a really amazing program, funded by the National Institutes of Health.

DAN:  All right, why don’t you tell us a little bit about that.

ROGER:  Yeah, this was an amazing program. So the program is still running. And its premise was to partner US institutions interested in natural products, with institutions in developing nations to do natural products based drug discovery, predominantly in diseases of importance to each host country. It was built very much on the foundations of sort of cooperative and collaborative science. It was designed to ensure equitable benefit sharing if any discoveries were made that were of value to the host country. And it had a huge mission to perform technology transfer and infrastructure development in these host countries. So, yeah, it was an amazing program.

DAN:  Yeah, a lot of really great academics came out of that too.

ROGER:  Many academics came out of it. Many scientists were trained on both sides – on the US side and on the host nation side. So there were many of these, with different groups. Our one was partnered with institutions in Panama. And so I had this very unusual postdoc where I went to Panama, straight out of my graduate school, and was given a small lab in the National Science Center there and ran this sort of semi independent research program, as part of my postdoc. It was great fun and did lots of fun science.

DAN:  Yeah, so let’s let’s rewind in time a little bit and, before that, what got you into natural products in the first place?

ROGER:  So I was originally trained in the UK, I did my undergraduate degree at University of Leeds. And there the focus was very much on sort of synthetic organic chemistry and sort of classical divisions in the chemistry landscape. So I was very much an organic chemist and I thought that I would become a synthetic medicinal chemist.

DAN:  Yeah, a lot of us start that way.

ROGER:  Yeah, it’s a common track.

DAN:  At least, the chemistry oriented people, yeah.

ROGER:  At that time, the UK didn’t have a particularly strong emphasis in natural products, certainly not in the school that I did my undergrad degree in. But I worked – as part of the undergraduate degree, there was a chance to do a year in industry. And in many ways, that was the best part of my undergraduate training.

I got to work for Pfizer, who at that time, had a big facility in the south of England. And that facility included both European scientists and North American scientists. And I saw a very different viewpoint between those two groups and the ways in which they approached scientific problems. And so that sort of opened my eyes to the idea that you could go internationally to do the next stage of your training and that there might be more to be gained than just subject matter, by making that change. So I went to the University of British Columbia, in Canada, to do my PhD. And there I was exposed to many more themes surrounding natural products. And so that sort of really captured my attention. And so I worked for Ray Andersen there, and had a very enjoyable time learning about natural products science.

DAN:  Yeah, great. So Pfizer was doing natural product based drug discovery, then?

ROGER:  They did still have a natural products program. But again, in the European side, we didn’t hear much about that. That was very much medicinal chemistry. My job was to make a molecule a day. And one of the things I wanted from that experience was the chance to see what the industrial life was like. And I think I learned that I’m more of an academic than an industrialist…. But that’s good! I mean, I think that one of the many good things that comes out of those kinds of training experiences is the chance for you to test the waters and see what it’s really what it’s really like behind the scenes. So yeah, I’ve benefited enormously. I’m not sure how much Pfizer benefited from my employment but I got a great deal out of it.

ALISON:  I am curious to hear a little bit about what you … What you saw in the different perspectives that people trained in North America brought to the UK?

ROGER:  Yeah, it’s difficult to generalize, of course, but I felt overall that the British approach at that time was quite linear and that people would take problems. And they would take a sort of foundational starting point. And they would work linearly through all of the possibilities until they reached some conclusion about that problem, whereas the North Americans tended to be slightly more out of the box. So they would approach the problem sometimes from very unexpected orthogonal perspectives. And sometimes one of those approaches would work more, would be more successful, and then other times the other one would. But I had come from a very linear background, like, the training I had received had been quite linear. And so I liked the idea that it was valuable to read broadly and think broadly about problems and to be open minded about other ways in which you might address or solve those problems. And so I hope that I might get more exposure to that here in North America.

ALISON:  Yeah, and it does sound like your career has incorporated a lot of diversity. I mean, going to Panama, as well. And then also Canada. 

ROGER:  Yeah, there’s been geographic diversity, I’d say perhaps more interestingly, though, there’s been quite a lot of thematic diversity. So when we first started out as a research lab, we were also quite focused on finding bioactive molecules and doing medicinal chemistry, doing development of projects on a project-by-project basis. And the more time has gone on, the more we’ve become interested in sort of systems level approaches. So rather than asking one question about one class of molecules or one particular target, we’re now very much – we’re interested in how we can build tools which will let us see the whole landscape of natural products. So you know, for example, what are all the compounds in a sample set, and how are they distributed? And that’s been a sort of evolving trajectory. I think, you know, most scientific careers are like that, in that they they follow an arc. And so, you know, definitely there has been a strong evolution in terms of the way the group thinks and that the kinds of problems we’re interested in tackling.

DAN:  Yeah, that that is the main reason I wanted to get you in the gaudy hotel chair today is to sort of talk about that direction of things. Because I think there’s a lot of this going around now, which I haven’t seen in natural products a whole lot in the past. This sort of idea of, “there’s a lot of data out there: let’s put it together”. So you have – at least you’re the corresponding author on the NPAtlas effort. 

ROGER:  That’s right. 

DAN:  So I know there’s a lot of people involved in this. It’s a big community and you should name them all right now off the top of your head.

ROGER:  I don’t know that I can list them all! So there are … You’re right. So this is a big, community driven collaborative effort that we’ve put together to try and create an open source database of all known microbial natural products. So if you think about natural products science as a field, we have arguably been studying microbial natural products in an ordered way for perhaps 80 years. And in that time, we have discovered many thousands of different natural product classes and tens of thousands of members of these classes. And yet, there is no central open repository, which contains all of that information.

In other words, we have invested an enormous amount of time and resources and money in studying this field. Yet we don’t really know what we know. The information is scattered throughout literature. It’s across a huge array of different dates, journal titles, languages, fields – it’s really a bit of a mess. And many of the system wide tools that people would like to build would greatly benefit by knowing what’s already known. So you can imagine, if you have a particular sample, and you don’t have such a database, then if you want to know its identity, you really have no idea. You have to start from ground zero again in order to solve what might be an already known compound.

ROGER:  However, if you had a comprehensive list of all the things that have previously been found, the first question you should ask is, is it one of these? And there you’re going from a question which is essentially boundless to a question which is very well bounded. And so there are plenty of strategies one can use to compare existing data to a new data set. But you have to have those data in a reasonable format. So that was our main motivation. So it is an altruistic effort in that it is a database which is now shared with the whole community in a completely free and open manner. But it is also self-centered in that we want that data set in order to do some of these projects on our own. And our motivation for starting the project was that we felt hamstrung by not having that information available. And so that’s sort of what, what motivated us to do it. Of course, it’s pretty complicated because what you’re effectively aiming to do is to graze across all of the published literature for the past hundred years and identify every article which describes the discovery of a microbial natural product –

DAN:  Right. 

ROGER:  And that’s a pretty tall ask. So this has been a big community effort. We’ve had tremendous support from lots of researchers around the world. Europeans and North Americans, folks from South America – all over the place. And we would never have been able to do this without that sort of consulting effort. So, you know, it’s maintained by us, but it’s built by the community. And I think that’s something we all should be quite proud of, actually.

DAN:  Absolutely. How do you get started on building a collaborative community like that to do such a big job?

ROGER:  Yeah. So we were a little overwhelmed by the project at the beginning. It wasn’t really clear how best to begin. And so in the end, we did it in several stages. So we started out internally by making a list on the whiteboard of the top 50 journals that we thought were most likely to contain natural products articles. And then we acquired the titles and abstracts for all of the articles from all of the issues for those 50 journals. For – [it] varied -something like 20 years, and then we – I – forced the lab group to take a week off from lab chemistry research, and to go through all of these abstracts and titles and to fish out data about new molecules that have been found. And that initial work was very labor intensive. But it gave us a basal training set we could use to build more sophisticated tools, because now we had a very reasonably large set of articles which had been manually curated for whether they were or weren’t about the topic we cared about. And that gave us about 12,000 molecules. And that was enough for us to build a basic database structure and a basic web interface. And we could then show that to collaborators and say, “This is a prototype. If you will help, we could do this in a more sophisticated way.” And that was enough to capture people’s attention and convince them it was worth putting that time into.

ALISON:  What was the role of potential collaborators? Like, what did you pitch to them that they could do?

ROGER:  So we made a fairly mercenary trade with the collaborators. So, we used those initial analyses we’d done by hand, in order to build a machine learning model that would look at all of the data that’s in PubMed, and pick out articles which the system considered to be likely to be about microbial natural products. And then we used a bunch of text mining tricks to try and extract all the relevant data from each title and abstract and find structures that went with those. But we needed somebody to look at each one of those and curate the information and make sure that what the algorithm had done was actually accurate. And so we told people that if they would curate 1000 entries, that that would earn them authorship of the study when it came out. And that gave us enough coverage that we could ensure that most entries were looked at more at once, and so the quality – the data quality – was reasonably high. And yeah, it gave it gave us enough enough pairs of hands to be able to get through all the all the entries

ALISON:  And how many entries are there now?

ROGER:  There are currently just over 25,000 compounds in the database. We don’t know exactly how many there are to find. Conservatively, we estimate perhaps 40,000. The best we can do is take sections of the literature we’ve already curated, and then go through them manually to see what we missed. And from that you can get some kind of projection for what’s yet to come. So there are lots of tricks we can do to try and backfill the legacy data. Some of that’s quite difficult because styles have changed a lot over the years. So the ways in which people write have changed. And so that means that your text matching machine learning model does better with more modern articles than it does with older articles. Some of the older articles aren’t digitized. And so that makes them hard to access. The articles – the structures in older articles – are presented in quite unusual ways sometimes. And some of the really early work – it took a long time for .. I mean, structural elucidation was extremely difficult before the advent of two dimensional NMR and high resolution mass spectrometry. And so some of those stories percolate through the literature for a long time before researchers reached consensus on structural identity. And so those stories can be very laborious to curate, because you end up having to sort of walk through a lot of historical literature until we find the place where the structure was first proven. 

DAN:  Yeah. You wonder how many of those old structures are just flat out wrong, too, right? 

ROGER:  Yeah, so we do have an ongoing effort to try and capture reassignments, and many of those come from efforts with total synthesis or re-isolation. So that is something that we are trying to put more energy into now. Ultimately, we would like to capture all the instances of total synthesis because I think not only will that draw in the synthetic organic community, but also it means that if you have a compound or a class that you’re really interested in, you have immediate information about whether or not any of the members of that class have been subject to synthesis and whether there are routes which can be used to produce analogues or to do development. And I think those kinds of things can really help people to prioritize projects and do project selection. So, yes, there’s lots and lots we can add.

DAN:  So in natural products, having a large data set of structures is obviously very valuable. From my perspective, JGI has a large database of biosynthetic gene clusters and biosynthetic gene cluster fragments, and all kinds of other crazy things. What do you think the community needs to do to start connecting that data?

ROGER:  I think this is a really fascinating question. I think it’s one of the unsolved questions in natural products. 

DAN:  Absolutely. 

ROGER:  If I want to be provocative, I have noticed that many talks about biosynthesis start by saying “Organisms contain many gene clusters, but very few of these have known molecules. Therefore biosynthetic gene clusters represent this untapped and cryptic world of new chemistry, which we can discover.” But as devil’s advocate, I could start most natural products discovery talks by saying, “We have found all these compounds and there are no gene clusters.” 

DAN:  That’s right.

ROGER:  So, either biosynthesis researchers are not very good at finding compounds and chemists are not very good at finding gene clusters. Or, there is a high degree of overlap between those two data sets, which we have not yet recognized. So.

DAN:  It’s one or the other, right?

ROGER:  It’s one or the other, or possibly some hybrid of the two. Likely somewhere in between.

DAN:  Halfway in between. Yeah.

ROGER:  But at the moment, you know, most gene clusters have no product and most products have no gene cluster. So either both of those pools have to increase in size dramatically, or there has to be a significant collapsing of that overlap. I suspect the latter is more likely to be true. People have studied chemistry-first projects for 80 years, from a huge array of different perspectives. They’ve studied it from the perspective of biofouling, from chemical ecology, from drug discovery in every possible disease target area, using an enormous array of different isolation strategies and different source organisms and so on and so on. So the idea that, you know, greater than 95% of the chemistry remains unfound seems unlikely to me. That’s I mean, that’s my controversial personal perspective. 

DAN:  Okay. Yeah. 

ROGER:  So I think what’s probably more likely to happen is that we will improve our ability to make relationships between structure and gene cluster and that we will see now, fairly soon, a fairly rapid collapse in the unknown fraction on either side there between both gene clusters and products. How we do that is still quite tricky. You can’t easily do it by forward prediction from the gene cluster. Not to the discrete compound, in many cases. 

DAN:  In many cases. Sometimes you can.

ROGER:  Sometimes you can.

DAN:  But also it’s a question of labor too, because you know, I spend a good chunk of – have spent a good chunk of my life staring at gene clusters and trying to understand what the chemical products are, and sometimes you get pretty close. But there isn’t enough time, probably in the rest of my lifetime anyway, to anywhere tackle that. So you’ve got to do it computationally. Right? So where do we start with that?

ROGER:  Well, I think one of the areas where we need to continue to invest effort is this question of chemical constitution. So that question would be much easier to answer if we had a ready mechanism for describing the chemical landscape of any sample set. So if you have 100 organisms, and you ferment them under one set of conditions each and you extract them all, you end up with some pool of natural products. And if we could describe even the number of unique molecules that are in that set, and how those unique molecules are distributed within the set, then it would be a lot easier to start to make connections, or at least hypotheses, about the relationship between a compound or family of compounds and corresponding gene cluster or family of gene clusters, which have sort of co-occurrence. 

DAN:  Right, right, right.

ROGER:  That are found that way. And so relatively simple changes like that, I think, will make a really big difference in the way that we’re able to approach these kinds of system-wide problems. And it’s not just constitutional analysis as a problem, we need corresponding improvements in our ability to recognize whether or not gene clusters which don’t look that similar actually produce similar products. So I think we need advancement in sort of both areas. And I suspect that no single change will resolve the problem. But that stepwise changes on both sides of the fence will mean that at some point, the whole thing suddenly collapses and becomes very much more straightforward.

Obviously, I’m much more familiar with the, sort of, chemical analytics and identification side of that problem. And I see all sorts of exciting areas and new developments and new developments in mass spectrometry hardware, software, new developments in NMR methods, ultra fast methods for acquisition. Things like this new SMART technology from UCSD, which is doing pattern recognition in comparing spectras to one another. And community-based resources which provide large, high quality, well curated datasets upon which to base those sorts of tools. So I think you need all three of those really. And from there, yeah, eventually we’ll know the answer.

DAN:  How long will that take? Obviously, it depends on how many people are working on it and the right people in the right situations. But what do you think?

ROGER:  I’d be surprised if it was still a problem in 10 years. I think that the field – I would say that the field of natural products is undergoing a really exciting Renaissance, as tools and opportunities in other research areas come to bear on the field of natural products. I feel that natural products science, perhaps, lagged in innovation during the tail end of the, sort of, structure-first individual compound discovery phase. And that there was a bit of a plateau there where many groups knew how to do “isolate and elucidate”-type projects. And so, now, we’re seeing this sort of new generation of tools come along, which provide a much broader view on natural product science. And I think that that’s opening up opportunities across the board. And because of that, and because of the rate at which that’s changing, I think that it’s a really exciting time to be a natural product scientist. It’s, it’s like the Wild West, it’s really great.

DAN:  What do you think about changes in technology? What changes in technology need to happen in order to really start unifying all this data? Are there changes? Or is this something we can, you know, launch into tomorrow and just roll up our sleeves and you force your grad students to work on it for another week? Or do we need some kind of specific advance to to start working there?

ROGER:  I think there are certain areas where we need centralized efforts at unification. So some obvious scenarios, and some of these have already been or are already under development. So a really great case in point is the MiBIG database which describes all of the biosynthetic gene clusters that are in the published literature. So that’s run by Marnix Medema and his team at Waginingen. And they have been extremely successful there because they brought together a consortium of researchers in the field. And they were able to define standards for the minimum information that should be reported. And to develop a schema by which you can report that in a reasonably standard way. And then the MiBIG repository is able to accept that standardized data. And they put a lot of energy behind the scenes into making sure they keep on top of the literature and that they capture gene clusters as they’re published and then curate and enter those which are not deposited.

DAN:  Invaluable resource. 

ROGER:  It’s an amazing resource, and it’s a really touchstone example of how consortia like that can build tools which are of high quality and very stable. Some data types are easier to manage than others. NMR is actually – both NMR and mass spectrometry are a little bit more difficult to handle. The number of variables is quite high. Mass spectrometry, in particular, the number of different ways in which you can analyze the same sample is frighteningly large. And so there are two problems there. One is that even though your molecule may be deposited in a repository, if the acquisition conditions are very different than the ones under which you analyze your sample, then the comparison may not be that relevant, but also the amount of information you need to capture and the ways in which you choose to standardize that are… that’s quite tricky. And again, industry has put a lot of effort into trying to sort that out. But it means that it remains quite difficult to relate databases and data sources of different types to one another, which is ultimately what we need to do as a field.

What we’d really like is a central repository, which has all the structures and their corresponding gene clusters, and their MS and MS-MS data under a huge range of different acquisition conditions and their raw NMR data, and their bioactivity data and so on and so on. That data set does not exist and will be hard to do. I think we can do this, some of that in pieces. And some of those relationships are already being built. But it is going to be the challenge of our age, I think.

DAN:  Yeah, yeah. Definitely an aspirational goal, and something that needs to happen. But…

ROGER:  Yeah, that will be quite difficult. I would say the other place where we’re going to see large and wholesale change is in data analysis strategies. So we’ve seen in the last 15 years, enormous advancements in hardware, both on the NMR side and the mass spec side. So the single biggest advancement in NMR was the development of cryo probe technology and the improvement in sensitivity of those instruments. And that’s been followed by developments in pulse sequences and improvements in the ways in which you acquire data. Mass Spectrometry has also had an enormous, sort of, sea change. In terms of the hardware. So when I was a grad student, the mass spectrometer was in the basement behind – what do you call those things – like, a counter. So when I was a grad student, the mass spectrometer was behind a counter in the basement, and it was run by a dedicated team. It was the size of a hotel room, and it was always broken.

DAN:  I remember Yeah. 

ROGER:  And now we have an ion mobility QTOF instrument in the lab, which our undergrads run.  And the difference there in reliability, accuracy, sensitivity, all those things have changed beyond all recognition. What has not yet kept pace are our data analysis strategies. I think everyone who does, particularly, mass spectrometry – if you do a lot of mass spectrometry, everybody will bemoan the challenges associated with data analysis, and particularly, with jumping between instruments or jumping between methods. There is an enormous amount to do there. I think there is – I still – I think we still throw away a huge amount of valuable information in mass spectrometric datasets because of our inability to extract all the value from them. And I think that’s where we’re going to see big change.

ALISON:  Yeah, just to follow up on that, like how… What other data analytic approaches could people use? Like, how could people innovate in that area?

ROGER:  Well, I think that again, when I was a graduate student, the mass spectrometer was the last thing you used. So you would purify a metabolite. You would solve its structure by NMR. And when you had it clean, and you were pretty sure you knew what it was, you’d get a mass back in order to verify the formula and satisfy the requirements for submission to the journal. Now, because of the ready availability of data from mass spectrometers, it’s been used as a sort of front end tool, so it’s being used as the first stage in discovery.

There are so many different ways that you can look at even a simple mass spectrometry experiment and derive information from it. So, the accurate mass, the isotope distribution, the collisional cross sectional area, if you’re doing ion mobility, the retention time if you standardize those, the MS-MS or MSn fragmentation patterns, in principle, the isotope patterns of those fragments depending on how you acquire the data. The data goes down and down in layers and layers and layers. And this comes back to this question about whether you know about the existing canon of structures or not.

So, if you have that huge array of different pieces of information from a single mass spec experiment, and you have a fixed pool of things which are known, then the question becomes: “Is this metabolite any of these previously known metabolites?” and there is enough information there that I think we ought to be able to do quite a good job of answering that. It requires development in lots of areas, we would like to do a better job of understanding gas phase reaction mechanisms.

We would like to do a better job of predicting MS-MS spectra, so on and so on – interpreting fragments, all this sort of stuff. So I think there is a huge opportunity there. But it requires developments on the software, on the technical side, and also developments in basic science. I think we don’t know enough about gas phase chemistry yet. We don’t know enough about fragmentation mechanisms, and prediction of those, the way those things work. And so I think there are opportunities across the spectrum from those who are doing very fundamental research, at the sort of basic building blocks of how mass spectrometers operate, and how analytes are produced, right through to applications and how you use that information in the most efficient and the most successful way. So, yeah, there’s room for everyone.

ALISON:  So if I can just see if I understood that. Improvement in data analysis would look like just having a better understanding of how these, I guess, the mapping of the masses and the ions to structure in a fundamental way. Like that we don’t understand so much about the chemistry – about the way that these molecules can break down – to interpret the spectra and that innovations in that area would be really helpful.

ROGER:  Yeah, I think that’s exactly right. I think understanding what happens to molecules during the process of analysis is an area where … we’re still not – we don’t have a perfect understanding. And then there are practical issues as well. We are still not particularly good at differentiating real analytes from noise, and thinking about best practices for experimental design, and which is the best instrument for a particular experiment? Is it worth doing replicates? What do you do about variations in concentration? How do you design the experiment so that you have the clearest possible picture of the chemistry that you’re trying to analyze? I think that, at the moment, there are still a very wide array of different approaches. And I suspect that we will start to see a sort of collapsing of those approaches as time goes on, as some strategies win out as the most accurate and the most informative. And it will be interesting to see that development, I don’t know where that’s going to go, but it’ll be interesting to see what happens.

DAN:  So I guess the last thing I wanted to just ask is: Now that, you know, you’ve got all this infrastructure built, surely, there was a reason that you did this in the first place. So what are you going to use all of this to actually do in your research? [long pause] Or are you an infrastructure guy now? 

ROGER:  No, no. 

DAN:  I’ve never known you to be!

ROGER:  We’ve now made a commitment to supporting the infrastructure, which is also a bit daunting! But no, we absolutely built it in order to to pursue fundamental questions in natural product science. I would love to know, you know, what are all the natural products that we can possibly find? Does the building of this data set allow you to say anything about what the scope of natural products diversity is likely to be in the environment? Can we use this broad-scale view of the chemistry in order to start to make predictions about environmental roles of these compounds? Can we perform studies such as looking at the distributions of different classes of compounds? So for example, if you see Class A, do you always see Class B and Class C? And if so, do they pay play some complimentary role in the environment? So you can imagine using the data in a very broad sense like that. I could envisage projects where you say, “Okay, I see, this group of organisms has this set of molecules. And we know that to be successful in the environment, organisms need molecules that do these kinds of things. And I know that some of the molecules I’ve identified do some of these jobs. Therefore, some of the molecules which don’t yet have jobs must most likely play one of these other roles.” So if we’re still missing a siderophore, we should look carefully at these as being candidates siderophores. Or for cell-cell signaling or for, you know, other other roles here. So I could imagine that once the set is complete, you can start asking much broader questions about how chemistry is diversified in nature, and when it is not diversified and why? Where are the holes? Which bits of chemical space are you never going to find in nature? And are any of those bits of chemical space relevant to human health? You can imagine that compounds found in nature are produced by organisms for their own benefit. But that’s not necessarily relevant to many of the diseases that we seek to treat. So, it may be that molecules which are natural product-like, which would actually be very beneficial in hard to hit target areas, are never going to be found in a classical natural products program because they are selected against because of their lack of function. So knowing about the full sweep of chemistry which is out there allows you to identify obvious holes and then use semi-synthesis or synthetic methods in order to target and go after these sort of new areas of chemistry in a very directed way, which is still inspired by nature, but maybe builds on what nature has to offer. So, lots to do.

DAN:  Roger, those are fantastic questions, and I’m looking forward to you generating the answers. 

ROGER:  Well, let’s see. I think it’s gonna be a worldwide effort. But it’ll be fun to see how people use the data in different ways.

DAN:  All right. Thanks so much for talking to us. 

ROGER:  It’s my pleasure. Thanks very much.

DAN: I’m Dan Udwary, and you’ve been listening to Natural Prodcast, a podcast produced by the US Department of Energy Joint Genome Institute, a DOE Office of Science User Facility located at Lawrence Berkeley National Lab. You can find links to transcripts, more information on this episode, and our other episodes at naturalprodcast.com

Special thanks, as always, to my co-host, Alison Takemura. <woohoo> If you like Alison, and want to hear more science from her, check out her podcast, Genome Insider. She talks to lots of great scientists outside of secondary metabolism, and if you like what we’re doing here, you’ll probably enjoy Genome Insider too. So, check it out.

My intro and outro music are by Jahzzar.

Please help spread the word by leaving a review of Natural Prodcast on Apple podcasts, Google, Spotify, or wherever you got the podcast. If you have a question, or want to give us feedback, tweet us @JGI, or to me @danudwary. If you want to record and send us a question that we might play on air, email us at jgi-comms@lbl.gov. And because we’re a User Facility, if you’re interested in partnering with us, we want to hear from you! We have projects in genome sequencing, DNA synthesis, transcriptomics, metabolomics, and natural products in plants, fungi, and microorganisms. If you want to collaborate, let us know! Find out more at jgi.doe.gov/user-programs.

Thanks, and see you next time! 

Share this:

  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Pinterest (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to print (Opens in new window)
Genome Insider is available on Apple Podcasts, Google Play, Spotify, iHeart Radio, and TuneIn Radio - Subscribe today! Natural Prodcast is available on Apple Podcasts, Google Play, and Spotify - Subscribe today!

Filed Under: Natural Prodcast, Podcasts Tagged With: natural prodcast, podcasts

More topics:

  • COVID-19 Status
  • News
  • Science Highlights
  • Blog
  • Webinars
  • CSP Plans
  • Featured Profiles

Related Content:

JGIota: Sequencing Shiitakes with David Hibbett

A Genome Insider Logo Image

Natural Prodcast Episode 19 – Bill Fenical

Natural Prodcast Episode 18 – A CSP Primer

Natural Prodcast podcast logo

Genome Insider S3 Episode 5: Work With the JGI! Tips for a Winning CSP Proposal

A Genome Insider Logo Image

JGIota: Looking back at how our cow rumen study drives higher learning

A Genome Insider Logo Image

Genome Insider S3 Episode 4: From Sample Shipments to Sequences – A Tour of the JGI’s Sequencing Pipeline

A Genome Insider Logo Image
  • Careers
  • Contact Us
  • Events
  • User Meeting
  • MGM Workshops
  • Internal
  • Disclaimer
  • Credits
  • Policies
  • Emergency Info
  • Accessibility / Section 508 Statement
  • Flickr
  • LinkedIn
  • RSS
  • Twitter
  • YouTube
Lawrence Berkeley National Lab Biosciences Area
A project of the US Department of Energy, Office of Science

JGI is a DOE Office of Science User Facility managed by Lawrence Berkeley National Laboratory

© 1997-2023 The Regents of the University of California