Srinivas Aluru, the recipient of the 2025 IEEE Computer Society (CS) Charles Babbage Award, is a luminary in parallel computational biology, having championed its formation as a discipline and led much of the early work that established it as a field of study. In honor of his achievements, Aluru received the award “for pioneering contributions to the field of parallel computational biology.” Yet, despite these accolades and his extensive influence on the field, when he reflects on his career, Aluru speaks of his achievements with humility, citing his insatiable curiosity and passion for collaboration as key drivers of his success.
“When I started working in the field, it was largely an intellectual curiosity, and I was hoping it would become important someday as the data sets grew, but what I didn’t realize is how quickly that was about to happen,” Aluru shared. “None of this work would be possible without collaborators in life sciences who are contributing leadership and analysis from that domain. I look at this award as a recognition for all of our collective work; my work is inseparably intertwined with contributions of many others.”
The IEEE Computer Society recently sat down with Aluru to celebrate his receipt of the 2025 Charles Babbage Award and find out more about the inspiration behind his work. What follows offers insights into that conversation, shedding light on Aluru as a collaborator, educator, and industry pioneer.
When I was a young assistant professor at New Mexico State University [Las Cruces, N.M., U.S.A.] in early 1997, I heard about a conference being organized, the first conference in computational molecular biology. It was organized in Santa Fe, New Mexico, which was just a few hours’ drive from where I was. I did not know anything about this field, but I was mostly interested in parallel algorithms research, and we are always looking for large and complex problems to solve.
So, because the conference was close by, I thought, “Why don’t I take my car and just drive there and see what I can learn?” I had not even taken biology in high school, so I really had no understanding of the field at all. But I decided to check out of curiosity. What I heard there really transformed me, and led me to say, “Hey, why don’t I work in this field?”
All of these wonderful people were talking about sequential algorithms, but I want to work on parallel algorithms. So, I started digging into literature to see what was out there, and most of the work was really at the interface of chemistry and biology and modeling proteins at that time.
Initially, I started reading sequential algorithms and trying to parallelize them. But very soon I migrated to large and complex problems that could not be solved serially, and there are fascinating problems in biology that can be solved using that type of expertise. It was really a field with a lot of unexplored and very important problems that I knew would have a lot of staying power.
The advice I got from just about everybody was, “Don’t make the switch before earning tenure. You were previously working in spatial data structures and computational physics. And now you’re going to be working on something completely different, and it takes time to establish oneself.”
But my pull for this field was so great that I ignored all of their advice. I said, “I’m going to do what I want to do, and I’m happy to take the consequences.”
The reason I moved to Iowa State was that they were, at that time, just developing a Bioinformatics and Computational Biology Ph.D. Program. They were second in the nation to do so and were looking for faculty in those areas across the campus. It’s interesting because I graduated from Iowa State in 1994, and my research was in parallel algorithms and computational physics. Five years later I returned to pursue research in the completely different area of computational biology.
For me, it is important to solve technical problems which require a certain degree of depth in terms of the computer science contributions. I try to choose problems which require deep intellectual progress in computer science to solve them. But I’m very careful in choosing the problems, because ultimately the quality of the problems, the importance of the problems that we choose, dictates how far you can go with them.
It was a Ph.D. student of mine. I was at Iowa State University at that time, and he walked into my office, and he said he liked to work with me, but he also liked another professor very much, and instead of forcing him to choose one or the other, he asked if the two of us could get together and come up with a joint project, and we could jointly mentor him.
The other professor is a well-known plant biologist and a maize geneticist. I took an appointment with him and asked if there was the potential for a collaborative project. He shared that there was work underway on sequencing of the maize genome and he hadn’t yet seen assembly of various fragments into longer pieces of the genome, which he’d need for his work.
I was a bit disappointed because this was a few years after the human genome was sequenced, and the human genome is 3 billion base pairs assembled from 30 million fragments, and he was talking about assembling 3 million fragments, about a factor of 10 smaller. I thought maybe it was just a question of running the software that’s already built for the human genome, but then, while I was walking back to my office, I began to question if it was so easy why no one else had done it.
I started looking more into it, and I found a very interesting difference between the human genome and the maize genome. The 30 million human fragments that had needed assembly were sampled all over the genome, whereas with the maize genome, they only targeted the gene-rich portions. On top of that, the maize genome is highly repetitive, which will trip up the assembly, because the DNA fragments that come from different parts of the genome look alike, and it’s very hard to figure that out. Mathematically what happens is that the data constitutes non-uniform sampling, but previous software addressed uniform sampling.
And as it turns out, just before this, I had worked on a different problem which had the same underlying abstraction. It solved the same non-uniform sampling problem in a very efficient way. So, I adapted that technique, worked with my graduate students very quickly to tailor the software, and we did the assembly in a month.
My collaborator on this project did a preliminary testing and was very satisfied with the results, and then we made it publicly available. The response was amazing. We had maize geneticists write to us, saying that the success in their research has gone up tremendously (one stating from below 15% to 70%) with access to this assembly.
Shortly after the success of these pilot projects, the National Science Foundation, Department of Energy, and U.S. Department of Agriculture created a USD$30 million effort to actually sequence the entire maize genome. At that point, we became sought after collaborators.
We ended up winning the project and began efforts to develop assemblies for the entire genome when another fortuitous thing happened. At that time, IBM was developing its Blue Gene/L supercomputer. And that supercomputer was massively parallel, but it had a limited amount of memory per node, and they were looking for killer applications to promote that technology. It was a year before it was released for sale, but they had a unit under development at IBM’s Rochester facility, which was very close to Iowa State, where I was at that time, and they gave us early access, along with a few others, to try some compelling applications in different areas and also inform them of the performance bottlenecks because they were still trying to fine tune the machine before they went into mass production. So luckily, I had early access, and then I ended up using it for this maize project.
Ultimately, the algorithmic contributions that went into this were recognized with a best paper award at the IEEE-CS International Parallel and Distributed Processing Symposium (IPDPS) meeting, the conference where I would be giving a keynote for the Charles Babbage Award. And IBM nominated us for the Computer World Honors Program, which identifies and awards innovative uses of information technology for the benefit of society. We also ended up being declared a finalist for the 21st Century Achievement Award in the area of energy, environment, and agriculture.
In 2018, I worked with a collaborator on the topic of microbial genomics. One of the fascinating things about microbes is that they have an ability to exchange genes when they meet each other without sexual reproduction, and it’s called horizontal gene transfer.
Since that discovery in 1928, researchers had been questioning if there was a clear species differentiation in the microbial world. With the frequent exchange of genes, maybe there was no concept of a species. It was a longstanding debate with several influential papers arguing that there are species boundaries, and several equally influential papers arguing that there aren’t. That debate had been going on for almost a century.
One of the collaborators I ran into at Georgia Tech works in this field and came up with a way to measure the distance between two microbial genomes, but what he wanted to know was if using this distance as a measure, could we really measure the pairwise distances between all microbial genome sequenced until today? That information would help us identify whether there are species boundaries or not.
The problem is that there are 100,000-plus microbial genomes sequenced to date and to analyze every pair of them would take a lot of work. My group worked with him, and we developed a computationally efficient way of measuring the distance between a pair of genomes, and then we parallelized it so that we could actually take all the microbial genome sequence to date, and then essentially measure the distance between them to come up with a distance metric and a way of computing it to determine if there are species boundaries. We found conclusive evidence that there are species boundaries.
Our results were published in Nature Communications in late 2018, and this work has 3,768 citations already. The software that we have developed for these species identification and classification is out there on Github has more than 100,000 downloads, and it is being used for species classification, and in several interesting studies.
When I started in this field, biologists would take experimental data, and more or less, they would look at it manually, and now no one can look at anything manually. Now, all of these large-scale data sets that they generate have to be run through computer programs to call out what needs the attention of the biologists. So, it’s really important to have the computing, the computer processing, done right. Otherwise, valuable insights are lost.
Almost every biology investigator now is confronted with large-scale data sets, no matter what they’re working on. And parallel computing becomes essential to be able to analyze such data sets. So, the future of the field is very bright, and almost everything that’s done in biology these days uses some form of parallel computing.
Parallel computational biology provides answers, not just data. Consider that if I give you the part list for what is under the hood of your car, it doesn’t really give you much information about the car. You really need to know how those parts fit together, work with each other, the function they achieve together, and so forth.
Biologists are grappling with those kinds of specifics. They can do experiments to try to find out so they can guess a function but it’s painstaking and costs literally hundreds of millions of dollars. That’s where computing comes in; I sometimes joke that our job is to really develop a computational hypothesis so that life scientists can narrow their focus on what is important, study that, and accelerate their way to a Nobel Prize.
Computer science is a unique field which is instrumental to the advancement of research and application of virtually every other field. If you ask a mechanical engineer if they need to know chemistry, they may say yes, to some extent, but it’s not vital. But ask anyone if they need computing, and they say yes. In fact, if you look at the research advances in any field, computing provides necessary support. It’s definitely true for all areas of sciences and engineering, but increasingly, it’s true even in arts, humanities, music, urban city planning, and all kinds of areas.
The cutting-edge research in many fields is happening where the field meets computer science. So, it’s a great time to be a computer scientist because you could really just work on all kinds of problems and all kinds of fields and go wherever your interest takes you.
Overall, my advice to the next generation is twofold:
Even though the impact of my work is really felt in biology, much of my research constitutes foundational advances in parallel computing. In this field, there is a preference towards publishing in life sciences, because that’s where your clients are, and if they read it and use it, you have wider dissemination. But I have always argued that it’s very important to publish in computer science and engineering conferences and journals—I have served as the editor-in-chief of IEEE/ACM Transactions on Computational Biology and Bioinformatics until recently—so the computer science advances are shared very openly.
That’s how others in computing can read and understand the technical aspects of the work, see what they would like to work on, and what additional contributions they can make. I have long argued that our community needs to know about the underlying algorithms on each of these contributions so that they don’t end up reinventing the wheel, and they also know when they want to work on something.
Much of my work constitutes foundational advances in parallel computing, and they’re fun and worth reading, even for people who are outside the field because the DNA sequences are modeled as strings. The interconnectedness of these sequences and the biological entities that interact with each other are modeled as graphs.
It opens up a lot of new fundamental problems dealing with strings and graphs and so forth. And these problems, they’re rooted in biology. But they’re genuine computer science problems in their own right. One could learn about the importance of new problems, formulate them, and solve them.
My group sometimes accidentally ended up solving long-standing theory problems that we would not have solved if we had been focused on them in the first place. We were working on applications, and in a few instances, the application context led us to interesting ways to solve the problems. So, it’s important to be curious and creative at all stages of your career.
Srinivas Aluru is Regents’ Professor in the School of Computational Science and Engineering and the Senior Associate Dean in the College of Computing at Georgia Institute of Technology. From 2016 to 2024, he served as Executive Director of the Institute for Data Engineering and Science (IDEaS), a campuswide interdisciplinary research institute. Previously, he held faculty positions at Iowa State University (1999-2013), Indian Institution of Technology Bombay (2009-2014), New Mexico State University (1996-1999), and Syracuse University (1994-1996). He received his B. Tech degree in Computer Science from the Indian Institute of Technology Madras in 1989, and his M.S. and Ph.D. degrees in Computer Science from Iowa State University in 1991 and 1994, respectively.
Aluru is known for his pioneering work in parallel computational biology, spanning both fundamental algorithms and compelling applications. He led genome assembly efforts for maize, the first economically important crop sequenced in the United States, on the IBM Blue Gene/L supercomputer. This work led to the discovery of 350 novel genes, and the assembly algorithms were recognized with an IPDPS best paper award in 2006. His group was the first to develop parallel algorithms for inference and analysis of genome-scale networks, leveraging experimental data deposited in public repositories by researchers worldwide. During the early years of big data research, Aluru led an NSF-NIH project to comprehensively develop parallel string and graph algorithms that underpin modern genomics. This work won both research and software reproducibility awards at Supercomputing conferences. More recently, his group developed FastANI, a method for estimating pairwise distances between microbial genomes. The software is downloaded more than 100K times and is being used for species classification. To promote education and research training in the field, Aluru edited the first comprehensive handbook in Computational Molecular Biology in 2005. He was a past chair of ACM SIGBIO (2015-2021) and served as editor-in-chief of the IEEE/ACM Transactions on Computational Biology and Bioinformatics (2021-2024).