Skip to main content

Life (science) in the cloud: Researchers dealing with petabytes of data become cloud experts

Image of Chris Dagdigian squatting next to a rack of servers, with his hands outstretched in a questioning pose

Chris Dagdigian

When Chris Dagdigian graduated with a degree in Biotechnology in 1995, the industry was poised to explode with innovation and possibility. There was only one problem, the explosion was being recorded using pen and paper. “Until that point scientists were documenting everything in notebooks,” he recalls. “As the research started to blossom, data volumes were starting to exceed what could reasonably be tracked in a handwritten lab book.”

As an entry-level scientist and low man on the totem pole at his company, it fell to Dagdigian to try to figure out digital data collection for his colleagues. “I had a great boss who saw an opportunity to improve the company’s infrastructure by allocating a significant part of his research budget to buy servers and storage, and he told me to figure it all out,” he says. “That’s how I went from being an early-career scientist, to a self-taught software developer, and eventually to an IT person specializing in scientific computing.”

What happened next was almost laughably predictable. The data exceeded its container yet again. “Digital was much better than pen and paper, but in very short order, instead of bulging notebooks everywhere, we had massive shelves full of disk drives,” he says. “The amount of data that science produces is unlike anything else in the world. IT changes fast, but science changes much, much faster. Within a couple years we realized that on-site digital storage had its own set of problems and wasn’t going to be a good long-term solution.”

Twin revolutions

In 2002 Dagdigian and a few colleagues went out on their own to found BioTeam, a science IT consulting company focusing on rationalizing scientific data processing and storage. Serendipitously, they did this just before the arrival of another revolution: the move to cloud computing. AWS got off the ground shortly after BioTeam came into being. “I’ve been using Amazon EC2 since it was in private beta,” Dagdigian recalls. “AWS’s value was clear from the beginning, they were extremely reliable and maximally flexible. Even today, they have more building block primitives than anyone else, which gives you the ability to make it as off-the-shelf or as customized as a company needs, depending on what they’re trying to do.”

The other game changer AWS brought to the table was its focus on enabling collaboration. “Back in the day, if two labs were collaborating, they were also overnighting hard drives across the country,” Dagdigian says. “Physically shifting petabytes of data around is not tenable in the long term. When data volumes are that high, once you have the data in one place, it’s far easier to bring people to the data, than the other way around.”

But while getting the model right was important, Dagdigian realized that to truly serve his customers, he would need to become an expert in this nascent field himself. “When I first started, I was self-taught,” he says. “I used my personal credit card to start an AWS account and fooled around with every service I thought was interesting. Today, thankfully, there is significantly more support for those looking to learn.”

Learning from others

To more fully develop his skills, in 2013 Dagdigian turned to the nascent AWS Certification program. “There’s real value in learning what other people are doing with the service,” he says. “Plus, you learn about things you might not have thought were interesting or necessary. In the beginning, like most scientists, I was completely focused on what mattered in the short term. For me, that was data storage and processing. In those early trainings for my Solutions Architect certifications, I had to learn about security and compliance, which I hadn’t been focused on, but are completely essential for this kind of work.”

Early on, Dagdigian found that certifications were essential for something else as well: credibility. “When we started out, all our customer acquisitions came through word of mouth, so trust wasn’t an issue,” he explains. “But as time went on and more and more people found us, it was reassuring for them to be able to instantly understand the expertise we had. The exams are quite difficult. You definitely can’t pass them unless you’ve been hands-on and really know your stuff.”

Expanding the future

Thanks to the innovations AWS has enabled, over the next several years, BioTeam went from four employees to 40. And while staying current with the technology is always challenging, the difference the cloud has made for Dagdigian’s clients is undeniable. “It’s an order of magnitude jump in both directions,” he says. “Computing against a complex dataset used to take weeks and cost upwards of $30,000. With the advanced High Performance Computing environments available in AWS today, I’m seeing people do the same thing in just days for more like $5,000. It’s incredible. Even better, products like AWS ParallelCluster can be trained to be licensing-aware. That means clients doing advanced work in data-intensive fields like computational chemistry can make sure they’re optimizing their software spend when they’re working with potentially expensive third-party software such as Schrödinger.”

At one point, Dagdigian had passed every AWS Certification available, but that was back when the AWS universe was considerably smaller. Today there are 12 different AWS Certifications available, in specialties from Advanced Networking to Machine Learning, and multiple people on his team have certifications across the spectrum of AWS services. “It’s been remarkable to see the explosion of possibilities within AWS over the past 10 years, and it’s allowed us to serve our customers at a level we couldn’t have dreamed of when we started,” he notes. “But it’s becoming increasingly challenging for a single person to maintain certifications in every area.” Then there’s the fact that certifications require renewal every three years, a feature Dagdigian also appreciates. “Things change too quickly in this industry to do it any other way,” he notes. “Maintaining your certification is just as important as getting it in the first place, because the rate of change at AWS is almost as crazy as it is in life science.”

Learn how you can improve your AWS skills and prepare for AWS Certification.

Read more...