Ajay- Describe how you started using R. What are some of the benefits you noticed on moving to R?
Jeff- I began using R in an internship while working on my undergraduate degree. I was provided with some unformatted R code and asked to modularize the code then wrap it up into an R package for distribution alongside a publication.
To be honest, as a Computer Science student with training more heavily emphasizing the big high-level languages, R took some getting used to for me. It wasn’t until after I concluded that initial project and began using R to do my own data analysis that I began to realize its potential and value. It was the first scripting language which really made interactive use appealing to me — the experience of exploring a dataset in R was unlike anything I’d been taught in my University courses.
Upon gaining familiarity with the syntax and basics of the language, I began to see the immense value in the vast array of R packages which had already been created and made publicly available. I found repeatedly that many of the “niche” functions I’d been coding myself in more traditional languages had already been developed and shared freely on CRAN or Bioconductor.
Ajay- Describe your work in computational biology using R?
Jeff- I work in the Quantitative Biomedical Research Center (QBRC) at UT Southwestern Medical Center. My group is involved in analyzing and mining massive biological datasets, much of which is now coming from different sequencing technologies (DNA-seq, RNA-seq, etc.) which generate many Gigabytes of data with each experiment. Unfortunately, due to the sheer volume of that data, R is often unfit for the initial analysis and pre-processing of the data. However, once the data has been processed and reduced, we use R to identify statistically and (hopefully) biologically interesting trends or anomalies.
Personally, most of my research lately has focused on reconstructing the interactions between genes based on the patterns and behaviors we can observe. Thankfully, most of the data we work with here fits in memory, so I use R almost exclusively when doing analysis in this area. My most recent work was in “Ensemble Network Aggregation” (ENA), the package of which is now available in CRAN.
Ajay- Describe your work in web applications using R?
Jeff- I was initially tasked with developing software packages which encapsulated the new statistical methodologies being developed within the group (which, at the time, were largely focused on microarray data). I continued developing R packages and began investigating how I might be able to integrate my prior experience with web development into these projects. We ended up developing a handful of different web applications which typically required that we use R to precompute any data which ultimately made its way (statically) into the application.
More recently, we’ve been developing sites which take advantage of dynamic or real-time R analysis, such as our upcoming release of the Lung Cancer Explorer — a tool which allows for the interactive exploration of lung cancer data within a browser. We went to great lengths to develop the IT and software infrastructure that would allow us to interact with R remotely for these applications.
I’ve been taking some time on the side to play with RStudio’s new Shiny application which, like most everything else that group has put out, represents a massive leap forward in this space. We’ve already begun looking at how we can supplement or replace some of our in-house systems with Shiny to start taking advantage of some of its capabilities.
Ajay- What is Trestle Technology focused on?
Jeff- I initially was doing a lot of web development, and helping small-medium businesses integrate and automate various software systems. Once R got added to my resume, however, I started finding more interesting work helping start-ups get their IT and analytics infrastructures off the ground.
My hope is to continue living at the intersection of data and software development and grow this company in that space. It’s quite difficult to find groups doing good analysis and proper software development under one roof — especially in Academia. I thoroughly enjoy the process of enriching data analysis tools with more comprehensive, user-friendly interfaces which allow for more efficient exploration of the underlying datasets.
Ajay- What do you do for relaxing when not working with a computer?
Jeff- I really am a nerd at heart, so much of my “relaxation time” is spent writing code of some sort. When I do pull away from the computer, I enjoy spending time with my wife, being involved in my church, playing soccer or volleyball, or dabbling in photography.
Ajay- Compare R and Python- What are some of the ways you think R can be further improved?
Jeff- I must confess that I’m a fairly late-comer to the Python world. I had tinkered with Perl and Python a few years back and ended up feeling more comfortable with Perl for my needs at that point, so I’ve used it for most of my scripting. Only recently have I started revisiting Python and getting introduced to some of the neat tools available. To me, Python offers a more intuitive Object-Orienting framework than either Perl or R provides, which helps me stay more organized as my “scripts” inevitably grow into “software.”
Aside from OO, I still feel there’s much room for improvement in R’s “big data” capabilities. The community has certainly been making huge strides in bypassing memory limitations (with packages like ff) and speeding up the code (with code compilation, etc.). Nonetheless, I find that my motivation in leaving R for any part of my data analysis is typically motivated by performance concerns and a desire to avoid having to nest any C++ code in my R scripts (though the recent improvements in the Rcpp and devtools packages are making that a much less painful process).
Jeffrey D Allen is a computational biologist at UT Southwestern Medical Center at Dallas. You can see him at Stack Overflow or on Github and contact him at LinkedIn http://www.linkedin.com/in/jeffreydallen1
About Trestle Technology
Trestle Technology, LLC was founded in 2010. Our primary goal is to bridge the gap between the technical and the familiar.
Historically, the new developments at the cutting edge of technology have only been available to those trained in the latest high-tech implementations. The methods and hardware required to solve problems people faced on a daily basis existed, they just weren’t easily accessible to the public.
Now-days, computational resources which were previously unfathomable are now available to any and all for just pennies per hour, yet many people are unnecessarily limited because they either lack the knowledge required to solve a problem, or lack the resources required to implement a solution.
We at Trestle Technology aim to bridge this gap: bringing the latest and greatest developments in modern hardware and software to the average computer user at the touch of a button.
Data Mining, Keystroke Dynamics, Web Development, Rich Internet Applications