Home » Posts tagged 'inference for R'
Tag Archives: inference for R
Okay, through the weekend I created a website for a few of my favourite things.
It’s on at https://rforanalytics.wordpress.com/
Graphical User Interfaces for R
Jerry Rubin said: “Don’t trust anyone over thirty
I dont trust anyone not using atleast one R GUI. Here’s a list of the top 10.
Code Enhancers for R
Here is a list of top 5 code enhancers,editors in R
R Commercial Software
A list of companies and software making (and) selling R software (and) services. Hint- it is almost 5 (unless I missed someone)
R Graphs Resources
R’s famous graphing capabilities and equally famous learning curve can be made a bit more humane- using some of these resources.
Because that’s what I do (all I do as per my cat) , and I am pretty good at it.
Using R from other Software
R can be used successfully from a lot of analytical software including some surprising ones praising the great 3000 packages library.
(to be continued- as I find more stuff I will keep it there, some ideas- database access from R, prominent R consultants, prominent R packages, famous R interviewees ;) )
ps- The quote from Jerry Rubin seems funny for a while. I turn 34 this year.
- Web Analytic Tools – Which is Right for You? (bloggingtips.com)
- OkCupid Demystifies Dating with Big Data (gigaom.com)
- Stata/SAS/SPSS – Numeric Data Services – Subject Guides at Syracuse University Library (researchguides.library.syr.edu)
- Michigan State University Libraries – Data Services – Data Analysis (lib.msu.edu)
- SPSS Guru embraces the freeware, R (ekonometrics.blogspot.com)
R RANT- while the European R Core leadership led by the Great Dane, Pierre Dalgaard focuses on the small picture and virtually handing the whole commercial side to Prof Nie and David Smith at Revo Computing other smaller package developers have refused to be treated as cheap R and D developers for enterprise software. How’s the book sales coming along, Prof Peter? Any plans to write another R Book or are you done with writing your version of Mathematica (Ref-Newton). Running the R Core project team must be so hard I recommend the Tarantino movie “Inglorious B…” for Herr Doktors. -END
I believe that individual R Package creators like Prof Harell (Hmisc) , or Hadley Wickham (plyr) deserve a share of the royalties or REVENUE that Revolution Computing, or ANY software company that uses R.
On this note-Some updated news on Rattle the Data Mining Tool created by Dr Graham Williams. Once again R development taken ahead by Down Under chaps while the Big Guys thrash out the road map across the Pond.
Data Mining Resources
Rattle is a free and open source data mining toolkit written in the statistical language R using the Gnome graphical interface. It runs under GNU/Linux, Macintosh OS X, and MS/Windows. Rattle is being used in business, government, research and for teaching data mining in Australia and internationally. Rattle can be purchased on DVD (or made available as a downloadable CD image) as a standalone installation for $450USD ($560AUD), using one of the following payment buttons.
The free and open source book, The Data Mining Desktop Survival Guide (ISBN 0-9757109-2-3) simply explains the otherwise complex algorithms and concepts of data mining, with examples to illustrate each algorithm using the statistical language R. The book is being written by Dr Graham Williams, based on his 20 years research and consulting experience in machine learning and data mining. An electronic PDF version is available for a small fee from Togaware ($40AUD/$35USD to cover costs and ongoing development);
- The Data Mining Software Repository makes available a collection of free (as in libre) open source software tools for data mining
- The Data Mining Catalogue lists many of the free and commercial data mining tools that are available on the market.
- The Australasian Data Mining Conferences are supported by Togaware, which also hosts the web site.
- Information about the Pacific Asia Knowledge Discovery and Data Mining series of conferences is also available.
- A Data Mining course is taught at the Australian National University.
- See also the Canberra Analytics Practise Group.
- A Data Mining Course was held at the Harbin Institute of Technology Shenzhen Graduate School, China, 6 December – 13 December 2006. This course introduced the basic concepts and algorithms of data mining from an applications point of view and introduced the use of R and Rattle for data mining in practise.
- A Data Mining Workshop was held over two days at the University of Canberra, 27-28 November, 2006. This course introduced the basic concepts and algorithms for data mining and the use of R and Rattle.
Using R for Data Mining
The open source statistical programming language R (based on S) is in daily use in academia and in business and government. We use R for data mining within the Australian Taxation Office. Rattle is used by those wishing to interact with R through a GUI.
R is memory based so that on 32bit CPUs you are limited to smaller datasets (perhaps 50,000 up to 100,000, depending on what you are doing). Deploying R on 64bit multiple CPU (AMD64) servers running GNU/Linux with 32GB of main memory provides a powerful platform for data mining.
R is open source, thus providing assurance that there will always be the opportunity to fix and tune things that suit our specific needs, rather than rely on having to convince a vendor to fix or tune their product to suit our needs.
Also, by being open source, we can be sure that the code will always be available, unlike some of the data mining products that have disappearded (e.g., IBM’s Intelligent Miner).
See earlier interview-
Interview Paul van Eikeren Inference for R
Here is an interview with Paul van Eikeren, President and CEO of Blue Reference, Inc. Paul heads up a startup company addressing the need of information workers to have easier-cheaper-faster access to high-end data mining, analysis and reporting capabilities from software like R, S-plus, MATLAB, SAS, SPSS, python and ruby. His recent product Inference for R has been causing waves within the analytical fraternity across both R users and SAS users, especially given the fact that it is quite well designed, has a great GUI, and is priced rather reasonably.
A few weeks ago, rumour had it the SAS Institute was reportedly buying out the Inference for R product ( Note the merger and acquisition question below)
Rather curious to know about this company, I happened to met Ben Hincliffe at the http://www.analyticbridge.com site which with 5000 members has the largest number of data analytics and many business intelligence members as well). Ben who recently authored a guest post for Sandro at Data Mining Blog then put across my request to interview with Paul, the CEO for Blue Reference. Existing products for Blue Reference include additional analytical packages like Inference for Matlab etc.
Paul is an extremely seasoned person with years in the analytical fraternity and with a Phd from MIT. Here is Paul’s vision on his company and analytics product development.
Ajay: Describe your career journeys. What advice would you give to today’s young people of following careers in science.
Paul: I have been blessed with extremely productive and diversified career journey. After receiving undergraduate and graduate degrees in chemistry, I taught chemistry and carried out research as a college professor for 14 years. During the next 12 years I spend heading R&D teams at three different startup companies focused on the application of novel processing technology for use in drug discovery and development. And using that wealth of acquired experience, I have had the good fortune to successfully co-found and develop with my son Josh, two startup companies (IntelliChem and Blue Reference) directed at the use of informatics to drive more efficient and effective Research, Development, Manufacturing and Operations.
In my journey I have had the opportunity to counsel many young people regarding their career choices. I have offered two principal pieces of advice: one, for the right person, science represents an outstanding opportunity for a productive and satisfying career; and two, a science education provides an outstanding stepping stone to careers in other fields. A study disclosed in a recent Wall Street Journal article (Sarah E. Needleman, “Doing the Math to Find the Good Jobs, 26 January 2009) revealed that mathematicians land the top spot in the new rankings of the best occupations. Science-linked occupations took 7 out of the top 20 spots.
These ratings suggest that the problem solving and innovation aspects of scientific occupations are much less stressful than other occupations, which leads to high job satisfaction. But does one have to be a genius to have a successful career in science? An interesting read on this subject is the book by Robert Weisberg (Creativity: Beyond the Myth of the Genius) in which he dispels the myth of the genius being the results of a genetic gift. Weisberg argues, convincingly, that a genius exhibits three elements: (1) a basic intellectual capacity; (2) a high level of motivation/determination, which enables the genius to remain focused; and (3) immersion in their chosen field, typically represented by over 10,000 hours of study/practice/experience. It turns out that the latter element is the principal differentiator, and fortunately, it is something one has control over.
Ajay: Describe the journey that Blue Reference has made leading to its current product line, including Inference for R.
Paul: The Inference product suite represents a natural extension beyond the Electronic Laboratory Notebook (ELN) product we developed at our previous company, IntelliChem. ELNs are used by scientists and technicians to document research, experiments and procedures performed in a laboratory. The ELN is a fully electronic replacement of the paper notebook. IntelliChem (sold to Symyx in 2004) was a leader in deployment of ELNs at global pharmaceutical companies.
After seeing the successful adoption of ELNs in the laboratory, we saw an opportunity to improve upon the utility of ELN documents and the data contained therein. Essentially, we developed Inference to be a platform for enabling MS Office documents with powerful, flexible, and transparent analytic capabilities – what we call “dynamic documents” or “document mashups”. Executable code from high-level scripting languages like R, MATLAB, and .NET, is combined with data and explanatory text in the document canvas to transform it from a static record into an analytic application.
The pharmaceutical industry, in cooperation with the FDA, has begun to look at ways to implement quality by design (QbD) practices as an alternative to quality by end-testing. QbD comprises a systematic application of predictive analytics to the drug R&D process such that development timelines and costs are reduced while drug safety and efficacy is improved.
Statistical modeling and analysis plays a key role in QbD as a tool for identifying critical quality attributes and confining their variability to a specified design space. Dynamic documents fit nicely into this paradigm, and we’re currently using Inference as a platform to develop an enterprise solution for QbD. You can visit http://www.InferenceForQbD.com for more information about our QbD product.
Along the way, we recognized the need for Inference outside of the pharmaceutical industry. The Inference for R, Inference for MATLAB, and Inference for.NET versions are meant to serve users of these technical computing languages who have analysis, publishing, reporting, collaboration, and reproducible research needs that are best served by a document centric environment. By using Microsoft Word, Excel and PowerPoint as the “front end,” we can serve the the 500 million users that use Microsoft Office as their principal desktop productive application.
Ajay: What is the pricing strategy for Inference for Matlab and Inference for R – and how do you see the current recession as an opportunity for analytical products.
Paul: Our strategy is to reach out to the market Microsoft Office users that would benefit from easy access to datamining and predictive analytics capabilities within their principal desktop productivity tool. Accordingly, we have offered the Inference product at the low price of $199 for a single user/one year subscription. Additionally, because it is implemented on top of an existing installation of Microsoft Office, the cost of training, support and maintenance are expected to be minimal.
Ajay: Your product seems to follow a nice fit where both open source as well as proprietary packages from Microsoft( .Net) are working together to give the customer a nice solution. Do you believe it is possible that big companies and big open source communities can work together to create some software rather than just be at loggerheads.
Paul: Absolutely. We’re seeing momentum build for open source analytic solutions as the economy impacts companies, both small and large. We saw this take place in the back office with implementation of Linux and Apache Web servers, and now we’re starting to see it in the front office. Smart IT teams are looking for creative ways to stretch their resources, forcing them to look beyond established, but expensive, software products.
We’ve encountered concrete evidence of this in the financial industry. Fresh on the heels of the credit crisis, investment banks and hedge funds have begun to realize that their risk models and supporting software infrastructure are inadequate. In response, quantitative finance and risk analysts are increasingly turning to the open source R statistical computing environment for improved predictive analytics.
R has a core group of devotees in academia that drive innovation, making it a comprehensive venue for development of leading-edge data analysis methods. In order to leverage these tools, banks need a way
for R to play nicely with their existing personnel and IT infrastructure. This is where Inference for R produces real value. It transforms MS Office into platform for the development, distribution, and maintenance of R based quantitative tools – enabling production level predictive analytics.
Commercial distributions of R address issues of scalability and support, which might otherwise be subjects of concern. For example, REvolution Computing distributes an optimized, validated and supported distribution of R, providing peace of mind to corporate IT. REvolution also offers Enterprise R, a distribution of R for 64-bit, high performance computing.
Ajay: Please name any successful customer testimonials for Inference for R.
Paul: We have been working with the director of quantitative analytics at a large international bank. He reported that he has successfully distributed R applications to his team of research analysts and portfolio managers based on Inference in Excel. Use of this strategy eliminated the need to code complex models in Visual Basic for applications, which is time consuming and error prone.
Ajay: Also are there any issues with licensing and IP for mixing open source code and proprietary code.
Paul- The licensing issues with open source R pertain to distributing R. There are no licensing restrictions in using R. Accordingly, we do not distribute R. Rather, our customers install R separately and Inference recognizes the installation.
Ajay: So R is free and I can get Open Office for free. What are the five specific uses where Inference for R can score an edge over this and make me pay for the solution.
Paul: R is free, and many R enthusiasts would argue that all you need for R is a Linux operating system like Ubuntu, a text editor such as Emacs, and R’s command line interface. For some highly-skilled R users this is sufficient; for the new and average R user this is a nightmare.
Many people think that the largest fraction of the cost of implementing new software is the cost of the license. In actuality, and especially in the corporate world, it is the cost of training, user support, software maintenance, and the costs of switching the user base to the new software. Free open source software does not help here. Hence there is a strong ROI argument to be made to build new software application on top of existing systems that have worked well.
Additionally, successful implementation of open source software like R requires a baseline of integration with existing systems. The fact is that Microsoft operating systems dominate the business world, as does Microsoft Office. If one is serious about using R to address the analytic needs of big business, tight integration with these systems is imperative.
Ajay: Any plans for a web hosted SaaS version for Inference for R soon?
Paul: The natural progression of Inference for R to SaaS will coincide with the next release of Office (Office 2010 or Office 14), which we expect to be largely SaaS enabled.
Ajay: Name some alliances and close partners working with Blue Reference
- and what we can expect from you in terms of product launches in 2009.
Paul: We have created a product development consortium in partnership involving ‘top ten’ global pharmaceutical companies The consortium is guiding the development of an enterprise solution for Quality by Design (QbD), using Inference for R as the platform.
We are working with several consulting firms specializing in IT solutions for specialized markets like risk management and predictive analytics.
We are also working with several technology partners who have complementary products and where integration of their products with Inference provides clear and significant value to customers.
Ajay: Any truth to the rumors of an acquisition by a BIG analytics company?
Paul: Our business strategy is centered on growth through partnerships with others. Acquisition is one means to execute that strategy.
Ajay: How do you see this particular product (for R) shaping up down the years.
Paul: R’s success can be attributed, in large part, to the support of its loyal open source community. Its enthusiastic use in academia bodes very well for its growth as a cutting-edge analytics tool. It is just a matter of time before commercial analytic solutions powered by R become de rigueur. We’re happy to be at the tip of the spear.
Ajay: Any Asia plans for Blue Reference or are you still happy with the Oregon location. How do you plan to interact with graduate schools and academia for your products.
Paul: Although we don’t have a major private university in our backyard, Oregon State University has opened a campus here. And, we’ve been in dialogue with the global Academic community from day one. Over 100 academic institutions around the world use Inference through our academic licensing program. Inference is a great tool for preparing dynamic lessons and publishing reproducible research.
Our Central Oregon location is home to a growing high-tech sector that we’ve been a part of for decades. We’ve had success building large and profitable companies here. Bend attracts Silicon Valley types who come here for vacation and don’t want to leave – they just can’t seem to resist the quality of life and bountiful recreational opportunities that this area offers. It’s a good mix of work and play.
Paul van Eikeren is President and CEO of Blue Reference, Inc. He is responsible for guiding the strategic direction of the company through novel products and services development, partnerships and alliances in the realm of application of informatics to faster-cheaper-better research, development, manufacturing and operations. Van Eikeren is a successful serial entrepreneur, which includes the co-founding of IntelliChem with his son Josh and its ultimate sale to Symyx Technologies. He has headed up R&D at several startup companies focused on drug discovery and development including Sepracor Inc., Argonaut Technologies, Inc, and Bend Research, Inc. He served as Professor of Chemistry and Biochemistry at Harvey Mudd College of Science and Engineering. He is author/co-author and inventor/co-inventor in over 50 scientific articles and patents directed at the application of chemical, biochemical and computational technologies. Van Eikeren holds a BA degree in Chemistry from Columbia University and a PhD in Chemistry from MIT.bluereference-logo
Ajay- To know more I recommend checking out the free evaluation at http://inferenceforr.com/ especially if you need to rev up your MS office Installation with greater graphics and analytics juice.