Home » Posts tagged 'r commander'
Tag Archives: r commander
I noticed this article sometime back by the most excellent hacker, John Myles White ( author Machine learning for Hackers)
Professor John Fox, whom we have interviewed here as the creator of R Commander, talked on this at User 2008 http://www.statistik.uni-dortmund.de/useR-2008/slides/Fox.pdf
I also noticed that R Project is stuck on SVN ( yes or no??, comment please) while some part of the rest of the World has moved on to Git. See http://en.wikipedia.org/wiki/Git_%28software%29
Is Git really that good compared to SVN http://stackoverflow.com/questions/871/why-is-git-better-than-subversion
Maybe, I think with 5000 packages and more , R -project needs to have more presence on Github and atleast consider Git for the distributed and international project R is becoming.
The R Users of New Delhi met for the second time on Dec 15, 2012. We meet on the third Saturday of every month.
We talked on epidemiology using epi calc package ( we have 1 doctor and 1 bio statistician) , and Cloud Computing ( we have two IT guys) and Business Analytics. We also discussed the GUI , R Commander , Rattle, and Deducer for beginners and people transitioning to R from other analytics software. We also discussed the R for SAS and SPSS Users books, and R for Data Mining Book. The free book for R for Epidemiology ( http://cran.r-project.org/doc/contrib/Epicalc_Book.pdf ) was mentioned . Not bad for 1 hour.
We are currently unfunded and unsponsored , I hope to get some sponsors to give away R books to encourage users and group members (excluding my own). The only catch to join this meetup group, you either need to attend (and be local) or present something ( if you are not in Delhi)
I have been trying to get this group to go from Vector to Matrix to get a bigger sponsorship from Revolution , but I am constrained by meeting in a public cafe. That is due to change since we managed to get one sponsor for meeting place in Noida ( a Business School batchmate who owns his office)
Deadlines for applications are:
- March 31, 2013 for Matrix and Array level groups.
- September 30, 2013 for Vector level groups.
2013 Sponsorship Levels
The size of the annual grant depends on the size of your group.
|Level||For groups that are:||Requirements||Annual Grant ($USD)|
|Vector||Just getting started||A group name, group webpage, and a focus on R. (Here are some tips on starting up a new R user group.)||$100|
|Matrix||Smaller but established||3 meetings in last 6 months with 30 attendees or more.||$500|
|Array||Larger and groups||3 meetings in last 6 months with 60 attendees or more.||$1000|
- New Delhi R User group meets up (decisionstats.com)
I got interviewed on moving on from Excel to R in Human Resources (HR) here at http://www.hrtecheurope.com/blog/?p=5345
“There is a lot of data out there and it’s stored in different formats. Spreadsheets have their uses but they’re limited in what they can do. The spreadsheet is bad when getting over 5000 or 10000 rows – it slows down. It’s just not designed for that. It was designed for much higher levels of interaction.
In the business world we really don’t need to know every row of data, we need to summarise it, we need to visualise it and put it into a powerpoint to show to colleagues or clients.”
And a more recent interview with my fellow IIML mate, and editor at Analytics India Magazine
AIM: Which R packages do you use the most and which ones are your favorites?
AO: I use R Commander and Rattle a lot, and I use the dependent packages. I use car for regression, and forecast for time series, and many packages for specific graphs. I have not mastered ggplot though but I do use it sometimes. Overall I am waiting for Hadley Wickham to come up with an updated book to his ecosystem of packages as they are very formidable, completely comprehensive and easy to use in my opinion, so much I can get by the occasional copy and paste code.
A surprising review at R- Bloggers.com /Intelligent Trading
The good news is that many of the large companies do not view R as a threat, but as a beneficial tool to assist their own software capabilities.
After assisting and helping R users navigate through the dense forest of various GUI interface choices (in order to get R up and running), Mr. Ohri continues to handhold users through step by step approaches (with detailed screen captures) to run R from various simple to more advanced platforms (e.g. CLOUD, EC2) in order to gather, explore, and process data, with detailed illustrations on how to use R’s powerful graphing capabilities on the back-end.
Do you want to write a review too? You can visit the site here
- What does R do? Bring people together, of course! (r-bloggers.com)
- Book Review: R for Business Analytics, A Ohri (r-bloggers.com)
I love GUIs (graphical user interfaces)- they might be TCL/TK based or GTK based or even QT based. As a researcher they help me with faster coding, as a consultant they help with faster transition of projects from startup to handover stage and as an R instructor helps me get people to learn R faster.
I wish Python had some GUIs though ;)
from the open access journal of statistical software-
JSS Special Volume 49: Graphical User Interfaces for R
Pedro M. Valero-Mora, Ruben Ledesma
Vol. 49, Issue 1, Jun 2012
Submitted 2012-06-03, Accepted 2012-06-03
Ya-Shan Cheng, Chien-Yu Peng
Vol. 49, Issue 2, Jun 2012
Submitted 2010-12-31, Accepted 2011-06-29
Joris J. Snellenburg, Sergey Laptenok, Ralf Seger, Katharine M. Mullen, Ivo H. M. van Stokkum
Vol. 49, Issue 3, Jun 2012
Submitted 2011-01-20, Accepted 2011-09-16
Marcel Austenfeld, Wolfram Beyschlag
Vol. 49, Issue 4, Jun 2012
Submitted 2011-01-05, Accepted 2012-02-20
Byron C. Wallace, Issa J. Dahabreh, Thomas A. Trikalinos, Joseph Lau, Paul Trow, Christopher H. Schmid
Vol. 49, Issue 5, Jun 2012
Submitted 2010-11-01, Accepted 2012-12-20
Bei Huang, Dianne Cook, Hadley Wickham
Vol. 49, Issue 6, Jun 2012
Submitted 2011-01-20, Accepted 2012-04-16
John Fox, Marilia S. Carvalho
Vol. 49, Issue 7, Jun 2012
Submitted 2010-12-26, Accepted 2011-12-28
Vol. 49, Issue 8, Jun 2012
Submitted 2011-02-28, Accepted 2011-09-08
Stefan Rödiger, Thomas Friedrichsmeier, Prasenjit Kapat, Meik Michalke
Vol. 49, Issue 9, Jun 2012
Submitted 2010-12-28, Accepted 2011-05-06
Vol. 49, Issue 10, Jun 2012
Submitted 2010-12-17, Accepted 2011-05-11
Vol. 49, Issue 11, Jun 2012
Submitted 2010-12-08, Accepted 2011-07-15
My favorite GUI (or one of them) R Commander has a relatively new plugin called KMGGplot2. Until now Deducer was the only GUI with ggplot features , but the much lighter and more popular R Commander has been a long champion in people wanting to pick up R quickly.
RcmdrPlugin.KMggplot2: Rcmdr Plug-In for Kaplan-Meier Plot and Other Plots by Using the ggplot2 Package
As you can see by the screenshot- it makes ggplot even easier for people (like R newbies and experienced folks alike)
This package is an R Commander plug-in for Kaplan-Meier plot and other plots by using the ggplot2 package.
|Depends:||R (≥ 2.15.0), stats, methods, grid, Rcmdr (≥ 1.8-4), ggplot2 (≥ 0.9.1)|
|Imports:||tcltk2 (≥ 1.2-3), RColorBrewer (≥ 1.0-5), scales (≥ 0.2.1), survival (≥ 2.36-14)|
|Author:||Triad sou. and Kengo NAGASHIMA|
|Maintainer:||Triad sou. <triadsou at gmail.com>|
|CRAN checks:||RcmdrPlugin.KMggplot2 results|
---------------------------------------------------------------- NEWS file for the RcmdrPlugin.KMggplot2 package ---------------------------------------------------------------- ---------------------------------------------------------------- Changes in version 0.1-0 (2012-05-18) o Restructuring implementation approach for efficient maintenance. o Added options() for storing package specific options (e.g., font size, font family, ...). o Added a theme: theme_simple(). o Added a theme element: theme_rect2(). o Added a list box for facet_xx() functions in some menus (Thanks to Professor Murtaza Haider). o Kaplan-Meier plot: added confidence intervals. o Box plot: added violin plots. o Bar chart for discrete variables: deleted dynamite plots. o Bar chart for discrete variables: added stacked bar charts. o Scatter plot matrix: added univariate plots at diagonal positions (ggplot2::plotmatrix). o Deleted the dummy data for histograms, which is large in size. ---------------------------------------------------------------- Changes in version 0.0-4 (2011-07-28) o Fixed "scale_y_continuous(formatter = "percent")" to "scale_y_continuous(labels = percent)" for ggplot2 (>= 0.9.0). o Fixed "legend = FALSE" to "show_guide = FALSE" for ggplot2 (>= 0.9.0). o Fixed the DESCRIPTION file for ggplot2 (>= 0.9.0) dependency. ---------------------------------------------------------------- Changes in version 0.0-3 (2011-07-28; FIRST RELEASE VERSION) o Kaplan-Meier plot: Show no. at risk table on outside. o Histogram: Color coding. o Histogram: Density estimation. o Q-Q plot: Create plots based on a maximum likelihood estimate for the parameters of the selected theoretical distribution. o Q-Q plot: Create plots based on a user-specified theoretical distribution. o Box plot / Errorbar plot: Box plot. o Box plot / Errorbar plot: Mean plus/minus S.D. o Box plot / Errorbar plot: Mean plus/minus S.D. (Bar plot). o Box plot / Errorbar plot: 95 percent Confidence interval (t distribution). o Box plot / Errorbar plot: 95 percent Confidence interval (bootstrap). o Scatter plot: Fitting a linear regression. o Scatter plot: Smoothing with LOESS for small datasets or GAM with a cubic regression basis for large data. o Scatter plot matrix: Fitting a linear regression. o Scatter plot matrix: Smoothing with LOESS for small datasets or GAM with a cubic regression basis for large data. o Line chart: Normal line chart. o Line chart: Line char with a step function. o Line chart: Area plot. o Pie chart: Pie chart. o Bar chart for discrete variables: Bar chart for discrete variables. o Contour plot: Color coding. o Contour plot: Heat map. o Distribution plot: Normal distribution. o Distribution plot: t distribution. o Distribution plot: Chi-square distribution. o Distribution plot: F distribution. o Distribution plot: Exponential distribution. o Distribution plot: Uniform distribution. o Distribution plot: Beta distribution. o Distribution plot: Cauchy distribution. o Distribution plot: Logistic distribution. o Distribution plot: Log-normal distribution. o Distribution plot: Gamma distribution. o Distribution plot: Weibull distribution. o Distribution plot: Binomial distribution. o Distribution plot: Poisson distribution. o Distribution plot: Geometric distribution. o Distribution plot: Hypergeometric distribution. o Distribution plot: Negative binomial distribution.
Continuing my series of basic data manipulation using R. For people knowing analytics and
new to R.
1 Keeping only some variables Using subset we can keep only the variables we want- Sitka89 <- subset(Sitka89, select=c(size,Time,treat)) Will keep only the variables we have selected (size,Time,treat). 2 Dropping some variables Harman23.cor$cov.arm.span <- NULL
This deletes the variable named cov.arm.span in the dataset Harman23.cor 3 Keeping records based on character condition Titanic.sub1<-subset(Titanic,Sex=="Male") Note the double equal-to sign
4 Keeping records based on date/time condition subset(DF, as.Date(Date) >= '2009-09-02' & as.Date(Date) <= '2009-09-04') 5 Converting Date Time Formats into other formats if the variable dob is “01/04/1977) then following will convert into a date object z=strptime(dob,”%d/%m/%Y”) and if the same date is 01Apr1977 z=strptime(dob,"%d%b%Y") 6 Difference in Date Time Values and Using Current Time The difftime function helps in creating differences in two date time variables. difftime(time1, time2, units='secs') or difftime(time1, time2, tz = "", units = c("auto", "secs", "mins", "hours", "days", "weeks")) For current system date time values you can use Sys.time() Sys.Date() This value can be put in the difftime function shown above to calculate age or time elapsed. 7 Keeping records based on numerical condition Titanic.sub1<-subset(Titanic,Freq >37) For enhanced usage-
you can also use the R Commander GUI with the sub menu Data > Active Dataset 8 Sorting Data Sorting A Data Frame in Ascending Order by a variable AggregatedData<- sort(AggregatedData, by=~ Package) Sorting a Data Frame in Descending Order by a variable AggregatedData<- sort(AggregatedData, by=~ -Installed) 9 Transforming a Dataset Structure around a single variable Using the Reshape2 Package we can use melt and acast functions library("reshape2") tDat.m<- melt(tDat) tDatCast<- acast(tDat.m,Subject~Item) If we choose not to use Reshape package, we can use the default reshape method in R. Please do note this takes longer processing time for bigger datasets. df.wide <- reshape(df, idvar="Subject", timevar="Item", direction="wide") 10 Type in Data Using scan() function we can type in data in a list 11 Using Diff for lags and Cum Sum function forCumulative Sums We can use the diff function to calculate difference between two successive values of a variable. Diff(Dataset$X) Cumsum function helps to give cumulative sum Cumsum(Dataset$X) > x=rnorm(10,20) #This gives 10 Randomly distributed numbers with Mean 20 > x  20.76078 19.21374 18.28483 20.18920 21.65696 19.54178 18.90592 20.67585  20.02222 18.99311 > diff(x)  -1.5470415 -0.9289122 1.9043664 1.4677589 -2.1151783 -0.6358585 1.7699296  -0.6536232 -1.0291181 > cumsum(x)  20.76078 39.97453 58.25936 78.44855 100.10551 119.64728 138.55320  159.22905 179.25128 198.24438 > diff(x,2) # The diff function can be used as diff(x, lag = 1, differences = 1, ...) where differences is the order of differencing  -2.4759536 0.9754542 3.3721252 -0.6474195 -2.7510368 1.1340711 1.1163064  -1.6827413 Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole. 12 Merging Data Deducer GUI makes it much simpler to merge datasets. The simplest syntax for a merge statement is totalDataframeZ <- merge(dataframeX,dataframeY,by=c("AccountId","Region")) 13 Aggregating and group processing of a variable We can use multiple methods for aggregating and by group processing of variables.
Two functions we explore here are aggregate and Tapply. Refering to the R Online Manual at
[http://stat.ethz.ch/R-manual/R-patched/library/stats/html/aggregate.html] ## Compute the averages for the variables in 'state.x77', grouped ## according to the region (Northeast, South, North Central, West) that ## each state belongs to aggregate(state.x77, list(Region = state.region), mean) Using TApply ## tapply(Summary Variable, Group Variable, Function) Reference [http://www.ats.ucla.edu/stat/r/library/advanced_function_r.htm#tapply] We can also use specialized packages for data manipulation. For additional By-group processing you can see the doBy package as well as Plyr package
for data manipulation.Doby contains a variety of utilities including:
1) Facilities for groupwise computations of summary statistics and other facilities for working with grouped data.
2) General linear contrasts and LSMEANS (least-squares-means also known as population means),
3) HTMLreport for autmatic generation of HTML file from R-script with a minimum of markup, 4) various other utilities and is available at[ http://cran.r-project.org/web/packages/doBy/index.html]
Also Available at [http://cran.r-project.org/web/packages/plyr/index.html],
Plyr is a set of tools that solves a common set of problems:
you need to break a big problem down into manageable pieces,
operate on each pieces and then put all the pieces back together.
For example, you might want to fit a model to each spatial location or
time point in your study, summarise data by panels or collapse high-dimensional arrays
to simpler summary statistics.