In this post, I would like to give a brief intro to R. R has gained tremendous popularity in recent past and many people want to know just what R is without getting into too many details. So here it is.
What is R?
R is a programming environment for statistical analysis. It consists of a programming language, graphics capability to visualise data, interfaces to other languages and debugging environment. R is open source which means it is free to download, install and use under GNU General Public License licence. Although R refers to complete set of tools for data analysis, in this post R is being referred as R programming language.
What’s its history?
R originated from S language and has similarities with it. S language was developed in 1975 at Bell Laboratories as statistical programming language. Other variations of S were developed later on including New S and S-PLUS which are available today as commercial software. R was developed by Robert Gentleman and Ross Ihaka in 1997 at University of Auckland as a way of teaching S-PLUS. Since then new features are being continuously added to R.
Why use R?
There are many statistical analysis packages available in the market. Some notable examples include SAS, SPSS, STATISTICA, Stata, Minitab, Mathematica and MATLAB. Despite these, R has gained popularity among data mining and statisticians because of following advantages.
• R is free
• Data analysis in R is written like a computer program, hence unlike some of the point-and-click tools, the data analysis performed using R is repeatable and documented.
• Data analysis in R is easier to communicate compared to other statistical packages.
• R has excellent graphics and visualizations capabilities built-in. At the same time, new data visualizations techniques are being continuously added.
• All the standard statistical techniques and models are built right into R.
• R code can be extended and reused by creating packages. There are packages available for seemingly every analysis right from generating an R data analysis in Word, PDF to cutting edge machine learning algorithms.
• R has dedicated followers in academia and general data mining community. The community contributes new packages. There are more than 2000 packages available catering almost every need of R users. With such large and diverse community, help on R is always available.
• Being open source, R can integrate easily with other programming languages and platforms.
What R is not?
• R is not a GUI based data analysis tool. So for every data analysis, the user must write code and the code must be executed in sequence.
• R is not a general purpose programming language like C, C++, C# or Python. Although, many of the constructs from a general purpose programming languages are supported in R such as loops, functions, variables etc., it is specifically geared towards statistical analysis.
• R is not meant for guided analysis similar to OLAP or some of the other technologies such as Tableau, Quilview, PowerPivot etc. In all these technologies, the user is presented with a dataset which he can slice/dice or filter. With R, the user is forced to think about the analysis beforehand since he has to write code for each step of analysis. This method has both pros and cons. One of the advantages is that the approach makes the user to understand the dataset beforehand a little more which makes the analysis structured. Whereas a disadvantage is that user might miss some obvious patterns in the dataset.
Who and where is R used?
With recent advancement in Big Data technologies, interest in R has increased significantly. While the list below is not exhaustive, some industries which most prominently use R include Media and Advertising, Finance, e-commerce, academia and bioinformatics.
Some of the applications of R include
1. Data mining
2. Recommender systems
3. Quantitative finance and automated trading
4. Predictive modelling
5. Statistical modelling
I know C#, is it similar?
Not really. R is a functional programming language. R can have a steep learning curve and expects some basic understanding of statistics.
Can you show me some examples of R programs and graphs?
There are many resources on the internet which will show various charts generated using R. The below links contain the chart and also the code used to generate the chart.
Also check out related links below.
Where should I get started?
The best place to learn R is to download and play with it. Apart from that there are numerous beginner tutorials available on the internet to get you started. If you want to try R without installing anything you can use tryR where R code can be executed in the browser window.
1. R language home page where you can download R binary packages and installable, The Comprehensive R Archive Network (CRAN) http://cran.r-project.org/
2. This is the place to ask for help on R. Please read posting guide before sending anything to mailing lists. R Mailing List page http://www.r-project.org/mail.html
3. Quick intro to R from CRAN itself. An introduction to R http://cran.r-project.org/doc/manuals/R-intro.pdf
4. Brief intro and quick reference on R language icebreaker notes http://www.ms.unimelb.edu.au/~andrewpr/r-users/
5. You can try R and learn here without installing anything. The code runs in browser. tryR http://tryr.codeschool.com/ :
6. Step by step tutorials . R-bootcamp http://jaredknowles.com/r-bootcamp/
7. Two minute video tutorials on R topics. Fun if you are short on time. Twotorials http://www.twotorials.com/
8. A good collection of videos on R from beginner to advanced users Video tutorial on R http://jeromyanglim.blogspot.co.uk/2010/05/videos-on-data-analysis-with-r.html
9. Contains links step by step to getting started with R Learning R blog http://jeromyanglim.blogspot.co.uk/2009/06/learning-r-for-researchers-in.html
10. A blog on techniques in R and latest happenings in R world R-bloggers http://www.r-bloggers.com/ Also of interest can be http://blog.revolutionanalytics.com/
11. This discussion on stackexchange contains link to various resources on R Stackexchange resources http://stats.stackexchange.com/questions/138/resources-for-learning-r
12. This is more comprehensive book on R Introduction to Probability and Statistics Using R http://cran.r-project.org/web/packages/IPSUR/vignettes/IPSUR.pdf
13. If you are into more structured learning, this is probably the best option to learn R. Coursera course on Data Science https://www.coursera.org/specialization/#jhudatascience/1?utm_medium=catalogSpec The previous course videos are here http://www.r-bloggers.com/videos-from-courseras-four-week-course-in-r/
14. Wiki page explain what is functional programming Functional Programming http://en.wikipedia.org/wiki/Functional_programming