Friday, July 1, 2016

A Short Introduction to R

1.1. Introduction
R is a software package exercised for theused for data analysis purpose and graphical representation representation of data. It won’t be wrong to say that R can be utilized as a statistical tool. It can be used as a. R is a programming language also which make.s it very is highly flexible and extremely customizable support easy customization. Graphical tools compose R a perfect environment for tentative data analysis and for preparing the R is suitable for creating publication ready figures (exportable as .jpg files). Here All all the work is done in command style text functions and therefore it is unlike from other windows style programs like SPSS that apply menus with choose and click options for the predefined statistical processes. Once you learn the R, you can easily use it.  Learning R is a bit tricky, it is not for the beginners. It obtains substantial time to learn to use R, but once you have passed the first trouble, it is quite suitable to handle. It is not for the beginners. It is basically for the advanced users for whom the statistical functions of the Microsoft Excel are no longer sufficient. For example, if you would like to do the Principal Component Analysis, in contrast to SAS and SPSS, which are very costly commercial programs for doing statistics, R is free software. It is distributed under the GNU and GPL license terms.
The R Development Core Team is responsible for the maintains the base distribution of R. A large group of volunteers keeps adding functionality through add-on packages. A huge quantity of further functionality is executed in add-on packages authored and preserved by a large group of volunteers. The R system is available at world wide web, connect to the home page of the main source of data about the R system is the World Wide Web (WWW) with the official home page of the R project http://www.R-project.org and get full accessibility of R system.
:
http://www.R-project.org
All resources are accessible from this page: the R system itself, a collection of add-on packages, manuals, documentation and more.
1.2. Installing R
The R system is made  of two major parts: the base system and add-on packages, contributed by the users. The core R language is executed in the base system. Whereas the Implementations of statistical and graphical procedures are organized in the form of packages. A package is nothing but a collection of functions, examples and documentation. The package is designed to focus on special statistical methodology. The R software is distributed by the  Archive Network (CRAN) accessible under
http://CRAN.R-project.org
1.2.1 The Base System and the First Steps
Download the precompiled binary and install it on the local machine. For window user, the link is
Follow step –by-step  instruction given by the installer and you are done with the installation.
Starting of R  depends on the  operating system  used by the user. One can start by clicking on the R symbol (as shown below) created by the installer (Windows) or by typing ‘R’ on the shell (Unix systems).

The user can change the appearance of the prompt by:
>options(prompt = "R> ")
1.2.2 Packages
The base distribution of R comes along with these add-on packages: :
Matrix                         boot                 lattice              mgcv
rpart                 survival            KernSmoothMASS
base                 class                 cluster              codetools
compiler          datasets           foreign             grDevices
graphics           grid                  methods          nlme
nnet                 parallel             spatial              splines
stats                 stats4               tcltk                 tools
utils
These packages are used to execute standard statistical functionality, as classical tests, linear models, a vast collection of high-level plotting functions. Packages that are not offered along with the base distribution can be installed directly from the R prompt.
 For  Windows operating systems users ,there is precompiled versions of the packages, just download it  and install it on the system. Whereas in unix operating system ,, packages are first compiled locally and then  installed on the Unix systems.
1.3. Getting Started
R is a command line based language, where all commands are entered directly. R can be used as a substitute for pocket calculator in its simplest form. When you type typing4+3 into the console and press the Enter key. Here is what appears on the screen:

> 4+3
[1] 7
> 
Here the result is 4. The[1] says, “first requested element will follow”. Here, there is just one element. The > indicates that R is ready for another command.
Other simple operators include
4-3 # Subtraction
4*3 # Multiplication
4/3 # Division
4ˆ3 # Exponential
sqrt(3) # Square roots
log(3) # Logarithms (to the base e)
One can use multiple operators, e.g.
(4- 3) * 2
first subtracts 3 from 4 and then multiplies the result with 2.
Exit or quit command:
>q()
If commands are stored in an external file, say commands. R in the working directory work, they may be executed at any time in an R session with the command
>source("commands.R")
For Windows Source is also available on the File menu.  The function sink,
>sink("record.lis")
will divert all subsequent output from the console to an external file, record.lis. The command
>sink()
Restores it to the console once again.
1.4. Some R commands information
Like all UNIX based packages, R is a case sensitive appearance, language with simple syntax. when we say that the language is case sensitive, then we are saying that in R capital A and small a are different symbols and would refer to dissimilar variables.
The set of symbols used in R depends on the operating system and the country where R is being run. The alphanumeric symbols are widely used almost in all countries (and in some countries this includes accented letters) plus ‘.’ and ‘_’, there is a rule that a name must start with ‘.’ or a letter, and if it starts with ‘.’ then the second character cannot be a digit. 
Separating Commands
A new line or semi colon is used to separate commands. All Elementary commands are grouped into one compound expression by braces (‘{’ and ‘}’).
Adding comments
The comment Start with hashmark (‘#’), everything to the end of the line is a comment.
To continue the command to the next line, , R will give a different prompt, by default it is +on second and subsequent lines and it  continues to read input until the command is syntactically complete.  The length of the Command lines entered at the console are 4095 bytes.not characters).
R allow recalling and re-executing previous commands.  With the help of vertical arrow keys on the keyboard tone can scroll forward and backward through a command history.  Once a command is located, one can move the cursor within the command with the help of  horizontal arrow keys, and characters can be removed with the DEL key or added with the other keys. 
1.5. Special Values
NA
In R, the NA values are is used to signify missing values.  The full form of NA is ] “not available.”. You will find various NA  values in text loaded into R or in data loaded from the databases (to replace the NULLvalues).
When you expand the size of a vector or matrix or array further, the new spaces will have the value NA (meaning “not available”):
> s <- c(5,7,9,11)
>s
[1] 5 7 9 11
>length(s)<- 6
>s
[1]  5 7 9 11 NA NA
Inf and -Inf
If the output of the calculation is number and that too big in size, R will return Inf and –Inf for a positive and negative number respectively:
> 3^1250
[1] Inf
> -3^1250
[1] –Inf
When you divide a number by 0 this value will also return:
> 3 / 0
[1] Inf

1.6. Objects
When we carry simple calculation, it does not produce the output that is remembered by R: The answers are displayed in the console window and for further calculations with the available answer you need to give it a name and store it as an object in R.
answer<-3+2
Tells R to add 3+2 and store the answer in an object called answer. To retrieve  the stored in answer, just write the name of the object:
answer
The symbol used in the middle <-.  is the allocation symbol, or the assign symbol, it has a “less than” arrow and a hyphen <- and it looks like an arrow pointing towards “answer”. The symbol represents “make the object on the left into the output of the command on the right”.
In earlier versions of R, and in S-Plus, the underscore character  is used for allocation, so next time when you  try to use S-Plus code in R you can figure out why it doesn’t work.
One can use objects in calculations just as the numbers being used above.
answer2<-   (5.5+2)^2
answer+answer2
[1]   61.25
You can  store the results as another object.
answer3<-answer2/answer
answer3
[1]   12.25
When you first start R, you will not find any objects stored, but once you start using it for a while there might be several. You can get a list of what’s there by using the ls() function
ls()
[1]   "answer"       "answer2"     "answer3"
To remove any object from R’s memory  use rm() function.
rm(answer2)
Notice that when you type this it doesn’t ask you if you’re sure, or give you any other sort of warning, nor does it let you know whether it’s done as you asked. The object you asked it to remove has just gone: you can confirm this by using ls() again.
ls()
[1]   "answer"     "answer3"
It’s removed, sure enough. when a user try to delete an object that doesn’t exist they will receive an error message. you will often notice that while using R when you type in a command and receive command prompt popping up again. s that means there is no error. .
1.7. Functions
R is aprogramming language it is not statistical package, it is used for carrying out statistical analyses. R is enriched with variety of  short ready-made pieces of code designed for tasks such as managing the data, or perform complex mathematical operations on data, draw graphs and representing statistical analyses ranging from the simple and straightforward to the eye-wateringly complex. These pre-designed e code are called functions. The name of Each function ends in a pair of brackets, and if you to use  more straightforward functions  all oyu have to do is to type the name of the function and put the name of the object you’d like the procedure carried out on in the brackets.
The natural log of 15
            >log(15)
[1] 2.70805
e raised to the power 5
            >exp(5)
[1] 148.4132
Square root of 64
            >sqrt (64)
[1] 8
Absolute (i.e. unsigned) value of −5
            >abs (-5)
[1] 5
 for more complex calculations turn  the argument of the function (the bit between the brackets) a calculation itself:
            sin(15+answer)
you will receive the answer the sine of 15 plus whatever the value of the object “answer” is.
To ensure that the complex calculations are done in a right way, use brackets within the function’s brackets:            exp((x*3)^(1/3))
it will return the value of e raised to the power of whatever the value of x is, multiplied by 3, raised to the power 1/3.
A functions can be used for creating new objects:
            P<- 1/sqrt(y)
creates an object called “P” that has the value of 1 divided by the square root of the value of the object y.
We have only discussed about the functions that have a single argument between the brackets. One can control the way that the function operates, you can add  further arguments, by putting commas. These extra arguments will modify the way that the function is applied, or tell which part needed to use from the part of a dataset, or specify how the function should deal with missing data points:. Here is an example to explain it: With the help of the function round(), one can get rounds off a number to a certain number of decimal places. Type a number in between the brackets after the function, specify  how many decimal places to round to by adding a second argument, digits=, using a comma to separate it from the first argument.
>round(19.7564, digits=2)
[1] 19.76
>round(17.4325, digits=1)
[1] 17.4
Most R functions use default values specified for most of their arguments If a user does not  mention a number of digits for round(), R will return the number rounded off to no decimal places.
            >round(13.7784)
[1] 14
Some other examples:
            >logb(15, base=2.5)
[1] 2.955449

Here we have specified to calculate the logarithm of 15 to the base 2.5.
>signif(pi, digits=4)
[1] 3.142
>signif(pi, digits=2)
[1] 3.1
Tn the above example the argument is precisely mentioned.
1.8.1. Vectors
A vector represents a sequence of data elements of the same basic type. Members in a vector are officially called components.
R runs  on named data structures similar to numeric vector. It is a single entity that consist a collection of ordered numbers.  To set up a vector named p, consisting of four numbers, namely 11.5, 6.8, 5.2, and 25.8, use the R command
> p<- c(11.5, 6.8, 5.2, 25.8
This is an assignment declaration using the function c().  In this context c() can take a random  number  of  vector arguments. The value of c()  is  a  vector  got  by  concatenating  its arguments end to end.
Assignment can also be done by using the function assign().  A corresponding way of making the same assignment as above is:
>assign("p", c(11.5, 6.8, 5.2, 25.8))
Here is one more way for the Assignments. One can use the apparent modification in the assignment operator.  Here the same assignment could be completed using
>c(11.5, 6.8, 5.2, 25.8) -> p
When the expression is used as an absolute command, the value is printed and lost.  But when we  use the command
> 1/p
the reciprocals of the four values would be printed at the terminal
[1] 0.08695652 0.14705882 0.19230769 0.03875969
The further assignment
> y <- c(p, 0, p)
would create a vector y with 11 entries consisting of two copies of x with a zero in the middle place.
            [1] 11.5  6.8  5.2 25.8  0.0 11.5  6.8  5.2 25.8
1.8.2. Vector Arithmetic
1.8.2
Reword to 2 paragraphs with your examples
 

Vectors can be used in arithmetic expressions, where the operations are executed element by element.  It is not necessary that the vectors arising in the same expression is of the same length.  If they are not, then the value of the expression will be the vector with the same length as of the longest vector occurs in the expression. 
>p<-4.5
> q<-6.25
>p+q
[1] 10.75
The basic arithmetic operators are +, -, *, / and ^ for raising to the power.  In addition all of the regular arithmetic functions are available like log, exp, sin, cos, tan, sqrt, and so on. The max and min pick the largest and smallest elements of a vector correspondingly. The range function’s value is a vector of length two, namely c(min(p), max(p)) where length(p) is the number of elements in p. The sum(p) gives the total of the elements in p, and prod(p) calculates the product.
Thestatistical function mean(p) calculates the sample mean, which is same as sum(p)/length(p) , and var(p) which givessum((p-mean(p))^2)/(length(p)-1)or the sample variance. 
sort(p) revisits a vector of the same size as p with the elements placed in increasing order. There are other more flexible sorting commands available (see order() or sort.list() which produces a permutation to do the sorting).
In  most  cases  the  user  will  not  be  worried  if  the  “numbers”  in  a  numeric  vector are  integers,  real or even complex.  Internally the calculations are done as double precision real numbers or the double precision complex numbers if the input data are complex.
To work with the complex numbers, the output would be the warning message 
sqrt(-17)
[1] NaN
Warning message:
In sqrt(-17) : NaNs produced
But
sqrt(-17+0i)
will do the computations as complex numbers
            [1] 0+4.123106
1.8.3. Generating regular sequences
R is also used for generating the commonly used series of numbers.  For example 1:20 is the vector c(1, 2, ..., 19, 20).  Here, the colon operator (:) has the main concern within an expression. lets take another example 2*1:10 is the vector c(2, 4, ..., 18, 20).
> 1:20
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
2*1:10
 [1]  2  4  6  8 10 12 14 16 18 20

if there is a structure 10:1, then it defines to generate a sequence backwards.
> 10:1
 [1] 10  9  8  7  6  5  4  3  2  1
The function seq() used for generating the sequences. It has five arguments,.  The first two arguments, denotes beginning and finish of the sequence, The  seq(1,20) is same  vector as 1:20.
>seq(1,20)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
One can assign Arguments in named form also..  The first two arguments can be named from=value and to=value; so the seq(1,10), seq(from=1, to=10) and seq(to=10, from=1) are all  the  same  as 1:10.
In next example we have used two arguments to seq()named by=value and length=value, they specify a step size and a length for the sequence correspondingly.  If none of the argument is defined, it is taken as 1 by default ,
For example
>seq(2, 3, by=.2) -> p
>p
[1] 2.0 2.2 2.4 2.6 2.8 3.0
Similarly                          
> p1 <- seq(length=6, from=2, by=.2)
> p1
[1] 2.0 2.2 2.4 2.6 2.8 3.0
generates the same vector in p1.
The fifth argument is named along=vector, this  argument is used to create the sequence 1, 2, ..., length(vector), or the empty series if the vector is empty.
A related function is rep() as the name suggest it is used for replicating an object in various  ways.  The simplest form is
> p2 <- rep(p, times=3)
> p2
 [1] 2.0 2.2 2.4 2.6 2.8 3.0 2.0 2.2 2.4 2.6 2.8 3.0 2.0 2.2 2.4 2.6 2.8 3.0
which will put three copies of p end-to-end in p2.  Another useful version is
> p3 <- rep(p, each=3)
> p3
 [1] 2.0 2.0 2.0 2.2 2.2 2.2 2.4 2.4 2.4 2.6 2.6 2.6 2.8 2.8 2.8 3.0 3.0 3.0
which repeats each element of p three times before moving on to the next.
1.8.4. Logical Vectors
R  also supports  logical quantities operation. The logical vector may have the values TRUE(T), FALSE(F), and NA. The T and F are just variables representing TRUE and FALSE by default, but it is not preserved words and can be overwrite by the user.  Hence, you should always use TRUE and FALSE.  For example,
> x<-c(1,2,3)
> y<-c(5,6,3)
>x==y
[1] FALSE FALSE  TRUE
The logical operators are <, <=, >, >=,
It is used for == for accurate equality and != for denoting inequality.  In addition, if c1 and c2 are the logical expressions, then c1 & c2 is their intersection (“and”), c1 | c2 is their union (“or”), and !c1 is the negation of c1.


1.8.5. Character Vector
Character vectors are widely used in R, for they are defined by using  a double  quote character, e.g., "y-values", "Old Calculations".
Character strings are penetrated using either matching double (") or single (’) quotes, but for printing double quotes are used or sometimes one can print without quotes. 
The c() function is used to concatenate character vector.
The paste() function obtains an random number of arguments and concatenates them one by one into the character strings.  The arguments are by default divided in the result by a single blank character.
>pr<- paste(c("X","Y"), 1:10, sep="")
Makes pr into the character vector
>pr
 [1] "X1"  "Y2"  "X3"  "Y4"  "X5"  "Y6"  "X7"  "Y8"  "X9"  "Y10"
1.9. Matrices and arrays
A matrix is two-dimensional array of numbers. In R, the matrix is made of  elements of any type, for example, a matrix of character strings. Matrices and arrays are nothing but vectors with dimensions:
>x<- 1:9
>dim(x) <- c(3,3)
>x
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
A suitable way to create matrices is to exercise the matrix function:
>matrix(1:9,nrow=3,byrow=T)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9                                                            
The byrow=T switch causes the matrix to be filled in a rowwise rather than column wise.
The transposition function t (notice the lowercase t as resist to the uppercase T for TRUE), which turns rows into columns and vice versa:
> x <- matrix(1:9,nrow=3,byrow=T)
>rownames(x) <- LETTERS[1:3]
>x
  [,1] [,2] [,3]
A    1    2    3
B    4    5    6
C    7    8    9
Transpose of a matrix is:
            > p <- t(x)
>p
             [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
The character vector LETTERS is an integrated variable it represents capital letters A–Z.
one can attach vectors together, column wise or row wise, with  cbind and rbind functions.
>cbind(P=1:4,Q=5:8,R=9:12)
P Q  R
[1,] 1 5  9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
            >rbind(P=1:4,Q=5:8,R=9:12)
                [,1] [,2] [,3] [,4]
P    1    2    3    4
Q    5    6    7    8
R    9   10   11   12
The operator ‘*’ is used for matrix multiplication. Here both the matrices should be of same size.
>p*x
[,1] [,2] [,3]
[1,]    1    8   21
[2,]    8   25   48
[3,]   21   48   81

1.10. Factors
The statistical data  have categorical variables, that specify subdivision of data, like social class, tumor stage, Tanner stage of puberty, primary diagnosis, etc. these variables are represented with a numeric code.Such variables are indicated as factors in R.
The factor has a set of levels—states four levels for compactness.On the inside, a four-level factor consists of two items: (a) a vector of integers between 1 and 4 and (b) a character vector of length 4 enclosing strings. Here is an example:
>unique<- c(0,4,1,1,2)
>funique<- factor(unique,levels=0:3)
>levels(funique) <- c("none","more","medium","large")
The first command will generate a numeric vector , encoding the unique levels of five values. To treat this as a categorical variable, create a factor funique from it by using the function factor. This is called with one argument in addition to unique, namely levels=0:3that specifythat the input coding exercises the values 0–3. The final line is that the level names are changed to the four indicated character strings.
>funique
[1] none<NA>   more   more   medium
Levels: none more medium large
>as.numeric(funique) 
[1]  1 NA  2  2  3
>levels(funique)
[1] "none"   "more"   "medium" "large"

1.11. Lists
The list is used for merging collection of object in a larger object. The list is built from the elements of the function list.
For example, consider a set of data, and place the data in two vectors as follows:
> A <- c(7900,7090,2680,5170,6300,
+ 4875,6508,7010,6535,6250,6790)
> B <- c(5990,7270,4880,5290,5849,
+ 4640,5160,6995,7595,6005,5331)
Notice how input lines are broken and carry on the next line. If a user press Enter key while an expression is syntactically incomplete, R will keep it in continuation  on the next line and will alter its normal > prompt to the continuation prompt +. If such situation, either complete the expression on the next line or press ESC (Windows) or Ctrl-C (Unix). The “Stop” button can also be exercised under Windows.
To merge these individual vectors into a list:
>Totallist<- list(before=A,after=B)
>Totallist
$before
 [1] 7900 7090 2680 5170 6300 4875 6508 7010 6535 6250 6790
$after
 [1] 5990 7270 4880 5290 5849 4640 5160 6995 7595 6005 5331
Named elements may be extracted like this:
>Totallist$before
 [1] 7900 7090 2680 5170 6300 4875 6508 7010 6535 6250 6790
there are many built-in function in R that  calculate more than a single vector of values and return the results in list form.

1.12. Data Frames
A data frame is a two-dimensional array-like structure. Each column holds the values of one variable and each row contains one set of values from each column.
The basic characteristics of a data frame are as follows.
The column names cannot be left empty.
The row must have a unique name
The data stored in a data frame can be of numeric, factor or character type.
Each column should contain same number of data items.
Data frames helps in managing tabular data. A data frame is a natural way to represent these data sets in R.
A data frame represents a table of data. The column may differ in  type, but each row in the data frame must have the same length:
>data.frame(a=c(1,2,3,4,5,6),b=c(1,2,3,4,5))
Error in data.frame(a = c(1, 2, 3, 4, 5, 6), b = c(1, 2, 3, 4, 5)) :
arguments imply differing number of rows: 6, 5
Here is a simple example of a data frame, showing the top travel countries.:
>top_travel_countries<-data.frame(
+ country=c("India","Egypt","Norway","Switzerland",
+ "Newzeland"),
+ rank=c(1,2,18,
+ 15,25)
+ )
Here is what this data frame contains:
>top_travel_countries
country                        rank
1           India               1
2           Egypt             2
3          Norway           18
4          Switzerland   15
5          Newzeland25
Data frames are applied as lists with class data.frame:
>typeof(top_travel_countries)
[1] "list"
>class(top_travel_countries)
[1] "data.frame"

1.12.1 Names and Indexing
R object can also have names. It helps in writing readable code and self describing objects. For example, we are creating a vector with a integer sequence
1, 2, 3
and by default, there's no name.
>x<-1:3
>names(x)
NULL
>names(x) <-c(“foo”, “bar”, “norf”)
>x
foo  bar  norf  
1      2      3
>names(x)
[1] “foo” “bar””norf”


1.13. Objects and Classes
1.13.1. Description
The simple generic functions of R can be utilized for an object-oriented style of programming. Method transmit takes place based on the class of the first argument to the generic function.
1.13.2. Usage
class(x)
class(x) <- value
unclass(x)
inherits(x, what, which = FALSE)

oldClass(x)
oldClass(x) <- value

1.13.3. Arguments
x                      a R object
what, value     a character vector naming classes. value can also be NULL.
which              logical affecting return value: see ‘Details’.


Summary
After completing the chapter, you will learn how to start with R. The chapter includes help and documentation related to R. You will learn how to customize R, what is R prompt, what are the different data types in R.
The chapter also explains what are the operators, objects and  factors. How to create a new function and what are the object classes and methods.