A Short Introduction to R
1.1.
Introduction
R is a software package exercised for theused for data analysis purpose
and graphical representation representation of data. It won’t be wrong to say
that R can be utilized as a statistical tool. It can be used as a. R is a
programming language also which make.s it very is highly flexible and extremely
customizable support easy customization. Graphical tools compose R a perfect
environment for tentative data analysis and for preparing the R is suitable for
creating publication ready figures (exportable as .jpg files). Here All all the
work is done in command style text functions and therefore it is unlike from
other windows style programs like SPSS that apply menus with choose and click
options for the predefined statistical processes. Once you learn the R, you can
easily use it. Learning R is a bit
tricky, it is not for the beginners. It obtains substantial time to learn to
use R, but once you have passed the first trouble, it is quite suitable to
handle. It is not for the beginners. It is basically for the advanced users for
whom the statistical functions of the Microsoft Excel are no longer sufficient.
For example, if you would like to do the Principal Component Analysis, in
contrast to SAS and SPSS, which are very costly commercial programs for doing
statistics, R is free software. It is distributed under the GNU and GPL license
terms.
The R Development Core Team is responsible for the maintains the base
distribution of R. A large group of volunteers keeps adding functionality
through add-on packages. A huge quantity of further functionality is executed
in add-on packages authored and preserved by a large group of volunteers. The R
system is available at world wide web, connect to the home page of the main
source of data about the R system is the World Wide Web (WWW) with the official
home page of the R project http://www.R-project.org and get full accessibility
of R system.
:
http://www.R-project.org
All resources are accessible from this page: the R system itself, a
collection of add-on packages, manuals, documentation and more.
1.2. Installing
R
The R system is made of two major
parts: the base system and add-on packages, contributed by the users. The core
R language is executed in the base system. Whereas the Implementations of
statistical and graphical procedures are organized in the form of packages. A
package is nothing but a collection of functions, examples and documentation.
The package is designed to focus on special statistical methodology. The R
software is distributed by the Archive
Network (CRAN) accessible under
http://CRAN.R-project.org
1.2.1 The Base
System and the First Steps
Download the precompiled binary and install it on the local machine. For
window user, the link is
Follow step –by-step instruction
given by the installer and you are done with the installation.
Starting of R depends on the operating system used by the user. One can start by clicking
on the R symbol (as shown below) created by the installer (Windows) or by
typing ‘R’ on the shell (Unix systems).

The user can change the appearance of the prompt by:
>options(prompt = "R> ")
1.2.2 Packages
The base
distribution of R comes along with these add-on packages: :
Matrix boot
lattice mgcv
rpart survival
KernSmoothMASS
base class
cluster codetools
compiler datasets
foreign grDevices
graphics grid
methods nlme
nnet parallel
spatial splines
stats stats4 tcltk tools
utils
These packages are used to execute standard statistical functionality, as
classical tests, linear models, a vast collection of high-level plotting
functions. Packages that are not offered along with the base distribution can
be installed directly from the R prompt.
For Windows
operating systems users ,there is precompiled versions of the packages, just download
it and install it on the system. Whereas
in unix operating system ,, packages are first compiled locally and then installed on the Unix systems.
1.3. Getting Started
R is a command
line based language, where all commands are entered directly. R can be used as
a substitute for pocket calculator in its simplest form. When you type
typing4+3 into the console and press the Enter key. Here is what appears on the
screen:
> 4+3
[1] 7
>
Here the result
is 4. The[1] says, “first requested element will follow”. Here, there is just
one element. The > indicates that R is ready for another command.
Other simple
operators include
4-3 #
Subtraction
4*3 #
Multiplication
4/3 # Division
4ˆ3 #
Exponential
sqrt(3) # Square
roots
log(3) #
Logarithms (to the base e)
One can use
multiple operators, e.g.
(4- 3) * 2
first subtracts
3 from 4 and then multiplies the result with 2.
Exit or quit command:
>q()
If commands are
stored in an external file, say commands. R in the working directory work, they
may be executed at any time in an R session with the command
>source("commands.R")
For Windows
Source is also available on the File menu.
The function sink,
>sink("record.lis")
will divert all
subsequent output from the console to an external file, record.lis. The command
>sink()
Restores it to
the console once again.
1.4. Some R commands information
Like all UNIX
based packages, R is a case sensitive appearance, language with simple syntax.
when we say that the language is case sensitive, then we are saying that in R capital
A and small a are different symbols and would refer to dissimilar variables.
The set of
symbols used in R depends on the operating system and the country where R is
being run. The alphanumeric symbols are widely used almost in all countries
(and in some countries this includes accented letters) plus ‘.’ and ‘_’, there
is a rule that a name must start with ‘.’ or a letter, and if it starts with
‘.’ then the second character cannot be a digit.
Separating Commands
A new line or
semi colon is used to separate commands. All Elementary commands are grouped
into one compound expression by braces (‘{’ and ‘}’).
Adding comments
The comment
Start with hashmark (‘#’), everything to the end of the line is a comment.
To continue the
command to the next line, , R will give a different prompt, by default it is
+on second and subsequent lines and it
continues to read input until the command is syntactically
complete. The length of the Command
lines entered at the console are 4095 bytes.not characters).
R allow
recalling and re-executing previous commands.
With the help of vertical arrow keys on the keyboard tone can scroll
forward and backward through a command history.
Once a command is located, one can move the cursor within the command
with the help of horizontal arrow keys,
and characters can be removed with the DEL key or added with the other
keys.
1.5. Special Values
NA
In R, the NA
values are is used to signify missing values.
The full form of NA is ] “not available.”. You will find various NA values in text loaded into R or in data
loaded from the databases (to replace the NULLvalues).
When you expand
the size of a vector or matrix or array further, the new spaces will have the
value NA (meaning “not available”):
> s <-
c(5,7,9,11)
>s
[1] 5 7 9 11
>length(s)<-
6
>s
[1] 5 7 9 11 NA NA
Inf and -Inf
If the output of
the calculation is number and that too big in size, R will return Inf and –Inf
for a positive and negative number respectively:
> 3^1250
[1] Inf
> -3^1250
[1] –Inf
When you divide
a number by 0 this value will also return:
> 3 / 0
[1] Inf
1.6. Objects
When we carry simple calculation, it does not produce the output that is
remembered by R: The answers are displayed in the console window and for further
calculations with the available answer you need to give it a name and store it
as an object in R.
answer<-3+2
Tells R to add 3+2 and store the answer in an object called answer. To
retrieve the stored in answer, just
write the name of the object:
answer
The symbol used in the middle <-.
is the allocation symbol, or the assign symbol, it has a “less than”
arrow and a hyphen <- and it looks like an arrow pointing towards “answer”. The
symbol represents “make the object on the left into the output of the command
on the right”.
In earlier versions of R, and in S-Plus, the underscore character is used for allocation, so next time when
you try to use S-Plus code in R you can
figure out why it doesn’t work.
One can use objects in calculations just as the numbers being used above.
answer2<- (5.5+2)^2
answer+answer2
[1] 61.25
You can store the results as
another object.
answer3<-answer2/answer
answer3
[1] 12.25
When you first start R, you will not find any objects stored, but once
you start using it for a while there might be several. You can get a list of
what’s there by using the ls() function
ls()
[1] "answer" "answer2" "answer3"
To remove any object from R’s memory
use rm() function.
rm(answer2)
Notice that when you type this it doesn’t ask you if you’re sure, or give
you any other sort of warning, nor does it let you know whether it’s done as
you asked. The object you asked it to remove has just gone: you can confirm
this by using ls() again.
ls()
[1] "answer" "answer3"
It’s removed, sure enough. when a user try to delete an object that
doesn’t exist they will receive an error message. you will often notice that
while using R when you type in a command and receive command prompt popping up
again. s that means there is no error. .
1.7. Functions
R is aprogramming language it is not statistical package, it is used for
carrying out statistical analyses. R is enriched with variety of short ready-made pieces of code designed for
tasks such as managing the data, or perform complex mathematical operations on
data, draw graphs and representing statistical analyses ranging from the simple
and straightforward to the eye-wateringly complex. These pre-designed e code
are called functions. The name of Each function ends in a pair of brackets, and
if you to use more straightforward
functions all oyu have to do is to type
the name of the function and put the name of the object you’d like the
procedure carried out on in the brackets.
The natural log of 15
>log(15)
[1] 2.70805
e raised to the power 5
>exp(5)
[1] 148.4132
Square root of 64
>sqrt (64)
[1] 8
Absolute (i.e. unsigned) value of −5
>abs (-5)
[1] 5
for more complex calculations
turn the argument of the function (the
bit between the brackets) a calculation itself:
sin(15+answer)
you will receive the answer the sine of 15 plus whatever the value of the
object “answer” is.
To ensure that the complex calculations are done in a right way, use
brackets within the function’s brackets: exp((x*3)^(1/3))
it will return the value of e raised to the power of whatever the value
of x is, multiplied by 3, raised to the power 1/3.
A functions can be used for creating new objects:
P<- 1/sqrt(y)
creates an object called “P” that has the value of 1 divided by the
square root of the value of the object y.
We have only discussed about the functions that have a single argument
between the brackets. One can control the way that the function operates, you
can add further arguments, by putting
commas. These extra arguments will modify the way that the function is applied,
or tell which part needed to use from the part of a dataset, or specify how the
function should deal with missing data points:. Here is an example to explain
it: With the help of the function round(), one can get rounds off a number to a
certain number of decimal places. Type a number in between the brackets after
the function, specify how many decimal
places to round to by adding a second argument, digits=, using a comma to
separate it from the first argument.
>round(19.7564, digits=2)
[1] 19.76
>round(17.4325, digits=1)
[1] 17.4
Most R functions use default values specified for most of their arguments
If a user does not mention a number of
digits for round(), R will return the number rounded off to no decimal places.
>round(13.7784)
[1] 14
Some other examples:
>logb(15, base=2.5)
[1] 2.955449
Here we have specified to calculate the logarithm of 15 to the base 2.5.
>signif(pi, digits=4)
[1] 3.142
>signif(pi, digits=2)
[1] 3.1
Tn the above example the argument is precisely mentioned.
1.8.1. Vectors
A vector represents a sequence of data elements of the same basic type.
Members in a vector are officially called components.
R runs on named data structures
similar to numeric vector. It is a single entity that consist a collection of
ordered numbers. To set up a vector
named p, consisting of four numbers, namely 11.5, 6.8, 5.2, and 25.8, use the R
command
> p<- c(11.5, 6.8, 5.2, 25.8
This is an assignment declaration using the function c(). In this context c() can take a random number
of vector arguments. The value of
c() is
a vector got
by concatenating its arguments end to end.
Assignment can also be done by using the function assign(). A corresponding way of making the same
assignment as above is:
>assign("p", c(11.5, 6.8, 5.2, 25.8))
Here is one more way for the Assignments. One can use the apparent
modification in the assignment operator.
Here the same assignment could be completed using
>c(11.5, 6.8, 5.2, 25.8) -> p
When the expression is used as an absolute command, the value is printed
and lost. But when we use the command
> 1/p
the reciprocals of the four values would be printed at the terminal
[1] 0.08695652 0.14705882 0.19230769 0.03875969
The further assignment
> y <- c(p, 0, p)
would create a vector y with 11 entries consisting of two copies of x
with a zero in the middle place.
[1] 11.5 6.8
5.2 25.8 0.0 11.5 6.8 5.2
25.8
1.8.2. Vector
Arithmetic
|
Vectors can be used in arithmetic expressions, where the operations are executed element by element. It is not necessary that the vectors arising in the same expression is of the same length. If they are not, then the value of the expression will be the vector with the same length as of the longest vector occurs in the expression.
>p<-4.5
> q<-6.25
>p+q
[1] 10.75
The basic arithmetic operators are +, -, *, / and ^ for raising to the
power. In addition all of the regular
arithmetic functions are available like log, exp, sin, cos, tan, sqrt, and so
on. The max and min pick the largest and smallest elements of a vector
correspondingly. The range function’s value is a vector of length two, namely
c(min(p), max(p)) where length(p) is the number of elements in p. The sum(p)
gives the total of the elements in p, and prod(p) calculates the product.
Thestatistical function mean(p) calculates the sample mean, which is same
as sum(p)/length(p) , and var(p) which givessum((p-mean(p))^2)/(length(p)-1)or
the sample variance.
sort(p) revisits a vector of the same size as p with the elements placed
in increasing order. There are other more flexible sorting commands available
(see order() or sort.list() which produces a permutation to do the sorting).
In most cases
the user will
not be worried
if the “numbers”
in a numeric
vector are integers, real or even complex. Internally the calculations are done as
double precision real numbers or the double precision complex numbers if the
input data are complex.
To work with the complex numbers, the output would be the warning message
sqrt(-17)
[1] NaN
Warning message:
In sqrt(-17) : NaNs produced
But
sqrt(-17+0i)
will do the computations as complex numbers
[1] 0+4.123106
1.8.3.
Generating regular sequences
R is also used for generating the commonly used series of numbers. For example 1:20 is the vector c(1, 2, ...,
19, 20). Here, the colon operator (:)
has the main concern within an expression. lets take another example 2*1:10 is
the vector c(2, 4, ..., 18, 20).
> 1:20
[1] 1
2 3 4 5 6
7 8 9 10 11 12 13 14 15 16 17 18 19 20
2*1:10
[1] 2
4 6 8 10 12 14 16 18 20
if there is a structure 10:1, then it defines to generate a sequence
backwards.
> 10:1
[1] 10 9
8 7 6 5 4
3 2 1
The function seq() used for generating the sequences. It has five
arguments,. The first two arguments,
denotes beginning and finish of the sequence, The seq(1,20) is same vector as 1:20.
>seq(1,20)
[1] 1
2 3 4 5 6
7 8 9 10 11 12 13 14 15 16 17 18 19 20
One can assign Arguments in named form also.. The first two arguments can be named
from=value and to=value; so the seq(1,10), seq(from=1, to=10) and seq(to=10,
from=1) are all the same
as 1:10.
In next example we have used two arguments to seq()named by=value and
length=value, they specify a step size and a length for the sequence
correspondingly. If none of the argument
is defined, it is taken as 1 by default ,
For example
>seq(2, 3, by=.2) -> p
>p
[1] 2.0 2.2 2.4 2.6 2.8 3.0
Similarly
> p1 <- seq(length=6, from=2, by=.2)
> p1
[1] 2.0 2.2 2.4 2.6 2.8 3.0
generates the same vector in p1.
The fifth argument is named along=vector, this argument is used to create the sequence 1, 2,
..., length(vector), or the empty series if the vector is empty.
A related function is rep() as the name suggest it is used for
replicating an object in various
ways. The simplest form is
> p2 <- rep(p, times=3)
> p2
[1] 2.0 2.2 2.4 2.6 2.8 3.0 2.0
2.2 2.4 2.6 2.8 3.0 2.0 2.2 2.4 2.6 2.8 3.0
which will put three copies of p end-to-end in p2. Another useful version is
> p3 <- rep(p, each=3)
> p3
[1] 2.0 2.0 2.0 2.2 2.2 2.2 2.4
2.4 2.4 2.6 2.6 2.6 2.8 2.8 2.8 3.0 3.0 3.0
which repeats each element of p three times before moving on to the next.
1.8.4. Logical
Vectors
R also supports logical quantities operation. The logical
vector may have the values TRUE(T), FALSE(F), and NA. The T and F are just
variables representing TRUE and FALSE by default, but it is not preserved words
and can be overwrite by the user. Hence,
you should always use TRUE and FALSE.
For example,
> x<-c(1,2,3)
> y<-c(5,6,3)
>x==y
[1] FALSE FALSE TRUE
The logical operators are <, <=, >, >=,
It is used for == for accurate equality and != for denoting
inequality. In addition, if c1 and c2
are the logical expressions, then c1 & c2 is their intersection (“and”), c1
| c2 is their union (“or”), and !c1 is the negation of c1.
1.8.5. Character
Vector
Character vectors are widely used in R, for they are defined by
using a double quote character, e.g., "y-values",
"Old Calculations".
Character strings are penetrated using either matching double (") or
single (’) quotes, but for printing double quotes are used or sometimes one can
print without quotes.
The c() function is used to concatenate character vector.
The paste() function obtains an random number of arguments and
concatenates them one by one into the character strings. The arguments are by default divided in the
result by a single blank character.
>pr<- paste(c("X","Y"), 1:10, sep="")
Makes pr into the character vector
>pr
[1] "X1" "Y2" "X3" "Y4" "X5" "Y6" "X7" "Y8" "X9" "Y10"
1.9. Matrices
and arrays
A matrix is two-dimensional array of numbers. In R, the matrix is made
of elements of any type, for example, a
matrix of character strings. Matrices and arrays are nothing but vectors with
dimensions:
>x<- 1:9
>dim(x) <- c(3,3)
>x
[,1] [,2] [,3]
[1,] 1 4
7
[2,] 2 5
8
[3,] 3 6
9
A suitable way to create matrices is to exercise the matrix function:
>matrix(1:9,nrow=3,byrow=T)
[,1] [,2] [,3]
[1,] 1 2
3
[2,] 4 5
6
[3,] 7 8
9
The byrow=T switch causes the matrix to be filled in a rowwise rather
than column wise.
The transposition function t (notice the lowercase t as resist to the
uppercase T for TRUE), which turns rows into columns and vice versa:
> x <- matrix(1:9,nrow=3,byrow=T)
>rownames(x) <- LETTERS[1:3]
>x
[,1] [,2] [,3]
A 1 2 3
B 4 5
6
C 7 8
9
Transpose of a matrix is:
> p <- t(x)
>p
[,1] [,2] [,3]
[1,] 1 4
7
[2,] 2 5
8
[3,] 3 6
9
The character vector LETTERS is an integrated variable it represents
capital letters A–Z.
one can attach vectors together, column wise or row wise, with cbind and rbind functions.
>cbind(P=1:4,Q=5:8,R=9:12)
P Q R
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
>rbind(P=1:4,Q=5:8,R=9:12)
[,1] [,2] [,3] [,4]
P 1 2
3 4
Q 5 6
7 8
R 9 10
11 12
The operator ‘*’ is used for matrix multiplication. Here both the
matrices should be of same size.
>p*x
[,1] [,2] [,3]
[1,] 1 8
21
[2,] 8 25
48
[3,] 21 48
81
1.10. Factors
The statistical data have categorical
variables, that specify subdivision of data, like social class, tumor stage,
Tanner stage of puberty, primary diagnosis, etc. these variables are
represented with a numeric code.Such variables are indicated as factors in R.
The factor has a set of levels—states four levels for compactness.On the
inside, a four-level factor consists of two items: (a) a vector of integers
between 1 and 4 and (b) a character vector of length 4 enclosing strings. Here
is an example:
>unique<- c(0,4,1,1,2)
>funique<- factor(unique,levels=0:3)
>levels(funique) <-
c("none","more","medium","large")
The first command will generate a numeric vector , encoding the unique
levels of five values. To treat this as a categorical variable, create a factor
funique from it by using the function factor. This is called with one argument
in addition to unique, namely levels=0:3that specifythat the input coding
exercises the values 0–3. The final line is that the level names are changed to
the four indicated character strings.
>funique
[1] none<NA> more more
medium
Levels: none more medium large
>as.numeric(funique)
[1] 1 NA 2
2 3
>levels(funique)
[1] "none"
"more"
"medium" "large"
1.11. Lists
The list is used for merging collection of object in a larger object. The
list is built from the elements of the function list.
For example, consider a set of data, and place the data in two vectors as
follows:
> A <- c(7900,7090,2680,5170,6300,
+ 4875,6508,7010,6535,6250,6790)
> B <- c(5990,7270,4880,5290,5849,
+ 4640,5160,6995,7595,6005,5331)
Notice how input lines are broken and carry on the next line. If a user
press Enter key while an expression is syntactically incomplete, R will keep it
in continuation on the next line and
will alter its normal > prompt to the continuation prompt +. If such situation,
either complete the expression on the next line or press ESC (Windows) or
Ctrl-C (Unix). The “Stop” button can also be exercised under Windows.
To merge these individual vectors into a list:
>Totallist<- list(before=A,after=B)
>Totallist
$before
[1] 7900 7090 2680 5170 6300 4875
6508 7010 6535 6250 6790
$after
[1] 5990 7270 4880 5290 5849 4640
5160 6995 7595 6005 5331
Named elements may be extracted like this:
>Totallist$before
[1] 7900 7090 2680 5170 6300 4875
6508 7010 6535 6250 6790
there are many built-in function in R that calculate more than a single vector of values
and return the results in list form.
1.12. Data
Frames
A data frame is a two-dimensional array-like structure. Each column holds
the values of one variable and each row contains one set of values from each
column.
The basic characteristics of a data frame are as follows.
The column names cannot be left empty.
The row must have a unique name
The data stored in a data frame can be of numeric, factor or character
type.
Each column should contain same number of data items.
Data frames helps in managing tabular data. A data frame is a natural way
to represent these data sets in R.
A data frame represents a table of data. The column may differ in type, but each row in the data frame must
have the same length:
>data.frame(a=c(1,2,3,4,5,6),b=c(1,2,3,4,5))
Error in data.frame(a = c(1, 2, 3, 4, 5, 6), b = c(1, 2, 3, 4, 5)) :
arguments imply differing number of rows: 6, 5
Here is a simple example of a data frame, showing the top travel
countries.:
>top_travel_countries<-data.frame(
+
country=c("India","Egypt","Norway","Switzerland",
+ "Newzeland"),
+ rank=c(1,2,18,
+ 15,25)
+ )
Here is what this data frame contains:
>top_travel_countries
country rank
1 India 1
2 Egypt 2
3 Norway 18
4 Switzerland 15
5 Newzeland25
Data frames are applied as lists with class data.frame:
>typeof(top_travel_countries)
[1] "list"
>class(top_travel_countries)
[1] "data.frame"
1.12.1
Names and Indexing
R object can also have names. It helps in writing readable code and self
describing objects. For example, we are creating a vector with a integer
sequence
1, 2, 3
and by default, there's no name.
>x<-1:3
>names(x)
NULL
>names(x) <-c(“foo”, “bar”, “norf”)
>x
foo bar norf
1 2 3
>names(x)
[1] “foo” “bar””norf”
1.13. Objects
and Classes
1.13.1.
Description
The simple generic functions of R can be utilized for an object-oriented
style of programming. Method transmit takes place based on the class of the
first argument to the generic function.
1.13.2. Usage
class(x)
class(x) <- value
unclass(x)
inherits(x, what, which = FALSE)
oldClass(x)
oldClass(x) <- value
1.13.3.
Arguments
x a R object
what, value a character vector naming classes.
value can also be NULL.
which logical affecting
return value: see ‘Details’.
Summary
After completing
the chapter, you will learn how to start with R. The chapter includes help and
documentation related to R. You will learn how to customize R, what is R
prompt, what are the different data types in R.
The chapter also
explains what are the operators, objects and
factors. How to create a new function and what are the object classes
and methods.
No comments:
Post a Comment