Chapter 5 Vectors
The most basic data type in R is the vector.
As we mentioned previously, if we assign the number
42
to a variable namedx
, R will treatx
as a vector.
## [1] 42
## [1] 42
- Here,
x
is considered to be a vector of length 1.
Technically, there are two kinds of vectors in R:
- atomic vectors
- lists
However, it is very common to refer to atomic vectors as simply “vectors” and to refer to lists as “lists”.
Vectors that are homogenous (all elements have the same type) are technically referred to as atomic vectors in R.
- It is common to refer to any atomic vector as a vector.
- We will also refer to any atomic vector as a vector.
- R will always store data as a “collection”.
Dimension | Homogeneous | Heterogeneous |
---|---|---|
1-Dimension | Atomic Vector | List |
2-Dimensions | Matrix | Data Frame |
>2-Dimensions | Multi-dimensional array |
There is no “0-dimensional data” in R.
Even a single-valued object is considered to be a “vector” with length 1.
Source: http://adv-r.had.co.nz/Data-structures.html
5.1 Creating vectors in R
5.1.1 The concatenate function c()
The most straightforward way to create vectors in R is to use the concatenate function
c()
This links together a group of values into a single vector.
- You can also create a single vector from multiple vectors using
c
.
- You can also create a single vector from multiple vectors using
Examples of using
c()
to create numeric vectors are:
## [1] 1 2 3
## [1] 1 2 3 4 5
You are not limited to using numeric values with
c()
.For example, you can use
c()
to create a vector of characters or logicals.
## [1] "cat" "dog" "hamster"
## [1] TRUE FALSE TRUE TRUE
5.1.2 colon, seq, rep: Creating vectors with specific patterns
It is often very useful to be able to create vectors with specific patterns.
The colon operator
:
can be used to create a sequence of numbers.The code
from:end
will create a vector of numbers starting atfrom
and increasing (or decreasing) by 1 until reachingend
.Examples:
## [1] 1 2 3 4 5
## [1] 22 23 24 25 26 27 28
- You can also use the colon operator to create a decreasing sequence of numbers.
## [1] 0 -1 -2 -3 -4 -5
- You can even have use a number with a decimal point as the starting or ending number (but this is not done that frequently).
## [1] 2.3 3.3 4.3 5.3 6.3
- Be careful when using something like
a:b-1
when creating a vector
## [1] 0 1 2 3 4 5
## [1] 1 2 3 4 5
The function seq is a useful function for creating vectors that have desired starting and ending values.
seq provides more flexibility than the colon operator
:
You can use seq to create a sequence with different increments than \(1\)
## [1] 1 3 5 7 9 11
## [1] 1 3 5 7 9
## [1] 1.00 3.54 6.08 8.62
- Use the
length.out
argument in seq to create an equally-spaced vector with a given length.
## [1] 1 2 3 4 5 6 7 8 9 10 11
## [1] 1 3 5 7 9 11
# using length.out is convenient
seq(21.5, 48.2, length.out=5) # don't have to work out correct increments
## [1] 21.500 28.175 34.850 41.525 48.200
The
rep()
(replicate) function is very useful for creating vectors that have any kind of repeated pattern.The basic form of
rep
is
- rep produces a vector which repeats the vector
x
times number of times.
## [1] 7 7 7
## [1] 2 4 6 2 4 6 2 4 6
- Using
rep
inside ofc()
:
## [1] 10 11 12 2 4 6 2 4 6 2 4 6
- Using rep with the keyword each will repeat each element of
x
each times before moving on to the next element ofx
.
## [1] 2 2 2 2 4 4 4 4 6 6 6 6
5.2 Subsets of vectors
5.2.1 Extracting vector elements
- You can extract the \(k^{th}\) element of a vector by using
- For example:
## [1] 3
## [1] 100
- You can also extract a subset of elements with indices stored by the vector
vec_index
from a vector by using
- For example:
## [1] 1 5
## [1] 5 100 1250
- You can change the value of the \(k^{th}\) element of a vector by using
## [1] 1 6 5 100
- You can also update multiple elements of a vector
by placing a vector of indices inside brackets
[]
## [1] 10 10 10 100
5.2.2 Subsetting with logical expressions
We described above how you can take a subset of a vector by specifying the vector indeces that you want for your subset.
You can also subset a vector using a logical expression rather than explicitly specifying the indeces you want.
x <- c(10, 2, 21, 15)
y <- x[x > 8] # returns all elements of x greater than 8
z <- x[x > 12] # returns all elements of x greater than 12
y
## [1] 10 21 15
## [1] 21 15
- You can think of the expression
x[x > 8]
as doing the following:
## [1] 10 21 15
5.3 Useful methods for vectors
- The
length
function can tell you how many elements are in your vector:
## [1] 9 8 7 6 5 4 3 2 1 0
## [1] 10
## [1] "integer"
## [1] 45
- R has functions which allow you to compute all the well-known summary statistics from a numeric vector.
## [1] 3
## [1] 2.5
## [1] 1.581139
## [1] 5
## [1] 1
## [1] 3
5.4 Vectors with different data types
As we mentioned before, R vectors are not limited to having numeric elements.
The main restriction for vectors is that they must have elements which are all the same type.
## [1] 1.0 2.5 42.0
## [1] "hello" "world" "biostat607"
## [1] TRUE FALSE FALSE
- You can “create” a vector that has mixed data types, but R will automatically convert the types of some of the elements so that all elements have the same type.
## [1] TRUE FALSE FALSE
x <- c(TRUE, FALSE, 2) ## contains logical and numeric values
print(x) ## R translates logical TRUE/FALSE into numeric 1/0
## [1] 1 0 2
x <- c(1, 2, "3") ## numeric + character
print(x) ## R translates numeric values translates into characters
## [1] "1" "2" "3"
x <- c(TRUE, 2, "3") ## logical + numeric + character
print(x) ## R translates logical and numeric values into characters
## [1] "TRUE" "2" "3"
5.4.1 Explicitly changing the data types
- You can convert a vector to another type using
as.logical
,as.numeric
, oras.character
.
## [1] FALSE TRUE TRUE TRUE
## [1] 1 0 1 0
## [1] "0" "1" "2" "3"
- Sometimes conversion of a vector does not work
## When a character cannot be converted, it returns NA
## as an invalid number
as.numeric(c("123","12.3","123a"))
## Warning: NAs introduced by coercion
## [1] 123.0 12.3 NA
## [1] TRUE FALSE TRUE NA NA
## Warning: NAs introduced by coercion
## [1] 123 12 123 NA
5.5 Mathematical operations with vectors
- When doing mathematical operations with two vectors of the same length, R will perform addition, subtraction, multiplication, division element-by-element.
## [1] 11 7 3
## [1] 10 10 0
## [1] 10 25 0
- Multiplying or dividing a vector by a single number multiplies (or divides) each element by that number
## [1] 30 15 0 -15
## [1] 5.0 2.5 0.0 -2.5
- Adding or subtracting a vector by a single number also adds (or subtracts) each element by that number
## [1] 13 8 3 -2
5.5.1 Recycling rules
You can actually add/subtract vectors of different lengths in R.
When doing this, R recycles the values in the shorter vector.
- R will print out a warning message if the length of the longer vector is not a multiple of the shorter vector.
Specifically, when R adds two vectors (say a \(=(a[1], \ldots, a[n_{a}])\) and b\(=(b[1], \ldots, b[n_{b}])\)) with lengths \(n_{a}\) and \(n_{b}\) respectively (with \(n_{a} < n_{b}\)), R returns the following sum \[\begin{equation} \sum_{j=1}^{n_{b}} a[((j - 1)\mod n_{a}) + 1]b[j] \end{equation}\]
We can see an example of this recycling rule in R when we try to add a vector of length 3 with a vector of length 4:
## Warning in c(1, 2, 4) + c(6, 0, 9, 10): longer object length is not a multiple
## of shorter object length
## [1] 7 2 13 11
- You can think of the above code as adding the vector
c(1, 2, 4, 1)
with the vectorc(6, 0, 9, 10)
.
- Note that if we add a vector of length 3 with a vector of length 6 we will get no warning message.
- This is because \(6\) is a multiple of \(3\).
## [1] 7 2 13 11 13 16
The above code adds the vector
c(1, 2, 4, 1, 2, 4)
with the vectorc(6, 0, 9, 10, 11, 12)
.I personally do not use recycling rules much when the length of **both vectors* is 2 or more.
It’s probably good to be aware of recycling rules if you are getting this type of warning message.
You may find it helpful to use these recycling rules if you are, for example, adding one vector with another vector that has a simple, repeating pattern.
5.6 Set operations with vectors
You can also do set operations with vectors.
When working with set operations, you should think of the set associated with a vector as the collection of unique elements from that vector.
For example, consider the vectors
x
andy
defined as
The “set” associated with
x
is \(\{1,2,3,4,5\}\) and the “set” associated withy
is \(\{1,3,5,7,9\}\).Then, the intersection of
x
andy
using theintersect
function in R is \(\{1, 3, 5\}\)
## [1] 1 3 5
- Similarly, the
union
function in R computes the “union” ofx
andy
: \(\{1,2,3,4,5,7,9\}\)
## [1] 1 2 3 4 5 7 9
- One can compute the “set difference” of two sets using
setdiff
.- These are the elements that are in
x
but are not iny
.
- These are the elements that are in
## [1] 2 4
- The operation
x %in% y
returns a logical vector the same length asx
indicating whether or not each element ofx
belongs to the set of unique elements ofy
## [1] TRUE FALSE TRUE TRUE FALSE TRUE
- The function
match
## [1] 1 NA 2 2 NA 4
5.7 NA and is.na(): missing values in R
- Missing data in R is usually represented by the
value NA.
NA
stands for “Not Available”
- You can create a vector with NA values by just
typing in
NA
for one of the vector elements.
## [1] "double"
- You can type in NA for either numeric or character variables.
- R will automatically convert everything to the appropriate type.
## [1] "character"
Many of the built-in R functions will return
NA
if the input numeric vector contains anyNA
values.For example, if we try to compute the standard deviation of the vector
x
x <- c(1, 5, NA, 4, 7) # The third element of this vector is NA
mx <- sd(x) # mx will have the value NA
mx
## [1] NA
- You can compute the standard deviation
of the non-NA values by including the argument
na.rm = TRUE
## [1] 2.5
In the function
sd
, the argumentna.rm
is a good example of an argument with a default value.You can see that
na.rm
has a default value by looking at the function definition forsd
- The default value of
na.rm
isFALSE
.- So, you need to include
na.rm = TRUE
if you wantsd
to ignore missing values.
- So, you need to include
5.7.1 The function is.na()
The function
is.na()
is often very useful when you’re working with data that has mising values.When applied to a vector,
is.na()
will return a vector of logical values with the same length as the input vector.The \(k^{th}\) element of
is.na(x)
will beTRUE
if the \(k^{th}\) element ofx
is missing.- Otherwise, the \(k^{th}\) element of
is.na(x)
will beFALSE
.
- Otherwise, the \(k^{th}\) element of
## [1] FALSE FALSE FALSE TRUE FALSE TRUE
- You can also use
is.na()
directly on matrices and data frames.
5.8 Exercises
Suppose we define the vector
x
asx <- 1:10
. What is the value ofx[ seq(1, 10,by=2)][3]
?Suppose
x <- rep(c(1, 5, 10), each=3)
. What is the value ofsum( x[x > 5] )
?Create a vector called
xvec
that stores the following sequence of numbers: \(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, ....\) and keeps repeating this pattern until the last number is \(10\).- What is the length of
xvec
?
- What is the length of
- What number is the \(35^{th}\) element of
xvec
?
- What number is the \(35^{th}\) element of
- What is the mean value of the numbers contained in
xvec
?
- What is the mean value of the numbers contained in
- How many elements of
xvec
equal \(2\)? How many elements ofxvec
equal \(7\)?
- How many elements of
- What is the sum of all the even numbers contained in
xvec
?
- What is the sum of all the even numbers contained in