Chapter 5 Vectors
The most basic data type in R is the vector.
As we mentioned previously, if we assign the number
to a variable namedx
, R will treatx
as a vector.
## [1] 42
## [1] 42
- Here,
is considered to be a vector of length 1.
Technically, there are two kinds of vectors in R:
- atomic vectors
- lists
However, it is very common to refer to atomic vectors as simply “vectors” and to refer to lists as “lists”.
Vectors that are homogenous (all elements have the same type) are technically referred to as atomic vectors in R.
- It is common to refer to any atomic vector as a vector.
- We will also refer to any atomic vector as a vector.
- R will always store data as a “collection”.
Dimension | Homogeneous | Heterogeneous |
1-Dimension | Atomic Vector | List |
2-Dimensions | Matrix | Data Frame |
>2-Dimensions | Multi-dimensional array |
There is no “0-dimensional data” in R.
Even a single-valued object is considered to be a “vector” with length 1.
5.1 Creating vectors in R
5.1.1 The concatenate function c()
The most straightforward way to create vectors in R is to use the concatenate function
This links together a group of values into a single vector.
- You can also create a single vector from multiple vectors using
- You can also create a single vector from multiple vectors using
Examples of using
to create numeric vectors are:
## [1] 1 2 3
## [1] 1 2 3 4 5
You are not limited to using numeric values with
.For example, you can use
to create a vector of characters or logicals.
## [1] "cat" "dog" "hamster"
5.1.2 colon, seq, rep: Creating vectors with specific patterns
It is often very useful to be able to create vectors with specific patterns.
The colon operator
can be used to create a sequence of numbers.The code
will create a vector of numbers starting atfrom
and increasing (or decreasing) by 1 until reachingend
## [1] 1 2 3 4 5
## [1] 22 23 24 25 26 27 28
- You can also use the colon operator to create a decreasing sequence of numbers.
## [1] 0 -1 -2 -3 -4 -5
- You can even have use a number with a decimal point as the starting or ending number (but this is not done that frequently).
## [1] 2.3 3.3 4.3 5.3 6.3
- Be careful when using something like
when creating a vector
## [1] 0 1 2 3 4 5
## [1] 1 2 3 4 5
The function seq is a useful function for creating vectors that have desired starting and ending values.
seq provides more flexibility than the colon operator
You can use seq to create a sequence with different increments than \(1\)
## [1] 1 3 5 7 9 11
## [1] 1 3 5 7 9
## [1] 1.00 3.54 6.08 8.62
- Use the
argument in seq to create an equally-spaced vector with a given length.
## [1] 1 2 3 4 5 6 7 8 9 10 11
## [1] 1 3 5 7 9 11
# using length.out is convenient
seq(21.5, 48.2, length.out=5) # don't have to work out correct increments
## [1] 21.500 28.175 34.850 41.525 48.200
(replicate) function is very useful for creating vectors that have any kind of repeated pattern.The basic form of
- rep produces a vector which repeats the vector
times number of times.
## [1] 7 7 7
## [1] 2 4 6 2 4 6 2 4 6
- Using
inside ofc()
## [1] 10 11 12 2 4 6 2 4 6 2 4 6
- Using rep with the keyword each will repeat each element of
each times before moving on to the next element ofx
## [1] 2 2 2 2 4 4 4 4 6 6 6 6
5.2 Subsets of vectors
5.2.1 Extracting vector elements
- You can extract the \(k^{th}\) element of a vector by using
- For example:
## [1] 3
## [1] 100
- You can also extract a subset of elements with indices stored by the vector
from a vector by using
- For example:
## [1] 1 5
## [1] 5 100 1250
- You can change the value of the \(k^{th}\) element of a vector by using
## [1] 1 6 5 100
- You can also update multiple elements of a vector
by placing a vector of indices inside brackets
## [1] 10 10 10 100
5.2.2 Subsetting with logical expressions
We described above how you can take a subset of a vector by specifying the vector indeces that you want for your subset.
You can also subset a vector using a logical expression rather than explicitly specifying the indeces you want.
x <- c(10, 2, 21, 15)
y <- x[x > 8] # returns all elements of x greater than 8
z <- x[x > 12] # returns all elements of x greater than 12
## [1] 10 21 15
## [1] 21 15
- You can think of the expression
x[x > 8]
as doing the following:
## [1] 10 21 15
5.3 Useful methods for vectors
- The
function can tell you how many elements are in your vector:
## [1] 9 8 7 6 5 4 3 2 1 0
## [1] 10
## [1] "integer"
## [1] 45
- R has functions which allow you to compute all the well-known summary statistics from a numeric vector.
## [1] 3
## [1] 2.5
## [1] 1.581139
## [1] 5
## [1] 1
## [1] 3
5.4 Vectors with different data types
As we mentioned before, R vectors are not limited to having numeric elements.
The main restriction for vectors is that they must have elements which are all the same type.
## [1] 1.0 2.5 42.0
## [1] "hello" "world" "biostat607"
- You can “create” a vector that has mixed data types, but R will automatically convert the types of some of the elements so that all elements have the same type.
x <- c(TRUE, FALSE, 2) ## contains logical and numeric values
print(x) ## R translates logical TRUE/FALSE into numeric 1/0
## [1] 1 0 2
x <- c(1, 2, "3") ## numeric + character
print(x) ## R translates numeric values translates into characters
## [1] "1" "2" "3"
x <- c(TRUE, 2, "3") ## logical + numeric + character
print(x) ## R translates logical and numeric values into characters
## [1] "TRUE" "2" "3"
5.4.1 Explicitly changing the data types
- You can convert a vector to another type using
, oras.character
## [1] 1 0 1 0
## [1] "0" "1" "2" "3"
- Sometimes conversion of a vector does not work
## When a character cannot be converted, it returns NA
## as an invalid number
## Warning: NAs introduced by coercion
## [1] 123.0 12.3 NA
## Warning: NAs introduced by coercion
## [1] 123 12 123 NA
5.5 Mathematical operations with vectors
- When doing mathematical operations with two vectors of the same length, R will perform addition, subtraction, multiplication, division element-by-element.
## [1] 11 7 3
## [1] 10 10 0
## [1] 10 25 0
- Multiplying or dividing a vector by a single number multiplies (or divides) each element by that number
## [1] 30 15 0 -15
## [1] 5.0 2.5 0.0 -2.5
- Adding or subtracting a vector by a single number also adds (or subtracts) each element by that number
## [1] 13 8 3 -2
5.5.1 Recycling rules
You can actually add/subtract vectors of different lengths in R.
When doing this, R recycles the values in the shorter vector.
- R will print out a warning message if the length of the longer vector is not a multiple of the shorter vector.
Specifically, when R adds two vectors (say a \(=(a[1], \ldots, a[n_{a}])\) and b\(=(b[1], \ldots, b[n_{b}])\)) with lengths \(n_{a}\) and \(n_{b}\) respectively (with \(n_{a} < n_{b}\)), R returns the following sum \[\begin{equation} \sum_{j=1}^{n_{b}} a[((j - 1)\mod n_{a}) + 1]b[j] \end{equation}\]
We can see an example of this recycling rule in R when we try to add a vector of length 3 with a vector of length 4:
## Warning in c(1, 2, 4) + c(6, 0, 9, 10): longer object length is not a multiple
## of shorter object length
## [1] 7 2 13 11
- You can think of the above code as adding the vector
c(1, 2, 4, 1)
with the vectorc(6, 0, 9, 10)
- Note that if we add a vector of length 3 with a vector of length 6 we will get no warning message.
- This is because \(6\) is a multiple of \(3\).
## [1] 7 2 13 11 13 16
The above code adds the vector
c(1, 2, 4, 1, 2, 4)
with the vectorc(6, 0, 9, 10, 11, 12)
.I personally do not use recycling rules much when the length of **both vectors* is 2 or more.
It’s probably good to be aware of recycling rules if you are getting this type of warning message.
You may find it helpful to use these recycling rules if you are, for example, adding one vector with another vector that has a simple, repeating pattern.
5.6 Set operations with vectors
You can also do set operations with vectors.
When working with set operations, you should think of the set associated with a vector as the collection of unique elements from that vector.
For example, consider the vectors
defined as
The “set” associated with
is \(\{1,2,3,4,5\}\) and the “set” associated withy
is \(\{1,3,5,7,9\}\).Then, the intersection of
using theintersect
function in R is \(\{1, 3, 5\}\)
## [1] 1 3 5
- Similarly, the
function in R computes the “union” ofx
: \(\{1,2,3,4,5,7,9\}\)
## [1] 1 2 3 4 5 7 9
- One can compute the “set difference” of two sets using
.- These are the elements that are in
but are not iny
- These are the elements that are in
## [1] 2 4
- The operation
x %in% y
returns a logical vector the same length asx
indicating whether or not each element ofx
belongs to the set of unique elements ofy
- The function
## [1] 1 NA 2 2 NA 4
5.7 NA and missing values in R
- Missing data in R is usually represented by the
value NA.
stands for “Not Available”
- You can create a vector with NA values by just
typing in
for one of the vector elements.
## [1] "double"
- You can type in NA for either numeric or character variables.
- R will automatically convert everything to the appropriate type.
## [1] "character"
Many of the built-in R functions will return
if the input numeric vector contains anyNA
values.For example, if we try to compute the standard deviation of the vector
x <- c(1, 5, NA, 4, 7) # The third element of this vector is NA
mx <- sd(x) # mx will have the value NA
## [1] NA
- You can compute the standard deviation
of the non-NA values by including the argument
na.rm = TRUE
## [1] 2.5
In the function
, the argumentna.rm
is a good example of an argument with a default value.You can see that
has a default value by looking at the function definition forsd
- The default value of
.- So, you need to include
na.rm = TRUE
if you wantsd
to ignore missing values.
- So, you need to include
5.7.1 The function
The function
is often very useful when you’re working with data that has mising values.When applied to a vector,
will return a vector of logical values with the same length as the input vector.The \(k^{th}\) element of
will beTRUE
if the \(k^{th}\) element ofx
is missing.- Otherwise, the \(k^{th}\) element of
will beFALSE
- Otherwise, the \(k^{th}\) element of
- You can also use
directly on matrices and data frames.
5.8 Exercises
Suppose we define the vector
asx <- 1:10
. What is the value ofx[ seq(1, 10,by=2)][3]
x <- rep(c(1, 5, 10), each=3)
. What is the value ofsum( x[x > 5] )
?Create a vector called
that stores the following sequence of numbers: \(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, ....\) and keeps repeating this pattern until the last number is \(10\).- What is the length of
- What is the length of
- What number is the \(35^{th}\) element of
- What number is the \(35^{th}\) element of
- What is the mean value of the numbers contained in
- What is the mean value of the numbers contained in
- How many elements of
equal \(2\)? How many elements ofxvec
equal \(7\)?
- How many elements of
- What is the sum of all the even numbers contained in
- What is the sum of all the even numbers contained in