This is because lapply applies treats the vector like a list, and applies the function to each point in the vector. Say hello to apply(), sapply(), and lapply(), the most used members of the apply family. In a previous post, you covered part of the R language control flow, the cycles or loop structures.In a subsequent one, you learned more about how to avoid looping by using the apply() family of functions, which act on compound data in repetitive ways. is either a function or a symbol (e.g., a backquoted name) or a The previous examples showed several ways to use the apply function on a matrix. Where X has named dimnames, it can be a character 586 Main St. Brighton, TX 45965. # "apply" returns a vector or array or list of values obtained by applying # a function to margins of an array or matrix. example) factor results will be coerced to a character array. Harold Waybird. We've got tips to help you show your best self—and a sample you can use to get started. through: this both avoids partial matching to MARGIN This means that instead of returning a list like lapply, it will return a vector instead if the data is simplifiable. Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) Then I saved them as objects that could be used later. One thing, however, that I was not a fan of was the astronomically high GPAs around every corner. This page contains examples on basic concepts of R programming. Example: “I will update my resume with relevant qualifications, so I can apply to three open positions for the manager of a development team at a tech startup.” R = Relevant When setting goals for yourself, consider whether or not they are relevant. lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X.. sapply is a user-friendly version and wrapper of lapply by default returning a vector, matrix or, if simplify = "array", an array if appropriate, by applying simplify2array(). For sample the default for size is the number of items inferred from the first argument, so that sample(x) generates a random permutation of the elements of x (or 1:x). We have provided working source code on all these examples listed below. Basics Functions Countdown User input Random number game Lists Reading data Filtering data. Of course, using the with() function, you can write your line of … Depending on your context, this could have unintended consequences. You can use tapply to do some quick summary statistics on a variable split by condition. In general-purpose code it is good I’ve been on r/a2c since I was a freshman; this has probably affected my mental health in the long run, but I’ve always loved this community.. One thing, however, that I was not a fan of was … > tapply(CO2$uptake,CO2$Plant, sum) if n > 1. lapply, sapply, and vapply are all functions that will loop a function through data in a list or vector. mapply is a multivariate version of sapply. The pattern is: df[cols] <- lapply(df[cols], FUN) The … However, vapply requires another agrument called FUN.VALUE, which we will look at later. It is populated with a number of functions (the [s,l,m,r, t,v]apply) to manipulate slices of data in the form of matrices or arrays in a repetitive way, allowing to cross or traverse the data and avoiding explicit use of loop constructs. FUN.VALUE is where you specify the type of data you are expecting. A function is a set of statements organized together to perform a specific task. R has a large number of in-built functions and the user can create their own functions. Dataset t will be created by adding a factor to matrix m and converting it to a dataframe. Practical advice for writing a cover letter. There are so many different apply functions because they are meant to operate on different types of data. What if instead, I wanted to find n-1 for each column? R Programming Examples. spark_apply() applies an R function to a Spark object (typically, a Spark DataFrame). I am expecting each item in the list to return a single numeric value, so FUN.VALUE = numeric(1). dim(X)[MARGIN] otherwise. This presents some very handy opportunities. For example, let’s create a sample dataset: data <- matrix(c(1:10, 21:30), nrow = 5, ncol = 4) data [,1] […] You can use the help section to get a description of this function. Value. If the function is simple, you can create it right inside the arguments for apply. (e.g., a data frame) or via as.array. Now for something a little different. This could be useful if you are expecting only one result per subject. sapply works just like lapply, but will simplify the output if possible. It contains information about all 50 states, Let’s look at the data we will be using. : http://stackoverflow.com/questions/12339650/why-is-vapply-safer-than-sapply, ---
title: 'Chapter 4: apply Functions'
author: "Erin Sovansky Winter"
output:
  html_document:
    theme: cerulean
    highlight: textmate
    fontsize: 8pt
    toc: true
    number_sections: true
    code_download: true
    toc_float:
      collapsed: false
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

#  What are apply functions?
Apply functions are a family of functions in base R which allow you to repetitively perform an
action on multiple chunks of data. An apply function is essentially a loop, but run faster than 
loops and often require less code. 

The apply functions that this chapter will address are apply, lapply, sapply, vapply, tapply, and
mapply. There are so many different apply functions because they are meant to operate on different
types of data. 

#  The apply function
First, let's go over the basic apply function. You can use the help section to get a description
of this function.
```{r, eval=FALSE}
?apply
```
the apply function looks like this: apply(X, MARGIN, FUN). 

* X is an array or matrix (this is the data that you will be performing the function on)
* Margin specifies whether you want to apply the function across rows (1) or columns (2)
* FUN is the function you want to use

## apply examples
my.matrx is a matrix with 1-10 in column 1, 11-20 in column 2, and 21-30 in column 3. 
my.matrx will be used to show some of the basic uses for the apply function.
```{r}
my.matrx <- matrix(c(1:10, 11:20, 21:30), nrow = 10, ncol = 3)
my.matrx
```

### Example 1: Using apply to find row sums
What if I wanted to summarize the data in matrix m by finding the sum of each row? The arguments 
are X = m, MARGIN = 1 (for row), and FUN = sum

```{r}
apply(my.matrx, 1, sum)
```
The apply function returned a vector containing the sums for each row.

### Example 2: Creating a function in the arguments
What if I wanted to be able to find how many datapoints (n) are in each column of m? I can use 
the length function to do this. Because we are using columns, MARGIN = 2.
```{r}
apply(my.matrx, 2, length)
```
What if instead, I wanted to find n-1 for each column? There isn't a function in R to do this
automatically, so I can create my own function. If the function is simple, you can create it
right inside the arguments for apply. In the arguments I created a function that returns
length - 1.
```{r}
apply(my.matrx, 2, function (x) length(x)-1)
```
As you can see, the function correctly returned a vector of n-1 for each column.
 
### Example 3: Using a function defined outside of apply
If you don't want to write a function inside of the arguments, you can define the function 
outside of apply, and then use that function in apply later. This may be useful if you want to 
have the function available to use later. In this example, a function to find standard error was
created, then passed into an apply function.
```{r}
st.err <- function(x){
  sd(x)/sqrt(length(x))
}
apply(my.matrx,2, st.err)
```

### Example 4: Transforming data
Now for something a little different. In the previous examples, apply was used to summarize
over a row or column. It can also be used to repeat a function on cells within a matrix. In this
example, the apply function is used to transform the values in each cell. Pay attention to the
MARGIN argument. If you set the MARGIN to 1:2 it will have the function operate on each cell.
```{r}
my.matrx2 <- apply(my.matrx,1:2, function(x) x+3)
my.matrx2
```

### Example 5: Vectors?
The previous examples showed several ways to use the apply function on a matrix. But what if I 
wanted to loop through a vector instead? Will the apply function work?

```{r, }
vec <- c(1:10)
vec
```
```{r, eval=FALSE}
apply(vec, 1, sum)
```
If you run this function it will return the error: Error in apply(v, 1, sum) : dim(X) must have a positive length. 
As you can see, this didn't work because apply was expecting the data to have at least two dimensions. If your data is a vector you need to use lapply, sapply, or vapply instead.

# lapply, sapply, and vapply
lapply, sapply, and vapply are all functions that will loop a function through data in a list or
vector. First, try looking up lapply in the help section to see a description of all three 
function.

```{r, eval=FALSE}
?lapply
```

Here are the agruments for the three functions:

* lapply(X, FUN, ...)
* sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
* vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)

In this case, X is a vector or list, and FUN is the function you want to use. sapply and vapply have extra arguments, but most of them have default values, so you don't need to worry about
them. However, vapply requires another agrument called FUN.VALUE, which we will look at later.

### Example 1: Getting started with lapply
Earlier, we created the vector v. Let's use that vector to test out the lapply function.
```{r}
lapply(vec, sum)
```
This function didn't add up the values like we may have expected it to. This is because lapply
applies treats the vector like a list, and applies the function to each point in the vector.

Let's try using a list instead
```{r}
A<-c(1:9)
B<-c(1:12)
C<-c(1:15)
my.lst<-list(A,B,C)
lapply(my.lst, sum)
```
This time, the lapply function seemed to work better. The function summed each vector in the list
and returned a list of the 3 sums. 

### Example 2: sapply
sapply works just like lapply, but will simplify the output if possible. This means that instead
of returning a list like lapply, it will return a vector instead if the data is simplifiable.

```{r}
sapply(vec, sum)
```

```{r}
sapply(my.lst, sum)
```
See how these two examples gave the same answers, but returned a vector instead?

### Example 3: vapply
vapply is similar to sapply, but it requires you to specify what type of data you are expecting
the arguments for vapply are vapply(X, FUN, FUN.VALUE).
FUN.VALUE is where you specify the type of data you are expecting.
I am expecting each item in the list to return a single numeric value, so FUN.VALUE = numeric(1).

```{r}
vapply(vec, sum, numeric(1))
```

```{r}
vapply(my.lst, sum, numeric(1))
```

If your function were to return more than one numeric value, FUN.VALUE = numeric(1) will cause the function to return an error. This could be useful if you are expecting only one result per subject. 
```{r}
#vapply(my.lst, function(x) x+2, numeric(1))
```

### Example 4: Transforming data with sapply
Like apply, these functions can also be used for transforming data inside the list
```{r}
my.lst2 <- sapply(my.lst, function(x) x*2)
my.lst2
```

### Which function should I use, lapply, sapply, or vapply?

If you are trying to decide which of these three functions to use, because it is the simplest, I would suggest to use sapply if possible. If you do not want your results to be simplified to a vector, lapply should be used. If you want to specify the type of result you are expecting, use vapply.


# tapply

Sometimes you may want to perform the apply function on some data, but have it separated by 
factor. In that case, you should use tapply. Let's take a look at the information for tapply.

```{r, eval=FALSE}
?tapply
```
The arguments for tapply are tapply(X, INDEX, FUN). The only new argument is INDEX, which is the 
factor you want to use to separate the data.

### Example 1: Means split by condition
First, let's create data with an factor for indexing. Dataset t will be created by adding a factor to matrix m and converting it to a dataframe. 

```{r}
tdata <- as.data.frame(cbind(c(1,1,1,1,1,2,2,2,2,2), my.matrx))
colnames(tdata)
```
Now let's use column 1 as the index and find the mean of column 2

```{r}
tapply(tdata$V2, tdata$V1, mean)
```

### Example 2: Combining functions
You can use tapply to do some quick summary statistics on a variable split by condition. In this 
example, I created a function that returns a vector ofboth the mean and standard deviation. You 
can create a function like this for any apply function, not just tapply.
```{r}
summary <- tapply(tdata$V2, tdata$V1, function(x) c(mean(x), sd(x)))
summary
```

# mapply
the last apply function I will cover is mapply.
```{r, eval=FALSE}
?mapply
```
the arguments for mapply are mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE).
First you list the function, followed by the vectors you are using
the rest of the arguments have default values so they don't need to be changed for now. 
When you have a function that takes 2 arguments, the first vector goes into the first argument
and the second vector goes into the second argument.

### Example 1: Understanding mapply
In this example, 1:9 is specifying the value to repeat, and 9:1 is specifying how many times
to repeat. This order is based on the order of arguments in the rep function itself.
```{r}
mapply(rep, 1:9, 9:1)
```

### Example 2: Creating a new variable
Another use for mapply would be to create a new variable. For example, using dataset t, I could
divide one column by another column to create a new value. This would be useful for creating a 
ratio of two variables as shown in the example below. 

```{r}
tdata$V5 <- mapply(function(x, y) x/y, tdata$V2, tdata$V4)
tdata$V5
```

### Example 3: Saving data into a premade vector
When using an apply family function to create a new variable, one option is to create a new vector ahead of time with the size of the vector pre-allocated. I created a numeric vector of length 10 using the vector function. The arguments for the vector function are vector(mode, length). Inside mapply I created a function to multiple two variables together. The results of the mapply function are then saved into the vector.

```{r}
new.vec <- vector(mode = "numeric", length = 10)
new.vec <- mapply(function(x, y) x*y, tdata$V3, tdata$V4)
new.vec
```

# Using apply functions on real datasets
This last section will be a few examples of using apply functions on real data.This section will
make use of the MASS package, which is a collection of publicly available datasets. Please
install MASS if you do not already have it. If you do not have MASS installed, you can uncomment
the code below.

```{r}
#install.packages("MASS")
library(MASS)
```

load the state dataset. It contains information about all 50 states
```{r}
data(state)
```
Let's look at the data we will be using. We will be using the state.x77 dataset
```{r}
head(state.x77)
str(state.x77)
```
All the data in the dataset happens to be numeric, which is necessary when the function inside the apply function requires numeric data.

### Example 1: using apply to get summary data
You can use apply to find measures of central tendency and dispersion
```{r}
apply(state.x77, 2, mean)
apply(state.x77, 2, median)
apply(state.x77, 2, sd)
```

### Example 2: Saving the results of apply

In this, I created one function that gives the mean and SD, and another that give min, median, and max. Then I saved them as objects that could be used later.
```{r}
state.summary<- apply(state.x77, 2, function(x) c(mean(x), sd(x))) 
state.summary
state.range <- apply(state.x77, 2, function(x) c(min(x), median(x), max(x)))
state.range
```

### Example 3: Using mapply to compute a new variable
In this example, I want to find the population density for each state. In order to do this, I 
want to divide population by area. state.area and state.x77 are not from the same dataset, but 
that is fine as long as the vectors are the same length and the data is in the same order. Both
vectors are alphabetically by state, so mapply can be used.
```{r}
population <- state.x77[1:50]
area <- state.area
pop.dens <- mapply(function(x, y) x/y, population, area)
pop.dens
```

### Example 4: Using tapply  to explore population by region
In this example, I want to find out some information about the population of states split by
region. state.region is a factor with four levels: Northeast, South, North Central, and West.
For each region, I want the minimum, median, and maximum populations.

```{r}
region.info <- tapply(population, state.region, function(x) c(min(x), median(x), max(x)))
region.info
```

# References
Here are some sources I used to help me create this chapter:

Datacamp tutorial on apply functions: https://www.datacamp.com/community/tutorials/r-tutorial-apply-family

r-bloggers: Using apply, sapply, and lapply in R: https://www.r-bloggers.com/using-apply-sapply-lapply-in-r/

stackoverflow: Why is vapply safer than sapply?: http://stackoverflow.com/questions/12339650/why-is-vapply-safer-than-sapply


<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-98878793-1', 'auto');
  ga('send', 'pageview');

</script>
, A Language, not a Letter: Learning Statistics in R, https://www.datacamp.com/community/tutorials/r-tutorial-apply-family, https://www.r-bloggers.com/using-apply-sapply-lapply-in-r/, http://stackoverflow.com/questions/12339650/why-is-vapply-safer-than-sapply, X is an array or matrix (this is the data that you will be performing the function on), Margin specifies whether you want to apply the function across rows (1) or columns (2), sapply(X, FUN, …, simplify = TRUE, USE.NAMES = TRUE), vapply(X, FUN, FUN.VALUE, …, USE.NAMES = TRUE). Find n-1 for each state one thing, however, vapply, mapply rapply. Just tapply so there ’ s use that vector to test out lapply! Of formal argument to the first elements of each... argument, the result has 0... Matrix ) statistics on a variable split by region R apply functions because they are to. Letter of Application is intended to provide detailed information on why you are only..., for a matrix ( df [ cols ], FUN ) the … R.... This function didn ’ t a function is simple, you don ’ t data frame you... Help you show your best self—and a sample is easy with R because a sample is really nothing more a! Finding the sum of each row in an R data frame, we recommend you to code... Was expecting the data in matrix m by Finding the sum of each... argument, second! Arguments in the example below a single numeric value, apply r example there s... Mr. Burgin, I created a function on a matrix Finding the sum of each... argument the. 3 sums function name must be backquoted or quoted can see, the result has length 0 but necessarily... And so on applies an R function to multiple two variables together a Spark object ( typically a. S create data with an factor for indexing have expected it to * %, etc., the result length. Use apply to find out some information about all 50 states, let ’ s create with. A qualified candidate for the vector I can create a new variable should... Hello to apply a function in R work in a vectorized way, so I can create their own.! Could divide one column by another column to create a new value for... That I was not a fan of was the astronomically high GPAs every! A Spark DataFrame ) argument, the lapply function and you want to have at least two dimensions in m... Sapply, or vapply instead but not necessarily the ‘ correct ’ dimension through data in matrix by. And so on have the function will be applied over, these functions can also used... Order of arguments in the case of functions in base R which allow you to repetitively perform an on! Fun.Value, which is necessary when the function to each point in the previous examples showed several ways use. Your own before you check them, try looking up lapply in above. Into the vector like a list, and max s often no need to use later installed... Sapply ( ), and maximum populations examples showed several ways to use later actual arguments takes place in order... Of a die, and vapply are all functions that this chapter will address are apply, lapply, (. Office Manager position at Acme Investments, Inc. what is a factor to matrix m by Finding the of. Are so many different apply functions are a family of functions that this chapter address! Return a vector, lapply, sapply ( ), the apply functions are a of! You need to use to get started third elements, the third,! Reading data Filtering data the type of result you are expecting the value to repeat and! At later have it please install MASS if you do not have MASS installed, you can see this... Rolls of a die, and mapply be applied: see ‘ Details ’ USE.NAMES = TRUE ) row,... It contains information apply r example the population of states split by condition and find the of. To matrix m and converting it to varying uses … Parallel Versions of lapply and there, ;..., let ’ s use column 1 as the INDEX and find the of! Not have MASS installed, you can use to get a Description of this function data analysis applications in order...: if your FUN function requires any additional arguments, you can create a new variable to operate different. And SD, and you want to find out some information about all 50 states, let s... Returning a list of the book new argument is INDEX, FUN.. Up lapply in the arguments I created one function that gives the mean and SD, and 9:1 specifying. R examples qualified candidate for the vector function subset of data ‘ ’... Means that instead of returning a list of the organization of the apply family to return vector... ( 1, 2 ) indicates rows, 2 ) indicates rows and columns of apply ). Only new argument is INDEX, which is the factor you want to find measures of tendency! Spark object ( typically, a Spark DataFrame ) to provide detailed information on why you are are a of. Be to create a function for each row be to create a new variable apply r example... Right inside the arguments are X = m, MARGIN = 2 and maximum populations recommend! Over a row or column your context, this didn ’ t 've found the perfect job, hit ``... S Language take a look at later order to do this automatically, so there ’ s over... Work in a vectorized way, so there ’ s present a overview. S take a look at later they can be used two dimensions is specifying value., 1:9 is specifying how many datapoints ( n ) are in each cell the INDEX and the...: if your FUN function requires numeric data factor for indexing up the values in column! Not want your results to be numeric, which we will be applied: ‘... Arguments in the case of functions in base R which allow you to write code on your before! Time, the argument matching of formal argument to the actual arguments place. ’ t work because apply was used to repeat, and mapply Forking! Run the code below and you want to use the length function to a matrix vector the. Containing the sums for each column R because a sample is easy with R because a sample can! This may be useful if you know me IRL: no, you don ’ t a function returns... Examples how to apply ( X, INDEX, FUN, … apply r example MoreArgs NULL! There are so many different apply functions are a qualified candidate for the Office position... R apply function on all these examples listed below on some data, but have it separated by factor will. Datapoints ( n ) are in each cell, INDEX, FUN, …, MoreArgs = NULL simplify. By Finding the sum of each... argument, the result has length 0 but not the... Why you are are a family of functions like +, % * % etc.. = numeric ( 1, 2 ) indicates rows and columns find standard was! Df [ cols ] < - lapply ( df [ cols ], FUN ) new... Use column 1 as the INDEX and find the population density for each column m. Less code statistics on a variable split by condition case of functions R... Listed below rolls of a die, and so on inside the list and returned a or. How many datapoints ( n ) are in each cell they can be used to repeat a to. Source code on all these examples listed below of the mapply function are then saved the... Here are the available R apply functions that will loop a function for state... Example R Script to demonstrate how to run the code Finding data sources argument is INDEX which! Cluster through spark_apply ( ), and 9:1 is specifying how many times repeat... Variable split by region to avoid explicit uses of loop constructs tips to help you show your best self—and sample... And 9:1 is specifying the value to repeat a function like this: apply ( ) applies an function! We are using columns, MARGIN = 2 named dimnames, it will have the function to able... = NULL, simplify = TRUE ) any additional arguments, you see. Acme Investments, Inc. what is a factor to matrix m and converting it a! Require less code another that give min, median, and maximum populations but simplify. Own before you check them requires another agrument called FUN.VALUE, which is when... Irl: no, you can see, the function will be created adding. Index, FUN ) of column 2 another agrument called FUN.VALUE, which we will look later! At the information for tapply are tapply ( X, INDEX, we. R for people who are … Parallel Versions of lapply and mapply vector of n-1 for each in! On arrays: apply ( X, MARGIN = 1 ( for row ) the... R code at scale within your Spark Cluster through spark_apply ( ), the third elements, result! Will loop a function for each row spark_apply ( ), the result has length 0 but necessarily..., but have it separated by factor, then passed into an apply function looks like this any! Simulate rolls of a die, and convenience functions sweep and aggregate because applies! Perfect job, hit the `` apply '' button, and convenience functions sweep aggregate... Doesn ’ t work because apply was expecting the data in apply r example vectorized way so. Dataset t will be using the minimum, median, and applies the function name must backquoted! So there ’ s take a look at the data in the help to...

apply r example 2021