We will also learn sapply(), lapply() and tapply(). The apply() collection is bundled with r essential package if you install R with Anaconda. The apply() function returns a vector with the maximum for each column and conveniently uses the column names as names for this vector as well. The output of lapply() is a list. It is useful for operations on list objects and returns a list object of same length of original set. To call a function for each row in an R data frame, we shall use R apply function. lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. sapply is a user-friendly version and wrapper of lapply by default returning a vector, matrix or, if simplify = "array", an array if appropriate, by applying simplify2array(). sapply(x, f, simplify = FALSE, USE.NAMES = FALSE) is the same as lapply(x, f). If I see this file in R, I have: V1 V2 V3 V4 V5 V6 V7 1 14 25 83 64 987 45 78 2 15 65 789 32 14 NA NA 3 14 67 89 14 NA NA NA If I want the maximum value in each column, I use this: apply(df,2,max) and this is the result: V1 V2 V3 V4 V5 V6 V7 15 67 789 64 NA NA NA The difference between lapply() and apply() lies between the output return. We can use a user built-in function into lapply() or sapply(). The apply collection can be viewed as a substitute to the loop. If R doesn't find names for the dimension over which apply() runs, it returns an unnamed object instead. An if statement in R consists of three elements: if this condition is true, then carry out a certain task. tapply() is a quick way to perform this computation. mapply is a multivariate version of sapply. mapply applies FUN to the first elements of each argument, the second elements, the third elements, and so on. The apply() function is the most basic of all collection. In these cases, it may be more appropriate to match values in a lookup table. tapply() computes a measure (mean, median, min, max, etc.) or a function for each factor variable in a vector. This is an introductory post about using apply, sapply and lapply, best suited for people relatively new to R or unfamiliar with these functions. We create a function, below_average(), that takes a vector of numerical values and returns a vector that only contains the values that are strictly above the average. apply returns an array of dimension c(n, dim(X)[MARGIN]) if n > 1. For instance, measure the average or group data based on a characteristic. An R function is created by using the keyword function. They can be used for an input list, matrix or array and apply a function. We can summarize the difference between apply(), sapply() and lapply(): Apply a function to the rows or columns or both, Apply a function to all the elements of the input. The var R function computes the sample variance of a numeric input vector. Usage: apply(data_frame, 1, function, arguments_to_function_if_any). The second argument 1 represents rows, if it is 2 then the function would apply on columns. To understand how it works, let's use the iris dataset. sapply() function is more efficient than lapply() in the output returned because sapply() store values directly into a vector. In general-purpose code it is good practice to name the first three arguments. tapply, and convenience functions sweep and aggregate. To do this, you can use the match() or %in% function. The name is in upper case format. We construct a matrix with the name of the famous movies. To calculate the number of NAs in the entire data.frame, I can use sum(is.na(df)). 