Managing Data
There are many different ways to edit the variables in a dataset. Specifically, we can create and add new variables into it by using the assignment operator <-
. If we are not satisfied with the current data, we can always code it differently. These are shown below:
testData <- data.frame(x = c(1:10), y = c(11:20)) # dataframe created
testData
## x y
## 1 1 11
## 2 2 12
## 3 3 13
## 4 4 14
## 5 5 15
## 6 6 16
## 7 7 17
## 8 8 18
## 9 9 19
## 10 10 20
testData$sum <- testData$x + testData$y # variable that adds the first two columns
testData$product <- testData$x * testData$y # variable that multiplies the first two columns
testData
## x y sum product
## 1 1 11 12 11
## 2 2 12 14 24
## 3 3 13 16 39
## 4 4 14 18 56
## 5 5 15 20 75
## 6 6 16 22 96
## 7 7 17 24 119
## 8 8 18 26 144
## 9 9 19 28 171
## 10 10 20 30 200
If attach is used on the dataset, the name of the dataset can be omitted from the calling of each variable. The detach function can then be used to go back to using the dataset name. The same code from above is shown below with attach and detach:
testData <- data.frame(x = c(1:10), y = c(11:20)) # dataframe created
testData
## x y
## 1 1 11
## 2 2 12
## 3 3 13
## 4 4 14
## 5 5 15
## 6 6 16
## 7 7 17
## 8 8 18
## 9 9 19
## 10 10 20
attach(testData) # for ommitting the dataset name
testData$sum <- x + y # a variable that adds the first two columns
testData$product <- x * y # a variable that multiplies the first two columns
testData
## x y sum product
## 1 1 11 12 11
## 2 2 12 14 24
## 3 3 13 16 39
## 4 4 14 18 56
## 5 5 15 20 75
## 6 6 16 22 96
## 7 7 17 24 119
## 8 8 18 26 144
## 9 9 19 28 171
## 10 10 20 30 200
detach(testData) # again, for using the dataset name
If there is a column that has values that are counted as one group or another (such as numbers that are really factors), we can make a new column that has a column of factors as text. For example:
attach(testData)
testData$overUnder[product < 100] = "Bad" # create factor for under 100
testData$overUnder[product >= 100] = "Good" # create factor for over 100
testData
## x y sum product overUnder
## 1 1 11 12 11 Bad
## 2 2 12 14 24 Bad
## 3 3 13 16 39 Bad
## 4 4 14 18 56 Bad
## 5 5 15 20 75 Bad
## 6 6 16 22 96 Bad
## 7 7 17 24 119 Good
## 8 8 18 26 144 Good
## 9 9 19 28 171 Good
## 10 10 20 30 200 Good
detach(testData)
In this section, how to sort and merge data will be shown:
Sorting
testData$dollarAmount <- c(25, 60, 37, 57, 18, 36, 47, 37, 47, 80)
attach(testData)
sortedA <- testData[order(dollarAmount),] # sorting in ascending order
sortedA2 <- sort(testData$dollarAmount) # sort() can also be used
sortedA
## x y sum product overUnder dollarAmount
## 5 5 15 20 75 Bad 18
## 1 1 11 12 11 Bad 25
## 6 6 16 22 96 Bad 36
## 3 3 13 16 39 Bad 37
## 8 8 18 26 144 Good 37
## 7 7 17 24 119 Good 47
## 9 9 19 28 171 Good 47
## 4 4 14 18 56 Bad 57
## 2 2 12 14 24 Bad 60
## 10 10 20 30 200 Good 80
sortedA2
## [1] 18 25 36 37 37 47 47 57 60 80
sortedD <- testData[order(-dollarAmount),] # sorting in descending order
sortedD2 <- sort(testData$dollarAmount, decreasing = TRUE) # sort() can also be used
sortedD
## x y sum product overUnder dollarAmount
## 10 10 20 30 200 Good 80
## 2 2 12 14 24 Bad 60
## 4 4 14 18 56 Bad 57
## 7 7 17 24 119 Good 47
## 9 9 19 28 171 Good 47
## 3 3 13 16 39 Bad 37
## 8 8 18 26 144 Good 37
## 6 6 16 22 96 Bad 36
## 1 1 11 12 11 Bad 25
## 5 5 15 20 75 Bad 18
sortedD2
## [1] 80 60 57 47 47 37 37 36 25 18
detach(testData)
Merging
testDataset2 <- data.frame(numbers = c(1), moreNumbers = c(11))
testDataset2
## numbers moreNumbers
## 1 1 11
totalSet <- merge(testData, testDataset2) # merging two datasets
totalSet
## x y sum product overUnder dollarAmount numbers moreNumbers
## 1 1 11 12 11 Bad 25 1 11
## 2 2 12 14 24 Bad 60 1 11
## 3 3 13 16 39 Bad 37 1 11
## 4 4 14 18 56 Bad 57 1 11
## 5 5 15 20 75 Bad 18 1 11
## 6 6 16 22 96 Bad 36 1 11
## 7 7 17 24 119 Good 47 1 11
## 8 8 18 26 144 Good 37 1 11
## 9 9 19 28 171 Good 47 1 11
## 10 10 20 30 200 Good 80 1 11
Note: rbind()
and cbind()
can be used to add a row or column to a dataset. Information about these functions and the use of conditional operators can be found in Basic Syntax page.
R has powerful indexing features for creating a subset of the data. The following codes demonstrate ways to delete variables in a data frame.
totalSet[2,] # gets all of row 2
## x y sum product overUnder dollarAmount numbers moreNumbers
## 2 2 12 14 24 Bad 60 1 11
totalSet$x <- NULL # this excludes or removes the entire column
totalSet
## y sum product overUnder dollarAmount numbers moreNumbers
## 1 11 12 11 Bad 25 1 11
## 2 12 14 24 Bad 60 1 11
## 3 13 16 39 Bad 37 1 11
## 4 14 18 56 Bad 57 1 11
## 5 15 20 75 Bad 18 1 11
## 6 16 22 96 Bad 36 1 11
## 7 17 24 119 Good 47 1 11
## 8 18 26 144 Good 37 1 11
## 9 19 28 171 Good 47 1 11
## 10 20 30 200 Good 80 1 11
Sometimes you may want to subset the data by selecting rows corresponding only to certain values. Suppose, for instance, that you wanted only the rows in the above example where dollarAmount
was equal to $25, $57, or $80. Then you could use the syntax %in%
as shown below.
totalSet[totalSet$dollarAmount %in% c(25, 57, 80), ]