Ordered Picking Value from 2nd Column
Introduction
In this article, we will explore a problem where you have a dataset with two columns and you need to pick the value ‘AD’ from the second column. However, the sequence of values in each row is different. We will use R programming language to solve this problem.
Problem Description
The given data has two columns, X1 and X2. The sequence of values in each row is different and we want to pick the value ‘AD’ from the second column. Here’s an example:
X1 X2
GT:GQ:GQX:DPI:AD:DP 0/1:909:12:125:93,26:119
GT:GQ:GQX:DPI:AD 0/1:909:12:125:35,24
GT:GQ:GQX:DP:DPF:AD 0/1:57:3:11:130:8,3
GT:AD:DP:GQ:PL 0/1:211,31:242:99:138,0,7251
We want to pick the value ‘AD’ from the second column.
Solution
To solve this problem, we can use R programming language. First, we need to split the values in the X2 column using strsplit function which splits a character string into substrings based on a specified separator (in this case, colon :). Then, we select ‘AD’ position from each substring by using grep with pattern “AD”.
Here’s how it can be achieved:
mapply(`[`, strsplit(d$X2, ":"), sapply(strsplit(d$X1,":"), grep, pattern="AD"))
# [1] "93,26" "35,24" "8,3" "211,31"
Code Explanation
The mapply function is a part of R’s functional programming system. The function applies a given expression to each element of the first argument.
Here’s how it works in this case:
strsplit(d$X2, ":")splits the values in the X2 column into substrings based on colon:separator.sapply(strsplit(d$X1,":"), grep, pattern="AD")appliesgrepfunction to each substring and selects ‘AD’ position from it.
Data
Let’s create a data frame d with two columns X1 and X2:
d <- structure(list(X1 = c("GT:GQ:GQX:DPI:AD:DP", "GT:GQ:GQX:DPI:AD",
"GT:GQ:GQX:DP:DPF:AD", "GT:AD:DP:GQ:PL"), X2 = c("0/1:909:12:125:93,26:119",
"0/1:909:12:125:35,24", "0/1:57:3:11:130:8,3", "0/1:211,31:242:99:138,0,7251"
)), class = "data.frame", row.names = c(NA, -4L))
Conclusion
We have successfully picked the value ‘AD’ from the second column in each row. This is a simple problem but it requires using some advanced R programming concepts such as strsplit, grep and mapply. We will also use sapply function to apply the expression over all elements of an object.
If you have any questions or need further clarification, please ask.
Last modified on 2024-01-06