Ordered Picking Value from 2nd Column

Introduction

In this article, we will explore a problem where you have a dataset with two columns and you need to pick the value ‘AD’ from the second column. However, the sequence of values in each row is different. We will use R programming language to solve this problem.

Problem Description

The given data has two columns, X1 and X2. The sequence of values in each row is different and we want to pick the value ‘AD’ from the second column. Here’s an example:

X1                      X2
GT:GQ:GQX:DPI:AD:DP     0/1:909:12:125:93,26:119
GT:GQ:GQX:DPI:AD        0/1:909:12:125:35,24
GT:GQ:GQX:DP:DPF:AD     0/1:57:3:11:130:8,3
GT:AD:DP:GQ:PL          0/1:211,31:242:99:138,0,7251

We want to pick the value ‘AD’ from the second column.

Solution

To solve this problem, we can use R programming language. First, we need to split the values in the X2 column using strsplit function which splits a character string into substrings based on a specified separator (in this case, colon :). Then, we select ‘AD’ position from each substring by using grep with pattern “AD”.

Here’s how it can be achieved:

mapply(`[`, strsplit(d$X2, ":"), sapply(strsplit(d$X1,":"), grep, pattern="AD"))
# [1] "93,26"  "35,24"  "8,3"    "211,31"

Code Explanation

The mapply function is a part of R’s functional programming system. The function applies a given expression to each element of the first argument.

Here’s how it works in this case:

strsplit(d$X2, ":") splits the values in the X2 column into substrings based on colon : separator.
sapply(strsplit(d$X1,":"), grep, pattern="AD") applies grep function to each substring and selects ‘AD’ position from it.

Data

Let’s create a data frame d with two columns X1 and X2:

d &lt;- structure(list(X1 = c("GT:GQ:GQX:DPI:AD:DP", "GT:GQ:GQX:DPI:AD", 
"GT:GQ:GQX:DP:DPF:AD", "GT:AD:DP:GQ:PL"), X2 = c("0/1:909:12:125:93,26:119", 
"0/1:909:12:125:35,24", "0/1:57:3:11:130:8,3", "0/1:211,31:242:99:138,0,7251"
)), class = "data.frame", row.names = c(NA, -4L))

Conclusion

We have successfully picked the value ‘AD’ from the second column in each row. This is a simple problem but it requires using some advanced R programming concepts such as strsplit, grep and mapply. We will also use sapply function to apply the expression over all elements of an object.

If you have any questions or need further clarification, please ask.

Last modified on 2024-01-06