Extracting Probe Names from HTAFeatureSet Objects in R Using oligo Package

Working with HTAFeatureSet objects in R: Extracting Probe Names

As a technical blogger, I often encounter questions from readers who are working with bioinformatics data, particularly those using the oligo package in R. In this article, we will delve into how to extract probe names from an HTAFeatureSet object.

Introduction to HTAFeatureSet objects

HTAFeatureSet is a class in R that represents an expression set for high-throughput array analysis. It contains information about the experimental design, sample types, and gene expression data. The oligo package provides a convenient interface for working with these objects.

When you perform RNA sequence alignment (RMA) on an HTAFeatureSet object using functions from the oligo package, such as oligo::rma(), the resulting Expressionset object contains probe names as rows and array names as columns. However, when we want to use this matrix for downstream analysis before normalizing expression data, it’s essential to obtain a matrix with rows indicating probe names and columns indicating array names.

The Problem with HTAFeatureSet objects

Unfortunately, the HTAFeatureSet object itself does not provide information about the probe names directly. Instead, it only contains information about the array names. This can be frustrating when working with data from high-throughput arrays, as we often need to identify specific probes or genes of interest.

Solution: Extracting Probe Names

Fortunately, there is a solution to this problem. The oligo package provides a function called stArrayPmInfo() that extracts information about the array probe mapping, including the probe index and name.

Here’s an example code snippet:

probe_df <- oligo::stArrayPmInfo(your_HTAFeatureSet, target = "core")

In this example, your_HTAFeatureSet is the object containing the HTAFeatureSet data. The target argument specifies that we want to extract information for core probes.

Understanding the output

The stArrayPmInfo() function returns a data frame (probe_df) with two columns: Probe Index and Probe Name. These columns correspond to the probe index (a unique identifier for each probe) and the probe name, respectively.

For example:

   Probe Index     Probe Name
1         10          GSE12345_a
2         20          GSE12345_b
3         30          GSE12345_c

In this output, the first row corresponds to a probe with index 10, which has name “GSE12345_a”.

Using the extracted data

Once we have extracted the probe names using the stArrayPmInfo() function, we can use them to create our desired matrix. Here’s an example:

probe_matrix <- matrix(your_HTAFeatureSet$exprs, nrow = length(probe_df$Probe Name), byrow = TRUE)

In this code snippet, we’re creating a matrix (probe_matrix) from the expression data in your_HTAFeatureSet using the probe names as row indices. The nrow argument specifies the number of rows to use (i.e., the number of probe names). The byrow = TRUE argument tells R to create a matrix with rows that correspond to the probe names.

Conclusion

In this article, we explored how to extract probe names from an HTAFeatureSet object in R using the oligo package. We discussed the limitations of the HTAFeatureSet object itself and introduced a function called stArrayPmInfo() that provides information about the array probe mapping.

We also demonstrated how to use the extracted data to create our desired matrix, which we can then use for downstream analysis before normalizing expression data. By following these steps, you should be able to work with HTAFeatureSet objects more effectively and extract valuable information from your high-throughput array data.


Last modified on 2023-08-27