Viewed: 201 - Published at: a few seconds ago

[ reference nth column without referencing the column name ]

I have a list of 1000 dfs, each of which have the same first 9 column headers but the 10th column is the sample name, which is different for all 1000 dfs. I am trying to delete the rows with 0 for the 10th column, but I'm not sure what to put for the column name. Using df$V10 isn't giving me the desired results and I can't use the actual column header name because it is different for every df.

This is what I am using:

> names(t[[2]])
 [1] "CHROM"        "POS"          "ID"           "REF"          "ALT"          "QUAL"        
 [7] "FILTER"       "INFO"         "FORMAT"       "s_SRR1198016"

> names(t[[3]])
 [1] "CHROM"        "POS"          "ID"           "REF"          "ALT"          "QUAL"        
 [7] "FILTER"       "INFO"         "FORMAT"       "s_SRR1267825"

> t0 <- lapply(t, function(x) x[!(x$V10==0),])

And the result:

> head(t0[[1]])
 [1] CHROM        POS          ID           REF          ALT          QUAL         FILTER      
 [8] INFO         FORMAT       s_SRR1198015
<0 rows> (or 0-length row.names)

When I know that there are non-zero entries in the 10th column. Any suggestions for this R novice?

Answer 1

The columns or rows can either take numeric/logical/character (names) as index to subset. As the OP's list contains data.frames with column names different, we can make use of the 'numeric' index for the 10th column.

 lapply(t, function(x) x[x[,10]!=0,])