Tiling Array Analysis

From Shiu Lab
Jump to: navigation, search

===Prerequisites=== Generation of ratios of signal intensities between two polypoids (or similar data), installation of tiling array package and its requisites.

===Goal=== By looking at the difference in signal intensities (between two polypoids), we wish to find regions that exhibit similar behavior (either gene retention or loss)

===First step=== We give the data generated in the previous phase (Generation of ratios) to the segment function of tilingArray. The output of the segment function will be breakpoints for the segments that exist in the given data that will maximize the likelihood with the Schwartz Criterion. The following is the code that we have used. In this case, the maximum number of segments is set to 1,000, and the maximum size of each segment is set to 500. The work has also been divided up into intervals of length 10,000. To lessen the effect of artificial breakpoints (at every 10,000), the function is also run on a series of data that has been slid by a length of 5,000 units (i.e. 5,001-15,000; 15,001-25,000; etc.). Also, borderline cases have been appended so that each series will cover the entire length of the provided data. So the first series consists of intervals (1-10,000; 10,001-20,001; ... ; 3,030,001-3,039,991) while the second series consists of intervals (5,001-15,000; 15,001-25,000; ...; 3,035,001-3,039,991).

The specified cutoffs for the number of segments, the length of the intervals, and the maximum size of each segment are not completely arbitrary, as this given configuration will utilize much of the RAM available in the machines that have been used.

library("tilingArray")
maxseg = 1000
maxk = 500
bks = 0
bks1 = 0
bks2 = 0
BIC1 = 0
BIC2 = 0

## Check for inclusion/exclusion on the endpoints !!!!!! 

## Start cases 

zz <- ratioz[1:5000,]
zz[is.nan(zz)] <- 0 

s1 = segment(zz, maxk = maxk, maxseg = maxseg)
BI = which.max(logLik(s1, penalty = "BIC"))
BIC2 <- c(BIC2, BI )
bks2 <- c(bks2, (s1@breakpoints[[BI]] ))

for(i in 1:303)
{
zz<-ratioz[(1+10000*(i-1)):(10000*i),]
zz[is.nan(zz)] <- 0
s1 = segment(zz, maxk = maxk, maxseg = maxseg)
BIC1 <- c(BIC1, which.max(logLik(s1, penalty = "BIC")) )
bks1 <- c(bks1, (s1@breakpoints[[which.max(logLik(s1, penalty = "BIC")) ]] + 10000*(i-1) ))

zz<-ratioz[(5001+10000*(i-1)):(10000*i+5000),]
zz[is.nan(zz)] <- 0
s1 = segment(zz, maxk = maxk, maxseg = maxseg)
BIC2 <- c(BIC2, which.max(logLik(s1, penalty = "BIC")) )
bks2 <- c(bks2, (s1@breakpoints[[which.max(logLik(s1, penalty = "BIC")) ]] + 5000 + 10000*(i-1) ))

}
## End cases

zz<-ratioz[(1+10000*(303)):(3039991),]
zz[is.nan(zz)] <- 0
s1 = segment(zz, maxk = maxk, maxseg = maxseg)
bks1 <- c(bks1, (s1@breakpoints[[which.max(logLik(s1, penalty = "BIC"))]] + 10000*(303) ))
BIC1 <- c(BIC1, which.max(logLik(s1, penalty = "BIC")) )

zz <- ratioz[3035001:3039991,]
zz[is.nan(zz)] <- 0
s1 = segment(zz, maxk = maxk, maxseg = maxseg)
bks2 <- c(bks2, (s1@breakpoints[[which.max(logLik(s1, penalty = "BIC"))]] + 5000 + 10000*(303) ))
BIC2 <- c(BIC2, which.max(logLik(s1, penalty = "BIC")) )

===Second step=== We compare the endpoints that result from the original data with the endpoints that result from the data that has been slided 5,000 units. We keep the endpoints that occur in both data sets, and throw away the rest.

===Results=== We generated a table that incorporated the segments data with the tiling array information. The tiling array analysis has currently been performed on two sets of data. They will be available soon.

Defects

A major defect came in the way of hardware limitations, as we were forced into choosing a configuration of parameters that did not cause memory overflow. Also, we noticed that the segment function only gave us valid number of segments when a multiple-column data was provided - so instead of providing it with a data consisting of one-column averages, we gave it a four-column data with each column corresponding to an individual difference.

References

Tiling Array Documentation [1]