Scripp Image Analysis Paper

375 Words1 Page

out at the Scripps Genomics Core). Analysis of the samples (~ 40 million reads per sample replication) was performed using the tuxedo protocol, which comprises the Bowtie, Tophat and the Cufflinks Suite. Preprocessing of the data involved trimming samples on the basis of sequence quality using the ‘FastX-Toolkit64 Bases having low scores were trimmed and RNA-Seq analysis was performed on the trimmed sample. Tophat was used for alignment to the mus musculus genome build NCBIM3765. Cufflinks was run on the data after alignment to obtain the RPKM values. These RPKM reads were used to calculate fold change between the HPC and PFC. Cufflinks measures transcript abundances in Fragments Per Kilobase of exon per million fragments mapped. For single end reads, as in this case, Cufflinks uses a Gaussian distribution to estimate the fragment length distribution. …show more content…

This data is used in DESeq66, which is an R Bioconductor package, to calculate differentially expressed genes between HPC and PFC. DESeq provides various statistical tests for determining differentially expressed genes in gene expression data67 The inputs for DESeq are raw counts obtained from HTSeq. DESeq takes into account the total size of each library to perform calculations on fold change as well as significance based on p-value and adjusted p-value. The transcript biotypes were obtained from the Ensembl GTF annotation file (Mus musculus genome build NCBIM37). Using the annotation file, we identified 34,379 transcripts from HPC and 32,909 transcripts from PFC. Analysis of this dataset by blasting against the EMBL database containing 2,057 lncRNAs led to the identification of 1,982 lncRNAs from HPC and 1,936 lncRNAs from PFC (Fig 1, from Kadakkuzha et al., submitted to Genome

Open Document