ProbeAlign is a program for annotating ncRNA structures by incorporating high-throughput sequencing based structure probing information.
An Example Output of ProbeAlign
*ProbeAlign_Mark 32344,32429 110.921 6.72681e-11
(((((((| ((( | ||||| ))) (((((|| ||))))) | |(((((||| | |))))))))))))|
- ProbeAlign_Mark: Sequence ID (in fasta file) of the target.
- 32344,32429: Starting position and ending position of the subject sequence in the target.
- 110.921: Score.
- 6.72681e-11: P-value.
- line 2: Structure consensus of the query.
- line 3: Sequence consensus of the query. Upper case letter means the nucleotide has frquency>=50% and lower case letter means the nucleotide has frequency<50%.
- line 4: The matches between the query and the subject.
- line 5: The subject sequence.
- line 6: The (0,1)-form of the reactivities for the subject. In the alignment by using simplified structure similarity function, 1 means reactivity>=cutoff and 0 means reactivity<cutoff. In the alignment by using protocol-specific structure similarity function, 1 means the unpaired probability>=paired probability and 0 means the paired probability<unpaired probability. '=' means the reactivity for the base is undefined.
- Dataset for benchmarking ProbeAlign and CMsearch is available here.
Preprocessing of FragSeq Data
- Retrieve the two reads mapping files, for undiff and for d5np, from GEO.
- Find all the starting positions of the reads in the mouse genome (mm9). Detect possible reactive regions by extending those reactive points 300 bps to both 5' and 3' directions. Overlapped regions are merged.
- Using FragSeq_v0.0.1 to compute the reactivities in each region. The regions with more than 3 reactivities are obtained for the homology search.
- To compute the p-values, the longest 10 reactive regions for both cells are used for estimating the Gamma distribution parameters for each family. Due to the low density of FragSeq reactivities, only the largest score in a window whose size is the length of the query is used for the fitting.
- The search hits in the repeat regions (in Repbase) are removed.
- The profiles can be downloaded here. The targets and their reactivities can be downloaded here.
Scan Results for FragSeq Data
- The predicted RNA sequences are available here.
- The corresponding alignments are available here.
Please contact Shaojie Zhang if you have any suggestions.