dynamic positional Burrows-Wheeler transform (d-PBWT)

Ahsan Sanaullah, Degui Zhi, and Shaojie Zhang

d-PBWT

The following benchmarking codes implements the d-PBWT. Two programs are provided. The first, dpbwt_fullindel.cpp, inserts all haplotypes in the input vcf into a d-PBWT in a random order, then it deletes haplotypes randomly until a user specified amount is left in the d-PBWT. The benchmarks of all operations are outputted.

The second, dpbwt_update.cpp, randomly inserts all haplotypes in the input vcf into a d-PBWT except a user specified amount. Then this user specified amount of insertions and deletions are done on the d-PBWT, the benchmarks of these operations are outputted. The benchmark insertions and deletions are interspersed randomly. The inserted haplotypes are haplotypes that from the input vcf that have not yet been inserted into the d-PBWT. The deletions are chosen randomly from the d-PBWT.

The output of these codes have the following fields: real time taken per operation (seconds), CPU time taken per operation (seconds), u and v pointers updated per operation (this does not count a haplotypes own u and v pointers), and insertion (1) or deletion(0). The fields are delimited by tabs.

Compile with std=c++17 or higher. Package. The following command may be used:
g++ -O3 -std=c++17 -o exeinx.exe inx.cpp

Where inx is in1 or in2.

After compilation, you can run the program on the sample vcfs using the following commands:
./exein1.exe -i example.vcf -o example1Time.txt -n 1
./exein2.exe -i example.vcf -o example2Time.txt -n 4

Long Match Query

The following benchmarking code implements the PBWT and d-PBWT and performs the single or triple sweep long match query. Compile with std=c++17 or higher. Package. Use gcc to compile a code with the following command:
g++ -O3 -std=c++17 -o exelmq_xswp_xpbwt.exe lmq_xswp_xpbwt.cpp

Where xswp is 3swp or 1swp and xpbwt is dpbwt or pbwt.

After compilation, you can run the program on the sample vcfs using the following command:
./exelmq_xwp_xpbwt.exe -i example.vcf -q query_example.vcf -m -L 3

To generate a random order and only use only input file with the last two haplotypes as query haplotypes:
./exelmq_xwp_xpbwt.exe -i example.vcf -n 2 -L 3 -g order.txt

To use the previously generated order with the last 5 sequences in the shuffled order as query haplotypes:
./exelmq_xwp_xpbwt.exe -i example.vcf -n 5 -L 3 -r order.txt