TESTFILES_README.txt 12/4/2012 Beth Martin martin91@uw.edu (I am writing these instructions with the assumption that this testfiles folder is in the same directory where you copied all the scripts/reference folders, and you're doing all the commands from within this testfiles directory.) Make sure you are running the latest version of python with: module load python/latest Start with the bedfile: testcoordinates.bed It looks like: chr16 57771150 57771200 chr16 57775593 57775734 chr16 57778300 57778428 chr16 57784733 57784844 chr16 57785167 57785219 chr16 57785547 57785641 chr16 57785846 57785972 chr16 57786436 57786518 chr16 57786684 57786845 chr16 57786983 57787184 chr16 57787295 57787436 chr16 57787851 57787912 chr16 57788836 57788914 chr16 57789025 57789155 chr16 57789251 57789411 chr16 57789747 57789834 chr16 57789898 57789983 chr16 57790262 57790389 chr16 57790720 57790863 Chromosome/targetstart/targetstop - there are NO HEADERS. You DO need to have the "chr" in the chromosome name in order to work. This is for the gene KATNB1. with 5bp flanking the coding regions. Now you will run the shell creation script. See the main README for an explanation of all the options. format: python /makeRunMipDesignShellScript2.py testcoordinates.bed 112 112 1 KATNB1 /hg19 /fasta_list_hg19 y /snp132.txt 0 > KATNB1.sh NOTE: the fasta_list_hg19 will have to be customized to YOUR paths to your fasta files in your genome directory. Using it as is from our server will not work. actual command line for this test: python ../makeRunMipDesignShellScript2.py testcoordinates.bed 112 112 1 KATNB1 .. ../hg19 ../fasta_list_hg19 y ../snp132.txt 0 > KATNB1.sh This will run instantly and you will get 2 files: testcoordinates.bed.scan112.all_mips.copy_counts.ranked.ranked_list.txt and KATNB1.sh = the shell script I would review your shell script and just double check that your paths came out okay. Be careful not to add line breaks into the long command lines, though. chmod 777 KATNB1.sh to make the shell script executable ./KATNB1.sh to run it This is going to take a while to run. Like at least two hours. Depending on how many regions you are designing for and your system. You will get a bunch of files: KATNB1_16_24.design_all_mips_joe2.se KATNB1_16_24.design_all_mips_joe2.so KATNB1_17_23.design_all_mips_joe2.se KATNB1_17_23.design_all_mips_joe2.so KATNB1_18_22.design_all_mips_joe2.se KATNB1_18_22.design_all_mips_joe2.so KATNB1_19_21.design_all_mips_joe2.se KATNB1_19_21.design_all_mips_joe2.so KATNB1_20_20.design_all_mips_joe2.se KATNB1_20_20.design_all_mips_joe2.so these files are the lists of every mip you could possibly have: testcoordinates.bed.ext16.lig24.scan112.all_mips testcoordinates.bed.ext17.lig23.scan112.all_mips testcoordinates.bed.ext18.lig22.scan112.all_mips testcoordinates.bed.ext19.lig21.scan112.all_mips testcoordinates.bed.ext20.lig20.scan112.all_mips Chr_snps_size.txt Next, from the second batch of commands in the shell - It's counting the amount of times the mip arms show up elsewhere in genome (you really don't want to capture repetitive regions). This is where people are running into problems if the compiled binary file genome_compare doesn't work with their system. There is an uncompiled version in the genome_compare_distrib folder. (This is the slowest part of the shell script.) KATNB1_16_24.copy_counts.se KATNB1_16_24.copy_counts.so KATNB1_17_23.copy_counts.se KATNB1_17_23.copy_counts.so KATNB1_18_22.copy_counts.se KATNB1_18_22.copy_counts.so KATNB1_19_21.copy_counts.se KATNB1_19_21.copy_counts.so KATNB1_20_20.copy_counts.se KATNB1_20_20.copy_counts.so testcoordinates.bed.ext16.lig24.scan112.all_mips.copy_counts testcoordinates.bed.ext17.lig23.scan112.all_mips.copy_counts testcoordinates.bed.ext18.lig22.scan112.all_mips.copy_counts testcoordinates.bed.ext19.lig21.scan112.all_mips.copy_counts testcoordinates.bed.ext20.lig20.scan112.all_mips.copy_counts Third batch of commands is going to score all the mips, and you will get: testcoordinates.bed.ext16.lig24.scan112.all_mips.copy_counts.ranked testcoordinates.bed.ext17.lig23.scan112.all_mips.copy_counts.ranked testcoordinates.bed.ext18.lig22.scan112.all_mips.copy_counts.ranked testcoordinates.bed.ext19.lig21.scan112.all_mips.copy_counts.ranked testcoordinates.bed.ext20.lig20.scan112.all_mips.copy_counts.ranked (this part goes really fast). Next, the last line of the shell script will go through all the ranked files and put together a select list of MIPs: testcoordinates.bed.picked_mip_probe_arms.testcoordinates.bed.scan112.all_mips.copy_counts.ranked.ranked_list.txt (There are a bunch of files with "apply_scoring_matrix_in_place4" in the filename and they will all be empty. Just ignore them.) Open up this ranked list file in an editor or import it into excel. You'll see fields for: mip_pick_count = mip number rank_score = either 5 (great!), 3 (okay), -1(bad, but you have no better options) chr ext_probe_start = extension arm start in reference sequence ext_probe_stop = extension arm stop in reference sequence ext_probe_sequence = extension arm sequence ext_copy_count = how many times you see this sequence in reference lig_probe_start = ligation arm start in reference sequence lig_probe_stop = ligation arm stop in reference sequence lig_probe_sequence = ligation arm sequence lig_copy_count = how many times you see this sequence in reference mip_target_start_position = the start of sequence that this particular MIP is capturing in reference. mip_target_stop_position = the stop of sequence that this particular MIP is capturing in reference. mip_target_sequence = the captured sequence feature_start_position = the start of the sequence that this mip was designed to feature_stop_position = the start of the sequence that this mip was designed to feature_mip_count = I don't remember what this is probe_strand = is this MIP on the + strand or - strand notes = do your mip arms have a snp in them? this is where you'd find that out check out the notes fields. if you have any mips that have a snp in the arms (and one of these mips does), you will have to run an additional script: perl ../redesign_mips_with_snps.pl -mips_file testcoordinates.bed.picked_mip_probe_arms.testcoordinates.bed.scan112.all_mips.copy_counts.ranked.ranked_list.txt -genome_dir ../hg19 -snp_file ../snp132.txt this will take a little while depending on how many chromosomes you have represented in this file. We just have chr16, so it should just take a few minutes. You'll get: testcoordinates.bed.picked_mip_probe_arms.testcoordinates.bed.scan112.all_mips.copy_counts.ranked.ranked_list.txt.fixed_snps Now you will have two mips in the same place to account for the snp in the arm. If everything looks good at this point, run the script to make the 70mers that you would send to the oligo-making people: python ../generate_70mers2.py testcoordinates.bed.picked_mip_probe_arms.testcoordinates.bed.scan112.all_mips.copy_counts.ranked.ranked_list.txt.fixed_snps KATNB1_70mers.txt You'll get: KATNB1_70mers.txt it contains the same information as the previous file, but added a column to the end which takes the ligation and extension arms and adds them to the mip backbone. You are done at this point! If you want to look at your mip designs in the UCSC browser, here is how you make a track: python ../generate_ucsc_track_colors.py testcoordinates.bed.picked_mip_probe_arms.testcoordinates.bed.scan112.all_mips.copy_counts.ranked.ranked_list.txt.fixed_snps KATNB1_ucsc.bed KATNB1 purple green You'll get: KATNB1_ucsc.bed Goto UCSC's genome browser at http://genome.ucsc.edu/cgi-bin/hgGateway make sure you're using the hg19 assembly and then click on "manage custom tracks" add custom tracks/shoose file and upload KATNB1_ucsc.bed then click on the button to go to the genome browser click on the purple and green mip tracks to expand them all the way, and zoom out a bunch so you can see all the mips covering the coding regions. IF YOURE HAVING PROBLEMS, open up your shell script and do each command line manually to see if you're getting the correct output. Then at least you can narrow down your problem. Often it is in the path or filename. The standard errors don't really seem to give you much help, you just don't get anything. email me if you have questions and I will help if I can!