Remove absolute filepaths, use conda for more dependencies#5
Remove absolute filepaths, use conda for more dependencies#5EthanHolleman wants to merge 11 commits intokernco:masterfrom
Conversation
…m conda installation (seems to be run using "ChromHMM.sh")
Conda chrom hmm
…ify the conda env to use
…her ucsd tool in there and then replace rules that were calling bedToBigBed from a local path with the conda env
… current state since the shell command refers to a local script (run_spp.r) which does not seem to be included in the repo. Not sure if the rule is actually run though
Add run_spp.R script and modify spp_stats rule to use local path
|
Just tested the pipeline out with some real data in dry run mode. I modified the I think all of these should run on any machine except My So since it does not look like the ChromHMM related rules are run I was wondering if there are additional parameters that need to be specified? Thanks Colin! |
|
Hi Ethan, thanks for making all these changes! Getting rid of the absolute paths is essential to making to the pipeline portable, but it wasn't something I ever had time to finish. There is one issue with switching to use the ChromHMM conda package, which is why I hadn't been using it already. There are files ChromHMM expects to find in its own installation directory with information such as the genomic positions of TSS and exons. ChromHMM is shipped with these files for human, mouse and fruit fly, but any other genomes (such as the farm animal ones) need to be added before ChromHMM can be used on those genomes. This is why I had it set up to run ChromHMM from my own directory, because that installation of ChromHMM has the necessary files for the farm animal genomes. I sent a message to the ChromHMM devs letting them know that because of this, it was nearly impossible to integrate ChromHMM into a pipeline in a portable way. They added some arguments in v1.16 (-u and -v) to specify alternate directories for these files, but I never got around to integrating it into the pipeline. I also suggested they have a ChromHMM command to convert a GTF annotation file into the format ChromHMM uses. Unfortunately, they added a command to convert a UCSC browser gene table to their format, but not a GTF. I have some scripts for this, but they're not on GitHub. I can send them to you if you think you could integrate them into the pipeline. Basically, there would need to be rules added to convert the GTF file set as the annotation in the pipeline configuration into the ChromHMM format, and then the ChromHMM rules would need to be changed to use the -u and -v parameters with the location of where those converted files were saved. |
|
Ah, makes total sense for ChromHMM and it looks like the latest version on conda is 1.14 so that would not help with using Also pull request wise should I close and send another without ChromHMM changes? I am not sure if GitHub lets you select which commits in a request to merge if other changes look alright to you. Let me know what you think. Thanks again Collin! |
|
A recent discovery relating to conda ChromHMM. If you install the package using conda and navigate to the location of the conda environment you will find two scripts relating to ChromHmm.
Contents of So it seems those who got ChromHmm downloadable through conda had a similar issue and therefore have included a script to do the downloading presumably at some point when ChromHmm is run but I am not %100 sure of this. If that is the case getting a working conda installation version running would need to involve figuring out where ChromHmm is expecting these downloaded files to be and then putting our own files there. Additional Thought If the above is the case then if the location where conda ChromHmm expects these files can be determined a super hacky fix could be to replace the |
Hello again! I originally made a pull request called Use Bioconda ChromHMM distribution but I made some additional changes that might be useful and so I am wrapping everything up in a new pull request here.
I will try and summarize the changes I have made below. I realized after making commits that it would have made more sense to refer to "local paths" as absolute paths. The take away is that I was trying to refer to paths that where specific to one machine.
Add chromHMM.yaml
Here I basically just copied the format of other
yamlfiles for conda tools and replacedchromhmmas the dependency so it could be used in the snakemake rule files.Use conda ChromHMM in
Rules/ChromHMM.smkIt seems that once installed with conda ChromHMM can be called as
ChromHMM.sh. I first tested this in just a personal conda environment on Ubuntu so I am not sure how this may differ on other operating systems.First, I followed the format lased in other rules and added
to any rule that called ChromHMM.
Next in the shell commands, I replaced
java -mx10000M -jar /home/ckern/ChromHMM/ChromHMM.jarwhere ever it occurred withChromHMM.sh.Testing ChromHMM changes
I do not have data to fully test the pipeline with these changes but I did create a small proof of concept snakefile which is below and was named
snake-test.I created a clean conda environment with the command
conda create -y --name clean -c conda-forge -c bioconda snakemake-minimaland then executed this tiny snakefile with the commandsnakemake -s snake-test -j 1 --use-conda. This was just to test that thechromHMM.yamlenvironment file could be used successfully and that ChromHMM could be called in this way.The snakefile ran successfully and produced
results.txtwith the following content.which is what I was expecting.
Based on this I believe ChromHMM calls should execute normally. However, I have not tested on Windows or Mac OS or in a full scale run.
Remove absolute paths in
Rules/DeployTrackHub.smkLooks like some absolute filepaths have made it this file. There were a couple instances of
/home/ckern/bin/bedClipbeing used instead of justbedClip. The conda environment used by these rules already specifiesucsc-bedclipas dependency so just with that change those rules should be good to go.Similarly, in some rules
bedToBigBedwas being called with/home/ckern/bin/bedToBigBed. It is available through conda so I added it as a dependency to thebdg2bw.yamlenvironment file along withconda: '../Env/bdg2bw.yaml'to the effected rules.Use bedtools.yaml environment file in a couple rules
There were a couple rules that seemed to call bedtools but did not specify a conda environment so I just added
conda: ./Env/bedtools.yamlto these.Add note about call to
run_spp.RThis looked like another possible absolute path issue. The rule run_spp executes
Rscript /home/ckern/phantompeakqualtools/run_spp.R. I am not sure if this rule is actually run in the pipeline but if it is it would likely fail. I also did not seerun_spp.Rin thescriptsfolder so I was mostly wondering if this should be marked as depreciated.Small change for clarity to the README
I noticed that in the
run.shscript the ----cluster-config` argument has an absolute path specified so I just added a note that this would have to be changed to run on "your" machine.That is all, thanks and have a good one!