Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update BenchmarkSVs workflow(s) with some new features #199

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

rickymagner
Copy link
Contributor

This PR includes a bunch of refactoring, improvements, and new features for the process of benchmarking SV VCFs. At a high level, this includes:

  • Allow for the option to use truvari refine and truvari ga4gh for collapsing (or "harmonizing") similar events in truth/query down to one event to try to improve benchmarking statistics where calls might get mismatched due to extreme fuzziness in the calling step. This includes an alignment step using mafft as outlined in the truvari documentation on the process. The docker image is updated to include newer versions of truvari as well as mafft. This option requires the input files to be phased.
  • Splits multiallelic sites before running truvari since the tool expects this (previous this would be expected for the users to preprocess in this way, but this adds a quick convenience).
  • The QC tasks were split out into a separate workflow to streamline the benchmarking vs qc/counting tasks. Some common tasks were moved to a third file and are imported.
  • Dockstore yml has been updated to be able to import both workflows there.
  • The benchmarking stats also include a breakdown by HET vs HOM sites (and both together).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant