Skip to content
/ ped Public

A command line tool for filtering pedigree files and converting between pedigree file types

Notifications You must be signed in to change notification settings

allytrope/ped

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ped is a command line tool for filtering pedigree files and converting between pedigree file types.

ped takes a pedigree file as the first positional argument or through stdin. ped then can use a combination of proband(s) and filtering options centered around those proband(s) to pull out a subset of individuals. Lastly, ped outputs to a variety of formats.

Overview of Options

Input

Pass pedigree file as stdin or positional arg.

Input Options

Option + arg Input Type Description
(default) headered TSV Mandatory fields: child, sire, and dam. Optional: sex.
-It trios/duos Child, sire, and dam with tab-delimited columns. (default)
-Ip PLINK Plink-style .ped.

Proband Options

Option + arg Long-form Description
-f --force-probands Prevent error if one of the specified probands is not in pedigree.
-P <file> --probands-file <file> One proband per line.
-p <str> --probands <str> Comma-delimited string.

Filtering Options

Option + arg Long-form Description
-a --ancestors Ancestors only + self.
-b --descendants Descendants only + self.
-d <int> --degree <int> Maximum degree of relationship.
-m --mates Keep mates.
-n --intersection Take the intersection of relatives from all probands.
-r <float> --relationship-coefficient <float> Minimum coefficient of relationship.

Output Options

Option + arg Output Type Description
-Ol list One individual per line.
-Om matrix Coefficients of relationship as a matrix.
-Op PLINK Plink-style .ped.
-Ot trios/duos Child, sire, and dam with tab-delimited columns. (default)
-Ow pairwise Coefficients of relationship as a pairwise TSV.

Examples

# Filter pedigree to only include individuals at most distance 4 from individual "111"
ped pedigree.tsv -p 111 -d 4

# Specify multiple probands as a comma-delimited string
ped pedigree.tsv -p 111,222,333 -d 4

# Find all ancestors that are shared between individuals "111" and "222".
ped pedigree.tsv -p 111,222 -an

# Convert to PLINK-style file
ped pedigree.tsv -Op

ped will by default return an error if a proband is not in the pedigree. To process anyway, include the flag --force-probands. This can be succinctly written as so:

ped pedigree.tsv -fp 111,222,333 -d 4

Additional uses can be found by combining with other tools:

# Count how many individuals are related to proband (including proband itself)
ped pedigree.tsv -p 111 -d 4 -Ol | wc -l

# Find individuals that are not closely related to proband (including proband)
ped pedigree.tsv -Ol | grep -Fvxf <(ped pedigree.tsv -p 111 -d 4 -Ol)

# Extract only samples related to proband from BCF file
bcftools view input.bcf -S <(ped pedigree.tsv -p 111 -d 4 -Ol) --force-samples

# Find all ancestors of 333 who are also descendants of 111 (including probands themselves)
comm -12 <(ped pedigree.tsv -p 333 -a -Ol) <(ped pedigree.tsv -p 111 -b -Ol)

Options in Detail

Input pedigree

The input pedigree file should be in one of the formats described below. It can be specified as a positional argument or through stdin.

By default, ped attempts to parse the header of TSV file. The file must have an "id", "sire", and "dam" field. "sex" is also optional. Several alternative column names are also allowed. For example, "Sire", "sire", and "Father" all work to identify the same column. Other fields are not read. And any rows starting with # are skipped.

Otherwise, the -I <format> can be used to specify two other file types. Furthermore, if a file ends in .ped or .fam, it is assumed to be in a PLINK-style format. Though this won't work if passed through stdin, in which case, -Ip will need to be specified.

-It

Interprets input as a 3-columned headerless TSV of trios and duos with columns in the order child, sire, and dam. Equivalent to the output type -Ot.

-Ip

Interprets input as a PLINK-style .ped/.fam. This has the columns family, individual, sire, dam, sex, and phenotype/ Additional columns after the phenotype column are also acceptable. However, currently, only the individual, sire, dam, and sex are read. That is, columns 2 through 5.

Males are encoded as 1, females as 2, and unknown sex as 0. Additionally, all unknown fields must be 0. Equivalent to the output type -Op.

Probands

Probands are the individuals from whom relatives will be determined using the filtering methods. Only one of the following options for specifying probands can be used. Using one will also require either -d <int> or -r <float>.

-f

Without this flag, ped will return an error when one of the probands specified with -p <probands> or -P <probands_file> is not in the pedigree file.

-P <probands_file>

A file containing a list of probands, one per line. Does not need to be seekable, and so can also take a file through process substitution. -P is incompatible with -p.

-p <probands>

A comma-delimited string of probands like so -p 111,222,333. -p is incompatible with -P.

Filtering

Filtering options explain how to filter down a individuals in relation to proband(s). Thus using any of these require either -P <probands_file> or -p <probands>.

-a/-b

Can keep only ancestors and proband(s) with -a flag or only descendants and proband(s) with -b. These two flags cannot be used together.

-d <int>

This option filters on relatives with a shortest path of n or less on a tree with parent-child edges. This is the shortest, or geodesic, path. This is specified with the option -d <int> or in long form --degree <int>.

Some example values:

Value Relatives
0 Self
1 Parents, children
2 Grandparents, grandchildren, siblings
... ...

-m

This flag will include mates of individuals in the subset that might have otherwise been filtered out. This step occurs after all other filtering. This option is useful for when using output to generate a plot.

-n

When this flag is included, filtering will take the intersection of relatives for each proband. Without this flag, it is instead the union that is taken.

For example, ped pedigree.tsv -p 111,222 -an will find all ancestors that are shared between individuals 111 and 222.

-r <float>

This option keeps only relatives with a coefficient of relationship greater than or equal to the specified float. While -d <int> keeps only the shorest path to determine degree, -r <float> sums the coefficients of all paths.

Some example coefficients:

Coefficient Relatives
1 Self
0.5 Parents, children, full-siblings
0.25 Grandparents, grandchildren, half-siblings, aunt/uncle, niece/nephew, double cousin
... ...
0 All blood relatives (not necessarily all in pedigree)

While a cousin would have a coefficient of 0.125, a double cousin (being a counsin on both parents' sides) would have the coefficient applied twice and thus be 0.25.

Output

There are five output types, all passed to stdout. They are specified with the -O option as summarized below.

If not specified, the default is the trios/duos output, -Ot. In this case, each line will be a duo or trio, unless the proband is the only relative.

-Om

n x n matrix of coefficients of relationship values. First row and first column list the individual ids. Includes identity of 1.0 along the diagonal.

-Ol

The simplest output; just one individual per row.

-Op

A PLINK-styled TSV will have one row for each individual. Each row will have five columns: family, child, sire, dam, sex, and affected. The family id will be assigned "1" and affected status as 0. The sex field uses 1 for males and 2 for females. Any missing entries are also filled with 0.

-Ot

Lists duos and trios as a TSV. Also condenses rows so that if an individual has no recorded parent, but is the parent of another, it will not have its own row. This means that there will usually be fewer rows than total individuals. Fields with missing parents are left blank.

-Ow

Lists individuals pairwise with their corresponding coefficients of relationship. Includes rows for comparing individuals to themselves (which will always be 1.0).

Installation

The binary can be downloaded from the release page. No dependencies are required this way.

Otherwise to compile, first download Nim and nimble install docopt. Then run:

nim c --define:release ped.nim

ped has been tested on Nim v2.2.

About

A command line tool for filtering pedigree files and converting between pedigree file types

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages