ped is a command line tool for filtering pedigree files and converting between pedigree file types.
ped takes a pedigree file as the first positional argument or through stdin. ped then can use a combination of proband(s) and filtering options centered around those proband(s) to pull out a subset of individuals. Lastly, ped outputs to a variety of formats.
Pass pedigree file as stdin or positional arg.
| Option + arg | Input Type | Description |
|---|---|---|
| (default) | headered TSV | Mandatory fields: child, sire, and dam. Optional: sex. |
-It |
trios/duos | Child, sire, and dam with tab-delimited columns. (default) |
-Ip |
PLINK | Plink-style .ped. |
| Option + arg | Long-form | Description |
|---|---|---|
-f |
--force-probands |
Prevent error if one of the specified probands is not in pedigree. |
-P <file> |
--probands-file <file> |
One proband per line. |
-p <str> |
--probands <str> |
Comma-delimited string. |
| Option + arg | Long-form | Description |
|---|---|---|
-a |
--ancestors |
Ancestors only + self. |
-b |
--descendants |
Descendants only + self. |
-d <int> |
--degree <int> |
Maximum degree of relationship. |
-m |
--mates |
Keep mates. |
-n |
--intersection |
Take the intersection of relatives from all probands. |
-r <float> |
--relationship-coefficient <float> |
Minimum coefficient of relationship. |
| Option + arg | Output Type | Description |
|---|---|---|
-Ol |
list | One individual per line. |
-Om |
matrix | Coefficients of relationship as a matrix. |
-Op |
PLINK | Plink-style .ped. |
-Ot |
trios/duos | Child, sire, and dam with tab-delimited columns. (default) |
-Ow |
pairwise | Coefficients of relationship as a pairwise TSV. |
# Filter pedigree to only include individuals at most distance 4 from individual "111"
ped pedigree.tsv -p 111 -d 4
# Specify multiple probands as a comma-delimited string
ped pedigree.tsv -p 111,222,333 -d 4
# Find all ancestors that are shared between individuals "111" and "222".
ped pedigree.tsv -p 111,222 -an
# Convert to PLINK-style file
ped pedigree.tsv -Op
ped will by default return an error if a proband is not in the pedigree. To process anyway, include the flag --force-probands. This can be succinctly written as so:
ped pedigree.tsv -fp 111,222,333 -d 4
Additional uses can be found by combining with other tools:
# Count how many individuals are related to proband (including proband itself)
ped pedigree.tsv -p 111 -d 4 -Ol | wc -l
# Find individuals that are not closely related to proband (including proband)
ped pedigree.tsv -Ol | grep -Fvxf <(ped pedigree.tsv -p 111 -d 4 -Ol)
# Extract only samples related to proband from BCF file
bcftools view input.bcf -S <(ped pedigree.tsv -p 111 -d 4 -Ol) --force-samples
# Find all ancestors of 333 who are also descendants of 111 (including probands themselves)
comm -12 <(ped pedigree.tsv -p 333 -a -Ol) <(ped pedigree.tsv -p 111 -b -Ol)
The input pedigree file should be in one of the formats described below. It can be specified as a positional argument or through stdin.
By default, ped attempts to parse the header of TSV file. The file must have an "id", "sire", and "dam" field. "sex" is also optional. Several alternative column names are also allowed. For example, "Sire", "sire", and "Father" all work to identify the same column. Other fields are not read. And any rows starting with # are skipped.
Otherwise, the -I <format> can be used to specify two other file types. Furthermore, if a file ends in .ped or .fam, it is assumed to be in a PLINK-style format. Though this won't work if passed through stdin, in which case, -Ip will need to be specified.
Interprets input as a 3-columned headerless TSV of trios and duos with columns in the order child, sire, and dam. Equivalent to the output type -Ot.
Interprets input as a PLINK-style .ped/.fam. This has the columns family, individual, sire, dam, sex, and phenotype/ Additional columns after the phenotype column are also acceptable. However, currently, only the individual, sire, dam, and sex are read. That is, columns 2 through 5.
Males are encoded as 1, females as 2, and unknown sex as 0. Additionally, all unknown fields must be 0. Equivalent to the output type -Op.
Probands are the individuals from whom relatives will be determined using the filtering methods. Only one of the following options for specifying probands can be used. Using one will also require either -d <int> or -r <float>.
Without this flag, ped will return an error when one of the probands specified with -p <probands> or -P <probands_file> is not in the pedigree file.
A file containing a list of probands, one per line.
Does not need to be seekable, and so can also take a file through process substitution.
-P is incompatible with -p.
A comma-delimited string of probands like so -p 111,222,333.
-p is incompatible with -P.
Filtering options explain how to filter down a individuals in relation to proband(s). Thus using any of these require either -P <probands_file> or -p <probands>.
Can keep only ancestors and proband(s) with -a flag or only descendants and proband(s) with -b.
These two flags cannot be used together.
This option filters on relatives with a shortest path of n or less on a tree with parent-child edges. This is the shortest, or geodesic, path. This is specified with the option -d <int> or in long form --degree <int>.
Some example values:
| Value | Relatives |
|---|---|
0 |
Self |
1 |
Parents, children |
2 |
Grandparents, grandchildren, siblings |
| ... | ... |
This flag will include mates of individuals in the subset that might have otherwise been filtered out. This step occurs after all other filtering. This option is useful for when using output to generate a plot.
When this flag is included, filtering will take the intersection of relatives for each proband. Without this flag, it is instead the union that is taken.
For example, ped pedigree.tsv -p 111,222 -an will find all ancestors that are shared between individuals 111 and 222.
This option keeps only relatives with a coefficient of relationship greater than or equal to the specified float. While -d <int> keeps only the shorest path to determine degree, -r <float> sums the coefficients of all paths.
Some example coefficients:
| Coefficient | Relatives |
|---|---|
1 |
Self |
0.5 |
Parents, children, full-siblings |
0.25 |
Grandparents, grandchildren, half-siblings, aunt/uncle, niece/nephew, double cousin |
| ... | ... |
0 |
All blood relatives (not necessarily all in pedigree) |
While a cousin would have a coefficient of 0.125, a double cousin (being a counsin on both parents' sides) would have the coefficient applied twice and thus be 0.25.
There are five output types, all passed to stdout. They are specified with the -O option as summarized below.
If not specified, the default is the trios/duos output, -Ot.
In this case, each line will be a duo or trio, unless the proband is the only relative.
n x n matrix of coefficients of relationship values. First row and first column list the individual ids. Includes identity of 1.0 along the diagonal.
The simplest output; just one individual per row.
A PLINK-styled TSV will have one row for each individual.
Each row will have five columns: family, child, sire, dam, sex, and affected.
The family id will be assigned "1" and affected status as 0. The sex field uses 1 for males and 2 for females.
Any missing entries are also filled with 0.
Lists duos and trios as a TSV. Also condenses rows so that if an individual has no recorded parent, but is the parent of another, it will not have its own row. This means that there will usually be fewer rows than total individuals. Fields with missing parents are left blank.
Lists individuals pairwise with their corresponding coefficients of relationship. Includes rows for comparing individuals to themselves (which will always be 1.0).
The binary can be downloaded from the release page. No dependencies are required this way.
Otherwise to compile, first download Nim and nimble install docopt.
Then run:
nim c --define:release ped.nim
ped has been tested on Nim v2.2.