Homology translation — homology_translate • OmnipathR

Translates identifiers between organisms using orthology data from Ensembl.

Usage

homology_translate(
  d,
  ...,
  target = 10090,
  source = 9606,
  ensembl_orthology_types = c("one2one", "one2many"),
  ensembl_min_orthology_confidence = 1L
)

Arguments

d

Data frame or character vector.

...

Column specification: from zero to up to three arguments, with or without names. NSE is supported. Arguments beyond the third one will be ignored.

The name of the arguments should be column names, the values identifier types, either as character or as symbols.
Arguments without names assumed to be both column names and identifier types, e.g. a column called "uniprot" containing UniProt IDs.
The first column spefication describes the source column, with identifiers of the source organism. This column must exist in the data and this will be the input of the homology translation. This column will be removed from the returned data frame.
In case of "uniprot", the source column name can be anything, if it contains only UniProt IDs it will be handled accordingly.
In case of "genesymbol", is enough if the source column name contains the word "genesymbol", e.g. "ligand_genesymbol".
The second column spefication describes the target column, with its name and identifier type. If not provided, both the column name and type will be the same as the source
Optionally a third column can be specified with another identifier type. This is convenient if you want, for example also Gene Symbols along with UniProt IDs.
If no specification provided, the input assumed to have a column named either "uniprot" or "genesymbol", or be a character vector of UniProt IDs or Gene Symbols.

target

Character or integer: name or identifier of the target organism (the one we translate to). The default target organism is mouse.

source

Character or integer: name of identifier of the source organism (the one the IDs in the input data belong to). The default source organism is human.

ensembl_orthology_types

Character vector: use only this orthology relationship types. Possible values are "one2one", "one2many" and "many2many".

ensembl_min_orthology_confidence

Integer: use only orthology relations with at least this level of confidence. In Ensembl the confidence can be either 0 or 1, so only these values make sense. If 0, all the orthology records will be used, if 1, only the ones with higher confidence.

Value

Data frame with the translated columns or character vector with translated identifiers.

Examples

if (FALSE) {
# these proteins are ULK1, IFNG, EGFR, TGFB1, IL1R1
human_uniprots <- c("O75385", "P01579", "P00533", "P01137", "P14778")
homology_translate(human_uniprots)
}