onconova.terminology.digestors
This module provides classes for digesting various terminology files into standardized CodedConcept and code system objects.
DIGESTORS
module-attribute
¶
CTCAEDigestor(verbose=True)
¶
Bases: TerminologyDigestor
CTCAEDigestor is a specialized TerminologyDigestor for parsing CTCAE (Common Terminology Criteria for Adverse Events) concepts from a CSV file.
Attributes:
Name | Type | Description |
---|---|---|
LABEL |
str
|
Identifier label for the digestor ("ctcae"). |
FILENAME |
str
|
Name of the CSV file containing CTCAE data ("ctcae.csv"). |
CANONICAL_URL |
str
|
Canonical URL for the terminology system (empty by default). |
Source code in onconova/terminology/digestors.py
EnsemblExonsDigestor(verbose=True)
¶
Bases: TerminologyDigestor
Processed and normalizes exon data from Ensembl gene annotations.
Attributes:
Name | Type | Description |
---|---|---|
LABEL |
str
|
Identifier label for the digestor ("ensembl"). |
FILENAME |
str
|
Expected filename for input data ("ensembl_exons.tsv"). |
exons |
defaultdict
|
Stores lists of GeneExon objects keyed by gene name. |
Source code in onconova/terminology/digestors.py
FILENAME
class-attribute
instance-attribute
¶
LABEL
class-attribute
instance-attribute
¶
exons
instance-attribute
¶
GeneExon
¶
Bases: BaseModel
Represents an exon within a gene, including its rank and coding region coordinates.
Attributes:
Name | Type | Description |
---|---|---|
rank |
int
|
The order of the exon within the gene. |
coding_dna_start |
int | None
|
The start position of the coding region in DNA coordinates, if available. |
coding_dna_end |
int | None
|
The end position of the coding region in DNA coordinates, if available. |
coding_genomic_start |
int | None
|
The start position of the coding region in genomic coordinates, if available. |
coding_genomic_end |
int | None
|
The end position of the coding region in genomic coordinates, if available. |
digest()
¶
Adjusts the cDNA positions of exons for each gene by normalizing them to the start of the coding DNA region.
This method iterates through all genes and their associated exons, recalculating the coding_dna_start
and
coding_dna_end
for each exon so that positions are relative to the first coding DNA position in the gene.
If an exon does not have a coding_dna_start
, it is skipped for normalization. The method returns the updated
exons dictionary.
Returns:
Type | Description |
---|---|
dict
|
A dictionary mapping gene names to lists of exons with updated cDNA positions. |
Source code in onconova/terminology/digestors.py
HGNCGenesDigestor(verbose=True)
¶
Bases: TerminologyDigestor
Digestor for HGNC gene terminology data.
Attributes:
Name | Type | Description |
---|---|---|
LABEL |
str
|
Identifier label for the digestor ("hgnc"). |
FILENAME |
str
|
Expected filename for HGNC data ("hgnc.tsv"). |
CANONICAL_URL |
str
|
Base URL for HGNC gene identifiers. |
Source code in onconova/terminology/digestors.py
HGNCGroupDigestor(verbose=True)
¶
Bases: TerminologyDigestor
Digestor for HGNC gene group terminology.
Attributes:
Name | Type | Description |
---|---|---|
LABEL |
str
|
Identifier label for this digestor. |
FILENAME |
str
|
Name of the TSV file containing gene group data. |
CANONICAL_URL |
str
|
URL representing the HGNC gene group system. |
Source code in onconova/terminology/digestors.py
ICD10Digestor(verbose=True)
¶
Bases: TerminologyDigestor
ICD10Digestor is a specialized TerminologyDigestor for processing ICD-10 terminology data.
Attributes:
Name | Type | Description |
---|---|---|
LABEL |
str
|
Identifier label for the digestor ("icd10"). |
FILENAME |
str
|
Name of the file containing ICD-10 data ("icd10.tsv"). |
CANONICAL_URL |
str
|
Canonical URL for the ICD-10 code system. |
Source code in onconova/terminology/digestors.py
ICDO3DifferentiationDigestor(verbose=True)
¶
Bases: TerminologyDigestor
ICDO3DifferentiationDigestor is a specialized TerminologyDigestor for processing ICD-O-3 differentiation concepts.
Attributes:
Name | Type | Description |
---|---|---|
LABEL |
str
|
Identifier label for this digestor. |
FILENAME |
str
|
Name of the TSV file containing differentiation concepts. |
CANONICAL_URL |
str
|
URL of the HL7 ICD-O-3 differentiation code system. |
Source code in onconova/terminology/digestors.py
ICDO3MorphologyDigestor(verbose=True)
¶
Bases: TerminologyDigestor
Digestor for ICD-O-3 Morphology terminology.
Attributes:
Name | Type | Description |
---|---|---|
LABEL |
str
|
Identifier label for this digestor. |
FILENAME |
str
|
Name of the TSV file containing ICD-O-3 Morphology data. |
CANONICAL_URL |
str
|
Canonical URL for the ICD-O-3 Morphology code system. |
Source code in onconova/terminology/digestors.py
ICDO3TopographyDigestor(verbose=True)
¶
Bases: TerminologyDigestor
Digestor for ICD-O-3 Topography terminology.
Attributes:
Name | Type | Description |
---|---|---|
LABEL |
str
|
Label for the digestor. |
FILENAME |
str
|
Name of the TSV file containing the terminology data. |
CANONICAL_URL |
str
|
Canonical URL for the ICD-O-3 Topography code system. |
Source code in onconova/terminology/digestors.py
LOINCDigestor(verbose=True)
¶
Bases: TerminologyDigestor
Digestor class for processing LOINC terminology files. Attributes: FILENAME (str): Name of the main LOINC CSV file. LABEL (str): Label for the terminology. CANONICAL_URL (str): Canonical URL for the LOINC system. LOINC_PROPERTIES (list): List of LOINC property fields to extract.
Source code in onconova/terminology/digestors.py
CANONICAL_URL
class-attribute
instance-attribute
¶
FILENAME
class-attribute
instance-attribute
¶
LABEL
class-attribute
instance-attribute
¶
LOINC_PROPERTIES
class-attribute
instance-attribute
¶
NCITDigestor(verbose=True)
¶
Bases: TerminologyDigestor
NCITDigestor is a specialized TerminologyDigestor for parsing and ingesting NCIT (National Cancer Institute Thesaurus) concepts from a TSV file.
Attributes:
Name | Type | Description |
---|---|---|
LABEL |
str
|
Identifier label for this digestor ("ncit"). |
FILENAME |
str
|
Expected filename containing NCIT data ("ncit.tsv"). |
CANONICAL_URL |
str
|
The canonical URL for the NCIT ontology. |
Source code in onconova/terminology/digestors.py
OncoTreeDigestor(verbose=True)
¶
Bases: TerminologyDigestor
Digestor for the OncoTree terminology.
Attributes:
Name | Type | Description |
---|---|---|
LABEL |
str
|
Identifier label for the terminology. |
FILENAME |
str
|
Default filename for the OncoTree JSON data. |
CANONICAL_URL |
str
|
Canonical URL for the OncoTree CodeSystem. |
VERSION |
str
|
Version string based on the current date. |
Source code in onconova/terminology/digestors.py
CANONICAL_URL
class-attribute
instance-attribute
¶
FILENAME
class-attribute
instance-attribute
¶
LABEL
class-attribute
instance-attribute
¶
VERSION
class-attribute
instance-attribute
¶
digest()
¶
Parses the OncoTree JSON file specified by self.file_location
, recursively processes its branches,
and populates self.concepts
with the digested concepts.
Returns:
Type | Description |
---|---|
dict
|
A dictionary containing the processed concepts from the OncoTree. |
Source code in onconova/terminology/digestors.py
SNOMEDCTDigestor(verbose=True)
¶
Bases: TerminologyDigestor
SNOMEDCTDigestor is a specialized TerminologyDigestor for processing SNOMED CT terminology data.
Attributes:
Name | Type | Description |
---|---|---|
LABEL |
str
|
Identifier label for SNOMED CT. |
FILENAME |
str
|
Filename for SNOMED CT concepts data. |
CANONICAL_URL |
str
|
Canonical URL for SNOMED CT system. |
RELATIONSHIPS_FILENAME |
str
|
Filename for SNOMED CT relationships data. |
SNOMED_IS_A |
str
|
SNOMED CT relationship type ID for "is a" relationships. |
SNOMED_DESIGNATION_USES |
dict
|
Mapping of SNOMED CT designation type IDs to usage labels. |
Source code in onconova/terminology/digestors.py
CANONICAL_URL
class-attribute
instance-attribute
¶
FILENAME
class-attribute
instance-attribute
¶
LABEL
class-attribute
instance-attribute
¶
RELATIONSHIPS_FILENAME
class-attribute
instance-attribute
¶
SNOMED_DESIGNATION_USES
class-attribute
instance-attribute
¶
SNOMED_IS_A
class-attribute
instance-attribute
¶
digest()
¶
Processes and updates concept relationships and display names.
This method first calls the parent class's digest
method, then processes relationships
specific to this class using _digest_relationships()
. For each concept in self.concepts
,
if the length of the concept's display name is greater than the length of its first synonym,
the display name is appended to the synonyms list and the display name is replaced with the
first synonym. Returns the updated concepts dictionary.
Returns:
Type | Description |
---|---|
dict
|
The updated concepts dictionary after processing relationships and display names. |
Source code in onconova/terminology/digestors.py
SequenceOntologyDigestor(verbose=True)
¶
Bases: TerminologyDigestor
Digestor for the Sequence Ontology (SO) terminology.
Attributes:
Name | Type | Description |
---|---|---|
LABEL |
str
|
Short label for the terminology. |
FILENAME |
str
|
Filename of the OBO file containing the ontology. |
CANONICAL_URL |
str
|
Canonical URL for the Sequence Ontology. |
OTHER_URLS |
list
|
Alternative URLs for the Sequence Ontology. |
Source code in onconova/terminology/digestors.py
TerminologyDigestor(verbose=True)
¶
A base class for digesting terminology files into CodedConcept objects.
Attributes:
Name | Type | Description |
---|---|---|
PATH |
str
|
The base directory path for external data files. |
FILENAME |
str
|
The name of the file containing terminology data. |
CANONICAL_URL |
str
|
The canonical URL of the terminology. |
OTHER_URLS |
list[str]
|
Additional URLs associated with the terminology. |
LABEL |
str
|
A label identifier for the terminology. |
Methods:
Name | Description |
---|---|
digest |
Digests the terminology's concepts and designations. |
_digest_concepts |
Reads and processes each row from the file containing concepts. |
_digest_concept_row |
dict[str, str]) -> None: Processes a single row from the concepts file. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
bool
|
Whether to print progress messages. Defaults to True. |
True
|
Source code in onconova/terminology/digestors.py
CANONICAL_URL
instance-attribute
¶
FILENAME
instance-attribute
¶
LABEL
instance-attribute
¶
OTHER_URLS
class-attribute
instance-attribute
¶
PATH
class-attribute
instance-attribute
¶
file_location
instance-attribute
¶
verbose
instance-attribute
¶
digest()
¶
Digests the terminology's concepts and designations.
Returns:
Type | Description |
---|---|
dict[str, CodedConcept]
|
dict[str, CodedConcept]: A dictionary with concept codes as keys and CodedConcept objects as values. |