Skip to content

onconova.terminology.digestors

This module provides classes for digesting various terminology files into standardized CodedConcept and code system objects.

DIGESTORS module-attribute

CTCAEDigestor(verbose=True)

Bases: TerminologyDigestor

CTCAEDigestor is a specialized TerminologyDigestor for parsing CTCAE (Common Terminology Criteria for Adverse Events) concepts from a CSV file.

Attributes:

Name Type Description
LABEL str

Identifier label for the digestor ("ctcae").

FILENAME str

Name of the CSV file containing CTCAE data ("ctcae.csv").

CANONICAL_URL str

Canonical URL for the terminology system (empty by default).

Source code in onconova/terminology/digestors.py
def __init__(self, verbose: bool = True) -> None:
    """
    Initialize the TerminologyDigestor.

    Args:
        verbose (bool, optional): Whether to print progress messages. Defaults to True.
    """
    try:
        self.file_location = get_file_location(self.PATH, self.FILENAME)
    except FileNotFoundError:
        # Unzip into DATA_DIR
        zip_file_path = os.environ.get("ONCONOVA_SNOMED_ZIPFILE_PATH", "")
        if not zip_file_path or not os.path.isfile(zip_file_path):
            print(
                "ERROR FILE NOT FOUND:\nPlease download the SNOMEDCT_International_*.zip file from (requires a login and license):\nand specify the location of the zip file with the ONCONOVA_SNOMED_ZIPFILE_PATH variable.\n"
            )
            sys.exit(1)
        with zipfile.ZipFile(zip_file_path) as zip_ref:
            zip_ref.extractall(self.PATH)

        # Move files into TEMP_DIR
        print("• Unpacking SNOMED CT files...")
        temp_dir = os.path.join(os.path.basename(zip_file_path), ".snomed")
        os.makedirs(temp_dir, exist_ok=True)
        snomed_dirs = glob.glob(os.path.join(self.PATH, "SnomedCT_*"))
        for snomed_dir in snomed_dirs:
            for item in os.listdir(snomed_dir):
                src = os.path.join(snomed_dir, item)
                dst = os.path.join(temp_dir, item)
                shutil.move(src, dst)

        # Move description and relationship files
        desc_src_pattern = os.path.join(
            temp_dir,
            "Snapshot",
            "Terminology",
            "sct2_Description_Snapshot-en_INT_*",
        )
        desc_files = glob.glob(desc_src_pattern)
        if desc_files:
            shutil.move(desc_files[0], os.path.join(self.PATH, "snomedct.tsv"))

        rel_src_pattern = os.path.join(
            temp_dir, "Snapshot", "Terminology", "sct2_Relationship_Snapshot_INT_*"
        )
        rel_files = glob.glob(rel_src_pattern)
        if rel_files:
            shutil.move(
                rel_files[0], os.path.join(self.PATH, "snomedct_relations.tsv")
            )

        # Remove TEMP_DIR and extracted SnomedCT_* directories
        print("• Clean-up unnecessary files...")
        shutil.rmtree(temp_dir, ignore_errors=True)
        for snomed_dir in snomed_dirs:
            shutil.rmtree(snomed_dir, ignore_errors=True)
    self.file_location = get_file_location(self.PATH, self.FILENAME)
    self.verbose = verbose

CANONICAL_URL class-attribute instance-attribute

FILENAME class-attribute instance-attribute

LABEL class-attribute instance-attribute

EnsemblExonsDigestor(verbose=True)

Bases: TerminologyDigestor

Processed and normalizes exon data from Ensembl gene annotations.

Attributes:

Name Type Description
LABEL str

Identifier label for the digestor ("ensembl").

FILENAME str

Expected filename for input data ("ensembl_exons.tsv").

exons defaultdict

Stores lists of GeneExon objects keyed by gene name.

Source code in onconova/terminology/digestors.py
def __init__(self, verbose=True):
    super().__init__(verbose)
    self.exons = defaultdict(list)

FILENAME class-attribute instance-attribute

LABEL class-attribute instance-attribute

exons instance-attribute

GeneExon

Bases: BaseModel

Represents an exon within a gene, including its rank and coding region coordinates.

Attributes:

Name Type Description
rank int

The order of the exon within the gene.

coding_dna_start int | None

The start position of the coding region in DNA coordinates, if available.

coding_dna_end int | None

The end position of the coding region in DNA coordinates, if available.

coding_genomic_start int | None

The start position of the coding region in genomic coordinates, if available.

coding_genomic_end int | None

The end position of the coding region in genomic coordinates, if available.

coding_dna_end class-attribute instance-attribute

coding_dna_start class-attribute instance-attribute

coding_genomic_end class-attribute instance-attribute

coding_genomic_start class-attribute instance-attribute

rank instance-attribute

digest()

Adjusts the cDNA positions of exons for each gene by normalizing them to the start of the coding DNA region.

This method iterates through all genes and their associated exons, recalculating the coding_dna_start and coding_dna_end for each exon so that positions are relative to the first coding DNA position in the gene. If an exon does not have a coding_dna_start, it is skipped for normalization. The method returns the updated exons dictionary.

Returns:

Type Description
dict

A dictionary mapping gene names to lists of exons with updated cDNA positions.

Source code in onconova/terminology/digestors.py
def digest(self):
    """
    Adjusts the cDNA positions of exons for each gene by normalizing them to the start of the coding DNA region.

    This method iterates through all genes and their associated exons, recalculating the `coding_dna_start` and
    `coding_dna_end` for each exon so that positions are relative to the first coding DNA position in the gene.
    If an exon does not have a `coding_dna_start`, it is skipped for normalization. The method returns the updated
    exons dictionary.

    Returns:
        (dict): A dictionary mapping gene names to lists of exons with updated cDNA positions.
    """
    super().digest()
    for gene, exons in self.exons.items():
        # Adjust the the cDNA position from the position in the gene reference sequence to position in the cDNA
        gene_coding_dna_region_start = min(
            [exon.coding_dna_start for exon in exons if exon.coding_dna_start]
            or [0]
        )
        if gene_coding_dna_region_start:
            for exon in exons:
                if exon.coding_dna_start:
                    exon.coding_dna_start = (
                        exon.coding_dna_start - gene_coding_dna_region_start + 1
                    )
                if exon.coding_dna_end:
                    exon.coding_dna_end = (
                        exon.coding_dna_end - gene_coding_dna_region_start + 1
                    )
    return self.exons

HGNCGenesDigestor(verbose=True)

Bases: TerminologyDigestor

Digestor for HGNC gene terminology data.

Attributes:

Name Type Description
LABEL str

Identifier label for the digestor ("hgnc").

FILENAME str

Expected filename for HGNC data ("hgnc.tsv").

CANONICAL_URL str

Base URL for HGNC gene identifiers.

Source code in onconova/terminology/digestors.py
def __init__(self, verbose: bool = True) -> None:
    """
    Initialize the TerminologyDigestor.

    Args:
        verbose (bool, optional): Whether to print progress messages. Defaults to True.
    """
    try:
        self.file_location = get_file_location(self.PATH, self.FILENAME)
    except FileNotFoundError:
        # Unzip into DATA_DIR
        zip_file_path = os.environ.get("ONCONOVA_SNOMED_ZIPFILE_PATH", "")
        if not zip_file_path or not os.path.isfile(zip_file_path):
            print(
                "ERROR FILE NOT FOUND:\nPlease download the SNOMEDCT_International_*.zip file from (requires a login and license):\nand specify the location of the zip file with the ONCONOVA_SNOMED_ZIPFILE_PATH variable.\n"
            )
            sys.exit(1)
        with zipfile.ZipFile(zip_file_path) as zip_ref:
            zip_ref.extractall(self.PATH)

        # Move files into TEMP_DIR
        print("• Unpacking SNOMED CT files...")
        temp_dir = os.path.join(os.path.basename(zip_file_path), ".snomed")
        os.makedirs(temp_dir, exist_ok=True)
        snomed_dirs = glob.glob(os.path.join(self.PATH, "SnomedCT_*"))
        for snomed_dir in snomed_dirs:
            for item in os.listdir(snomed_dir):
                src = os.path.join(snomed_dir, item)
                dst = os.path.join(temp_dir, item)
                shutil.move(src, dst)

        # Move description and relationship files
        desc_src_pattern = os.path.join(
            temp_dir,
            "Snapshot",
            "Terminology",
            "sct2_Description_Snapshot-en_INT_*",
        )
        desc_files = glob.glob(desc_src_pattern)
        if desc_files:
            shutil.move(desc_files[0], os.path.join(self.PATH, "snomedct.tsv"))

        rel_src_pattern = os.path.join(
            temp_dir, "Snapshot", "Terminology", "sct2_Relationship_Snapshot_INT_*"
        )
        rel_files = glob.glob(rel_src_pattern)
        if rel_files:
            shutil.move(
                rel_files[0], os.path.join(self.PATH, "snomedct_relations.tsv")
            )

        # Remove TEMP_DIR and extracted SnomedCT_* directories
        print("• Clean-up unnecessary files...")
        shutil.rmtree(temp_dir, ignore_errors=True)
        for snomed_dir in snomed_dirs:
            shutil.rmtree(snomed_dir, ignore_errors=True)
    self.file_location = get_file_location(self.PATH, self.FILENAME)
    self.verbose = verbose

CANONICAL_URL class-attribute instance-attribute

FILENAME class-attribute instance-attribute

LABEL class-attribute instance-attribute

HGNCGroupDigestor(verbose=True)

Bases: TerminologyDigestor

Digestor for HGNC gene group terminology.

Attributes:

Name Type Description
LABEL str

Identifier label for this digestor.

FILENAME str

Name of the TSV file containing gene group data.

CANONICAL_URL str

URL representing the HGNC gene group system.

Source code in onconova/terminology/digestors.py
def __init__(self, verbose: bool = True) -> None:
    """
    Initialize the TerminologyDigestor.

    Args:
        verbose (bool, optional): Whether to print progress messages. Defaults to True.
    """
    try:
        self.file_location = get_file_location(self.PATH, self.FILENAME)
    except FileNotFoundError:
        # Unzip into DATA_DIR
        zip_file_path = os.environ.get("ONCONOVA_SNOMED_ZIPFILE_PATH", "")
        if not zip_file_path or not os.path.isfile(zip_file_path):
            print(
                "ERROR FILE NOT FOUND:\nPlease download the SNOMEDCT_International_*.zip file from (requires a login and license):\nand specify the location of the zip file with the ONCONOVA_SNOMED_ZIPFILE_PATH variable.\n"
            )
            sys.exit(1)
        with zipfile.ZipFile(zip_file_path) as zip_ref:
            zip_ref.extractall(self.PATH)

        # Move files into TEMP_DIR
        print("• Unpacking SNOMED CT files...")
        temp_dir = os.path.join(os.path.basename(zip_file_path), ".snomed")
        os.makedirs(temp_dir, exist_ok=True)
        snomed_dirs = glob.glob(os.path.join(self.PATH, "SnomedCT_*"))
        for snomed_dir in snomed_dirs:
            for item in os.listdir(snomed_dir):
                src = os.path.join(snomed_dir, item)
                dst = os.path.join(temp_dir, item)
                shutil.move(src, dst)

        # Move description and relationship files
        desc_src_pattern = os.path.join(
            temp_dir,
            "Snapshot",
            "Terminology",
            "sct2_Description_Snapshot-en_INT_*",
        )
        desc_files = glob.glob(desc_src_pattern)
        if desc_files:
            shutil.move(desc_files[0], os.path.join(self.PATH, "snomedct.tsv"))

        rel_src_pattern = os.path.join(
            temp_dir, "Snapshot", "Terminology", "sct2_Relationship_Snapshot_INT_*"
        )
        rel_files = glob.glob(rel_src_pattern)
        if rel_files:
            shutil.move(
                rel_files[0], os.path.join(self.PATH, "snomedct_relations.tsv")
            )

        # Remove TEMP_DIR and extracted SnomedCT_* directories
        print("• Clean-up unnecessary files...")
        shutil.rmtree(temp_dir, ignore_errors=True)
        for snomed_dir in snomed_dirs:
            shutil.rmtree(snomed_dir, ignore_errors=True)
    self.file_location = get_file_location(self.PATH, self.FILENAME)
    self.verbose = verbose

CANONICAL_URL class-attribute instance-attribute

FILENAME class-attribute instance-attribute

LABEL class-attribute instance-attribute

ICD10Digestor(verbose=True)

Bases: TerminologyDigestor

ICD10Digestor is a specialized TerminologyDigestor for processing ICD-10 terminology data.

Attributes:

Name Type Description
LABEL str

Identifier label for the digestor ("icd10").

FILENAME str

Name of the file containing ICD-10 data ("icd10.tsv").

CANONICAL_URL str

Canonical URL for the ICD-10 code system.

Source code in onconova/terminology/digestors.py
def __init__(self, verbose: bool = True) -> None:
    """
    Initialize the TerminologyDigestor.

    Args:
        verbose (bool, optional): Whether to print progress messages. Defaults to True.
    """
    try:
        self.file_location = get_file_location(self.PATH, self.FILENAME)
    except FileNotFoundError:
        # Unzip into DATA_DIR
        zip_file_path = os.environ.get("ONCONOVA_SNOMED_ZIPFILE_PATH", "")
        if not zip_file_path or not os.path.isfile(zip_file_path):
            print(
                "ERROR FILE NOT FOUND:\nPlease download the SNOMEDCT_International_*.zip file from (requires a login and license):\nand specify the location of the zip file with the ONCONOVA_SNOMED_ZIPFILE_PATH variable.\n"
            )
            sys.exit(1)
        with zipfile.ZipFile(zip_file_path) as zip_ref:
            zip_ref.extractall(self.PATH)

        # Move files into TEMP_DIR
        print("• Unpacking SNOMED CT files...")
        temp_dir = os.path.join(os.path.basename(zip_file_path), ".snomed")
        os.makedirs(temp_dir, exist_ok=True)
        snomed_dirs = glob.glob(os.path.join(self.PATH, "SnomedCT_*"))
        for snomed_dir in snomed_dirs:
            for item in os.listdir(snomed_dir):
                src = os.path.join(snomed_dir, item)
                dst = os.path.join(temp_dir, item)
                shutil.move(src, dst)

        # Move description and relationship files
        desc_src_pattern = os.path.join(
            temp_dir,
            "Snapshot",
            "Terminology",
            "sct2_Description_Snapshot-en_INT_*",
        )
        desc_files = glob.glob(desc_src_pattern)
        if desc_files:
            shutil.move(desc_files[0], os.path.join(self.PATH, "snomedct.tsv"))

        rel_src_pattern = os.path.join(
            temp_dir, "Snapshot", "Terminology", "sct2_Relationship_Snapshot_INT_*"
        )
        rel_files = glob.glob(rel_src_pattern)
        if rel_files:
            shutil.move(
                rel_files[0], os.path.join(self.PATH, "snomedct_relations.tsv")
            )

        # Remove TEMP_DIR and extracted SnomedCT_* directories
        print("• Clean-up unnecessary files...")
        shutil.rmtree(temp_dir, ignore_errors=True)
        for snomed_dir in snomed_dirs:
            shutil.rmtree(snomed_dir, ignore_errors=True)
    self.file_location = get_file_location(self.PATH, self.FILENAME)
    self.verbose = verbose

CANONICAL_URL class-attribute instance-attribute

FILENAME class-attribute instance-attribute

LABEL class-attribute instance-attribute

ICDO3DifferentiationDigestor(verbose=True)

Bases: TerminologyDigestor

ICDO3DifferentiationDigestor is a specialized TerminologyDigestor for processing ICD-O-3 differentiation concepts.

Attributes:

Name Type Description
LABEL str

Identifier label for this digestor.

FILENAME str

Name of the TSV file containing differentiation concepts.

CANONICAL_URL str

URL of the HL7 ICD-O-3 differentiation code system.

Source code in onconova/terminology/digestors.py
def __init__(self, verbose: bool = True) -> None:
    """
    Initialize the TerminologyDigestor.

    Args:
        verbose (bool, optional): Whether to print progress messages. Defaults to True.
    """
    try:
        self.file_location = get_file_location(self.PATH, self.FILENAME)
    except FileNotFoundError:
        # Unzip into DATA_DIR
        zip_file_path = os.environ.get("ONCONOVA_SNOMED_ZIPFILE_PATH", "")
        if not zip_file_path or not os.path.isfile(zip_file_path):
            print(
                "ERROR FILE NOT FOUND:\nPlease download the SNOMEDCT_International_*.zip file from (requires a login and license):\nand specify the location of the zip file with the ONCONOVA_SNOMED_ZIPFILE_PATH variable.\n"
            )
            sys.exit(1)
        with zipfile.ZipFile(zip_file_path) as zip_ref:
            zip_ref.extractall(self.PATH)

        # Move files into TEMP_DIR
        print("• Unpacking SNOMED CT files...")
        temp_dir = os.path.join(os.path.basename(zip_file_path), ".snomed")
        os.makedirs(temp_dir, exist_ok=True)
        snomed_dirs = glob.glob(os.path.join(self.PATH, "SnomedCT_*"))
        for snomed_dir in snomed_dirs:
            for item in os.listdir(snomed_dir):
                src = os.path.join(snomed_dir, item)
                dst = os.path.join(temp_dir, item)
                shutil.move(src, dst)

        # Move description and relationship files
        desc_src_pattern = os.path.join(
            temp_dir,
            "Snapshot",
            "Terminology",
            "sct2_Description_Snapshot-en_INT_*",
        )
        desc_files = glob.glob(desc_src_pattern)
        if desc_files:
            shutil.move(desc_files[0], os.path.join(self.PATH, "snomedct.tsv"))

        rel_src_pattern = os.path.join(
            temp_dir, "Snapshot", "Terminology", "sct2_Relationship_Snapshot_INT_*"
        )
        rel_files = glob.glob(rel_src_pattern)
        if rel_files:
            shutil.move(
                rel_files[0], os.path.join(self.PATH, "snomedct_relations.tsv")
            )

        # Remove TEMP_DIR and extracted SnomedCT_* directories
        print("• Clean-up unnecessary files...")
        shutil.rmtree(temp_dir, ignore_errors=True)
        for snomed_dir in snomed_dirs:
            shutil.rmtree(snomed_dir, ignore_errors=True)
    self.file_location = get_file_location(self.PATH, self.FILENAME)
    self.verbose = verbose

CANONICAL_URL class-attribute instance-attribute

FILENAME class-attribute instance-attribute

LABEL class-attribute instance-attribute

ICDO3MorphologyDigestor(verbose=True)

Bases: TerminologyDigestor

Digestor for ICD-O-3 Morphology terminology.

Attributes:

Name Type Description
LABEL str

Identifier label for this digestor.

FILENAME str

Name of the TSV file containing ICD-O-3 Morphology data.

CANONICAL_URL str

Canonical URL for the ICD-O-3 Morphology code system.

Source code in onconova/terminology/digestors.py
def __init__(self, verbose: bool = True) -> None:
    """
    Initialize the TerminologyDigestor.

    Args:
        verbose (bool, optional): Whether to print progress messages. Defaults to True.
    """
    try:
        self.file_location = get_file_location(self.PATH, self.FILENAME)
    except FileNotFoundError:
        # Unzip into DATA_DIR
        zip_file_path = os.environ.get("ONCONOVA_SNOMED_ZIPFILE_PATH", "")
        if not zip_file_path or not os.path.isfile(zip_file_path):
            print(
                "ERROR FILE NOT FOUND:\nPlease download the SNOMEDCT_International_*.zip file from (requires a login and license):\nand specify the location of the zip file with the ONCONOVA_SNOMED_ZIPFILE_PATH variable.\n"
            )
            sys.exit(1)
        with zipfile.ZipFile(zip_file_path) as zip_ref:
            zip_ref.extractall(self.PATH)

        # Move files into TEMP_DIR
        print("• Unpacking SNOMED CT files...")
        temp_dir = os.path.join(os.path.basename(zip_file_path), ".snomed")
        os.makedirs(temp_dir, exist_ok=True)
        snomed_dirs = glob.glob(os.path.join(self.PATH, "SnomedCT_*"))
        for snomed_dir in snomed_dirs:
            for item in os.listdir(snomed_dir):
                src = os.path.join(snomed_dir, item)
                dst = os.path.join(temp_dir, item)
                shutil.move(src, dst)

        # Move description and relationship files
        desc_src_pattern = os.path.join(
            temp_dir,
            "Snapshot",
            "Terminology",
            "sct2_Description_Snapshot-en_INT_*",
        )
        desc_files = glob.glob(desc_src_pattern)
        if desc_files:
            shutil.move(desc_files[0], os.path.join(self.PATH, "snomedct.tsv"))

        rel_src_pattern = os.path.join(
            temp_dir, "Snapshot", "Terminology", "sct2_Relationship_Snapshot_INT_*"
        )
        rel_files = glob.glob(rel_src_pattern)
        if rel_files:
            shutil.move(
                rel_files[0], os.path.join(self.PATH, "snomedct_relations.tsv")
            )

        # Remove TEMP_DIR and extracted SnomedCT_* directories
        print("• Clean-up unnecessary files...")
        shutil.rmtree(temp_dir, ignore_errors=True)
        for snomed_dir in snomed_dirs:
            shutil.rmtree(snomed_dir, ignore_errors=True)
    self.file_location = get_file_location(self.PATH, self.FILENAME)
    self.verbose = verbose

CANONICAL_URL class-attribute instance-attribute

FILENAME class-attribute instance-attribute

LABEL class-attribute instance-attribute

ICDO3TopographyDigestor(verbose=True)

Bases: TerminologyDigestor

Digestor for ICD-O-3 Topography terminology.

Attributes:

Name Type Description
LABEL str

Label for the digestor.

FILENAME str

Name of the TSV file containing the terminology data.

CANONICAL_URL str

Canonical URL for the ICD-O-3 Topography code system.

Source code in onconova/terminology/digestors.py
def __init__(self, verbose: bool = True) -> None:
    """
    Initialize the TerminologyDigestor.

    Args:
        verbose (bool, optional): Whether to print progress messages. Defaults to True.
    """
    try:
        self.file_location = get_file_location(self.PATH, self.FILENAME)
    except FileNotFoundError:
        # Unzip into DATA_DIR
        zip_file_path = os.environ.get("ONCONOVA_SNOMED_ZIPFILE_PATH", "")
        if not zip_file_path or not os.path.isfile(zip_file_path):
            print(
                "ERROR FILE NOT FOUND:\nPlease download the SNOMEDCT_International_*.zip file from (requires a login and license):\nand specify the location of the zip file with the ONCONOVA_SNOMED_ZIPFILE_PATH variable.\n"
            )
            sys.exit(1)
        with zipfile.ZipFile(zip_file_path) as zip_ref:
            zip_ref.extractall(self.PATH)

        # Move files into TEMP_DIR
        print("• Unpacking SNOMED CT files...")
        temp_dir = os.path.join(os.path.basename(zip_file_path), ".snomed")
        os.makedirs(temp_dir, exist_ok=True)
        snomed_dirs = glob.glob(os.path.join(self.PATH, "SnomedCT_*"))
        for snomed_dir in snomed_dirs:
            for item in os.listdir(snomed_dir):
                src = os.path.join(snomed_dir, item)
                dst = os.path.join(temp_dir, item)
                shutil.move(src, dst)

        # Move description and relationship files
        desc_src_pattern = os.path.join(
            temp_dir,
            "Snapshot",
            "Terminology",
            "sct2_Description_Snapshot-en_INT_*",
        )
        desc_files = glob.glob(desc_src_pattern)
        if desc_files:
            shutil.move(desc_files[0], os.path.join(self.PATH, "snomedct.tsv"))

        rel_src_pattern = os.path.join(
            temp_dir, "Snapshot", "Terminology", "sct2_Relationship_Snapshot_INT_*"
        )
        rel_files = glob.glob(rel_src_pattern)
        if rel_files:
            shutil.move(
                rel_files[0], os.path.join(self.PATH, "snomedct_relations.tsv")
            )

        # Remove TEMP_DIR and extracted SnomedCT_* directories
        print("• Clean-up unnecessary files...")
        shutil.rmtree(temp_dir, ignore_errors=True)
        for snomed_dir in snomed_dirs:
            shutil.rmtree(snomed_dir, ignore_errors=True)
    self.file_location = get_file_location(self.PATH, self.FILENAME)
    self.verbose = verbose

CANONICAL_URL class-attribute instance-attribute

FILENAME class-attribute instance-attribute

LABEL class-attribute instance-attribute

LOINCDigestor(verbose=True)

Bases: TerminologyDigestor

Digestor class for processing LOINC terminology files. Attributes: FILENAME (str): Name of the main LOINC CSV file. LABEL (str): Label for the terminology. CANONICAL_URL (str): Canonical URL for the LOINC system. LOINC_PROPERTIES (list): List of LOINC property fields to extract.

Source code in onconova/terminology/digestors.py
def __init__(self, verbose: bool = True) -> None:
    """
    Initialize the TerminologyDigestor.

    Args:
        verbose (bool, optional): Whether to print progress messages. Defaults to True.
    """
    try:
        self.file_location = get_file_location(self.PATH, self.FILENAME)
    except FileNotFoundError:
        # Unzip into DATA_DIR
        zip_file_path = os.environ.get("ONCONOVA_SNOMED_ZIPFILE_PATH", "")
        if not zip_file_path or not os.path.isfile(zip_file_path):
            print(
                "ERROR FILE NOT FOUND:\nPlease download the SNOMEDCT_International_*.zip file from (requires a login and license):\nand specify the location of the zip file with the ONCONOVA_SNOMED_ZIPFILE_PATH variable.\n"
            )
            sys.exit(1)
        with zipfile.ZipFile(zip_file_path) as zip_ref:
            zip_ref.extractall(self.PATH)

        # Move files into TEMP_DIR
        print("• Unpacking SNOMED CT files...")
        temp_dir = os.path.join(os.path.basename(zip_file_path), ".snomed")
        os.makedirs(temp_dir, exist_ok=True)
        snomed_dirs = glob.glob(os.path.join(self.PATH, "SnomedCT_*"))
        for snomed_dir in snomed_dirs:
            for item in os.listdir(snomed_dir):
                src = os.path.join(snomed_dir, item)
                dst = os.path.join(temp_dir, item)
                shutil.move(src, dst)

        # Move description and relationship files
        desc_src_pattern = os.path.join(
            temp_dir,
            "Snapshot",
            "Terminology",
            "sct2_Description_Snapshot-en_INT_*",
        )
        desc_files = glob.glob(desc_src_pattern)
        if desc_files:
            shutil.move(desc_files[0], os.path.join(self.PATH, "snomedct.tsv"))

        rel_src_pattern = os.path.join(
            temp_dir, "Snapshot", "Terminology", "sct2_Relationship_Snapshot_INT_*"
        )
        rel_files = glob.glob(rel_src_pattern)
        if rel_files:
            shutil.move(
                rel_files[0], os.path.join(self.PATH, "snomedct_relations.tsv")
            )

        # Remove TEMP_DIR and extracted SnomedCT_* directories
        print("• Clean-up unnecessary files...")
        shutil.rmtree(temp_dir, ignore_errors=True)
        for snomed_dir in snomed_dirs:
            shutil.rmtree(snomed_dir, ignore_errors=True)
    self.file_location = get_file_location(self.PATH, self.FILENAME)
    self.verbose = verbose

CANONICAL_URL class-attribute instance-attribute

FILENAME class-attribute instance-attribute

LABEL class-attribute instance-attribute

LOINC_PROPERTIES class-attribute instance-attribute

digest()

Processes and digests terminology data by invoking parent digest logic, extracting part codes, and compiling answer lists.

Returns:

Type Description
list

A list of digested concepts.

Source code in onconova/terminology/digestors.py
def digest(self):
    """
    Processes and digests terminology data by invoking parent digest logic,
    extracting part codes, and compiling answer lists.

    Returns:
        (list): A list of digested concepts.
    """
    super().digest()
    self._digest_part_codes()
    self._digest_answer_lists()
    return self.concepts

NCITDigestor(verbose=True)

Bases: TerminologyDigestor

NCITDigestor is a specialized TerminologyDigestor for parsing and ingesting NCIT (National Cancer Institute Thesaurus) concepts from a TSV file.

Attributes:

Name Type Description
LABEL str

Identifier label for this digestor ("ncit").

FILENAME str

Expected filename containing NCIT data ("ncit.tsv").

CANONICAL_URL str

The canonical URL for the NCIT ontology.

Source code in onconova/terminology/digestors.py
def __init__(self, verbose: bool = True) -> None:
    """
    Initialize the TerminologyDigestor.

    Args:
        verbose (bool, optional): Whether to print progress messages. Defaults to True.
    """
    try:
        self.file_location = get_file_location(self.PATH, self.FILENAME)
    except FileNotFoundError:
        # Unzip into DATA_DIR
        zip_file_path = os.environ.get("ONCONOVA_SNOMED_ZIPFILE_PATH", "")
        if not zip_file_path or not os.path.isfile(zip_file_path):
            print(
                "ERROR FILE NOT FOUND:\nPlease download the SNOMEDCT_International_*.zip file from (requires a login and license):\nand specify the location of the zip file with the ONCONOVA_SNOMED_ZIPFILE_PATH variable.\n"
            )
            sys.exit(1)
        with zipfile.ZipFile(zip_file_path) as zip_ref:
            zip_ref.extractall(self.PATH)

        # Move files into TEMP_DIR
        print("• Unpacking SNOMED CT files...")
        temp_dir = os.path.join(os.path.basename(zip_file_path), ".snomed")
        os.makedirs(temp_dir, exist_ok=True)
        snomed_dirs = glob.glob(os.path.join(self.PATH, "SnomedCT_*"))
        for snomed_dir in snomed_dirs:
            for item in os.listdir(snomed_dir):
                src = os.path.join(snomed_dir, item)
                dst = os.path.join(temp_dir, item)
                shutil.move(src, dst)

        # Move description and relationship files
        desc_src_pattern = os.path.join(
            temp_dir,
            "Snapshot",
            "Terminology",
            "sct2_Description_Snapshot-en_INT_*",
        )
        desc_files = glob.glob(desc_src_pattern)
        if desc_files:
            shutil.move(desc_files[0], os.path.join(self.PATH, "snomedct.tsv"))

        rel_src_pattern = os.path.join(
            temp_dir, "Snapshot", "Terminology", "sct2_Relationship_Snapshot_INT_*"
        )
        rel_files = glob.glob(rel_src_pattern)
        if rel_files:
            shutil.move(
                rel_files[0], os.path.join(self.PATH, "snomedct_relations.tsv")
            )

        # Remove TEMP_DIR and extracted SnomedCT_* directories
        print("• Clean-up unnecessary files...")
        shutil.rmtree(temp_dir, ignore_errors=True)
        for snomed_dir in snomed_dirs:
            shutil.rmtree(snomed_dir, ignore_errors=True)
    self.file_location = get_file_location(self.PATH, self.FILENAME)
    self.verbose = verbose

CANONICAL_URL class-attribute instance-attribute

FILENAME class-attribute instance-attribute

LABEL class-attribute instance-attribute

OncoTreeDigestor(verbose=True)

Bases: TerminologyDigestor

Digestor for the OncoTree terminology.

Attributes:

Name Type Description
LABEL str

Identifier label for the terminology.

FILENAME str

Default filename for the OncoTree JSON data.

CANONICAL_URL str

Canonical URL for the OncoTree CodeSystem.

VERSION str

Version string based on the current date.

Source code in onconova/terminology/digestors.py
def __init__(self, verbose: bool = True) -> None:
    """
    Initialize the TerminologyDigestor.

    Args:
        verbose (bool, optional): Whether to print progress messages. Defaults to True.
    """
    try:
        self.file_location = get_file_location(self.PATH, self.FILENAME)
    except FileNotFoundError:
        # Unzip into DATA_DIR
        zip_file_path = os.environ.get("ONCONOVA_SNOMED_ZIPFILE_PATH", "")
        if not zip_file_path or not os.path.isfile(zip_file_path):
            print(
                "ERROR FILE NOT FOUND:\nPlease download the SNOMEDCT_International_*.zip file from (requires a login and license):\nand specify the location of the zip file with the ONCONOVA_SNOMED_ZIPFILE_PATH variable.\n"
            )
            sys.exit(1)
        with zipfile.ZipFile(zip_file_path) as zip_ref:
            zip_ref.extractall(self.PATH)

        # Move files into TEMP_DIR
        print("• Unpacking SNOMED CT files...")
        temp_dir = os.path.join(os.path.basename(zip_file_path), ".snomed")
        os.makedirs(temp_dir, exist_ok=True)
        snomed_dirs = glob.glob(os.path.join(self.PATH, "SnomedCT_*"))
        for snomed_dir in snomed_dirs:
            for item in os.listdir(snomed_dir):
                src = os.path.join(snomed_dir, item)
                dst = os.path.join(temp_dir, item)
                shutil.move(src, dst)

        # Move description and relationship files
        desc_src_pattern = os.path.join(
            temp_dir,
            "Snapshot",
            "Terminology",
            "sct2_Description_Snapshot-en_INT_*",
        )
        desc_files = glob.glob(desc_src_pattern)
        if desc_files:
            shutil.move(desc_files[0], os.path.join(self.PATH, "snomedct.tsv"))

        rel_src_pattern = os.path.join(
            temp_dir, "Snapshot", "Terminology", "sct2_Relationship_Snapshot_INT_*"
        )
        rel_files = glob.glob(rel_src_pattern)
        if rel_files:
            shutil.move(
                rel_files[0], os.path.join(self.PATH, "snomedct_relations.tsv")
            )

        # Remove TEMP_DIR and extracted SnomedCT_* directories
        print("• Clean-up unnecessary files...")
        shutil.rmtree(temp_dir, ignore_errors=True)
        for snomed_dir in snomed_dirs:
            shutil.rmtree(snomed_dir, ignore_errors=True)
    self.file_location = get_file_location(self.PATH, self.FILENAME)
    self.verbose = verbose

CANONICAL_URL class-attribute instance-attribute

FILENAME class-attribute instance-attribute

LABEL class-attribute instance-attribute

VERSION class-attribute instance-attribute

digest()

Parses the OncoTree JSON file specified by self.file_location, recursively processes its branches, and populates self.concepts with the digested concepts.

Returns:

Type Description
dict

A dictionary containing the processed concepts from the OncoTree.

Source code in onconova/terminology/digestors.py
def digest(self):
    """
    Parses the OncoTree JSON file specified by `self.file_location`, recursively processes its branches,
    and populates `self.concepts` with the digested concepts.

    Returns:
        (dict): A dictionary containing the processed concepts from the OncoTree.
    """
    self.concepts = {}
    with open(self.file_location) as file:
        self.oncotree = json.load(file)
    # And recursively add all its children
    for branch in self.oncotree["TISSUE"]["children"].values():
        self._digest_branch(branch)
    return self.concepts

SNOMEDCTDigestor(verbose=True)

Bases: TerminologyDigestor

SNOMEDCTDigestor is a specialized TerminologyDigestor for processing SNOMED CT terminology data.

Attributes:

Name Type Description
LABEL str

Identifier label for SNOMED CT.

FILENAME str

Filename for SNOMED CT concepts data.

CANONICAL_URL str

Canonical URL for SNOMED CT system.

RELATIONSHIPS_FILENAME str

Filename for SNOMED CT relationships data.

SNOMED_IS_A str

SNOMED CT relationship type ID for "is a" relationships.

SNOMED_DESIGNATION_USES dict

Mapping of SNOMED CT designation type IDs to usage labels.

Source code in onconova/terminology/digestors.py
def __init__(self, verbose: bool = True) -> None:
    """
    Initialize the TerminologyDigestor.

    Args:
        verbose (bool, optional): Whether to print progress messages. Defaults to True.
    """
    try:
        self.file_location = get_file_location(self.PATH, self.FILENAME)
    except FileNotFoundError:
        # Unzip into DATA_DIR
        zip_file_path = os.environ.get("ONCONOVA_SNOMED_ZIPFILE_PATH", "")
        if not zip_file_path or not os.path.isfile(zip_file_path):
            print(
                "ERROR FILE NOT FOUND:\nPlease download the SNOMEDCT_International_*.zip file from (requires a login and license):\nand specify the location of the zip file with the ONCONOVA_SNOMED_ZIPFILE_PATH variable.\n"
            )
            sys.exit(1)
        with zipfile.ZipFile(zip_file_path) as zip_ref:
            zip_ref.extractall(self.PATH)

        # Move files into TEMP_DIR
        print("• Unpacking SNOMED CT files...")
        temp_dir = os.path.join(os.path.basename(zip_file_path), ".snomed")
        os.makedirs(temp_dir, exist_ok=True)
        snomed_dirs = glob.glob(os.path.join(self.PATH, "SnomedCT_*"))
        for snomed_dir in snomed_dirs:
            for item in os.listdir(snomed_dir):
                src = os.path.join(snomed_dir, item)
                dst = os.path.join(temp_dir, item)
                shutil.move(src, dst)

        # Move description and relationship files
        desc_src_pattern = os.path.join(
            temp_dir,
            "Snapshot",
            "Terminology",
            "sct2_Description_Snapshot-en_INT_*",
        )
        desc_files = glob.glob(desc_src_pattern)
        if desc_files:
            shutil.move(desc_files[0], os.path.join(self.PATH, "snomedct.tsv"))

        rel_src_pattern = os.path.join(
            temp_dir, "Snapshot", "Terminology", "sct2_Relationship_Snapshot_INT_*"
        )
        rel_files = glob.glob(rel_src_pattern)
        if rel_files:
            shutil.move(
                rel_files[0], os.path.join(self.PATH, "snomedct_relations.tsv")
            )

        # Remove TEMP_DIR and extracted SnomedCT_* directories
        print("• Clean-up unnecessary files...")
        shutil.rmtree(temp_dir, ignore_errors=True)
        for snomed_dir in snomed_dirs:
            shutil.rmtree(snomed_dir, ignore_errors=True)
    self.file_location = get_file_location(self.PATH, self.FILENAME)
    self.verbose = verbose

CANONICAL_URL class-attribute instance-attribute

FILENAME class-attribute instance-attribute

LABEL class-attribute instance-attribute

RELATIONSHIPS_FILENAME class-attribute instance-attribute

SNOMED_DESIGNATION_USES class-attribute instance-attribute

SNOMED_IS_A class-attribute instance-attribute

digest()

Processes and updates concept relationships and display names.

This method first calls the parent class's digest method, then processes relationships specific to this class using _digest_relationships(). For each concept in self.concepts, if the length of the concept's display name is greater than the length of its first synonym, the display name is appended to the synonyms list and the display name is replaced with the first synonym. Returns the updated concepts dictionary.

Returns:

Type Description
dict

The updated concepts dictionary after processing relationships and display names.

Source code in onconova/terminology/digestors.py
def digest(self):
    """
    Processes and updates concept relationships and display names.

    This method first calls the parent class's `digest` method, then processes relationships
    specific to this class using `_digest_relationships()`. For each concept in `self.concepts`,
    if the length of the concept's display name is greater than the length of its first synonym,
    the display name is appended to the synonyms list and the display name is replaced with the
    first synonym. Returns the updated concepts dictionary.

    Returns:
        (dict): The updated concepts dictionary after processing relationships and display names.
    """
    super().digest()
    self._digest_relationships()
    for code, concept in self.concepts.items():
        if len(concept.display) > len(concept.synonyms[0]):
            self.concepts[code].synonyms.append(concept.display)
            self.concepts[code].display = concept.synonyms[0]
    return self.concepts

SequenceOntologyDigestor(verbose=True)

Bases: TerminologyDigestor

Digestor for the Sequence Ontology (SO) terminology.

Attributes:

Name Type Description
LABEL str

Short label for the terminology.

FILENAME str

Filename of the OBO file containing the ontology.

CANONICAL_URL str

Canonical URL for the Sequence Ontology.

OTHER_URLS list

Alternative URLs for the Sequence Ontology.

Source code in onconova/terminology/digestors.py
def __init__(self, verbose: bool = True) -> None:
    """
    Initialize the TerminologyDigestor.

    Args:
        verbose (bool, optional): Whether to print progress messages. Defaults to True.
    """
    try:
        self.file_location = get_file_location(self.PATH, self.FILENAME)
    except FileNotFoundError:
        # Unzip into DATA_DIR
        zip_file_path = os.environ.get("ONCONOVA_SNOMED_ZIPFILE_PATH", "")
        if not zip_file_path or not os.path.isfile(zip_file_path):
            print(
                "ERROR FILE NOT FOUND:\nPlease download the SNOMEDCT_International_*.zip file from (requires a login and license):\nand specify the location of the zip file with the ONCONOVA_SNOMED_ZIPFILE_PATH variable.\n"
            )
            sys.exit(1)
        with zipfile.ZipFile(zip_file_path) as zip_ref:
            zip_ref.extractall(self.PATH)

        # Move files into TEMP_DIR
        print("• Unpacking SNOMED CT files...")
        temp_dir = os.path.join(os.path.basename(zip_file_path), ".snomed")
        os.makedirs(temp_dir, exist_ok=True)
        snomed_dirs = glob.glob(os.path.join(self.PATH, "SnomedCT_*"))
        for snomed_dir in snomed_dirs:
            for item in os.listdir(snomed_dir):
                src = os.path.join(snomed_dir, item)
                dst = os.path.join(temp_dir, item)
                shutil.move(src, dst)

        # Move description and relationship files
        desc_src_pattern = os.path.join(
            temp_dir,
            "Snapshot",
            "Terminology",
            "sct2_Description_Snapshot-en_INT_*",
        )
        desc_files = glob.glob(desc_src_pattern)
        if desc_files:
            shutil.move(desc_files[0], os.path.join(self.PATH, "snomedct.tsv"))

        rel_src_pattern = os.path.join(
            temp_dir, "Snapshot", "Terminology", "sct2_Relationship_Snapshot_INT_*"
        )
        rel_files = glob.glob(rel_src_pattern)
        if rel_files:
            shutil.move(
                rel_files[0], os.path.join(self.PATH, "snomedct_relations.tsv")
            )

        # Remove TEMP_DIR and extracted SnomedCT_* directories
        print("• Clean-up unnecessary files...")
        shutil.rmtree(temp_dir, ignore_errors=True)
        for snomed_dir in snomed_dirs:
            shutil.rmtree(snomed_dir, ignore_errors=True)
    self.file_location = get_file_location(self.PATH, self.FILENAME)
    self.verbose = verbose

CANONICAL_URL class-attribute instance-attribute

FILENAME class-attribute instance-attribute

LABEL class-attribute instance-attribute

OTHER_URLS class-attribute instance-attribute

TerminologyDigestor(verbose=True)

A base class for digesting terminology files into CodedConcept objects.

Attributes:

Name Type Description
PATH str

The base directory path for external data files.

FILENAME str

The name of the file containing terminology data.

CANONICAL_URL str

The canonical URL of the terminology.

OTHER_URLS list[str]

Additional URLs associated with the terminology.

LABEL str

A label identifier for the terminology.

Methods:

Name Description
digest

Digests the terminology's concepts and designations.

_digest_concepts

Reads and processes each row from the file containing concepts.

_digest_concept_row

dict[str, str]) -> None: Processes a single row from the concepts file.

Parameters:

Name Type Description Default

verbose

bool

Whether to print progress messages. Defaults to True.

True
Source code in onconova/terminology/digestors.py
def __init__(self, verbose: bool = True) -> None:
    """
    Initialize the TerminologyDigestor.

    Args:
        verbose (bool, optional): Whether to print progress messages. Defaults to True.
    """
    try:
        self.file_location = get_file_location(self.PATH, self.FILENAME)
    except FileNotFoundError:
        # Unzip into DATA_DIR
        zip_file_path = os.environ.get("ONCONOVA_SNOMED_ZIPFILE_PATH", "")
        if not zip_file_path or not os.path.isfile(zip_file_path):
            print(
                "ERROR FILE NOT FOUND:\nPlease download the SNOMEDCT_International_*.zip file from (requires a login and license):\nand specify the location of the zip file with the ONCONOVA_SNOMED_ZIPFILE_PATH variable.\n"
            )
            sys.exit(1)
        with zipfile.ZipFile(zip_file_path) as zip_ref:
            zip_ref.extractall(self.PATH)

        # Move files into TEMP_DIR
        print("• Unpacking SNOMED CT files...")
        temp_dir = os.path.join(os.path.basename(zip_file_path), ".snomed")
        os.makedirs(temp_dir, exist_ok=True)
        snomed_dirs = glob.glob(os.path.join(self.PATH, "SnomedCT_*"))
        for snomed_dir in snomed_dirs:
            for item in os.listdir(snomed_dir):
                src = os.path.join(snomed_dir, item)
                dst = os.path.join(temp_dir, item)
                shutil.move(src, dst)

        # Move description and relationship files
        desc_src_pattern = os.path.join(
            temp_dir,
            "Snapshot",
            "Terminology",
            "sct2_Description_Snapshot-en_INT_*",
        )
        desc_files = glob.glob(desc_src_pattern)
        if desc_files:
            shutil.move(desc_files[0], os.path.join(self.PATH, "snomedct.tsv"))

        rel_src_pattern = os.path.join(
            temp_dir, "Snapshot", "Terminology", "sct2_Relationship_Snapshot_INT_*"
        )
        rel_files = glob.glob(rel_src_pattern)
        if rel_files:
            shutil.move(
                rel_files[0], os.path.join(self.PATH, "snomedct_relations.tsv")
            )

        # Remove TEMP_DIR and extracted SnomedCT_* directories
        print("• Clean-up unnecessary files...")
        shutil.rmtree(temp_dir, ignore_errors=True)
        for snomed_dir in snomed_dirs:
            shutil.rmtree(snomed_dir, ignore_errors=True)
    self.file_location = get_file_location(self.PATH, self.FILENAME)
    self.verbose = verbose

CANONICAL_URL instance-attribute

FILENAME instance-attribute

LABEL instance-attribute

OTHER_URLS class-attribute instance-attribute

PATH class-attribute instance-attribute

file_location instance-attribute

verbose instance-attribute

digest()

Digests the terminology's concepts and designations.

Returns:

Type Description
dict[str, CodedConcept]

dict[str, CodedConcept]: A dictionary with concept codes as keys and CodedConcept objects as values.

Source code in onconova/terminology/digestors.py
def digest(self) -> dict[str, CodedConcept]:
    """
    Digests the terminology's concepts and designations.

    Returns:
        dict[str, CodedConcept]: A dictionary with concept codes as keys
            and CodedConcept objects as values.
    """
    self.designations = defaultdict(list)
    self.concepts = {}
    self._digest_concepts()
    for code, synonyms in self.designations.items():
        self.concepts[code].synonyms = synonyms
    return self.concepts
runner