`onconova.research.schemas.analysis`

`AnalysisMetadata` ¶

Bases: Schema

Schema representing metadata for an analysis performed on a cohort.

Attributes:

Name	Type	Description
`cohortId`	`str`	The ID of the cohort for which the analysis was performed.
`analyzedAt`	`datetime`	The datetime at which the analysis was performed.
`cohortPopulation`	`int`	The effective number of valid patient cases in the cohort used for the analysis.

`analyzedAt` `class-attribute` `instance-attribute` ¶

`cohortId` `class-attribute` `instance-attribute` ¶

`cohortPopulation` `class-attribute` `instance-attribute` ¶

`AnalysisMetadataMixin` ¶

Mixin class that provides metadata handling for analysis objects.

Attributes:

Name	Type	Description
`metadata`	`AnalysisMetadata \| None`	Metadata for the Kaplan-Meier curve, including cohort information and analysis timestamp.

Methods:

Name	Description
`add_metadata`	Cohort) -> Self: Populates the metadata attribute with information from the provided cohort, such as cohort ID, analysis time, and population size.

`metadata` `class-attribute` `instance-attribute` ¶

`add_metadata(cohort)` ¶

Adds metadata information to the analysis based on the provided cohort.

Parameters:

Name	Type	Description	Default
`cohort` ¶	`Cohort`	The cohort object containing data to populate metadata fields.	required

Returns:

Type	Description
`Self`	The instance of the analysis with updated metadata.

Source code in onconova/research/schemas/analysis.py

def add_metadata(self, cohort: Cohort):
    """
    Adds metadata information to the analysis based on the provided cohort.

    Args:
        cohort (Cohort): The cohort object containing data to populate metadata fields.

    Returns:
        (Self): The instance of the analysis with updated metadata.
    """
    self.metadata = AnalysisMetadata(
        cohortId=str(cohort.id),
        analyzedAt=datetime.now(),
        cohortPopulation=cohort.valid_cases.count(),
    )
    return self

`CategorizedSurvivals` ¶

Bases: Schema, AnalysisMetadataMixin

Schema for categorizing progression free survival (PFS) data within a cohort based on therapy-related groupings.

Attributes:

Name	Type	Description
`survivals`	`Dict[str, List[float]]`	A dictionary mapping category names (e.g., drug combinations or therapy classifications) to lists of progression free survival values.

`survivals` `instance-attribute` ¶

`calculate(cohort, therapyLine, categorization)` `classmethod` ¶

Calculates survival statistics for a given cohort based on therapy line and categorization.

Parameters:

Name	Type	Description	Default
`cohort` ¶	`Cohort`	The cohort of patients to analyze.	required
`therapyLine` ¶	`str`	The therapy line to consider for the analysis.	required
`categorization` ¶	`str`	The categorization method, either "drugs" or "therapies".	required

Returns:

Type	Description
`CategorizedSurvivals`	An instance of the class with calculated survivals based on the specified categorization.

Notes

If categorization is "drugs", survivals are calculated by combination therapy.
If categorization is "therapies", survivals are calculated by therapy classification.

Source code in onconova/research/schemas/analysis.py

@classmethod
def calculate(cls, cohort: Cohort, therapyLine: str, categorization: str) -> "CategorizedSurvivals":
    """
    Calculates survival statistics for a given cohort based on therapy line and categorization.

    Args:
        cohort (Cohort): The cohort of patients to analyze.
        therapyLine (str): The therapy line to consider for the analysis.
        categorization (str): The categorization method, either "drugs" or "therapies".

    Returns:
        (CategorizedSurvivals): An instance of the class with calculated survivals based on the specified categorization.

    Notes:
        - If categorization is "drugs", survivals are calculated by combination therapy.
        - If categorization is "therapies", survivals are calculated by therapy classification.
    """
    if categorization == "drugs":
        return cls(
            survivals=cls._calculate_by_combination_therapy(cohort, therapyLine)
        )
    elif categorization == "therapies":
        return cls(
            survivals=cls._calculate_by_therapy_classification(cohort, therapyLine)
        )
    else:
        raise ValueError(f'Expected categorization to be either `drugs` or `therapies`, but got {categorization}')

`Distribution` ¶

Bases: Schema, AnalysisMetadataMixin

Represents a statistical distribution of trait counts within a cohort.

Attributes:

Name	Type	Description
`items`	`List[CohortTraitCounts]`	The entries in the distribution, each representing a category and its associated counts and percentage.

`items` `class-attribute` `instance-attribute` ¶

`calculate(cohort, property)` `classmethod` ¶

Calculates the distribution of a specified property within a cohort.

Parameters:

Name	Type	Description	Default
`cohort` ¶	`Cohort`	The cohort to analyze.	required
`property` ¶	`str`	The property to calculate distribution for. Supported properties include: - "age" - "ageAtDiagnosis" - "gender" - "neoplasticSites" - "vitalStatus"	required

Returns:

Type	Description
`Distribution`	An instance of Distribution containing items with category, counts, and percentage for each property value.

Raises:

Type	Description
`KeyError`	If the specified property is not supported.

Source code in onconova/research/schemas/analysis.py

@classmethod
def calculate(cls, cohort: Cohort, property: str) -> "Distribution":
    """
    Calculates the distribution of a specified property within a cohort.

    Args:
        cohort (Cohort): The cohort to analyze.
        property (str): The property to calculate distribution for. Supported properties include:
            - "age"
            - "ageAtDiagnosis"
            - "gender"
            - "neoplasticSites"
            - "vitalStatus"

    Returns:
        (Distribution): An instance of Distribution containing items with category, counts, and percentage for each property value.

    Raises:
        KeyError: If the specified property is not supported.
    """
    property_info = dict(
        age={"lookup": "age", "anonymization": anonymize_age},
        ageAtDiagnosis={
            "lookup": "age_at_diagnosis",
            "anonymization": anonymize_age,
        },
        gender={"lookup": "gender__display"},
        neoplasticSites={
            "lookup": "neoplastic_entities__topography_group__display"
        },
        vitalStatus={"lookup": "vital_status"},
    ).get(property)
    return Distribution(
        items=[
            CohortTraitCounts(
                category=category, counts=count, percentage=percentage
            )
            for category, (count, percentage) in (cohort.get_cohort_trait_counts(
                cohort.valid_cases.all(),
                str(property_info["lookup"]),
                anonymization=property_info.get("anonymization"),
            ) if property_info else {}).items()
        ]
    )

`KaplanMeierCurve` ¶

Bases: Schema, AnalysisMetadataMixin

Schema representing a Kaplan-Meier survival curve, including survival probabilities and confidence intervals.

Attributes:

Name	Type	Description
`months`	`List[float]`	List of time points (in months) for survival probability estimates.
`probabilities`	`List[float]`	Survival probabilities at each time point.
`lowerConfidenceBand`	`List[float]`	Lower bound of the survival probability confidence interval at each time point.
`upperConfidenceBand`	`List[float]`	Upper bound of the survival probability confidence interval at each time point.

`lowerConfidenceBand` `class-attribute` `instance-attribute` ¶

`months` `class-attribute` `instance-attribute` ¶

`probabilities` `class-attribute` `instance-attribute` ¶

`upperConfidenceBand` `class-attribute` `instance-attribute` ¶

`calculate(survivals, confidence_level=0.95)` `classmethod` ¶

Performs Kappler-Maier analysis to estimate survival probabilities and 95% confidence intervals and initializes a Kaplan-Meier curve.

Parameters:

Name	Type	Description	Default
`survivals` ¶	`List[float \| None]`	Array containing the number of months survived for each patient.	required
`confidence_level` ¶	`float`	Confidence level for the confidence interval (0.95 default).	`0.95`

Returns:

Name	Type	Description
`KaplanMeierCurve`	`KaplanMeierCurve`	Instance containing the computed survival curve and confidence bands.

Raises:

Type	Description
`ValueError`	If the input survivals list is empty or contains only None values.

Notes:

Uses the analytical Kaplan-Meier estimator 1_ and computes the asymptotic 95%
confidence intervals 2_ using the log-log approach 3_.

References:

.. [1]  https://en.wikipedia.org/wiki/Kapla-Meier_estimator
.. [2]  Fisher, Ronald (1925), Statistical Methods for Research Workers, Table 1
.. [3]  Borgan, Liestøl (1990). Scandinavian Journal of Statistics 17, 35-41

Source code in onconova/research/schemas/analysis.py

@classmethod
def calculate(
    cls,
    survivals: List[float | None],
    confidence_level: float = 0.95,
) -> "KaplanMeierCurve":
    """
    Performs Kappler-Maier analysis to estimate survival probabilities and 95% confidence intervals
    and initializes a Kaplan-Meier curve.

    Args:
        survivals (List[float | None]): Array containing the number of months survived for each
            patient.
        confidence_level (float): Confidence level for the confidence interval (0.95 default).

    Returns:
        KaplanMeierCurve: Instance containing the computed survival curve and confidence bands.

    Raises:
        ValueError: If the input survivals list is empty or contains only None values.

    Notes:

        Uses the analytical Kaplan-Meier estimator 1_ and computes the asymptotic 95%
        confidence intervals 2_ using the log-log approach 3_.

    References:

        .. [1]  https://en.wikipedia.org/wiki/Kapla-Meier_estimator
        .. [2]  Fisher, Ronald (1925), Statistical Methods for Research Workers, Table 1
        .. [3]  Borgan, Liestøl (1990). Scandinavian Journal of Statistics 17, 35-41
    """
    # Remove None values and convert to floats
    _survivals = [float(m) for m in survivals if m is not None]
    # Check that there are values in the array left
    if len(_survivals) == 0:
        raise ValueError("The input argument cannot be empty or None")

    # Round months to integers
    _survivals = [round(m) for m in _survivals]

    # Generate the axis of survived months
    max_month = int(max(_survivals))
    survival_axis = list(range(0, max_month + 1))

    # Determine the number of alive patients along the axis
    alive = [sum(m >= month for m in _survivals) for month in survival_axis]

    # Determine the number of death events along the axis
    events = [sum(m == month for m in _survivals) for month in survival_axis]

    # Truncate the axis regions where nothing more happens
    valid_indices = [i for i, a in enumerate(alive) if a > 0]
    survival_axis = [float(survival_axis[i]) for i in valid_indices]
    alive = [alive[i] for i in valid_indices]
    events = [events[i] for i in valid_indices]

    # Evaluate the KM survival probability estimator
    est_survival_prob = []
    cumulative_product = 1.0

    for e, a in zip(events, alive):
        cumulative_product *= 1 - e / a
        est_survival_prob.append(cumulative_product)

    # Evaluate its standard deviation
    cumulative_sum = 0.0
    std = []
    for p, e, a in zip(est_survival_prob, events, alive):
        if a == 0 or a - e == 0:
            std.append(0.0)
            continue
        increment = e / (a * (a - e))
        if increment <= 0:
            std.append(0.0)
            continue
        cumulative_sum += increment
        std_val = math.sqrt(cumulative_sum / math.log(p) ** 2)
        std.append(std_val)

    # Set the normal inverse CDF value for confidence level
    z = NormalDist().inv_cdf(1 - (1 - confidence_level) / 2)

    # Compute the 95%-confidence intervals
    confidence_bands = {
        "lower": [p ** math.exp(+z * s) for p, s in zip(est_survival_prob, std)],
        "upper": [p ** math.exp(-z * s) for p, s in zip(est_survival_prob, std)],
    }

    return cls(
        # Return the Kaplan-Meier curve
        months=survival_axis,
        probabilities=est_survival_prob,
        lowerConfidenceBand=confidence_bands["lower"],
        upperConfidenceBand=confidence_bands["upper"],
    )

`OncoplotDataset` ¶

Bases: Schema, AnalysisMetadataMixin

Schema representing the dataset required for generating an Oncoplot visualization.

Attributes:

Name	Type	Description
`genes`	`List[str]`	List of the most frequently encountered gene names.
`cases`	`List[str]`	List of patient case identifiers.
`variants`	`List[OncoplotVariant]`	List of variant records included in the Oncoplot.

`cases` `class-attribute` `instance-attribute` ¶

`genes` `class-attribute` `instance-attribute` ¶

`variants` `class-attribute` `instance-attribute` ¶

`calculate(cases)` `classmethod` ¶

Calculates and returns an analysis summary for the given patient cases.

This method performs the following steps:

Retrieves all GenomicVariant objects associated with the provided cases.
Identifies the top 25 most frequently occurring genes among these variants.
Filters variants to include only those associated with the top genes.
Annotates each variant with relevant fields such as pseudoidentifier, gene name, HGVS expression, and pathogenicity.
Constructs and returns an instance of the class with:
- The list of top genes.
- The pseudoidentifiers of the cases.
- A validated list of variant data for plotting or further analysis.

Parameters:

Name	Type	Description	Default
`cases` ¶	`QuerySet[PatientCase]`	A queryset of patient cases to analyze.	required

Returns:

Type	Description
`OncoplotDataset`	An instance of the class containing the analysis results.

Source code in onconova/research/schemas/analysis.py

@classmethod
def calculate(cls, cases: QuerySet[PatientCase]) -> "OncoplotDataset":
    """
    Calculates and returns an analysis summary for the given patient cases.

    This method performs the following steps:

    1. Retrieves all GenomicVariant objects associated with the provided cases.
    2. Identifies the top 25 most frequently occurring genes among these variants.
    3. Filters variants to include only those associated with the top genes.
    4. Annotates each variant with relevant fields such as pseudoidentifier, gene name, HGVS expression, and pathogenicity.
    5. Constructs and returns an instance of the class with:
        - The list of top genes.
        - The pseudoidentifiers of the cases.
        - A validated list of variant data for plotting or further analysis.

    Args:
        cases (QuerySet[PatientCase]): A queryset of patient cases to analyze.

    Returns:
        (OncoplotDataset): An instance of the class containing the analysis results.
    """
    variants = GenomicVariant.objects.filter(case__in=cases)
    genes = [
        gene
        for gene, _ in Counter(
            variants.values_list("genes__display", flat=True)
        ).most_common(25)
    ]
    variants = (
        variants.filter(genes__display__in=genes)
        .annotate(
            pseudoidentifier=F("case__pseudoidentifier"),
            gene=F("genes__display"),
            hgvs_expression=Coalesce(F("protein_hgvs"), F("dna_hgvs"), Value("?")),
        )
        .values("pseudoidentifier", "gene", "hgvs_expression", "is_pathogenic")
    )
    return cls(
        genes=genes,
        cases=list(cases.values_list("pseudoidentifier", flat=True)),
        variants=[OncoplotVariant.model_validate(variant) for variant in variants],
    )

`OncoplotVariant` ¶

Bases: Schema

Schema representing a variant entry for an oncoplot analysis.

Attributes:

Name	Type	Description
`gene`	`str`	The gene symbol associated with the variant.
`caseId`	`str`	Unique identifier for the case, can be provided as 'caseId' or 'pseudoidentifier'.
`hgvsExpression`	`str`	HGVS expression describing the variant, can be provided as 'hgvsExpression' or 'hgvs_expression'.
`isPathogenic`	`Optional[bool]`	Indicates if the variant is pathogenic, can be provided as 'isPathogenic' or 'is_pathogenic'.

`caseId` `class-attribute` `instance-attribute` ¶

`gene` `instance-attribute` ¶

`hgvsExpression` `class-attribute` `instance-attribute` ¶

`isPathogenic` `class-attribute` `instance-attribute` ¶

`TherapyLineCasesDistribution` ¶

Bases: Distribution

Represents the distribution of cases in a cohort based on inclusion in a specific therapy line.

`calculate(cohort, therapyLine)` `classmethod` ¶

Calculates the distribution of cases in a cohort based on inclusion in a specified therapy line.

Parameters:

Name	Type	Description	Default
`cohort` ¶	`Cohort`	The cohort containing valid cases to analyze.	required
`therapyLine` ¶	`str`	The label of the therapy line to filter cases by.	required

Returns:

Type	Description
`TherapyLineCasesDistribution`	A Distribution object containing counts and percentages for cases included and not included in the specified therapy line.

Notes:

- The percentages are rounded to four decimal places.
- Assumes `cohort.valid_cases` is a queryset-like object supporting `count()` and `filter()` methods.

Source code in onconova/research/schemas/analysis.py

@classmethod
def calculate(cls, cohort: Cohort, therapyLine: str) -> "TherapyLineCasesDistribution":
    """
    Calculates the distribution of cases in a cohort based on inclusion in a specified therapy line.

    Args:
        cohort (Cohort): The cohort containing valid cases to analyze.
        therapyLine (str): The label of the therapy line to filter cases by.

    Returns:
        (TherapyLineCasesDistribution): A Distribution object containing counts and percentages for cases included and not included in the specified therapy line.

    Notes:

        - The percentages are rounded to four decimal places.
        - Assumes `cohort.valid_cases` is a queryset-like object supporting `count()` and `filter()` methods.
    """
    total = cohort.valid_cases.count()
    included = cohort.valid_cases.filter(therapy_lines__label=therapyLine).count()
    not_included = total - included
    return TherapyLineCasesDistribution(
        items=[
            CohortTraitCounts(
                category=f"Included in {therapyLine}",
                counts=included,
                percentage=round(included / total * 100.0, 4),
            ),
            CohortTraitCounts(
                category="Not included",
                counts=not_included,
                percentage=round(not_included / total * 100.0, 4),
            ),
        ]
    )

`TherapyLineResponseDistribution` ¶

Bases: Distribution

Represents the distribution of treatment responses for a specific therapy line within a cohort.

`calculate(cohort, therapyLine)` `classmethod` ¶

Calculates the distribution of treatment responses for a specified therapy line within a given cohort.

Parameters:

Name	Type	Description	Default
`cohort` ¶	`Cohort`	The cohort containing valid cases to analyze.	required
`therapyLine` ¶	`str`	The label of the therapy line to filter cases.	required

Returns:

Type	Description
`TherapyLineResponseDistribution`	An object representing the distribution of treatment responses, including counts and percentages for each response category.

Notes:

- Filters cases in the cohort by the specified therapy line.
- Annotates each case with its most recent treatment response during the therapy line period.
- Aggregates and calculates the percentage distribution of response categories.
- Categories with no response are labeled as "Unknown".

Source code in onconova/research/schemas/analysis.py

@classmethod
def calculate(cls, cohort: Cohort, therapyLine: str) -> "TherapyLineResponseDistribution":
    """
    Calculates the distribution of treatment responses for a specified therapy line within a given cohort.

    Args:
        cohort (Cohort): The cohort containing valid cases to analyze.
        therapyLine (str): The label of the therapy line to filter cases.

    Returns:
        (TherapyLineResponseDistribution): An object representing the distribution of treatment responses, including counts and percentages for each response category.

    Notes:

        - Filters cases in the cohort by the specified therapy line.
        - Annotates each case with its most recent treatment response during the therapy line period.
        - Aggregates and calculates the percentage distribution of response categories.
        - Categories with no response are labeled as "Unknown".
    """
    values = (
        cohort.valid_cases.filter(therapy_lines__label=therapyLine)
        .annotate(
            response=Subquery(
                TreatmentResponse.objects.annotate(
                    therapy_line_period=Subquery(
                        TherapyLine.objects.select_properties("period") # type: ignore
                        .filter(case_id=OuterRef("case_id"), label=therapyLine)
                        .values_list("period", flat=True)[:1]
                    )
                )
                .filter(
                    case_id=OuterRef("id"),
                    therapy_line_period__contains=F("date"),
                )
                .order_by("-date")
                .values_list("recist__display", flat=True)[:1]
            )
        )
        .values_list("response", flat=True)
    )
    return TherapyLineResponseDistribution(
        items=[
            CohortTraitCounts(
                category=key or "Unknown",
                counts=count,
                percentage=round(count / values.count() * 100.0, 4),
            )
            for key, count in Counter(values).items()
        ]
    )

2025-10-172025-10-17runner

onconova.research.schemas.analysis

AnalysisMetadata ¶

analyzedAt class-attribute instance-attribute ¶

cohortId class-attribute instance-attribute ¶

cohortPopulation class-attribute instance-attribute ¶

AnalysisMetadataMixin ¶

metadata class-attribute instance-attribute ¶

add_metadata(cohort) ¶

cohort ¶

CategorizedSurvivals ¶

survivals instance-attribute ¶

calculate(cohort, therapyLine, categorization) classmethod ¶

cohort ¶

therapyLine ¶

categorization ¶

Distribution ¶

items class-attribute instance-attribute ¶

calculate(cohort, property) classmethod ¶

cohort ¶

property ¶

KaplanMeierCurve ¶

lowerConfidenceBand class-attribute instance-attribute ¶

months class-attribute instance-attribute ¶

probabilities class-attribute instance-attribute ¶

upperConfidenceBand class-attribute instance-attribute ¶

calculate(survivals, confidence_level=0.95) classmethod ¶

survivals ¶

confidence_level ¶

OncoplotDataset ¶

cases class-attribute instance-attribute ¶

genes class-attribute instance-attribute ¶

variants class-attribute instance-attribute ¶

calculate(cases) classmethod ¶

cases ¶

OncoplotVariant ¶

caseId class-attribute instance-attribute ¶

gene instance-attribute ¶

hgvsExpression class-attribute instance-attribute ¶

isPathogenic class-attribute instance-attribute ¶

TherapyLineCasesDistribution ¶

calculate(cohort, therapyLine) classmethod ¶

cohort ¶

therapyLine ¶

TherapyLineResponseDistribution ¶

calculate(cohort, therapyLine) classmethod ¶

cohort ¶

therapyLine ¶

`onconova.research.schemas.analysis`

`AnalysisMetadata` ¶

`analyzedAt` `class-attribute` `instance-attribute` ¶

`cohortId` `class-attribute` `instance-attribute` ¶

`cohortPopulation` `class-attribute` `instance-attribute` ¶

`AnalysisMetadataMixin` ¶

`metadata` `class-attribute` `instance-attribute` ¶

`add_metadata(cohort)` ¶

`cohort` ¶

`CategorizedSurvivals` ¶

`survivals` `instance-attribute` ¶

`calculate(cohort, therapyLine, categorization)` `classmethod` ¶

`cohort` ¶

`therapyLine` ¶

`categorization` ¶

`Distribution` ¶

`items` `class-attribute` `instance-attribute` ¶

`calculate(cohort, property)` `classmethod` ¶

`cohort` ¶

`property` ¶

`KaplanMeierCurve` ¶

`lowerConfidenceBand` `class-attribute` `instance-attribute` ¶

`months` `class-attribute` `instance-attribute` ¶

`probabilities` `class-attribute` `instance-attribute` ¶

`upperConfidenceBand` `class-attribute` `instance-attribute` ¶

`calculate(survivals, confidence_level=0.95)` `classmethod` ¶

`survivals` ¶

`confidence_level` ¶

`OncoplotDataset` ¶

`cases` `class-attribute` `instance-attribute` ¶

`genes` `class-attribute` `instance-attribute` ¶

`variants` `class-attribute` `instance-attribute` ¶

`calculate(cases)` `classmethod` ¶

`cases` ¶

`OncoplotVariant` ¶

`caseId` `class-attribute` `instance-attribute` ¶

`gene` `instance-attribute` ¶

`hgvsExpression` `class-attribute` `instance-attribute` ¶

`isPathogenic` `class-attribute` `instance-attribute` ¶

`TherapyLineCasesDistribution` ¶

`calculate(cohort, therapyLine)` `classmethod` ¶

`cohort` ¶

`therapyLine` ¶

`TherapyLineResponseDistribution` ¶

`calculate(cohort, therapyLine)` `classmethod` ¶

`cohort` ¶

`therapyLine` ¶