Skip to content

onconova.research.compilers

DATASET_ROOT_FIELDS module-attribute

AggregationNode(key, annotation_nodes=list(), nested_aggregation_nodes=list(), aggregated_model=None, aggregated_model_parent_related_name=None) dataclass

Represents an aggregation node with a key and an associated list of annotation nodes and/or nested aggregation nodes.

Attributes:

Name Type Description
key str

The unique identifier for the aggregation.

annotation_nodes List[AnnotationNode]

The annotations associated with the aggregation.

nested_aggregation_nodes List[AggregationNode]

The nested aggregations associated with the aggregation.

aggregated_model Model

The Django model that the aggregation operates on.

aggregations_model_related_name str

The related name of the model that the aggregation operates on.

aggregated_model class-attribute instance-attribute

aggregated_subquery property

annotation_nodes class-attribute instance-attribute

annotations property

Returns a dictionary of annotations for the aggregation.

The returned dictionary contains the keys of the annotations as specified in the annotation nodes and the values are the corresponding Django ORM expressions.

If the aggregation node has nested aggregation nodes, their annotations are also included in the returned dictionary.

key instance-attribute

nested_aggregation_nodes class-attribute instance-attribute

subquery property

Returns a subquery that can be used to create a Django ORM expression which will annotate a queryset with the aggregated results of the annotation nodes.

The returned subquery aggregates the annotations of the annotation nodes and returns a single value of type JSONB which contains the aggregated results.

The subquery is constructed by annotating the aggregated model with a JSONB object that contains the aggregated results of the annotation nodes. The subquery is then filtered to only include the related objects specified by the aggregations_model_related_name.

The subquery is annotated with a single field named related_json_object which contains the JSONB object with the aggregated results.

If the aggregation node has nested aggregation nodes, their annotations are also included in the returned JSONB object.

Raises:

Type Description
AttributeError

If the aggregation node's subquery cannot be constructed without an aggregated model and its related name.

AttributeError

If the aggregation node's subquery cannot be constructed without annotations.

add_annotation_node(key, expression)

Adds an annotation node to the current aggregation node.

The annotation node is constructed from the given key and expression.

The added annotation node is included in the annotations of the current aggregation node.

Parameters:

Name Type Description Default

key

str

The key to use for the annotation node.

required

expression

Expression

The expression to use for the annotation node.

required

Returns:

Type Description
None

None

Source code in onconova/research/compilers.py
def add_annotation_node(self, key: str, expression: Expression) -> None:
    """
    Adds an annotation node to the current aggregation node.

    The annotation node is constructed from the given key and expression.

    The added annotation node is included in the annotations of the
    current aggregation node.

    Args:
        key: The key to use for the annotation node.
        expression: The expression to use for the annotation node.

    Returns:
        None
    """
    self.annotation_nodes.append(AnnotationNode(key, expression))

add_nested_aggregation_node(node)

Adds a nested aggregation node to the current aggregation node.

The added nested aggregation node is included in the annotations of the current aggregation node.

Parameters:

Name Type Description Default

node

AggregationNode

The nested aggregation node to add.

required

Returns:

Type Description
None

None

Source code in onconova/research/compilers.py
def add_nested_aggregation_node(self, node: "AggregationNode") -> None:
    """
    Adds a nested aggregation node to the current aggregation node.

    The added nested aggregation node is included in the annotations of the
    current aggregation node.

    Args:
        node: The nested aggregation node to add.

    Returns:
        None
    """
    self.nested_aggregation_nodes.append(node)

AnnotationCompiler(rules)

Compiles a list of dataset rules into an aggregation tree and generates the corresponding Django ORM annotations.

The tree is built by grouping rules by their resource models and creating an AggregationNode for each group. Annotation nodes are then added to the corresponding AggregationNode. The tree is built recursively by processing child rules for each node.

Parameters:

Name Type Description Default

rules

List[DatasetRule]

A list of dataset rules

required
Source code in onconova/research/compilers.py
def __init__(self, rules: List[DatasetRule]):
    """
    Initializes the AnnotationCompiler with a list of dataset rules.

    Args:
        rules: A list of dataset rules
    """
    self.rules = [DatasetRuleProcessor(rule) for rule in rules]
    self.aggregation_nodes: List[AggregationNode] = self._build_aggregation_tree(
        self.rules
    )

aggregation_nodes instance-attribute

rules instance-attribute

generate_annotations()

Generates the Django ORM annotations for the dataset.

The annotations are generated by traversing the aggregation tree and building a dictionary of annotations. The dictionary contains the annotations in one of three forms.

Case 1: PatientCase properties at root of dataset. The key is the name of the property and the value is the Django ORM expression for the property.

Case 2: Nested resources. The key is the name of the nested resource and the value is the subquery for the nested resource.

Case 3: Simple annotations. The key is the name of the annotation and the value is the Django ORM expression for the annotation.

Returns:

Type Description
tuple[dict, list]

A tuple of two elements. The first element is a dictionary of annotations and the second element is a list of field names.

Source code in onconova/research/compilers.py
def generate_annotations(self) -> Tuple[Dict[str, Expression], List[str]]:
    """
    Generates the Django ORM annotations for the dataset.

    The annotations are generated by traversing the aggregation tree
    and building a dictionary of annotations. The dictionary contains
    the annotations in one of three forms.

    Case 1: PatientCase properties at root of dataset. The key is the name
    of the property and the value is the Django ORM expression for the
    property.

    Case 2: Nested resources. The key is the name of the nested resource and
    the value is the subquery for the nested resource.

    Case 3: Simple annotations. The key is the name of the annotation and the
    value is the Django ORM expression for the annotation.

    Returns:
        (tuple[dict, list]): A tuple of two elements. The first element is a dictionary of annotations and the second element is a list of field names.
    """
    annotations = {}
    queryset_fields = ["pseudoidentifier"]
    for aggregation_node in self.aggregation_nodes:
        # Case 1: PatientCase properties at root of dataset
        if not aggregation_node.key:
            for annotation_node in aggregation_node.annotation_nodes:
                if annotation_node.key not in DATASET_ROOT_FIELDS:
                    annotations[annotation_node.key] = annotation_node.expression

                if annotation_node.key not in queryset_fields:
                    queryset_fields.append(annotation_node.key)
        elif aggregation_node.annotations:
            aggregation_node.key = aggregation_node.key + "_resources"
            annotations[aggregation_node.key] = aggregation_node.aggregated_subquery
            queryset_fields.append(aggregation_node.key)
    # Remove duplicates
    return annotations, queryset_fields

AnnotationNode(key, expression) dataclass

Represents an annotation node with a key and an associated expression.

Attributes:

Name Type Description
key str

The unique identifier for the annotation.

expression Expression

The Django ORM expression associated with the annotation.

expression instance-attribute

key instance-attribute

DatasetRuleProcessingError

Bases: RuntimeError

DatasetRuleProcessor(rule)

Processes individual dataset rules and extracts necessary query information.

Source code in onconova/research/compilers.py
def __init__(self, rule: DatasetRule):
    self.schema_field = rule.field
    # Get the schema specified by the rule
    schema = self._get_schema(rule.resource.value)
    self.resource_model = self._get_orm_model(schema)
    # Resolve the related
    if self.resource_model == PatientCase:
        self.parent_model = None
    elif hasattr(self.resource_model, "case"):
        self.parent_model = PatientCase
    else:
        self.parent_model = next(
            (
                field.related_model
                for field in self.resource_model._meta.get_fields()
                if field.related_model and hasattr(field.related_model, "case")
            )
        )
    # Get other values
    self.model_field_name = self._get_model_field_name(schema)
    self.model_field = self._get_model_field(
        self.resource_model, self.model_field_name
    )
    self.value_transformer = self._get_transformer(rule.transform)

annotation_key property

Returns a unique key used in dataset query annotations.

field_annotation property

Returns the Django ORM annotation for this dataset field.

model_field instance-attribute

model_field_name instance-attribute

parent_model instance-attribute

query_lookup_path property

Generates the Django ORM lookup path for querying the dataset field.

related_model_annotation_key property

Determines the Django ORM lookup for related models.

resource_model instance-attribute

schema_field instance-attribute

value_transformer instance-attribute

QueryCompiler(cohort, rules)

Compiles a dataset query based on user-defined rules

QueryCompiler takes a cohort and a set of rules as input, and returns a QuerySet representing the dataset for that cohort.

Attributes:

Name Type Description
cohort Cohort

The cohort to generate the dataset for

rules List[DatasetRule]

The user-defined rules for generating the dataset

Source code in onconova/research/compilers.py
def __init__(self, cohort, rules: List[DatasetRule]):
    self.cohort = cohort
    self.rule_compiler = AnnotationCompiler(rules)

cohort instance-attribute

rule_compiler instance-attribute

compile()

Compiles a QuerySet based on the rules provided

Returns:

Name Type Description
QuerySet QuerySet

The dataset for the cohort

Source code in onconova/research/compilers.py
def compile(self) -> QuerySet:
    """
    Compiles a QuerySet based on the rules provided

    Returns:
        QuerySet: The dataset for the cohort
    """
    annotations, queryset_fields = self.rule_compiler.generate_annotations()
    return self.cohort.valid_cases.annotate(**annotations).values(*queryset_fields)

construct_dataset(cohort, rules)

Compiles a QuerySet based on the rules provided

Parameters:

Name Type Description Default

cohort

Cohort

The cohort to generate the dataset for

required

rules

List[DatasetRule]

The user-defined rules for generating the dataset

required

Returns:

Name Type Description
QuerySet QuerySet

The dataset for the cohort

Source code in onconova/research/compilers.py
def construct_dataset(cohort, rules: List[DatasetRule]) -> QuerySet:
    """
    Compiles a QuerySet based on the rules provided

    Args:
        cohort (onconova.cohorts.models.Cohort): The cohort to generate the
            dataset for
        rules (List[DatasetRule]): The user-defined rules for generating the
            dataset

    Returns:
        QuerySet: The dataset for the cohort
    """
    return QueryCompiler(cohort, rules).compile()
runner