Convergence¶

class hinteval.cores.evaluation_metrics.convergence.Specificity(model_name: Literal['bert-base', 'roberta-large'] = 'bert-base', batch_size: int = 256, checkpoint: bool = False, checkpoint_step: int = 1, force_download=False, enable_tqdm=False)¶

Class for evaluating specificity of Hint using neural network models such as BERT and RoBERTa [27].

checkpoint¶

Whether checkpointing is enabled.

Type:: bool

checkpoint_step¶

Step interval for checkpointing.

Type:: int

enable_tqdm¶

Whether the tqdm progress bar is enabled.

Type:: bool

References

See also

NeuralNetworkBased: Class for evaluating convergence between question and hints using neural network models such as BERT and RoBERTa.
LlmBased: Class for evaluating convergence between question and hints using large language models such as LLaMA-3-8b and LLaMA-3-70b models.

evaluate(instances: List[Instance], **kwargs) → List[List[float]]¶

Evaluates the specificity of the Hint of the given instances using the specified neural network model [29].

Parameters:

instances (List[Instance]) – List of instances to evaluate.
**kwargs – Additional keyword arguments.

Returns:

List of specificity scores for each instance.

Return type:

List[List[float]]

Notes

This function stores the scores as Metric objects within the metrics attribute of the Hint, with names based on the model, such as “convergence-specificity-bert-base”.

Examples

>>> from hinteval.cores import Instance, Question, Hint, Answer
>>> from hinteval.evaluation.convergence import Specificity
>>>
>>> specificity = Specificity(model_name='bert-base')
>>> instance_1 = Instance(
...     question=Question('What is the capital of Austria?'),
...     answers=[Answer('Vienna')],
...     hints=[Hint('This city, once home to Mozart and Beethoven, is the capital of Austria.')])
>>> instance_2 = Instance(
...     question=Question('Who was the president of USA in 2009?'),
...     answers=[Answer('Barack Obama')],
...     hints=[Hint('He was the first African-American president in U.S. history.')])
>>> instances = [instance_1, instance_2]
>>> results = specificity.evaluate(instances)
>>> print(results)
# [[1], [1]]
>>> classes = [sent.hints[0].metrics['convergence-specificity-bert-base'].metadata['description'] for sent in instances]
>>> print(classes)
# ['specific', 'specific']
>>> metrics = [f'{metric_key}: {metric_value.value}' for
...            instance in instances
...            for hint in instance.hints for metric_key, metric_value in
...            hint.metrics.items()]
>>> print(metrics)
# ['convergence-specificity-bert-base: 1', 'convergence-specificity-bert-base: 1']

References

See also

NeuralNetworkBased: Class for evaluating convergence between question and hints using neural network models such as BERT and RoBERTa.
LlmBased: Class for evaluating convergence between question and hints using large language models such as LLaMA-3-8b and LLaMA-3-70b models.

release_memory()¶

Releases the memory used by the class instance.

This method deletes the instance of the class and triggers garbage collection to free up memory.

Examples

>>> from hinteval.evaluation.familiarity import Wikipedia
>>>
>>> wikipedia = Wikipedia(spacy_pipeline='en_core_web_sm')
>>> wikipedia.release_memory()

class hinteval.cores.evaluation_metrics.convergence.NeuralNetworkBased(model_name: Literal['bert-base', 'roberta-large'] = 'bert-base', batch_size: int = 256, checkpoint: bool = False, checkpoint_step: int = 1, force_download=False, enable_tqdm=False)¶

Class for evaluating convergence between question and hints using neural network models such as BERT and RoBERTa.

checkpoint¶

Whether checkpointing is enabled.

Type:: bool

checkpoint_step¶

Step interval for checkpointing.

Type:: int

enable_tqdm¶

Whether the tqdm progress bar is enabled.

Type:: bool

See also

Specificity: Class for evaluating specificity of Hint using neural network models such as BERT and RoBERTa models.
LlmBased: Class for evaluating convergence between question and hints using large language models such as LLaMA-3-8b and LLaMA-3-70b models.

evaluate(instances: List[Instance], **kwargs) → List[List[float]]¶

Evaluates the convergence between question and hints of the given instances using the specified neural network model.

Parameters:

instances (List[Instance]) – List of instances to evaluate.
**kwargs – Additional keyword arguments.

Returns:

List of convergence scores for each instance.

Return type:

List[List[float]]

Notes

This function stores the scores as Metric objects within the metrics attribute of the Hint, with names based on the model, such as “convergence-nn-bert-base”.

Examples

>>> from hinteval.cores import Instance, Question, Hint, Answer
>>> from hinteval.evaluation.convergence import NeuralNetworkBased
>>>
>>> neural_network = NeuralNetworkBased(model_name='bert-base')
>>> instance_1 = Instance(
...     question=Question('What is the capital of Austria?'),
...     answers=[Answer('Vienna')],
...     hints=[Hint('This city, once home to Mozart and Beethoven, is the capital of Austria.')])
>>> instance_2 = Instance(
...     question=Question('Who was the president of USA in 2009?'),
...     answers=[Answer('Barack Obama')],
...     hints=[Hint('He was named the 2009 Nobel Peace Prize laureate')])
>>> instances = [instance_1, instance_2]
>>> results = neural_network.evaluate(instances)
>>> print(results)
# [[1.0], [1.0]]
>>> metrics = [f'{metric_key}: {metric_value.value}' for
...            instance in instances
...            for hint in instance.hints for metric_key, metric_value in
...            hint.metrics.items()]
>>> print(metrics)
# ['convergence-nn-bert-base: 1.0', 'convergence-nn-bert-base: 1.0']

See also

Specificity: Class for evaluating specificity of Hint using neural network models such as BERT and RoBERTa models.
LlmBased: Class for evaluating convergence between question and hints using large language models such as LLaMA-3-8b and LLaMA-3-70b models.

release_memory()¶

Releases the memory used by the class instance.

This method deletes the instance of the class and triggers garbage collection to free up memory.

Examples

>>> from hinteval.evaluation.familiarity import Wikipedia
>>>
>>> wikipedia = Wikipedia(spacy_pipeline='en_core_web_sm')
>>> wikipedia.release_memory()

class hinteval.cores.evaluation_metrics.convergence.LlmBased(model_name: Literal['llama-3-8b', 'llama-3-70b'] = 'llama-3-8b', together_ai_api_key: str = None, checkpoint: bool = False, checkpoint_step: int = 1, enable_tqdm=False)¶

Class for evaluating convergence between question and hints using large language models such as LLaMA-3-8b and LLaMA-3-70b [30].

checkpoint¶

Whether checkpointing is enabled.

Type:: bool

checkpoint_step¶

Step interval for checkpointing.

Type:: int

enable_tqdm¶

Whether the tqdm progress bar is enabled.

Type:: bool

References

See also

Specificity: Class for evaluating specificity of Hint using neural network models such as BERT and RoBERTa models.
NeuralNetworkBased: Class for evaluating convergence between question and hints using neural network models such as BERT and RoBERTa.

evaluate(instances: List[Instance], **kwargs) → List[List[float]]¶

Evaluates the convergence between question and hints of the given instances using the specified large language model [32].

Parameters:

instances (List[Instance]) – List of instances to evaluate.
**kwargs – Additional keyword arguments.

Returns:

List of convergence scores for each instance.

Return type:

List[List[float]]

Notes

This function stores the scores as Metric objects within the metrics attribute of the Hint, with names based on the model, such as “convergence-llm-llama-3-8b”.

This function also stores the candidate answers in the metadata of the Question. Moreover, it stores the scores for each hint in the metadata attribute of the Hint.

Examples

>>> from hinteval.cores import Instance, Question, Hint, Answer
>>> from hinteval.evaluation.convergence import LlmBased
>>>
>>> llm = LlmBased(model_name='llama-3-8b', together_ai_api_key='your_api_key')
>>> instance_1 = Instance(
...     question=Question('What is the capital of Austria?'),
...     answers=[Answer('Vienna')],
...     hints=[Hint('This city, once home to Mozart and Beethoven, is the capital of Austria.')])
>>> instance_2 = Instance(
...     question=Question('Who was the president of USA in 2009?'),
...     answers=[Answer('Barack Obama')],
...     hints=[Hint('He was the first African-American president in U.S. history.')])
>>> instances = [instance_1, instance_2]
>>> results = llm.evaluate(instances)
>>> print(results)
# [[0.91], [1.0]]
>>> metrics = [f'{metric_key}: {metric_value.value}' for
...        instance in instances
...        for hint in instance.hints for metric_key, metric_value in
...        hint.metrics.items()]
>>> print(metrics)
# ['convergence-llm-llama-3-8b: 0.91', 'convergence-llm-llama-3-8b: 1.0']
>>> scores = [hint.metrics['convergence-llm-llama-3-8b'].metadata['scores'] for inst in instances for hint in inst.hints]
>>> print(scores[0])
# {'Salzburg': 1, 'Graz': 0, 'Innsbruck': 0, 'Linz': 0, 'Klagenfurt': 0, 'Bregenz': 0, 'Wels': 0, 'St. Pölten': 0, 'Eisenstadt': 0, 'Sankt Johann impong': 0, 'Vienna': 1}
>>> print(scores[1])
# {'George W. Bush': 0, 'Bill Clinton': 0, 'Jimmy Carter': 0, 'Donald Trump': 0, 'Joe Biden': 0, 'Ronald Reagan': 0, 'Richard Nixon': 0, 'Gerald Ford': 0, 'Franklin D. Roosevelt': 0, 'Theodore Roosevelt': 0, 'Barack Obama': 1}

References

See also

Specificity: Class for evaluating specificity of Hint using neural network models such as BERT and RoBERTa models.
NeuralNetworkBased: Class for evaluating convergence between question and hints using neural network models such as BERT and RoBERTa.

release_memory()¶

Releases the memory used by the class instance.

This method deletes the instance of the class and triggers garbage collection to free up memory.

Examples

>>> from hinteval.evaluation.familiarity import Wikipedia
>>>
>>> wikipedia = Wikipedia(spacy_pipeline='en_core_web_sm')
>>> wikipedia.release_memory()