Convergence

class hinteval.cores.evaluation_metrics.convergence.Specificity(model_name: Literal['bert-base', 'roberta-large'] = 'bert-base', batch_size: int = 256, checkpoint: bool = False, checkpoint_step: int = 1, force_download=False, enable_tqdm=False)

Class for evaluating specificity of Hint using neural network models such as BERT and RoBERTa [27].

checkpoint

Whether checkpointing is enabled.

Type:

bool

checkpoint_step

Step interval for checkpointing.

Type:

int

enable_tqdm

Whether the tqdm progress bar is enabled.

Type:

bool

References

See also

NeuralNetworkBased

Class for evaluating convergence between question and hints using neural network models such as BERT and RoBERTa.

LlmBased

Class for evaluating convergence between question and hints using large language models such as LLaMA-3-8b and LLaMA-3-70b models.

evaluate(instances: List[Instance], **kwargs) List[List[float]]

Evaluates the specificity of the Hint of the given instances using the specified neural network model [29].

Parameters:
  • instances (List[Instance]) – List of instances to evaluate.

  • **kwargs – Additional keyword arguments.

Returns:

List of specificity scores for each instance.

Return type:

List[List[float]]

Notes

This function stores the scores as Metric objects within the metrics attribute of the Hint, with names based on the model, such as “convergence-specificity-bert-base”.

Examples

>>> from hinteval.cores import Instance, Question, Hint, Answer
>>> from hinteval.evaluation.convergence import Specificity
>>>
>>> specificity = Specificity(model_name='bert-base')
>>> instance_1 = Instance(
...     question=Question('What is the capital of Austria?'),
...     answers=[Answer('Vienna')],
...     hints=[Hint('This city, once home to Mozart and Beethoven, is the capital of Austria.')])
>>> instance_2 = Instance(
...     question=Question('Who was the president of USA in 2009?'),
...     answers=[Answer('Barack Obama')],
...     hints=[Hint('He was the first African-American president in U.S. history.')])
>>> instances = [instance_1, instance_2]
>>> results = specificity.evaluate(instances)
>>> print(results)
# [[1], [1]]
>>> classes = [sent.hints[0].metrics['convergence-specificity-bert-base'].metadata['description'] for sent in instances]
>>> print(classes)
# ['specific', 'specific']
>>> metrics = [f'{metric_key}: {metric_value.value}' for
...            instance in instances
...            for hint in instance.hints for metric_key, metric_value in
...            hint.metrics.items()]
>>> print(metrics)
# ['convergence-specificity-bert-base: 1', 'convergence-specificity-bert-base: 1']

References

See also

NeuralNetworkBased

Class for evaluating convergence between question and hints using neural network models such as BERT and RoBERTa.

LlmBased

Class for evaluating convergence between question and hints using large language models such as LLaMA-3-8b and LLaMA-3-70b models.

release_memory()

Releases the memory used by the class instance.

This method deletes the instance of the class and triggers garbage collection to free up memory.

Examples

>>> from hinteval.evaluation.familiarity import Wikipedia
>>>
>>> wikipedia = Wikipedia(spacy_pipeline='en_core_web_sm')
>>> wikipedia.release_memory()
class hinteval.cores.evaluation_metrics.convergence.NeuralNetworkBased(model_name: Literal['bert-base', 'roberta-large'] = 'bert-base', batch_size: int = 256, checkpoint: bool = False, checkpoint_step: int = 1, force_download=False, enable_tqdm=False)

Class for evaluating convergence between question and hints using neural network models such as BERT and RoBERTa.

checkpoint

Whether checkpointing is enabled.

Type:

bool

checkpoint_step

Step interval for checkpointing.

Type:

int

enable_tqdm

Whether the tqdm progress bar is enabled.

Type:

bool

See also

Specificity

Class for evaluating specificity of Hint using neural network models such as BERT and RoBERTa models.

LlmBased

Class for evaluating convergence between question and hints using large language models such as LLaMA-3-8b and LLaMA-3-70b models.

evaluate(instances: List[Instance], **kwargs) List[List[float]]

Evaluates the convergence between question and hints of the given instances using the specified neural network model.

Parameters:
  • instances (List[Instance]) – List of instances to evaluate.

  • **kwargs – Additional keyword arguments.

Returns:

List of convergence scores for each instance.

Return type:

List[List[float]]

Notes

This function stores the scores as Metric objects within the metrics attribute of the Hint, with names based on the model, such as “convergence-nn-bert-base”.

Examples

>>> from hinteval.cores import Instance, Question, Hint, Answer
>>> from hinteval.evaluation.convergence import NeuralNetworkBased
>>>
>>> neural_network = NeuralNetworkBased(model_name='bert-base')
>>> instance_1 = Instance(
...     question=Question('What is the capital of Austria?'),
...     answers=[Answer('Vienna')],
...     hints=[Hint('This city, once home to Mozart and Beethoven, is the capital of Austria.')])
>>> instance_2 = Instance(
...     question=Question('Who was the president of USA in 2009?'),
...     answers=[Answer('Barack Obama')],
...     hints=[Hint('He was named the 2009 Nobel Peace Prize laureate')])
>>> instances = [instance_1, instance_2]
>>> results = neural_network.evaluate(instances)
>>> print(results)
# [[1.0], [1.0]]
>>> metrics = [f'{metric_key}: {metric_value.value}' for
...            instance in instances
...            for hint in instance.hints for metric_key, metric_value in
...            hint.metrics.items()]
>>> print(metrics)
# ['convergence-nn-bert-base: 1.0', 'convergence-nn-bert-base: 1.0']

See also

Specificity

Class for evaluating specificity of Hint using neural network models such as BERT and RoBERTa models.

LlmBased

Class for evaluating convergence between question and hints using large language models such as LLaMA-3-8b and LLaMA-3-70b models.

release_memory()

Releases the memory used by the class instance.

This method deletes the instance of the class and triggers garbage collection to free up memory.

Examples

>>> from hinteval.evaluation.familiarity import Wikipedia
>>>
>>> wikipedia = Wikipedia(spacy_pipeline='en_core_web_sm')
>>> wikipedia.release_memory()
class hinteval.cores.evaluation_metrics.convergence.LlmBased(model_name: Literal['llama-3-8b', 'llama-3-70b'] = 'llama-3-8b', together_ai_api_key: str = None, checkpoint: bool = False, checkpoint_step: int = 1, enable_tqdm=False)

Class for evaluating convergence between question and hints using large language models such as LLaMA-3-8b and LLaMA-3-70b [30].

checkpoint

Whether checkpointing is enabled.

Type:

bool

checkpoint_step

Step interval for checkpointing.

Type:

int

enable_tqdm

Whether the tqdm progress bar is enabled.

Type:

bool

References

See also

Specificity

Class for evaluating specificity of Hint using neural network models such as BERT and RoBERTa models.

NeuralNetworkBased

Class for evaluating convergence between question and hints using neural network models such as BERT and RoBERTa.

evaluate(instances: List[Instance], **kwargs) List[List[float]]

Evaluates the convergence between question and hints of the given instances using the specified large language model [32].

Parameters:
  • instances (List[Instance]) – List of instances to evaluate.

  • **kwargs – Additional keyword arguments.

Returns:

List of convergence scores for each instance.

Return type:

List[List[float]]

Notes

This function stores the scores as Metric objects within the metrics attribute of the Hint, with names based on the model, such as “convergence-llm-llama-3-8b”.

This function also stores the candidate answers in the metadata of the Question. Moreover, it stores the scores for each hint in the metadata attribute of the Hint.

Examples

>>> from hinteval.cores import Instance, Question, Hint, Answer
>>> from hinteval.evaluation.convergence import LlmBased
>>>
>>> llm = LlmBased(model_name='llama-3-8b', together_ai_api_key='your_api_key')
>>> instance_1 = Instance(
...     question=Question('What is the capital of Austria?'),
...     answers=[Answer('Vienna')],
...     hints=[Hint('This city, once home to Mozart and Beethoven, is the capital of Austria.')])
>>> instance_2 = Instance(
...     question=Question('Who was the president of USA in 2009?'),
...     answers=[Answer('Barack Obama')],
...     hints=[Hint('He was the first African-American president in U.S. history.')])
>>> instances = [instance_1, instance_2]
>>> results = llm.evaluate(instances)
>>> print(results)
# [[0.91], [1.0]]
>>> metrics = [f'{metric_key}: {metric_value.value}' for
...        instance in instances
...        for hint in instance.hints for metric_key, metric_value in
...        hint.metrics.items()]
>>> print(metrics)
# ['convergence-llm-llama-3-8b: 0.91', 'convergence-llm-llama-3-8b: 1.0']
>>> scores = [hint.metrics['convergence-llm-llama-3-8b'].metadata['scores'] for inst in instances for hint in inst.hints]
>>> print(scores[0])
# {'Salzburg': 1, 'Graz': 0, 'Innsbruck': 0, 'Linz': 0, 'Klagenfurt': 0, 'Bregenz': 0, 'Wels': 0, 'St. Pölten': 0, 'Eisenstadt': 0, 'Sankt Johann impong': 0, 'Vienna': 1}
>>> print(scores[1])
# {'George W. Bush': 0, 'Bill Clinton': 0, 'Jimmy Carter': 0, 'Donald Trump': 0, 'Joe Biden': 0, 'Ronald Reagan': 0, 'Richard Nixon': 0, 'Gerald Ford': 0, 'Franklin D. Roosevelt': 0, 'Theodore Roosevelt': 0, 'Barack Obama': 1}

References

See also

Specificity

Class for evaluating specificity of Hint using neural network models such as BERT and RoBERTa models.

NeuralNetworkBased

Class for evaluating convergence between question and hints using neural network models such as BERT and RoBERTa.

release_memory()

Releases the memory used by the class instance.

This method deletes the instance of the class and triggers garbage collection to free up memory.

Examples

>>> from hinteval.evaluation.familiarity import Wikipedia
>>>
>>> wikipedia = Wikipedia(spacy_pipeline='en_core_web_sm')
>>> wikipedia.release_memory()