Evaluates the specificity of the Hint of the given instances using the specified neural network model [29].
Parameters:
instances (List[Instance]) – List of instances to evaluate.
**kwargs – Additional keyword arguments.
Returns:
List of specificity scores for each instance.
Return type:
List[List[float]]
Notes
This function stores the scores as Metric objects within the metrics attribute of the Hint, with names based on the model, such as “convergence-specificity-bert-base”.
Examples
>>> fromhinteval.coresimportInstance,Question,Hint,Answer>>> fromhinteval.evaluation.convergenceimportSpecificity>>>>>> specificity=Specificity(model_name='bert-base')>>> instance_1=Instance(... question=Question('What is the capital of Austria?'),... answers=[Answer('Vienna')],... hints=[Hint('This city, once home to Mozart and Beethoven, is the capital of Austria.')])>>> instance_2=Instance(... question=Question('Who was the president of USA in 2009?'),... answers=[Answer('Barack Obama')],... hints=[Hint('He was the first African-American president in U.S. history.')])>>> instances=[instance_1,instance_2]>>> results=specificity.evaluate(instances)>>> print(results)# [[1], [1]]>>> classes=[sent.hints[0].metrics['convergence-specificity-bert-base'].metadata['description']forsentininstances]>>> print(classes)# ['specific', 'specific']>>> metrics=[f'{metric_key}: {metric_value.value}'for... instanceininstances... forhintininstance.hintsformetric_key,metric_valuein... hint.metrics.items()]>>> print(metrics)# ['convergence-specificity-bert-base: 1', 'convergence-specificity-bert-base: 1']
Evaluates the convergence between question and hints of the given instances using the specified neural network model.
Parameters:
instances (List[Instance]) – List of instances to evaluate.
**kwargs – Additional keyword arguments.
Returns:
List of convergence scores for each instance.
Return type:
List[List[float]]
Notes
This function stores the scores as Metric objects within the metrics attribute of the Hint, with names based on the model, such as “convergence-nn-bert-base”.
Examples
>>> fromhinteval.coresimportInstance,Question,Hint,Answer>>> fromhinteval.evaluation.convergenceimportNeuralNetworkBased>>>>>> neural_network=NeuralNetworkBased(model_name='bert-base')>>> instance_1=Instance(... question=Question('What is the capital of Austria?'),... answers=[Answer('Vienna')],... hints=[Hint('This city, once home to Mozart and Beethoven, is the capital of Austria.')])>>> instance_2=Instance(... question=Question('Who was the president of USA in 2009?'),... answers=[Answer('Barack Obama')],... hints=[Hint('He was named the 2009 Nobel Peace Prize laureate')])>>> instances=[instance_1,instance_2]>>> results=neural_network.evaluate(instances)>>> print(results)# [[1.0], [1.0]]>>> metrics=[f'{metric_key}: {metric_value.value}'for... instanceininstances... forhintininstance.hintsformetric_key,metric_valuein... hint.metrics.items()]>>> print(metrics)# ['convergence-nn-bert-base: 1.0', 'convergence-nn-bert-base: 1.0']
Evaluates the convergence between question and hints of the given instances using the specified large language model [32].
Parameters:
instances (List[Instance]) – List of instances to evaluate.
**kwargs – Additional keyword arguments.
Returns:
List of convergence scores for each instance.
Return type:
List[List[float]]
Notes
This function stores the scores as Metric objects within the metrics attribute of the Hint, with names based on the model, such as “convergence-llm-llama-3-8b”.
This function also stores the candidate answers in the metadata of the Question. Moreover, it stores the scores for each hint in the metadata attribute of the Hint.
Examples
>>> fromhinteval.coresimportInstance,Question,Hint,Answer>>> fromhinteval.evaluation.convergenceimportLlmBased>>>>>> llm=LlmBased(model_name='llama-3-8b',together_ai_api_key='your_api_key')>>> instance_1=Instance(... question=Question('What is the capital of Austria?'),... answers=[Answer('Vienna')],... hints=[Hint('This city, once home to Mozart and Beethoven, is the capital of Austria.')])>>> instance_2=Instance(... question=Question('Who was the president of USA in 2009?'),... answers=[Answer('Barack Obama')],... hints=[Hint('He was the first African-American president in U.S. history.')])>>> instances=[instance_1,instance_2]>>> results=llm.evaluate(instances)>>> print(results)# [[0.91], [1.0]]>>> metrics=[f'{metric_key}: {metric_value.value}'for... instanceininstances... forhintininstance.hintsformetric_key,metric_valuein... hint.metrics.items()]>>> print(metrics)# ['convergence-llm-llama-3-8b: 0.91', 'convergence-llm-llama-3-8b: 1.0']>>> scores=[hint.metrics['convergence-llm-llama-3-8b'].metadata['scores']forinstininstancesforhintininst.hints]>>> print(scores[0])# {'Salzburg': 1, 'Graz': 0, 'Innsbruck': 0, 'Linz': 0, 'Klagenfurt': 0, 'Bregenz': 0, 'Wels': 0, 'St. Pölten': 0, 'Eisenstadt': 0, 'Sankt Johann impong': 0, 'Vienna': 1}>>> print(scores[1])# {'George W. Bush': 0, 'Bill Clinton': 0, 'Jimmy Carter': 0, 'Donald Trump': 0, 'Joe Biden': 0, 'Ronald Reagan': 0, 'Richard Nixon': 0, 'Gerald Ford': 0, 'Franklin D. Roosevelt': 0, 'Theodore Roosevelt': 0, 'Barack Obama': 1}