Dataset

class hinteval.cores.dataset.dataset.Dataset(name: str = None, url: str = None, version: str = None, description: str = None, metadata: Dict[str, str | int | float] = None)

A class to represent a dataset, including its subsets and associated metadata.

name

The name of the dataset.

Type:

str

url

The URL where the dataset can be accessed.

Type:

str

version

The version of the dataset.

Type:

str

description

A description of the dataset.

Type:

str

metadata

Additional metadata about the dataset.

Type:

dict[str, Union[str,int, float]]

add_subset(subset: Subset)

Adds a subset to the dataset.

Parameters:

subset (Subset) – The subset to be added to the dataset.

Raises:

ValueError – If the subset name already exists in the dataset.

Examples

>>> from hinteval.cores import Subset
>>> from hinteval import Dataset
>>>
>>> dataset = Dataset('Hint_Dataset')
>>> subset = Subset(name='training_set')
>>> dataset.add_subset(subset)
>>> print(dataset["training_set"].name)
# training_set

See also

remove_subset

Removes a subset from the dataset.

get_subset

Retrieves a subset by name.

classmethod available_datasets(show_info=False, update=False) Dict

Retrieves a list of available datasets to download.

Parameters:
  • update (bool, optional) – Whether to update the dataset list from the remote source (default is False).

  • show_info (bool, optional) – Whether to show the information about the available datasets (default is False).

Returns:

A dictionary of available datasets.

Return type:

dict

Raises:

Exception – If failed to load datasets from the remote source.

Examples

>>> from hinteval import Dataset
>>>
>>> available_datasets = Dataset.available_datasets()
>>> print(available_datasets['triviahg']['description'])
# TriviaHG is an extensive dataset crafted specifically for hint generation in question answering.

See also

download_and_load_dataset

Loads a dataset from a local cache or downloads it if not available locally.

classmethod download_and_load_dataset(name, force_download=False)

Loads a dataset from a local cache or downloads it if not available locally.

Parameters:
  • name (str) – The name of the dataset to load.

  • force_download (bool, optional) – Whether to force download the dataset even if it already exists locally (default is False).

Returns:

A new Dataset object initialized from the loaded data.

Return type:

Dataset

Raises:
  • FileNotFoundError – If the dataset name does not exist in the available datasets.

  • Exception – If the dataset fails to download.

Examples

>>> from hinteval import Dataset
>>>
>>> dataset = Dataset.download_and_load_dataset('triviahg', force_download=True)
>>> print(dataset.description)
# TriviaHG is an extensive dataset crafted specifically for hint generation in question answering.

See also

available_datasets

Retrieves a list of available datasets to download.

load_json

Loads a Dataset instance from a JSON file.

load

Loads a Dataset instance from a file.

get_subset(name: str)

Retrieves a subset by name.

Parameters:

name (str) – The name of the subset to retrieve.

Returns:

The subset associated with the given name.

Return type:

Subset

Raises:

ValueError – If the subset name does not exist in the dataset.

Examples

>>> from hinteval.cores import Subset
>>> from hinteval import Dataset
>>>
>>> dataset = Dataset('Hint_Dataset')
>>> subset = Subset(name='training_set')
>>> dataset.add_subset(subset)
>>> subset = dataset.get_subset('training_set')
>>> # subset = dataset['training_set']
>>> print(subset.name)
# training_set

See also

add_subset

Adds a subset to the dataset.

remove_subset

Removes a subset from the dataset.

get_subsets_name

Retrieves the names of all subsets in the dataset.

get_subsets

Retrieves all subsets in the dataset.

get_subsets()

Retrieves all subsets in the dataset.

Returns:

A list of all subsets in the dataset.

Return type:

list[Subset]

Examples

>>> from hinteval.cores import Subset
>>> from hinteval import Dataset
>>>
>>> dataset = Dataset('Hint_Dataset')
>>> training = Subset(name='training_set')
>>> validation = Subset(name='validation_set')
>>> dataset.add_subset(training)
>>> dataset.add_subset(validation)
>>> subsets = dataset.get_subsets()
>>> print([subset.name for subset in subsets])
# ['training_set', 'validation_set']

See also

add_subset

Adds a subset to the dataset.

remove_subset

Removes a subset from the dataset.

get_subset

Retrieves a subset by name.

get_subsets_name

Retrieves the names of all subsets in the dataset.

get_subsets_name()

Retrieves the names of all subsets in the dataset.

Returns:

A list of all subset names in the dataset.

Return type:

list[str]

Examples

>>> from hinteval.cores import Subset
>>> from hinteval import Dataset
>>>
>>> dataset = Dataset('Hint_Dataset')
>>> training = Subset(name='training_set')
>>> validation = Subset(name='validation_set')
>>> dataset.add_subset(training)
>>> dataset.add_subset(validation)
>>> subset_names = dataset.get_subsets_name()
>>> print(subset_names)
# ['training_set', 'validation_set']

See also

add_subset

Adds a subset to the dataset.

remove_subset

Removes a subset from the dataset.

get_subset

Retrieves a subset by name.

get_subsets

Retrieves all subsets in the dataset.

classmethod load(path)

Loads a Dataset instance from a file.

Parameters:

path (str) – The file path to load the Dataset instance.

Returns:

A new Dataset object initialized from the loaded file.

Return type:

Dataset

Raises:
  • FileNotFoundError – If the specified file path does not exist.

  • Exception – If the specified file is correpted.

Examples

>>> from hinteval import Dataset
>>>
>>> dataset_loaded = Dataset.load('./dataset.pickle')
>>> print(dataset_loaded.name)
# Hint_Dataset

See also

store_json

Stores the Dataset instance as a JSON file.

store

Stores the Dataset instance.

classmethod load_json(path)

Loads a Dataset instance from a JSON file.

Parameters:

path (str) – The file path to load the JSON representation of the Dataset instance.

Returns:

A new Dataset object initialized from the JSON file.

Return type:

Dataset

Raises:
  • FileNotFoundError – If the specified file path does not exist.

  • KeyError – If required keys are missing in the JSON file.

Examples

>>> from hinteval import Dataset
>>>
>>> dataset_loaded = Dataset.load_json('./dataset.json')
>>> print(dataset_loaded.name)
# Hint_Dataset

See also

store_json

Stores the Dataset instance as a JSON file.

store

Stores the Dataset instance.

prepare_dataset(fill_question_types=True, fill_entities=False, batch_size: int = 256, spacy_pipeline: Literal['en_core_web_sm', 'en_core_web_lg', 'en_core_web_md', 'en_core_web_trf'] = 'en_core_web_sm', qc_model_force_download=False, enable_tqdm=False)

Prepares the dataset by detecting question types for questions and entities for questions, hints, and answers.

Parameters:
  • fill_question_types (bool, optional) – Whether to detect question types (default is True).

  • fill_entities (bool, optional) – Whether to detect entities (default is False).

  • batch_size (int, optional) – The batch size for processing (default is 256).

  • spacy_pipeline ({'en_core_web_sm', 'en_core_web_lg', 'en_core_web_md', 'en_core_web_trf'}, optional) – The spaCy pipeline to use for entity recognition (default is ‘en_core_web_sm’).

  • qc_model_force_download (bool, optional) – Whether to force download the question classification model (default is False).

  • enable_tqdm (bool, optional) – Whether to enable tqdm progress bar (default is False).

Examples

>>> from hinteval.cores import Subset, Instance
>>> from hinteval import Dataset
>>>
>>> dataset = Dataset("Hint_Dataset")
>>> subset = Subset(name="training_set")
>>> dataset.add_subset(subset)
>>> instance = Instance.from_strings("What is the capital of Austria?",
...                 ["Vienna"],
...                 ["This city, once home to Mozart and Beethoven."])
>>> subset.add_instance(instance, "q_1")
>>> dataset.prepare_dataset(fill_question_types=True, fill_entities=True,
...                         batch_size=64,
...                         spacy_pipeline='en_core_web_sm',
...                         qc_model_force_download=False,
...                         enable_tqdm=False)
>>> print(instance.question)
# {
#     "question": "What is the capital of Austria?",
#     "question_type": {
#         "major": "LOC:LOCATION",
#         "minor": "other:Other location"
#     },
#     "entities": [
#         {
#             "entity": "Austria",
#             "ent_type": "GPE",
#             "start_index": 23,
#             "end_index": 30,
#             "metadata": {}
#         }
#     ],
#     "metrics": {},
#     "metadata": {}
# }

See also

utils.identify_functions.identify_entities

Function to detect entities in instances.

utils.identify_functions.identify_question_type

Function to detect question types for questions.

remove_subset(name: str)

Removes a subset from the dataset.

Parameters:

name (str) – The name of the subset to be removed.

Raises:

ValueError – If the subset name does not exist in the dataset.

Examples

>>> from hinteval.cores import Subset
>>> from hinteval import Dataset
>>>
>>> dataset = Dataset('Hint_Dataset')
>>> subset = Subset(name='training_set')
>>> dataset.add_subset(subset)
>>> print(dataset.get_subsets_name())
# ['training_set']
>>> dataset.remove_subset('training_set')
>>> # del dataset['training_set']
>>> print(dataset.get_subsets_name())
# []

See also

add_subset

Adds a subset to the dataset.

get_subset

Retrieves a subset by name.

store(path)

Stores the Dataset instance.

Parameters:

path (str) – The file path to store the Dataset instance.

Examples

>>> from hinteval.cores import Subset
>>> from hinteval import Dataset
>>>
>>> dataset = Dataset('Hint_Dataset')
>>> training = Subset(name='training_set')
>>> validation = Subset(name='validation_set')
>>> dataset.add_subset(training)
>>> dataset.add_subset(validation)
>>> dataset.store('./dataset.pickle')

See also

load_json

Loads a Dataset instance from a JSON file.

load

Loads a Dataset instance from a file.

store_json(path)

Stores the Dataset instance as a JSON file.

Parameters:

path (str) – The file path to store the JSON representation of the Dataset instance.

Examples

>>> from hinteval.cores import Subset
>>> from hinteval import Dataset
>>>
>>> dataset = Dataset('Hint_Dataset')
>>> training = Subset(name='training_set')
>>> validation = Subset(name='validation_set')
>>> dataset.add_subset(training)
>>> dataset.add_subset(validation)
>>> dataset.store_json('./dataset.json')

See also

load_json

Loads a Dataset instance from a JSON file.

load

Loads a Dataset instance from a file.

to_dict()

Converts the Dataset instance into a dictionary.

Returns:

A dictionary representation of the Dataset instance.

Return type:

dict[str, Any]

Examples

>>> from hinteval.cores import Subset
>>> from hinteval import Dataset
>>>
>>> subset1 = Subset(name='training_set')
>>> subset2 = Subset(name='validation_set')
>>> dataset = Dataset(name='example_dataset')
>>> dataset.add_subset(subset1)
>>> dataset.add_subset(subset2)
>>> dataset_dict = dataset.to_dict()
>>> print(dataset_dict)
# {
#     'name': 'example_dataset',
#     'version': None,
#     'description': None,
#     'url': None,
#     'metadata': {},
#     'subsets': {
#         'training_set': {'name': 'training_set', ...},
#         'validation_set': {'name': 'validation_set', ...}
#     }
# }

See also

from_dict

Creates a Dataset instance from a dictionary.

store_json

Stores the Dataset instance as a JSON file.

store

Stores the Dataset instance.

class hinteval.cores.dataset_core.Answer(answer: str, entities: List[Entity] = None, metrics: Dict[str, Metric] = None, metadata: Dict[str, str | int | float] = None)

A class to represent an answer with associated entities, metrics, and optional metadata.

answer

The text of the answer.

Type:

str

entities

A list of Entity instances associated with the answer.

Type:

list[Entity]

metrics

A dictionary of Metric instances associated with the answer, keyed by their names.

Type:

dict[str, Metric]

metadata

Optional additional metadata about the answer.

Type:

dict[str, Union[str,int, float]]

classmethod from_dict(data)

Creates an Answer instance from a dictionary.

Parameters:

data (dict[str, Any]) – A dictionary containing all necessary attributes to instantiate an Answer object.

Returns:

A new instance of Answer initialized from the provided dictionary.

Return type:

Answer

Examples

>>> from hinteval.cores import Answer
>>>
>>> data = {
...     'answer': 'Vienna',
...     'entities': [{'entity': 'Vienna', 'ent_type':
...     'LOCATION', 'start_index': 0, 'end_index': 6}],
...     'metrics': {'familiarity': {'name': 'familiarity', 'value': 1.0}},
...     'metadata': {'source': 'https://en.wikipedia.org/wiki/Austria'}
... }
>>> answer = Answer.from_dict(data)
>>> print(answer.answer)
# Vienna

See also

to_dict

Converts the Answer instance into a dictionary.

Raises:

KeyError – If required keys are missing in the dictionary.

to_dict()

Converts the Answer instance into a dictionary.

Returns:

A dictionary representation of the Answer instance including all its attributes.

Return type:

dict[str, Any]

Examples

>>> from hinteval.cores import Entity, Metric, Answer
>>>
>>> answer = Answer(
...     "Vienna",
...     [Entity("Vienna", "LOCATION", 0, 6)],
...     {"familiarity": Metric("familiarity", 1.0)},
...     {"source": "https://en.wikipedia.org/wiki/Austria"}
...)
>>> print(answer.to_dict())
# {
#     'answer': 'Vienna',
#     'entities': [
#         {'entity': 'Vienna', 'ent_type': 'LOCATION', 'start_index': 0, 'end_index': 6}
#     ],
#     'metrics': {'familiarity': {'name': 'familiarity', 'value': 1.0}},
#     'metadata': {'source': 'https://en.wikipedia.org/wiki/Austria'}
# }

See also

from_dict

Creates an Answer instance from a dictionary.

class hinteval.cores.dataset_core.Entity(entity: str, ent_type: str, start_index: int, end_index: int, metadata: Dict[str, str | int | float] = None)

A class to represent an entity with its type and position in the text, along with optional metadata.

entity

The textual representation of the entity.

Type:

str

ent_type

The type of the entity, e.g., ‘PERSON’, ‘LOCATION’.

Type:

str

start_index

The start index of the entity in the text.

Type:

int

end_index

The end index of the entity in the text, non-inclusive.

Type:

int

metadata

Optional additional metadata about the entity.

Type:

dict[str, Union[str,int, float]]

classmethod from_dict(data: Dict[str, Any])

Creates an Entity instance from a dictionary.

Parameters:

data (dict[str, Any]) – A dictionary containing keys ‘entity’, ‘ent_type’, ‘start_index’, ‘end_index’, and ‘metadata’.

Returns:

A new instance of Entity initialized from the provided dictionary.

Return type:

Entity

Examples

>>> from hinteval.cores import Entity
>>>
>>> data = {'entity': 'Lionel Messi', 'ent_type': 'PERSON', 'start_index': 0,
...         'end_index': 12, 'metadata': {'familiarity': 1.0}}
>>> entity = Entity.from_dict(data)
>>> print(entity.entity, entity.metadata['familiarity'])
# Lionel Messi 1.0

See also

to_dict

Converts the Entity instance into a dictionary.

Raises:

KeyError – If any of the required keys (‘entity’, ‘ent_type’, ‘start_index’, ‘end_index’) are missing in the dictionary.

to_dict()

Converts the Entity instance into a dictionary.

Returns:

A dictionary representing the Entity with keys ‘entity’, ‘ent_type’, ‘start_index’, ‘end_index’, and ‘metadata’.

Return type:

dict[str, Any]

Examples

>>> from hinteval.cores import Entity
>>>
>>> entity = Entity("Lionel Messi", "PERSON", 0, 12, {"familiarity": 1.0})
>>> print(entity.to_dict())
# {'entity': 'Lionel Messi', 'ent_type': 'PERSON', 'start_index': 0, 'end_index': 12, 'metadata': {'familiarity': 1.0}}

See also

from_dict

Creates an Entity instance from a dictionary.

class hinteval.cores.dataset_core.Hint(hint, source: str = None, entities: List[Entity] = None, metrics: Dict[str, Metric] = None, metadata: Dict[str, str | int | float] = None)

A class to represent a hint associated with questions and answers, including sources, entities, metrics, and optional metadata.

hint

The text of the hint.

Type:

str

source

The source from which the hint was derived.

Type:

str, optional

entities

A list of Entity instances related to the hint.

Type:

list[Entity]

metrics

A dictionary of Metric instances associated with the hint, keyed by their names.

Type:

dict[str, Metric]

metadata

Optional additional metadata about the hint.

Type:

dict[str, Union[str,int, float]]

classmethod from_dict(data)

Creates a Hint instance from a dictionary.

Parameters:

data (dict[str, Any]) – A dictionary containing all necessary attributes to instantiate a Hint object.

Returns:

A new instance of Hint initialized from the provided dictionary.

Return type:

Hint

Examples

>>> from hinteval.cores import Hint
>>>
>>> data = {
...     'hint': 'This city, once home to Mozart and Beethoven, is famous for its music and culture.',
...     'source': 'https://en.wikipedia.org/wiki/Vienna',
...     'entities': [
...        {'entity': 'Mozart', 'ent_type': 'PERSON', 'start_index': 29, 'end_index': 35, 'metadata': {'url': 'https://en.wikipedia.org/wiki/Wolfgang_Amadeus_Mozart'}},
...         {'entity': 'Beethoven', 'ent_type': 'PERSON', 'start_index': 40, 'end_index': 49, 'metadata': {'url': 'https://en.wikipedia.org/wiki/Ludwig_van_Beethoven'}}
...     ],
...     'metrics': {'relevance': {'name': 'relevance', 'value': 0.9}},
...     'metadata': {}
... }
>>> hint = Hint.from_dict(data)
>>> print(hint.hint)
# This city, once home to Mozart and Beethoven, is famous for its music and culture.

See also

to_dict

Converts the Hint instance into a dictionary.

Raises:

KeyError – If required keys are missing in the dictionary.

to_dict()

Converts the Hint instance into a dictionary.

Returns:

A dictionary representation of the Hint instance including all its attributes.

Return type:

dict[str, Any]

Examples

>>> from hinteval.cores import Entity, Metric, Hint
>>>
>>> hint = Hint(
...     "This city, once home to Mozart and Beethoven, is famous for its music and culture.",
...     "https://en.wikipedia.org/wiki/Vienna",
...     [Entity("Mozart", "PERSON", 29, 35, {"url": "https://en.wikipedia.org/wiki/Wolfgang_Amadeus_Mozart"}),
...     Entity("Beethoven", "PERSON", 40, 49, {"url": "https://en.wikipedia.org/wiki/Ludwig_van_Beethoven"})],
...     {"relevance": Metric("relevance", 0.9)}
...     )
>>> print(hint.to_dict())
# {
#     'hint': 'This city, once home to Mozart and Beethoven, is famous for its music and culture.',
#     'source': 'https://en.wikipedia.org/wiki/Vienna',
#     'entities': [
#         {'entity': 'Mozart', 'ent_type': 'PERSON', 'start_index': 29, 'end_index': 35, 'metadata': {'url': 'https://en.wikipedia.org/wiki/Wolfgang_Amadeus_Mozart'}},
#         {'entity': 'Beethoven', 'ent_type': 'PERSON', 'start_index': 40, 'end_index': 49, 'metadata': {'url': 'https://en.wikipedia.org/wiki/Ludwig_van_Beethoven'}}
#     ],
#     'metrics': {'relevance': {'name': 'relevance', 'value': 0.9, 'metadata': {}}},
#     'metadata': {}
# }

See also

from_dict

Creates a Hint instance from a dictionary.

class hinteval.cores.dataset_core.Instance(question: Question, answers: List[Answer], hints: List[Hint], metadata: Dict[str, str | int | float] = None)

A class to represent a question-and-answer instance along with associated hints and metadata.

question

The Question object representing the primary question.

Type:

Question

answers

A list of Answer objects providing possible answers to the question.

Type:

list[Answer]

hints

A list of Hint objects providing additional context or clues for the question.

Type:

list[Hint]

metadata

Optional additional metadata about the instance.

Type:

dict[str, Union[str,int, float]]

answers_from_strings(answers: List[str])

Updates the answers of the instance using a list of string representations.

Parameters:

answers (list[str]) – A list of strings where each string represents an answer.

Examples

>>> from hinteval.cores import Instance
>>>
>>> instance = Instance.from_strings(
...     "What is the capital of France?",
...     ['Lyon'],
...     ["This city is also known as the City of Lights."]
... )
>>> instance.answers_from_strings(["Paris"])
>>> print([answer.answer for answer in instance.answers])
# ['Paris']

See also

from_dict

Creates an Instance object from a dictionary.

from_strings

Creates an Instance object from strings representing the question, answers, and hints.

classmethod from_dict(data)

Creates an Instance object from a dictionary.

Parameters:

data (dict[str, Any]) – A dictionary containing all necessary attributes to instantiate an Instance object.

Returns:

A new Instance object initialized from the provided dictionary.

Return type:

Instance

Examples

>>> from hinteval.cores import Instance
>>>
>>> data = {
...     'question': {'question': 'What is the capital of France?'},
...     'answers': [{'answer': 'Paris'}],
...     'hints': [{'hint': 'This city is also known as the City of Lights.'}]
... }
>>> instance = Instance.from_dict(data)
>>> print(instance.question.question)
# What is the capital of France?

See also

to_dict

Converts the Instance object into a dictionary.

from_strings

Creates an Instance object from strings representing the question, answers, and hints.

Raises:

KeyError – If required keys are missing in the dictionary.

classmethod from_strings(question: str, answers: List[str], hints: List[str])

Creates an Instance object from strings representing the question, answers, and hints.

Parameters:
  • question (str) – The text of the primary question.

  • answers (list[str]) – A list of strings representing the answers.

  • hints (list[str]) – A list of strings representing the hints.

Returns:

A new Instance object populated with the converted question, answers, and hints.

Return type:

Instance

Examples

>>> from hinteval.cores import Instance
>>>
>>> instance = Instance.from_strings(
...     "What is the capital of France?",
...     ["Paris"],
...     ["This city is also known as the City of Lights."]
... )
>>> print(instance.question.question)
# What is the capital of France?

See also

from_dict

Creates an Instance object from a dictionary.

to_dict

Converts the Instance object into a dictionary.

hints_from_strings(hints: List[str])

Updates the hints of the instance using a list of string representations.

Parameters:

hints (list[str]) – A list of strings where each string represents a hint.

Examples

>>> from hinteval.cores import Instance
>>>
>>> instance = Instance.from_strings(
...     "What is the capital of France?",
...     ["Paris"],
...     []
... )
>>> instance.hints_from_strings(["This city is also known as the City of Lights.", "It is a major European city."])
>>> print([hint.hint for hint in instance.hints])
# ['This city is also known as the City of Lights.', 'It is a major European city.']

See also

from_dict

Creates an Instance object from a dictionary.

from_strings

Creates an Instance object from strings representing the question, answers, and hints.

question_from_string(question: str)

Updates the question of the instance using a string representation.

Parameters:

question (str) – The text of the question.

Examples

>>> from hinteval.cores import Instance
>>>
>>> instance = Instance.from_strings(
...     "What is the capital of Italy?",
...     ["Paris"],
...     ["This city is also known as the City of Lights."]
... )
>>> instance.question_from_string("What is the capital of France?")
>>> print(instance.question.question)
# What is the capital of France?

See also

from_dict

Creates an Instance object from a dictionary.

from_strings

Creates an Instance object from strings representing the question, answers, and hints.

to_dict()

Converts the Instance object into a dictionary.

Returns:

A dictionary representation of the Instance object including all its attributes.

Return type:

dict[str, Any]

Examples

>>> from hinteval.cores import Instance, Question, Answer, Hint
>>>
>>> question = Question("What is the capital of France?")
>>> answers = [Answer("Paris")]
>>> hints = [Hint("This city is also known as the City of Lights.")]
>>> instance = Instance(question, answers, hints)
>>> print(instance.to_dict())
# {
#     'question': {'question': 'What is the capital of France?', ...},
#     'answers': [{'answer': 'Paris', ...}],
#     'hints': [{'hint': 'This city is also known as the City of Lights.', ...}],
#     'metadata': {}
# }

See also

from_dict

Creates an Instance object from a dictionary.

from_strings

Creates an Instance object from strings representing the question, answers, and hints.

class hinteval.cores.dataset_core.Metric(name: str, value: str | int | float, metadata: Dict[str, str | int | float] = None)

A class used to represent a Metric, which includes a name, a value, and optional metadata.

name

The name of the metric.

Type:

str

value

The value associated with the metric.

Type:

Union[str,int, float]

metadata

The metadata associated with the metric.

Type:

dict[str, Union[str,int, float]]

classmethod from_dict(data: Dict[str, Any])

Creates a Metric instance from a dictionary.

Parameters:

data (dict[str, Any]) – A dictionary containing the keys ‘name’, ‘value’, and ‘metadata’.

Returns:

A new instance of Metric initialized with the data from the dictionary.

Return type:

Metric

Examples

>>> from hinteval.cores import Metric
>>>
>>> data = {'name': 'readability', 'value': 0.4, 'metadata': {'model': 'bert-base'}}
>>> metric = Metric.from_dict(data)
>>> print(metric.name, metric.value, metric.metadata)
# readability 0.4 {'model': 'bert-base'}

See also

to_dict

Converts the Metric instance into a dictionary.

Raises:

KeyError – If the ‘name’, ‘value’, or ‘metadata’ keys are missing in the dictionary.

to_dict()

Converts the Metric instance into a dictionary.

Returns:

A dictionary representation of the Metric instance with keys ‘name’, ‘value’, and ‘metadata’.

Return type:

dict[str, Any]

Examples

>>> from hinteval.cores import Metric
>>>
>>> metric = Metric("readability", 0.4, {"model": "bert-base"})
>>> print(metric.to_dict())
# {'name': 'readability', 'value': 0.4, 'metadata': {'model': 'bert-base'}}

See also

from_dict

Creates a Metric instance from a dictionary.

exception hinteval.cores.dataset_core.NotComparableException(operator, name_1, name_2)
class hinteval.cores.dataset_core.Question(question: str, question_type: Dict[str, str] = None, entities: List[Entity] = None, metrics: Dict[str, Metric] = None, metadata: Dict[str, str | int | float] = None)

A class to represent a structured question with associated types, entities, metrics, and optional metadata.

question

The text of the question.

Type:

str

question_type

A dictionary mapping question aspects to their types.

Type:

dict[str, str]

entities

A list of Entity instances associated with the question.

Type:

list[Entity]

metrics

A dictionary of Metric instances associated with the question, keyed by their names.

Type:

dict[str, Metric]

metadata

Optional additional metadata about the question.

Type:

dict[str, Union[str,int, float]]

classmethod from_dict(data)

Creates a Question instance from a dictionary.

Parameters:

data (dict[str, Any]) – A dictionary containing all necessary attributes to instantiate a Question object.

Returns:

A new instance of Question initialized from the provided dictionary.

Return type:

Question

Examples

>>> from hinteval.cores import Question
>>>
>>> data = {
...     'question': 'What is the capital of Austria?',
...     'question_type': {'type': 'LOC:LOCATION'},
...     'entities': [{'entity': 'Austria', 'ent_type': 'LOCATION',
...     'start_index': 23, 'end_index': 30, 'metadata': {'familiarity': 1.0}}],
...     'metrics': {'readability': {'name': 'readability', 'value': 0.8}},
...     'metadata': {'source': 'https://en.wikipedia.org/wiki/Austria'}
... }
>>> question = Question.from_dict(data)
>>> print(question.question)
# What is the capital of Austria?

See also

to_dict

Converts the Question instance into a dictionary.

Raises:

KeyError – If required keys are missing in the dictionary.

to_dict()

Converts the Question instance into a dictionary.

Returns:

A dictionary representation of the Question instance including all its attributes.

Return type:

dict[str, Any]

Examples

>>> from hinteval.cores import Entity, Metric, Question
>>>
>>> question = Question(
...     "What is the capital of Austria?",
...     {"major": "LOC:LOCATION"},
...     [Entity("Austria", "LOCATION", 23, 30, {"familiarity": 1.0})],
...     {"readability": Metric("readability", 0.8)},
...     {"source": "https://en.wikipedia.org/wiki/Austria"}
... )
>>> print(question.to_dict())
# {
#     'question': 'What is the capital of Austria?',
#     'question_type': {'type': 'LOC:LOCATION'},
#     'entities': [
#         {'entity': 'Austria', 'ent_type': 'LOCATION', 'start_index': 23, 'end_index': 30, 'metadata': {'familiarity': 1.0}}
#     ],
#     'metrics': {'readability': {'name': 'familiarity', 'value': 0.8, 'metadata': {}}},
#     'metadata': {'source': 'https://en.wikipedia.org/wiki/Austria'}
# }

See also

from_dict

Creates a Question instance from a dictionary.

class hinteval.cores.dataset_core.Subset(name: str = 'entire', metadata: Dict[str, str | int | float] = None)

A class to represent a subset of instances, typically used for managing and organizing a collection of instances with associated metadata.

name

The name of the subset.

Type:

str

metadata

Optional additional metadata about the subset.

Type:

dict[str, Union[str,int, float]]

add_instance(instance: Instance, q_id: str = None)

Adds an instance to the subset.

Parameters:
  • instance (Instance) – The instance to be added to the subset.

  • q_id (str, optional) – The unique identifier for the instance (default is None, which generates a new random unique ID).

Raises:

ValueError – If the provided q_id already exists in the subset.

Examples

>>> from hinteval.cores import Subset, Instance
>>>
>>> subset = Subset()
>>> instance = Instance.from_strings("What is the capital of France?", ["Paris"], [])
>>> subset.add_instance(instance, "q1")
>>> # subset["q1"] = instance
>>> print(subset["q1"].answers[0].answer)
# Paris

See also

get_instance

Retrieves an instance by its unique identifier.

remove_instance

Removes an instance from the subset.

classmethod from_dict(data)

Creates a Subset instance from a dictionary.

Parameters:

data (dict[str, Any]) – A dictionary containing all necessary attributes to instantiate a Subset object.

Returns:

A new Subset object initialized from the provided dictionary.

Return type:

Subset

Examples

>>> from hinteval.cores import Subset
>>>
>>> data = {
...     'name': 'training_set',
...     'metadata': {},
...     'instances': {
...         'q1': {
...             'question': {'question': 'What is the capital of France?'},
...             'answers': [{'answer': 'Paris'}],
...             'hints': [],
...             'metadata': {}
...         },
...         'q2': {
...             'question': {'question': 'What is the capital of Austria?'},
...             'answers': [{'answer': 'Vienna'}],
...             'hints': [],
...             'metadata': {}
...         }
...     }
... }
>>> subset = Subset.from_dict(data)
>>> print(subset.name)
# training_set
>>> for instance_id in subset.get_instance_ids():
...     print(instance_id, subset[instance_id].question.question)
# q1 What is the capital of France?
# q2 What is the capital of Austria?
Raises:

KeyError – If required keys are missing in the dictionary.

See also

to_dict

Converts the Subset instance into a dictionary.

get_instance(q_id: str)

Retrieves an instance by its unique identifier.

Parameters:

q_id (str) – The unique identifier of the instance to retrieve.

Returns:

The instance associated with the given q_id.

Return type:

Instance

Raises:

ValueError – If the q_id does not exist in the subset.

Examples

>>> from hinteval.cores import Subset, Instance
>>>
>>> subset = Subset()
>>> instance = Instance.from_strings("What is the capital of France?", ["Paris"], [])
>>> subset.add_instance(instance, "q1")
>>> instance = subset.get_instance("q1")
>>> # instance = subset["q1"]
>>> print(instance.question.question)
# What is the capital of France?

See also

add_instance

Adds an instance to the subset.

remove_instance

Removes an instance from the subset.

get_instances

Retrieves all instances in the subset.

get_instance_ids

Retrieves all instance identifiers in the subset.

get_instance_ids()

Retrieves all instance identifiers in the subset.

Returns:

A list of all instance identifiers in the subset.

Return type:

list[str]

Examples

>>> from hinteval.cores import Subset, Instance
>>>
>>> subset = Subset()
>>> instance_1 = Instance.from_strings("What is the capital of France?", ["Paris"], [])
>>> instance_2 = Instance.from_strings("What is the capital of Austria?", ["Vienna"], [])
>>> subset["q1"] = instance_1
>>> subset["q2"] = instance_2
>>> ids = subset.get_instance_ids()
>>> print(ids)
# ['q1', 'q2']

See also

get_instance

Retrieves an instance by its unique identifier.

get_instances

Retrieves all instances in the subset.

get_instances()

Retrieves all instances in the subset.

Returns:

A list of all instances in the subset.

Return type:

list[Instance]

Examples

>>> from hinteval.cores import Subset, Instance
>>>
>>> subset = Subset()
>>> instance_1 = Instance.from_strings("What is the capital of France?", ["Paris"], [])
>>> instance_2 = Instance.from_strings("What is the capital of Austria?", ["Vienna"], [])
>>> subset["q1"] = instance_1
>>> subset["q2"] = instance_2
>>> instances = subset.get_instances()
>>> print([instance.question.question for instance in instances])
# ['What is the capital of France?', 'What is the capital of Austria?']

See also

get_instance

Retrieves an instance by its unique identifier.

get_instance_ids

Retrieves all instance identifiers in the subset.

remove_instance(q_id: str)

Removes an instance from the subset.

Parameters:

q_id (str) – The unique identifier of the instance to be removed.

Raises:

ValueError – If the q_id does not exist in the subset.

Examples

>>> from hinteval.cores import Subset, Instance
>>>
>>> subset = Subset()
>>> instance = Instance.from_strings("What is the capital of France?", ["Paris"], [])
>>> subset.add_instance(instance, "q1")
>>> print(len(subset))
# 1
>>> subset.remove_instance("q1")
>>> # del subset["q1"]
>>> print(len(subset))
# 0

See also

get_instance

Retrieves an instance by its unique identifier.

add_instance

Adds an instance to the subset.

to_dict()

Converts the Subset instance into a dictionary.

Returns:

A dictionary representation of the Subset instance.

Return type:

dict[str, Any]

Examples

>>> from hinteval.cores import Subset, Instance
>>>
>>> instance1 = Instance.from_strings("What is the capital of France?", ["Paris"], [])
>>> instance2 = Instance.from_strings("What is the capital of Austria?", ["Vienna"], [])
>>> subset = Subset(name='training_set')
>>> subset.add_instance(instance1, "q1")
>>> subset.add_instance(instance2, "q2")
>>> subset_dict = subset.to_dict()
>>> print(subset_dict)
# {
#     'name': 'training_set',
#     'metadata': {},
#     'instances': {
#         'q1': {
#             'question': {'question': 'What is the capital of France?', ...},
#             'answers': [{'answer': 'Paris', ...}],
#             'hints': [],
#             'metadata': {}
#         },
#         'q2': {
#             'question': {'question': 'What is the capital of Austria?', ...},
#             'answers': [{'answer': 'Vienna', ...}],
#             'hints': [],
#             'metadata': {}
#         }
#     }
# }

See also

from_dict

Creates a Subset instance from a dictionary.