Dataset¶
- class hinteval.cores.dataset.dataset.Dataset(name: str = None, url: str = None, version: str = None, description: str = None, metadata: Dict[str, str | int | float] = None)¶
A class to represent a dataset, including its subsets and associated metadata.
- name¶
The name of the dataset.
- Type:
str
- url¶
The URL where the dataset can be accessed.
- Type:
str
- version¶
The version of the dataset.
- Type:
str
- description¶
A description of the dataset.
- Type:
str
- metadata¶
Additional metadata about the dataset.
- Type:
dict[str, Union[str,int, float]]
- add_subset(subset: Subset)¶
Adds a subset to the dataset.
- Parameters:
subset (Subset) – The subset to be added to the dataset.
- Raises:
ValueError – If the subset name already exists in the dataset.
Examples
>>> from hinteval.cores import Subset >>> from hinteval import Dataset >>> >>> dataset = Dataset('Hint_Dataset') >>> subset = Subset(name='training_set') >>> dataset.add_subset(subset) >>> print(dataset["training_set"].name) # training_setSee also
remove_subsetRemoves a subset from the dataset.
get_subsetRetrieves a subset by name.
- classmethod available_datasets(show_info=False, update=False) Dict¶
Retrieves a list of available datasets to download.
- Parameters:
update (bool, optional) – Whether to update the dataset list from the remote source (default is False).
show_info (bool, optional) – Whether to show the information about the available datasets (default is False).
- Returns:
A dictionary of available datasets.
- Return type:
dict
- Raises:
Exception – If failed to load datasets from the remote source.
Examples
>>> from hinteval import Dataset >>> >>> available_datasets = Dataset.available_datasets() >>> print(available_datasets['triviahg']['description']) # TriviaHG is an extensive dataset crafted specifically for hint generation in question answering.See also
download_and_load_datasetLoads a dataset from a local cache or downloads it if not available locally.
- classmethod download_and_load_dataset(name, force_download=False)¶
Loads a dataset from a local cache or downloads it if not available locally.
- Parameters:
name (str) – The name of the dataset to load.
force_download (bool, optional) – Whether to force download the dataset even if it already exists locally (default is False).
- Returns:
A new Dataset object initialized from the loaded data.
- Return type:
- Raises:
FileNotFoundError – If the dataset name does not exist in the available datasets.
Exception – If the dataset fails to download.
Examples
>>> from hinteval import Dataset >>> >>> dataset = Dataset.download_and_load_dataset('triviahg', force_download=True) >>> print(dataset.description) # TriviaHG is an extensive dataset crafted specifically for hint generation in question answering.See also
available_datasetsRetrieves a list of available datasets to download.
load_jsonLoads a Dataset instance from a JSON file.
loadLoads a Dataset instance from a file.
- get_subset(name: str)¶
Retrieves a subset by name.
- Parameters:
name (str) – The name of the subset to retrieve.
- Returns:
The subset associated with the given name.
- Return type:
- Raises:
ValueError – If the subset name does not exist in the dataset.
Examples
>>> from hinteval.cores import Subset >>> from hinteval import Dataset >>> >>> dataset = Dataset('Hint_Dataset') >>> subset = Subset(name='training_set') >>> dataset.add_subset(subset) >>> subset = dataset.get_subset('training_set') >>> # subset = dataset['training_set'] >>> print(subset.name) # training_setSee also
add_subsetAdds a subset to the dataset.
remove_subsetRemoves a subset from the dataset.
get_subsets_nameRetrieves the names of all subsets in the dataset.
get_subsetsRetrieves all subsets in the dataset.
- get_subsets()¶
Retrieves all subsets in the dataset.
- Returns:
A list of all subsets in the dataset.
- Return type:
list[Subset]
Examples
>>> from hinteval.cores import Subset >>> from hinteval import Dataset >>> >>> dataset = Dataset('Hint_Dataset') >>> training = Subset(name='training_set') >>> validation = Subset(name='validation_set') >>> dataset.add_subset(training) >>> dataset.add_subset(validation) >>> subsets = dataset.get_subsets() >>> print([subset.name for subset in subsets]) # ['training_set', 'validation_set']See also
add_subsetAdds a subset to the dataset.
remove_subsetRemoves a subset from the dataset.
get_subsetRetrieves a subset by name.
get_subsets_nameRetrieves the names of all subsets in the dataset.
- get_subsets_name()¶
Retrieves the names of all subsets in the dataset.
- Returns:
A list of all subset names in the dataset.
- Return type:
list[str]
Examples
>>> from hinteval.cores import Subset >>> from hinteval import Dataset >>> >>> dataset = Dataset('Hint_Dataset') >>> training = Subset(name='training_set') >>> validation = Subset(name='validation_set') >>> dataset.add_subset(training) >>> dataset.add_subset(validation) >>> subset_names = dataset.get_subsets_name() >>> print(subset_names) # ['training_set', 'validation_set']See also
add_subsetAdds a subset to the dataset.
remove_subsetRemoves a subset from the dataset.
get_subsetRetrieves a subset by name.
get_subsetsRetrieves all subsets in the dataset.
- classmethod load(path)¶
Loads a Dataset instance from a file.
- Parameters:
path (str) – The file path to load the Dataset instance.
- Returns:
A new Dataset object initialized from the loaded file.
- Return type:
- Raises:
FileNotFoundError – If the specified file path does not exist.
Exception – If the specified file is correpted.
Examples
>>> from hinteval import Dataset >>> >>> dataset_loaded = Dataset.load('./dataset.pickle') >>> print(dataset_loaded.name) # Hint_DatasetSee also
store_jsonStores the Dataset instance as a JSON file.
storeStores the Dataset instance.
- classmethod load_json(path)¶
Loads a Dataset instance from a JSON file.
- Parameters:
path (str) – The file path to load the JSON representation of the Dataset instance.
- Returns:
A new Dataset object initialized from the JSON file.
- Return type:
- Raises:
FileNotFoundError – If the specified file path does not exist.
KeyError – If required keys are missing in the JSON file.
Examples
>>> from hinteval import Dataset >>> >>> dataset_loaded = Dataset.load_json('./dataset.json') >>> print(dataset_loaded.name) # Hint_DatasetSee also
store_jsonStores the Dataset instance as a JSON file.
storeStores the Dataset instance.
- prepare_dataset(fill_question_types=True, fill_entities=False, batch_size: int = 256, spacy_pipeline: Literal['en_core_web_sm', 'en_core_web_lg', 'en_core_web_md', 'en_core_web_trf'] = 'en_core_web_sm', qc_model_force_download=False, enable_tqdm=False)¶
Prepares the dataset by detecting question types for questions and entities for questions, hints, and answers.
- Parameters:
fill_question_types (bool, optional) – Whether to detect question types (default is True).
fill_entities (bool, optional) – Whether to detect entities (default is False).
batch_size (int, optional) – The batch size for processing (default is 256).
spacy_pipeline ({'en_core_web_sm', 'en_core_web_lg', 'en_core_web_md', 'en_core_web_trf'}, optional) – The spaCy pipeline to use for entity recognition (default is ‘en_core_web_sm’).
qc_model_force_download (bool, optional) – Whether to force download the question classification model (default is False).
enable_tqdm (bool, optional) – Whether to enable tqdm progress bar (default is False).
Examples
>>> from hinteval.cores import Subset, Instance >>> from hinteval import Dataset >>> >>> dataset = Dataset("Hint_Dataset") >>> subset = Subset(name="training_set") >>> dataset.add_subset(subset) >>> instance = Instance.from_strings("What is the capital of Austria?", ... ["Vienna"], ... ["This city, once home to Mozart and Beethoven."]) >>> subset.add_instance(instance, "q_1") >>> dataset.prepare_dataset(fill_question_types=True, fill_entities=True, ... batch_size=64, ... spacy_pipeline='en_core_web_sm', ... qc_model_force_download=False, ... enable_tqdm=False) >>> print(instance.question) # { # "question": "What is the capital of Austria?", # "question_type": { # "major": "LOC:LOCATION", # "minor": "other:Other location" # }, # "entities": [ # { # "entity": "Austria", # "ent_type": "GPE", # "start_index": 23, # "end_index": 30, # "metadata": {} # } # ], # "metrics": {}, # "metadata": {} # }See also
utils.identify_functions.identify_entitiesFunction to detect entities in instances.
utils.identify_functions.identify_question_typeFunction to detect question types for questions.
- remove_subset(name: str)¶
Removes a subset from the dataset.
- Parameters:
name (str) – The name of the subset to be removed.
- Raises:
ValueError – If the subset name does not exist in the dataset.
Examples
>>> from hinteval.cores import Subset >>> from hinteval import Dataset >>> >>> dataset = Dataset('Hint_Dataset') >>> subset = Subset(name='training_set') >>> dataset.add_subset(subset) >>> print(dataset.get_subsets_name()) # ['training_set'] >>> dataset.remove_subset('training_set') >>> # del dataset['training_set'] >>> print(dataset.get_subsets_name()) # []See also
add_subsetAdds a subset to the dataset.
get_subsetRetrieves a subset by name.
- store(path)¶
Stores the Dataset instance.
- Parameters:
path (str) – The file path to store the Dataset instance.
Examples
>>> from hinteval.cores import Subset >>> from hinteval import Dataset >>> >>> dataset = Dataset('Hint_Dataset') >>> training = Subset(name='training_set') >>> validation = Subset(name='validation_set') >>> dataset.add_subset(training) >>> dataset.add_subset(validation) >>> dataset.store('./dataset.pickle')
- store_json(path)¶
Stores the Dataset instance as a JSON file.
- Parameters:
path (str) – The file path to store the JSON representation of the Dataset instance.
Examples
>>> from hinteval.cores import Subset >>> from hinteval import Dataset >>> >>> dataset = Dataset('Hint_Dataset') >>> training = Subset(name='training_set') >>> validation = Subset(name='validation_set') >>> dataset.add_subset(training) >>> dataset.add_subset(validation) >>> dataset.store_json('./dataset.json')
- to_dict()¶
Converts the Dataset instance into a dictionary.
- Returns:
A dictionary representation of the Dataset instance.
- Return type:
dict[str, Any]
Examples
>>> from hinteval.cores import Subset >>> from hinteval import Dataset >>> >>> subset1 = Subset(name='training_set') >>> subset2 = Subset(name='validation_set') >>> dataset = Dataset(name='example_dataset') >>> dataset.add_subset(subset1) >>> dataset.add_subset(subset2) >>> dataset_dict = dataset.to_dict() >>> print(dataset_dict) # { # 'name': 'example_dataset', # 'version': None, # 'description': None, # 'url': None, # 'metadata': {}, # 'subsets': { # 'training_set': {'name': 'training_set', ...}, # 'validation_set': {'name': 'validation_set', ...} # } # }See also
from_dictCreates a Dataset instance from a dictionary.
store_jsonStores the Dataset instance as a JSON file.
storeStores the Dataset instance.
- class hinteval.cores.dataset_core.Answer(answer: str, entities: List[Entity] = None, metrics: Dict[str, Metric] = None, metadata: Dict[str, str | int | float] = None)¶
A class to represent an answer with associated entities, metrics, and optional metadata.
- answer¶
The text of the answer.
- Type:
str
- metrics¶
A dictionary of
Metricinstances associated with the answer, keyed by their names.- Type:
dict[str, Metric]
- metadata¶
Optional additional metadata about the answer.
- Type:
dict[str, Union[str,int, float]]
- classmethod from_dict(data)¶
Creates an Answer instance from a dictionary.
- Parameters:
data (dict[str, Any]) – A dictionary containing all necessary attributes to instantiate an Answer object.
- Returns:
A new instance of Answer initialized from the provided dictionary.
- Return type:
Examples
>>> from hinteval.cores import Answer >>> >>> data = { ... 'answer': 'Vienna', ... 'entities': [{'entity': 'Vienna', 'ent_type': ... 'LOCATION', 'start_index': 0, 'end_index': 6}], ... 'metrics': {'familiarity': {'name': 'familiarity', 'value': 1.0}}, ... 'metadata': {'source': 'https://en.wikipedia.org/wiki/Austria'} ... } >>> answer = Answer.from_dict(data) >>> print(answer.answer) # ViennaSee also
to_dictConverts the Answer instance into a dictionary.
- Raises:
KeyError – If required keys are missing in the dictionary.
- to_dict()¶
Converts the Answer instance into a dictionary.
- Returns:
A dictionary representation of the Answer instance including all its attributes.
- Return type:
dict[str, Any]
Examples
>>> from hinteval.cores import Entity, Metric, Answer >>> >>> answer = Answer( ... "Vienna", ... [Entity("Vienna", "LOCATION", 0, 6)], ... {"familiarity": Metric("familiarity", 1.0)}, ... {"source": "https://en.wikipedia.org/wiki/Austria"} ...) >>> print(answer.to_dict()) # { # 'answer': 'Vienna', # 'entities': [ # {'entity': 'Vienna', 'ent_type': 'LOCATION', 'start_index': 0, 'end_index': 6} # ], # 'metrics': {'familiarity': {'name': 'familiarity', 'value': 1.0}}, # 'metadata': {'source': 'https://en.wikipedia.org/wiki/Austria'} # }See also
from_dictCreates an Answer instance from a dictionary.
- class hinteval.cores.dataset_core.Entity(entity: str, ent_type: str, start_index: int, end_index: int, metadata: Dict[str, str | int | float] = None)¶
A class to represent an entity with its type and position in the text, along with optional metadata.
- entity¶
The textual representation of the entity.
- Type:
str
- ent_type¶
The type of the entity, e.g., ‘PERSON’, ‘LOCATION’.
- Type:
str
- start_index¶
The start index of the entity in the text.
- Type:
int
- end_index¶
The end index of the entity in the text, non-inclusive.
- Type:
int
- metadata¶
Optional additional metadata about the entity.
- Type:
dict[str, Union[str,int, float]]
- classmethod from_dict(data: Dict[str, Any])¶
Creates an Entity instance from a dictionary.
- Parameters:
data (dict[str, Any]) – A dictionary containing keys ‘entity’, ‘ent_type’, ‘start_index’, ‘end_index’, and ‘metadata’.
- Returns:
A new instance of Entity initialized from the provided dictionary.
- Return type:
Examples
>>> from hinteval.cores import Entity >>> >>> data = {'entity': 'Lionel Messi', 'ent_type': 'PERSON', 'start_index': 0, ... 'end_index': 12, 'metadata': {'familiarity': 1.0}} >>> entity = Entity.from_dict(data) >>> print(entity.entity, entity.metadata['familiarity']) # Lionel Messi 1.0See also
to_dictConverts the Entity instance into a dictionary.
- Raises:
KeyError – If any of the required keys (‘entity’, ‘ent_type’, ‘start_index’, ‘end_index’) are missing in the dictionary.
- to_dict()¶
Converts the Entity instance into a dictionary.
- Returns:
A dictionary representing the Entity with keys ‘entity’, ‘ent_type’, ‘start_index’, ‘end_index’, and ‘metadata’.
- Return type:
dict[str, Any]
Examples
>>> from hinteval.cores import Entity >>> >>> entity = Entity("Lionel Messi", "PERSON", 0, 12, {"familiarity": 1.0}) >>> print(entity.to_dict()) # {'entity': 'Lionel Messi', 'ent_type': 'PERSON', 'start_index': 0, 'end_index': 12, 'metadata': {'familiarity': 1.0}}See also
from_dictCreates an Entity instance from a dictionary.
- class hinteval.cores.dataset_core.Hint(hint, source: str = None, entities: List[Entity] = None, metrics: Dict[str, Metric] = None, metadata: Dict[str, str | int | float] = None)¶
A class to represent a hint associated with questions and answers, including sources, entities, metrics, and optional metadata.
- hint¶
The text of the hint.
- Type:
str
- source¶
The source from which the hint was derived.
- Type:
str, optional
- metrics¶
A dictionary of
Metricinstances associated with the hint, keyed by their names.- Type:
dict[str, Metric]
- metadata¶
Optional additional metadata about the hint.
- Type:
dict[str, Union[str,int, float]]
- classmethod from_dict(data)¶
Creates a Hint instance from a dictionary.
- Parameters:
data (dict[str, Any]) – A dictionary containing all necessary attributes to instantiate a Hint object.
- Returns:
A new instance of Hint initialized from the provided dictionary.
- Return type:
Examples
>>> from hinteval.cores import Hint >>> >>> data = { ... 'hint': 'This city, once home to Mozart and Beethoven, is famous for its music and culture.', ... 'source': 'https://en.wikipedia.org/wiki/Vienna', ... 'entities': [ ... {'entity': 'Mozart', 'ent_type': 'PERSON', 'start_index': 29, 'end_index': 35, 'metadata': {'url': 'https://en.wikipedia.org/wiki/Wolfgang_Amadeus_Mozart'}}, ... {'entity': 'Beethoven', 'ent_type': 'PERSON', 'start_index': 40, 'end_index': 49, 'metadata': {'url': 'https://en.wikipedia.org/wiki/Ludwig_van_Beethoven'}} ... ], ... 'metrics': {'relevance': {'name': 'relevance', 'value': 0.9}}, ... 'metadata': {} ... } >>> hint = Hint.from_dict(data) >>> print(hint.hint) # This city, once home to Mozart and Beethoven, is famous for its music and culture.See also
to_dictConverts the Hint instance into a dictionary.
- Raises:
KeyError – If required keys are missing in the dictionary.
- to_dict()¶
Converts the Hint instance into a dictionary.
- Returns:
A dictionary representation of the Hint instance including all its attributes.
- Return type:
dict[str, Any]
Examples
>>> from hinteval.cores import Entity, Metric, Hint >>> >>> hint = Hint( ... "This city, once home to Mozart and Beethoven, is famous for its music and culture.", ... "https://en.wikipedia.org/wiki/Vienna", ... [Entity("Mozart", "PERSON", 29, 35, {"url": "https://en.wikipedia.org/wiki/Wolfgang_Amadeus_Mozart"}), ... Entity("Beethoven", "PERSON", 40, 49, {"url": "https://en.wikipedia.org/wiki/Ludwig_van_Beethoven"})], ... {"relevance": Metric("relevance", 0.9)} ... ) >>> print(hint.to_dict()) # { # 'hint': 'This city, once home to Mozart and Beethoven, is famous for its music and culture.', # 'source': 'https://en.wikipedia.org/wiki/Vienna', # 'entities': [ # {'entity': 'Mozart', 'ent_type': 'PERSON', 'start_index': 29, 'end_index': 35, 'metadata': {'url': 'https://en.wikipedia.org/wiki/Wolfgang_Amadeus_Mozart'}}, # {'entity': 'Beethoven', 'ent_type': 'PERSON', 'start_index': 40, 'end_index': 49, 'metadata': {'url': 'https://en.wikipedia.org/wiki/Ludwig_van_Beethoven'}} # ], # 'metrics': {'relevance': {'name': 'relevance', 'value': 0.9, 'metadata': {}}}, # 'metadata': {} # }See also
from_dictCreates a Hint instance from a dictionary.
- class hinteval.cores.dataset_core.Instance(question: Question, answers: List[Answer], hints: List[Hint], metadata: Dict[str, str | int | float] = None)¶
A class to represent a question-and-answer instance along with associated hints and metadata.
- hints¶
A list of
Hintobjects providing additional context or clues for the question.- Type:
list[Hint]
- metadata¶
Optional additional metadata about the instance.
- Type:
dict[str, Union[str,int, float]]
- answers_from_strings(answers: List[str])¶
Updates the answers of the instance using a list of string representations.
- Parameters:
answers (list[str]) – A list of strings where each string represents an answer.
Examples
>>> from hinteval.cores import Instance >>> >>> instance = Instance.from_strings( ... "What is the capital of France?", ... ['Lyon'], ... ["This city is also known as the City of Lights."] ... ) >>> instance.answers_from_strings(["Paris"]) >>> print([answer.answer for answer in instance.answers]) # ['Paris']See also
from_dictCreates an Instance object from a dictionary.
from_stringsCreates an Instance object from strings representing the question, answers, and hints.
- classmethod from_dict(data)¶
Creates an Instance object from a dictionary.
- Parameters:
data (dict[str, Any]) – A dictionary containing all necessary attributes to instantiate an Instance object.
- Returns:
A new Instance object initialized from the provided dictionary.
- Return type:
Examples
>>> from hinteval.cores import Instance >>> >>> data = { ... 'question': {'question': 'What is the capital of France?'}, ... 'answers': [{'answer': 'Paris'}], ... 'hints': [{'hint': 'This city is also known as the City of Lights.'}] ... } >>> instance = Instance.from_dict(data) >>> print(instance.question.question) # What is the capital of France?See also
to_dictConverts the Instance object into a dictionary.
from_stringsCreates an Instance object from strings representing the question, answers, and hints.
- Raises:
KeyError – If required keys are missing in the dictionary.
- classmethod from_strings(question: str, answers: List[str], hints: List[str])¶
Creates an Instance object from strings representing the question, answers, and hints.
- Parameters:
question (str) – The text of the primary question.
answers (list[str]) – A list of strings representing the answers.
hints (list[str]) – A list of strings representing the hints.
- Returns:
A new Instance object populated with the converted question, answers, and hints.
- Return type:
Examples
>>> from hinteval.cores import Instance >>> >>> instance = Instance.from_strings( ... "What is the capital of France?", ... ["Paris"], ... ["This city is also known as the City of Lights."] ... ) >>> print(instance.question.question) # What is the capital of France?
- hints_from_strings(hints: List[str])¶
Updates the hints of the instance using a list of string representations.
- Parameters:
hints (list[str]) – A list of strings where each string represents a hint.
Examples
>>> from hinteval.cores import Instance >>> >>> instance = Instance.from_strings( ... "What is the capital of France?", ... ["Paris"], ... [] ... ) >>> instance.hints_from_strings(["This city is also known as the City of Lights.", "It is a major European city."]) >>> print([hint.hint for hint in instance.hints]) # ['This city is also known as the City of Lights.', 'It is a major European city.']See also
from_dictCreates an Instance object from a dictionary.
from_stringsCreates an Instance object from strings representing the question, answers, and hints.
- question_from_string(question: str)¶
Updates the question of the instance using a string representation.
- Parameters:
question (str) – The text of the question.
Examples
>>> from hinteval.cores import Instance >>> >>> instance = Instance.from_strings( ... "What is the capital of Italy?", ... ["Paris"], ... ["This city is also known as the City of Lights."] ... ) >>> instance.question_from_string("What is the capital of France?") >>> print(instance.question.question) # What is the capital of France?See also
from_dictCreates an Instance object from a dictionary.
from_stringsCreates an Instance object from strings representing the question, answers, and hints.
- to_dict()¶
Converts the Instance object into a dictionary.
- Returns:
A dictionary representation of the Instance object including all its attributes.
- Return type:
dict[str, Any]
Examples
>>> from hinteval.cores import Instance, Question, Answer, Hint >>> >>> question = Question("What is the capital of France?") >>> answers = [Answer("Paris")] >>> hints = [Hint("This city is also known as the City of Lights.")] >>> instance = Instance(question, answers, hints) >>> print(instance.to_dict()) # { # 'question': {'question': 'What is the capital of France?', ...}, # 'answers': [{'answer': 'Paris', ...}], # 'hints': [{'hint': 'This city is also known as the City of Lights.', ...}], # 'metadata': {} # }See also
from_dictCreates an Instance object from a dictionary.
from_stringsCreates an Instance object from strings representing the question, answers, and hints.
- class hinteval.cores.dataset_core.Metric(name: str, value: str | int | float, metadata: Dict[str, str | int | float] = None)¶
A class used to represent a Metric, which includes a name, a value, and optional metadata.
- name¶
The name of the metric.
- Type:
str
- value¶
The value associated with the metric.
- Type:
Union[str,int, float]
- metadata¶
The metadata associated with the metric.
- Type:
dict[str, Union[str,int, float]]
- classmethod from_dict(data: Dict[str, Any])¶
Creates a Metric instance from a dictionary.
- Parameters:
data (dict[str, Any]) – A dictionary containing the keys ‘name’, ‘value’, and ‘metadata’.
- Returns:
A new instance of Metric initialized with the data from the dictionary.
- Return type:
Examples
>>> from hinteval.cores import Metric >>> >>> data = {'name': 'readability', 'value': 0.4, 'metadata': {'model': 'bert-base'}} >>> metric = Metric.from_dict(data) >>> print(metric.name, metric.value, metric.metadata) # readability 0.4 {'model': 'bert-base'}See also
to_dictConverts the Metric instance into a dictionary.
- Raises:
KeyError – If the ‘name’, ‘value’, or ‘metadata’ keys are missing in the dictionary.
- to_dict()¶
Converts the Metric instance into a dictionary.
- Returns:
A dictionary representation of the Metric instance with keys ‘name’, ‘value’, and ‘metadata’.
- Return type:
dict[str, Any]
Examples
>>> from hinteval.cores import Metric >>> >>> metric = Metric("readability", 0.4, {"model": "bert-base"}) >>> print(metric.to_dict()) # {'name': 'readability', 'value': 0.4, 'metadata': {'model': 'bert-base'}}See also
from_dictCreates a Metric instance from a dictionary.
- exception hinteval.cores.dataset_core.NotComparableException(operator, name_1, name_2)¶
- class hinteval.cores.dataset_core.Question(question: str, question_type: Dict[str, str] = None, entities: List[Entity] = None, metrics: Dict[str, Metric] = None, metadata: Dict[str, str | int | float] = None)¶
A class to represent a structured question with associated types, entities, metrics, and optional metadata.
- question¶
The text of the question.
- Type:
str
- question_type¶
A dictionary mapping question aspects to their types.
- Type:
dict[str, str]
- metrics¶
A dictionary of
Metricinstances associated with the question, keyed by their names.- Type:
dict[str, Metric]
- metadata¶
Optional additional metadata about the question.
- Type:
dict[str, Union[str,int, float]]
- classmethod from_dict(data)¶
Creates a Question instance from a dictionary.
- Parameters:
data (dict[str, Any]) – A dictionary containing all necessary attributes to instantiate a Question object.
- Returns:
A new instance of Question initialized from the provided dictionary.
- Return type:
Examples
>>> from hinteval.cores import Question >>> >>> data = { ... 'question': 'What is the capital of Austria?', ... 'question_type': {'type': 'LOC:LOCATION'}, ... 'entities': [{'entity': 'Austria', 'ent_type': 'LOCATION', ... 'start_index': 23, 'end_index': 30, 'metadata': {'familiarity': 1.0}}], ... 'metrics': {'readability': {'name': 'readability', 'value': 0.8}}, ... 'metadata': {'source': 'https://en.wikipedia.org/wiki/Austria'} ... } >>> question = Question.from_dict(data) >>> print(question.question) # What is the capital of Austria?See also
to_dictConverts the Question instance into a dictionary.
- Raises:
KeyError – If required keys are missing in the dictionary.
- to_dict()¶
Converts the Question instance into a dictionary.
- Returns:
A dictionary representation of the Question instance including all its attributes.
- Return type:
dict[str, Any]
Examples
>>> from hinteval.cores import Entity, Metric, Question >>> >>> question = Question( ... "What is the capital of Austria?", ... {"major": "LOC:LOCATION"}, ... [Entity("Austria", "LOCATION", 23, 30, {"familiarity": 1.0})], ... {"readability": Metric("readability", 0.8)}, ... {"source": "https://en.wikipedia.org/wiki/Austria"} ... ) >>> print(question.to_dict()) # { # 'question': 'What is the capital of Austria?', # 'question_type': {'type': 'LOC:LOCATION'}, # 'entities': [ # {'entity': 'Austria', 'ent_type': 'LOCATION', 'start_index': 23, 'end_index': 30, 'metadata': {'familiarity': 1.0}} # ], # 'metrics': {'readability': {'name': 'familiarity', 'value': 0.8, 'metadata': {}}}, # 'metadata': {'source': 'https://en.wikipedia.org/wiki/Austria'} # }See also
from_dictCreates a Question instance from a dictionary.
- class hinteval.cores.dataset_core.Subset(name: str = 'entire', metadata: Dict[str, str | int | float] = None)¶
A class to represent a subset of instances, typically used for managing and organizing a collection of instances with associated metadata.
- name¶
The name of the subset.
- Type:
str
- metadata¶
Optional additional metadata about the subset.
- Type:
dict[str, Union[str,int, float]]
- add_instance(instance: Instance, q_id: str = None)¶
Adds an instance to the subset.
- Parameters:
instance (Instance) – The instance to be added to the subset.
q_id (str, optional) – The unique identifier for the instance (default is None, which generates a new random unique ID).
- Raises:
ValueError – If the provided q_id already exists in the subset.
Examples
>>> from hinteval.cores import Subset, Instance >>> >>> subset = Subset() >>> instance = Instance.from_strings("What is the capital of France?", ["Paris"], []) >>> subset.add_instance(instance, "q1") >>> # subset["q1"] = instance >>> print(subset["q1"].answers[0].answer) # ParisSee also
get_instanceRetrieves an instance by its unique identifier.
remove_instanceRemoves an instance from the subset.
- classmethod from_dict(data)¶
Creates a Subset instance from a dictionary.
- Parameters:
data (dict[str, Any]) – A dictionary containing all necessary attributes to instantiate a Subset object.
- Returns:
A new Subset object initialized from the provided dictionary.
- Return type:
Examples
>>> from hinteval.cores import Subset >>> >>> data = { ... 'name': 'training_set', ... 'metadata': {}, ... 'instances': { ... 'q1': { ... 'question': {'question': 'What is the capital of France?'}, ... 'answers': [{'answer': 'Paris'}], ... 'hints': [], ... 'metadata': {} ... }, ... 'q2': { ... 'question': {'question': 'What is the capital of Austria?'}, ... 'answers': [{'answer': 'Vienna'}], ... 'hints': [], ... 'metadata': {} ... } ... } ... } >>> subset = Subset.from_dict(data) >>> print(subset.name) # training_set >>> for instance_id in subset.get_instance_ids(): ... print(instance_id, subset[instance_id].question.question) # q1 What is the capital of France? # q2 What is the capital of Austria?- Raises:
KeyError – If required keys are missing in the dictionary.
See also
to_dictConverts the Subset instance into a dictionary.
- get_instance(q_id: str)¶
Retrieves an instance by its unique identifier.
- Parameters:
q_id (str) – The unique identifier of the instance to retrieve.
- Returns:
The instance associated with the given q_id.
- Return type:
- Raises:
ValueError – If the q_id does not exist in the subset.
Examples
>>> from hinteval.cores import Subset, Instance >>> >>> subset = Subset() >>> instance = Instance.from_strings("What is the capital of France?", ["Paris"], []) >>> subset.add_instance(instance, "q1") >>> instance = subset.get_instance("q1") >>> # instance = subset["q1"] >>> print(instance.question.question) # What is the capital of France?See also
add_instanceAdds an instance to the subset.
remove_instanceRemoves an instance from the subset.
get_instancesRetrieves all instances in the subset.
get_instance_idsRetrieves all instance identifiers in the subset.
- get_instance_ids()¶
Retrieves all instance identifiers in the subset.
- Returns:
A list of all instance identifiers in the subset.
- Return type:
list[str]
Examples
>>> from hinteval.cores import Subset, Instance >>> >>> subset = Subset() >>> instance_1 = Instance.from_strings("What is the capital of France?", ["Paris"], []) >>> instance_2 = Instance.from_strings("What is the capital of Austria?", ["Vienna"], []) >>> subset["q1"] = instance_1 >>> subset["q2"] = instance_2 >>> ids = subset.get_instance_ids() >>> print(ids) # ['q1', 'q2']See also
get_instanceRetrieves an instance by its unique identifier.
get_instancesRetrieves all instances in the subset.
- get_instances()¶
Retrieves all instances in the subset.
- Returns:
A list of all instances in the subset.
- Return type:
list[Instance]
Examples
>>> from hinteval.cores import Subset, Instance >>> >>> subset = Subset() >>> instance_1 = Instance.from_strings("What is the capital of France?", ["Paris"], []) >>> instance_2 = Instance.from_strings("What is the capital of Austria?", ["Vienna"], []) >>> subset["q1"] = instance_1 >>> subset["q2"] = instance_2 >>> instances = subset.get_instances() >>> print([instance.question.question for instance in instances]) # ['What is the capital of France?', 'What is the capital of Austria?']See also
get_instanceRetrieves an instance by its unique identifier.
get_instance_idsRetrieves all instance identifiers in the subset.
- remove_instance(q_id: str)¶
Removes an instance from the subset.
- Parameters:
q_id (str) – The unique identifier of the instance to be removed.
- Raises:
ValueError – If the q_id does not exist in the subset.
Examples
>>> from hinteval.cores import Subset, Instance >>> >>> subset = Subset() >>> instance = Instance.from_strings("What is the capital of France?", ["Paris"], []) >>> subset.add_instance(instance, "q1") >>> print(len(subset)) # 1 >>> subset.remove_instance("q1") >>> # del subset["q1"] >>> print(len(subset)) # 0See also
get_instanceRetrieves an instance by its unique identifier.
add_instanceAdds an instance to the subset.
- to_dict()¶
Converts the Subset instance into a dictionary.
- Returns:
A dictionary representation of the Subset instance.
- Return type:
dict[str, Any]
Examples
>>> from hinteval.cores import Subset, Instance >>> >>> instance1 = Instance.from_strings("What is the capital of France?", ["Paris"], []) >>> instance2 = Instance.from_strings("What is the capital of Austria?", ["Vienna"], []) >>> subset = Subset(name='training_set') >>> subset.add_instance(instance1, "q1") >>> subset.add_instance(instance2, "q2") >>> subset_dict = subset.to_dict() >>> print(subset_dict) # { # 'name': 'training_set', # 'metadata': {}, # 'instances': { # 'q1': { # 'question': {'question': 'What is the capital of France?', ...}, # 'answers': [{'answer': 'Paris', ...}], # 'hints': [], # 'metadata': {} # }, # 'q2': { # 'question': {'question': 'What is the capital of Austria?', ...}, # 'answers': [{'answer': 'Vienna', ...}], # 'hints': [], # 'metadata': {} # } # } # }See also
from_dictCreates a Subset instance from a dictionary.