Miscellaneous

hinteval.utils.identify_functions.identify_entities(texts: List[Instance | Question | Answer | Hint], batch_size, spacy_pipeline: Literal['en_core_web_sm', 'en_core_web_lg', 'en_core_web_md', 'en_core_web_trf'], enable_tqdm)

Identifies entities in the given texts using spaCy [1].

Parameters:
  • texts (list of Union[Instance, Question, Answer, Hint]) – A list of objects (instances of Instance, Question, Answer, or Hint) to perform entity recognition on.

  • batch_size (int) – The batch size for processing texts.

  • spacy_pipeline ({'en_core_web_sm', 'en_core_web_lg', 'en_core_web_md', 'en_core_web_trf'}) – The spaCy pipeline to use for entity recognition.

  • enable_tqdm (bool) – Whether to enable tqdm progress bar.

Raises:

ValueError – If any item in texts is not an instance of the supported classes.

Examples

>>> from hinteval.cores import Instance
>>> from hinteval.utils.identify_functions import identify_entities
>>>
>>> instance = Instance.from_strings("What is the capital of Austria?", ["Vienna"], ["This city is famous for its music and culture."])
>>> identify_entities([instance], batch_size=64, spacy_pipeline='en_core_web_sm', enable_tqdm=True)
>>> print(instance.question.entities)
# [{entity='Austria', ent_type='GPE', start_index=23, end_index=30, metadata={}}]

References

hinteval.utils.identify_functions.identify_question_type(texts: List[Question], batch_size, force_download, enable_tqdm)

Identifies question types for the given questions using a pre-trained classifier [2].

Parameters:
  • texts (list of Question) – A list of Question instances to perform question type classification on.

  • batch_size (int) – The batch size for processing texts.

  • force_download (bool) – Whether to force download the question classification model files.

  • enable_tqdm (bool) – Whether to enable tqdm progress bar.

Raises:

ValueError – If any item in texts is not an instance of the Question class.

Examples

>>> from hinteval.cores import Question
>>> from hinteval.utils.identify_functions import identify_question_type
>>>
>>> questions = [Question("What is the capital of Austria?")]
>>> identify_question_type(questions, batch_size=64, force_download=True, enable_tqdm=True)
>>> print(questions[0].question_type)
# {'major': 'LOC:LOCATION', 'minor': 'city:City'}

References