Identifies entities in the given texts using spaCy [1].
Parameters:
texts (list of Union[Instance, Question, Answer, Hint]) – A list of objects (instances of Instance, Question, Answer, or Hint) to perform entity recognition on.
batch_size (int) – The batch size for processing texts.
spacy_pipeline ({'en_core_web_sm', 'en_core_web_lg', 'en_core_web_md', 'en_core_web_trf'}) – The spaCy pipeline to use for entity recognition.
enable_tqdm (bool) – Whether to enable tqdm progress bar.
Raises:
ValueError – If any item in texts is not an instance of the supported classes.
Examples
>>> fromhinteval.coresimportInstance>>> fromhinteval.utils.identify_functionsimportidentify_entities>>>>>> instance=Instance.from_strings("What is the capital of Austria?",["Vienna"],["This city is famous for its music and culture."])>>> identify_entities([instance],batch_size=64,spacy_pipeline='en_core_web_sm',enable_tqdm=True)>>> print(instance.question.entities)# [{entity='Austria', ent_type='GPE', start_index=23, end_index=30, metadata={}}]
Identifies question types for the given questions using a pre-trained classifier [2].
Parameters:
texts (list of Question) – A list of Question instances to perform question type classification on.
batch_size (int) – The batch size for processing texts.
force_download (bool) – Whether to force download the question classification model files.
enable_tqdm (bool) – Whether to enable tqdm progress bar.
Raises:
ValueError – If any item in texts is not an instance of the Question class.
Examples
>>> fromhinteval.coresimportQuestion>>> fromhinteval.utils.identify_functionsimportidentify_question_type>>>>>> questions=[Question("What is the capital of Austria?")]>>> identify_question_type(questions,batch_size=64,force_download=True,enable_tqdm=True)>>> print(questions[0].question_type)# {'major': 'LOC:LOCATION', 'minor': 'city:City'}