Familiarity

class hinteval.cores.evaluation_metrics.familiarity.WordFrequency(method: Literal['include_stop_words', 'exclude_stop_words'] = 'include_stop_words', spacy_pipeline: Literal['en_core_web_sm', 'en_core_web_lg', 'en_core_web_md', 'en_core_web_trf'] = 'en_core_web_sm', checkpoint: bool = False, checkpoint_step: int = 1, force_download=False, enable_tqdm=False)

Class for evaluating familiarity of Question, Hint, or Answer based on word frequency analysis on Common Crawl.

checkpoint

Whether checkpointing is enabled.

Type:

bool

checkpoint_step

Step interval for checkpointing.

Type:

int

enable_tqdm

Whether the tqdm progress bar is enabled.

Type:

bool

See also

Wikipedia

Class for evaluating familiarity of Question, Hint, or Answer using number of views of corresponding wikipedia page.

evaluate(sentences: List[Question | Hint | Answer], **kwargs) List[float]

Evaluates the familiarity of the given Question, Hint, or Answer using word frequency analysis on Common Crawl.

Parameters:
  • sentences (List[Union[Question, Hint, Answer]]) – List of sentences to evaluate.

  • **kwargs – Additional keyword arguments.

Returns:

List of familiarity scores for each sentence.

Return type:

List[float]

Notes

This function stores the scores as Metric objects within the metrics attribute of the Question, Hint, or Answer, with names based on the method, such as “familiarity-freq-include_stop_words-sm”.

Examples

>>> from hinteval.cores import Question, Hint
>>> from hinteval.evaluation.familiarity import WordFrequency
>>>
>>> word_frequency = WordFrequency(method='include_stop_words')
>>> sentence_1 = Question('What is the capital of Austria?')
>>> sentence_2 = Hint('This city, once home to Mozart and Beethoven, is the capital of Austria.')
>>> sentences = [sentence_1, sentence_2]
>>> results = word_frequency.evaluate(sentences)
>>> print(results)
# [1.0, 1.0]
>>> metrics = [f'{metric_key}: {metric_value.value}' for sent in sentences for metric_key, metric_value in
...    sent.metrics.items()]
>>> print(metrics)
# ['familiarity-freq-include_stop_words-sm: 1.0', 'familiarity-freq-include_stop_words-sm: 1.0']

See also

Wikipedia

Class for evaluating familiarity of Question, Hint, or Answer using number of views of corresponding wikipedia page.

release_memory()

Releases the memory used by the class instance.

This method deletes the instance of the class and triggers garbage collection to free up memory.

Examples

>>> from hinteval.evaluation.familiarity import Wikipedia
>>>
>>> wikipedia = Wikipedia(spacy_pipeline='en_core_web_sm')
>>> wikipedia.release_memory()
class hinteval.cores.evaluation_metrics.familiarity.Wikipedia(spacy_pipeline: Literal['en_core_web_sm', 'en_core_web_lg', 'en_core_web_md', 'en_core_web_trf'] = 'en_core_web_sm', checkpoint: bool = False, checkpoint_step: int = 1, enable_tqdm=False)

Class for evaluating familiarity of Question, Hint, or Answer using the number of views of corresponding Wikipedia pages [33].

checkpoint

Whether checkpointing is enabled.

Type:

bool

checkpoint_step

Step interval for checkpointing.

Type:

int

enable_tqdm

Whether the tqdm progress bar is enabled.

Type:

bool

References

See also

WordFrequency

Class for evaluating familiarity of Question, Hint, or Answer based on word frequency analysis on Common Crawl.

evaluate(sentences: List[Question | Hint | Answer], **kwargs) List[float]

Evaluates the familiarity of the given Question, Hint, or Answer using the number of views of corresponding Wikipedia pages [35].

Parameters:
  • sentences (List[Union[Question, Hint, Answer]]) – List of sentences to evaluate.

  • **kwargs – Additional keyword arguments.

Returns:

List of familiarity scores for each sentence.

Return type:

List[float]

Notes

This function stores the scores as Metric objects within the metrics attribute of the Question, Hint, or Answer, with names based on the method, such as “familiarity-wikipedia-sm”.

This function also stores number of views for each entity as Entity objects within the entities attribute.

Examples

>>> from hinteval.cores import Question, Hint
>>> from hinteval.evaluation.familiarity import Wikipedia
>>>
>>> wikipedia = Wikipedia(spacy_pipeline='en_core_web_trf')
>>> sentence_1 = Question('What is the capital of Austria?')
>>> sentence_2 = Hint('This city, once home to Mozart and Beethoven, is the capital of Austria.')
>>> sentences = [sentence_1, sentence_2]
>>> results = wikipedia.evaluate(sentences)
>>> print(results)
# [1.0, 1.0]
>>> metrics = [f'{metric_key}: {metric_value.value}' for sent in sentences for metric_key, metric_value in
...    sent.metrics.items()]
>>> print(metrics)
# ['familiarity-wikipedia-trf: 1.0', 'familiarity-wikipedia-trf: 1.0']
>>> entities = [f'{entity.entity}: {entity.metadata["wiki_views_per_month"]}' for sent in sentences for entity in
...    sent.entities]
>>> print(entities)
# ['austria: 248144', 'mozart: 233219', 'beethoven: 224128', 'austria: 248144']

References

See also

WordFrequency

Class for evaluating familiarity of Question, Hint, or Answer based on word frequency analysis on Common Crawl.

release_memory()

Releases the memory used by the class instance.

This method deletes the instance of the class and triggers garbage collection to free up memory.

Examples

>>> from hinteval.evaluation.familiarity import Wikipedia
>>>
>>> wikipedia = Wikipedia(spacy_pipeline='en_core_web_sm')
>>> wikipedia.release_memory()