Semantic similarity

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Semantic similarity, is a concept whereby a set of documents or terms within term lists are assigned a metric based on the likeness of their meaning / semantic content.

According to some opinions the concept of semantic similarity is different from semantic relatedness because semantic relatedness includes concepts as antonymy and meronymy, while similarity doesn't [1]. However, much of the literature uses these terms interchangeably, along with terms like semantic distance. In essence, semantic similarity, semantic distance, and semantic relatedness all mean, "How much does term A have to do with term B?"

The answer to this question, as given by the many automatic measures of semantic similarity/relatedness, is usually a number, usually between -1 and 1, or between 0 and 1, where 1 signifies extremely high similarity/relatedness, and 0 signifies little-to-none.

An intuitive way of displaying terms according to their semantic similarity is by grouping together closer related terms and spacing more distantly related ones wider apart. This is common - if sometime subconscious - practice for mind maps and concept maps.

Concretely, this can be achieved for instance by defining a topological similarity, by using ontologies to define a distance between words (a naive metric for terms arranged as nodes in a directed acyclic graph like a hierarchy would be the minimal distance (in separating edges) between the two term nodes), or using statistical means to correlate words and textual contexts from a suitable text corpus (co-occurrence).

[edit] See also

[edit] References

  1. ^ Evgeniy Gabrilovich and Shaul Markovitch (2007). "Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis" (PDF). Retrieved on 2007-09-18.

[edit] External links

Views
Personal tools

Toolbox