![]() ![]() Generally, TF-IDF is used when we need a machine to identify topics of a huge set of documents. So, TF-IDF for these terms gets higher and… v oila! The machine knows what my article is about. But very few articles mention “TF-IDF”, “keywords”, “content” and other important subtopics I’m covering in my article. When multiplied by Inverse Document Frequency, Term Frequency gets lower for commonly used words and higher for unique topic-identifying terms.īack to our example, the verb “to be” is used in each and every article in English. Inverse Document Frequency = log (number of docs) / (docs containing keyword).Term Frequency = (count of the term) / (total word count in the document).Or, to put it simply (disclaimer: I’m purposefully oversimplifying here for the sake of conveying the basic idea), we’re taking: The formula for my calculations looks like this: Thus, we’re able to pay less attention to all the commonly used words and distinguish a very specific topic for a particular piece of content. This is where TF-IDF comes into play, letting us see how “TF-IDF” use frequency in this article compares to its average use frequency across other documents on the Web. Is there a way to adjust my calculations for the fact that some words appear more frequently in speech in general? This is what we call keyword density – a widely used content optimization metric of the past.īut relying on keyword density makes me think that the word “to be” (not “TF-IDF”) is the most prominent one in this article. No, thus we obviously ignore the size of the documents.Ĭould we compare the count of our keyword to the total number of words? Article A is more about TF-IDF than article B.Ĭould we simply count the number of times our keyword, TF-IDF, appears in each document?.Article A is about TF-IDF (as opposed to, say, link building).It’s about TF-IDF, right?īut when relevancy is evaluated (and, most importantly, compared for several articles) by a machine, we need a numeric representation to see that: ![]() TF-IDF: What Kind of Beast Is That?įor a human brain, it doesn’t take any math to tell what my article is about. This post will explore why you shouldn’t expect TF-IDF to substitute a comprehensive optimization strategy and what the true benefits are of using it for SEO.
0 Comments
Leave a Reply. |