Thursday February 09, 2012 6:37:56 am (Pacific)

Webmaster Wednesdays :: Concept :: Understanding LSI

It’s been a while since I did a Webmaster Wednesdays post, and I thought it might be useful to help you understand what some have had figured out for about a year now – if you want better search engine results, you need to understand what Latent Semantic Indexing is, why it’s important, and how to apply it to get better search engine results.

So let’s start with definitions.

Regular keyword searches approach a document collection with a kind of accountant mentality: a document contains a given word or it doesn’t, with no middle ground. We create a result set by looking through each document in turn for certain keywords and phrases, tossing aside any documents that don’t contain them, and ordering the rest based on some ranking system. Each document stands alone in judgement before the search algorithm – there is no interdependence of any kind between documents, which are evaluated solely on their contents.

Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn’t understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent.

(Emphasis mine.) Source.

“So, Tinu,” You might be thinking. “What the hell does all that technical gibberish mean?”

It means that search engines have been learning how words related to each other, and taking that into account when it ranks your site. Not only that, it’s looking at the relationship between your document on the web and other documents, and drawing a conclusion from the way in which those terms relate.

The old school way a search engine would rank sites had more to do with the word itself. Either your site was about cars, or it wasn’t about cars. Either it was linked by other sites that had to do with cars or it wasn’t.

But like any other technology which hopes to remain useful, search methods had to mature, if for no other reason than search engine spam.

So now, your friendly neighborhood search engine spider and its supporting technology can also realize that car and automobile are referencing the same thing, and assign additional value to a correlation accordingly.

No one knows the exact make-up of the pattern the search engines use to rank sites, of course, but the search engine companies themselves. But from the results they give, and the fact that we know they use LSI, we can draw the conclusion that they look at a number of things more deeply, and probably take into account things that didn’t matter before.

We’re talking:

  • The text in the link (anchor text linking)
  • Words on the page that are related to the desired keyword’s theme
  • The link it came from
  • The link it is pointing to and whether it mentions some of the same information
  • How that page relates to some of the other pages in its index, not just for the number of times a phrase is repeated, but for the overall theme of the page.

So how does that work? I’ll tell you in the next tip. Promise.

Tags/Resources

Tags: :: :: :: :: .

Resources :

Tinu Abayomi-Paul is the CEO of Leveraged Promotion, a member of the Network Solutions Social Web Advisory Board, and Editor of Women Grow Business. Her website promotion company specializes in reputation management, and building traffic systems for business. You can find her on Google+ and Twitter.

Post comment as twitter logo facebook logo
Sort: Newest | Oldest