LSI: The New High In CyberSearch?
69
Latent Semantic Indexing – the new buzz word in ‘Search’ has been quietly and insidiously spreading its tentacles through the great big Google www world. Webmasters still can’t decide whether it counts for Page Rank or not. Here’s what could be good about it from the other side – for the millions who see that Google Search Bar as their entry into a world of information.
What exactly is LSI?
It is, pure and simple, yet another information retrieval system. However, it goes beyond SEO or Search Engine Optimisation in that it gives you more than just the millions of pages that contain the word you just typed into the Search Bar. Have you ever been totally frustrated when you’ve wanted information on a particular subject and you had to dig through hundreds of pages to find what you wanted? What LSI is moving towards is to try and make search more relevant. When you type in a keyword, the Google spiders look at all the web pages that contain that particular word and also analyse them for words that are semantically similar. So what you get as a result are pages that will be more relevant to the subject you are interested in rather than just the ones that contain your ‘search’ word. While you may not really notice too much of a difference just yet when you are searching for simple terms, you most definitely will when you looking for something more complex.
For webmasters, however, how this works as far as Page Rank goes is not yet clear. SEO and all the other complex factors that go into getting your web site up there on Page 1 are too important to discard and it might be quite a while before LSI kicks in and becomes the yardstick. Most of the ones I know and work with might look at a middle ground combining both – but they are not about to throw out all their SEO maneuverings just yet. They can’t afford to – not if they are playing the Page Rank game in real earnest.
A magic wand it isn’t
If you think LSI will get you into some kind of realm of artificial intelligence, banish the thought. It is and was designed to be a mathematical formula. However, the way it functions, one can be forgiven for thinking otherwise. As one expert puts it, it takes the whole search operation from a common accountant mentality to a new level of matrix algebra. It’s a powerful algorithm that seeks out similarity values, arranges the results and what you get is page indexing that goes beyond mere searching for a term – you have a stage of analyzing that comes before the search begins.
Adding another dimension
It’s like moving from 2-D to 3-D. It retrieves documents based on similar content – and those similarities are determined by the content on all the relevant pages. So what took you hours sometimes to plough through, with numerous permutations and combinations of search words or phrases will be done behind the scenes of that search bar and presented to you. What it does is to co-relate semantically similar words over thousands or maybe millions of related documents and then come up with a set of content words that are likely to be relevant.
When Google bought over Applied Semantics, it was a foregone conclusion that it would only be a matter of time before their software CIRCA would be put to use in the retrieval of information. This application extracts and organises information and almost mimics human thought. What it has done for cyber search is to go beyond keywords to keyword themes.
Tilde Tactics
How do you have access to this new dimension of searching? It’s easy. Look at the little-used key on your keyboard to the left of your ‘1’. That squiggly symbol on the top is called a ‘tilde’. That’s the magic key to get you there. All you need to do is to put that little symbol in front of your search word, like so: ~song. Do it both ways and see the difference. The first time, without – and what you get are pages that contain the word ‘song’. Then add the tilde before it. See the difference? Now, you might just have pages listed that don’t contain the word ‘song’. It could include documents that have the words ‘music’, ‘lyrics’, ‘MP3’, etc. (Look at the words in Bold and you’ll see the keywords that are being picked up.)
Bringing back the joy of writing
What does this mean for someone who is an online writer? HOPE. At the present moment, probably nothing more. However, the fact that more and more people will search in a more focussed way means that no matter even if your sites are way down as far as Page Rank goes, chances are that if they are doing an LSI search, you will get read. And what is most welcome is the fact that you don’t need to stuff all those keywords into the copy. As long as the relevant words and phrases occur naturally, those invisible spiders will find them and present them to the person who is looking for them. So Content might emerge out of the SEO clutches to remain king, making it easier to search and easier and more satisfying to write. Will the webmasters welcome it? That’s something we’ll have to wait to find out.
CommentsLoading...
Wow! This is good news to me! Thanks for bringing this information to our attention. I've tried some searches using the ~ and saw some of my writing come up on the 1st page. Awesome.
Very interesting and possibly very potent! Thank you very much!
I am surprised that google has not made more progress in coming up with ways to describe similarity searches versus absolute (keyword ) keyword only searches. There is some capability to do that with boolean seaches but that is beyond most users ability and even those like myself who can figure it out doesn't want to go through the effort.
It would seem that google could make it a lot easier by letting you pick from the initial results and tell google that out of the first 20 results they served up you , whatyou are looking for is similar to three of the results and totally exclude similarities to 2 or three of the other results.
With google's computing capabilities they should then be able to construct a refined boolean search to provide what you are looking for even though you have not been able to provide a concise explicit seach.
Shalini - this is a great hub. Theme based search has been a long time in coming and its impact for legit businesses is welcome. The implications are for site transparency and visibility based on valuable, original, on topic content. Keep us posted on any new information. Thanks. Steve
Another remarkably informative hub. Whenever i read your work, I learn something that is relevant to me. You also have a knack of writing about complicated subjects in a very understandable and accessible way. Very good work! Now I'm going to go play with my tilde~.
First SEO and now LSI. I have always learnt something new from your hubs. You have so much to offer and I am glad through hub pages I am receiving so much from you. Great hub.
Me also like countrywomen, thanks for sharing this. We are missing out a lot on Google search by just using the simple stuff but not the advanced ones. Thanks Shalini.
Hi Shalini ...
Albert Einstein, Thomas Edison, Isaac Newton, Michael Faraday, Steven Hawkins ... all these men had something in common with you. A "curious mind." :)
Wow, Shalini, had no idea this existed. You've sure opened my eyes, my friend. Great hub!
Your last paragraph, on hope, for the online writer is intriguing. One does not have to stuff all those keywords? Maybe. But one has to stick to one's guns in terms of what works. And so far LSI to means several categories of keywords, from main to peripheral, and to be used in a set-length piece with an eye for overall keyword density. But just because there are more keywords one could use does not always translate into natural phrasing. Good hub though.
Shalini Great interesting infomation. well put so that the layman can easily understand it. Well I guess I must try sear~ch~ing!
Thanks Shalini...
I just want to know how to Implement LSI or how to use LSI for my site, so my site more searchable...
Thanks
















sumosalesman 3 years ago
Fascinating piece... it's great to be brought up to speed on what may be the cutting edge of search in the months to come.