Can spiders index your website? On-site search engine optimization.
The basis of your confidence that the site will appear in the search engine index base is a "successful" program code. After all, if the robot cannot index your pages, then the search engine cannot include it in its search database.
Unfortunately, many websites use technologies or architectures that make them hostile to search engine crawlers. The search engine robot is really just an automated web browser that has to interpret your page's HTML code just like a regular browser.
But search robots are amazingly slow-witted. Most advanced search engines are believed by many to have achieved development close to version 2.0 of the web browser. This means that the spider cannot understand many web technologies and cannot read some pages. This is especially harmful if these particular parts include some or all of the links on your page. If the spider can't read your links, it can't crawl through all the pages of the project.
As a search engine marketing consultant, I was often asked to rate new sites shortly after they were launched. Search engine optimization is often neglected during the development process. At this time, designers are focused on navigation, usability and branding. As a result, many sites start up with built-in problems. And correcting these problems is much more difficult than not fixing them at the design stage.
And only when the site does not appear in the search engine listings, many companies turn to SEO.
They are ashamed to admit this, because search engines are perhaps the most important source of traffic for small businesses. Almost 85% of Internet users search for websites through search engines. The value of a website that is not search engine friendly drops significantly.
In this article, I'll give an overview of some of the key things that can prevent a search engine crawler from indexing your brainchild. This list is by no means exhaustive, but it can highlight the most common things that will keep spiders from indexing your site.
For example, imagine you have the following script that redirects the user to a specific page on your site:
This script uses the goToPage() function to add a referral code to the end of the URL before sending visitors to the page.
Spiders will either ignore the contents of the SCRIPT-tag, or read the contents of the script as if it were visible text.
DHTML Menus DHTML
If you have a dynamic site that uses technologies such as ASP, PHP, Cold Fusion, or JSP, there is a good chance that your URLs include a query string like this:
This can be a problem since many search engine spiders do not index such links that include query strings. This is true even if the page the link points to contains nothing but standard HTML. The URL, by itself, is a barrier to the spider.
Why? Most search engines make a conscious decision not to index query string links because they need an extra record to interpret them. The spiders keep a list of all indexed pages and try to avoid re-indexing a page on a unique visit to the site. They do this by comparing all new URLs against a list of those they have already seen.
Now, let's say the spider sees a URL like this on your site:
This URL leads to the same page as our first URL, even though the URLs are not identical (Note that the name/value pairs in the string requests are in different order).
To determine that this URL leads to the same page, the spider must split the query string and store each name/value pair. Then, whenever it sees a URL with the same parent page, it will need to compare its name/value pairs with all previous query strings in the file.
Keep in mind that our example query is quite small, the query string could be much larger. I've seen query strings that were 200 characters long and refer to a dozen different name/value pairs.
So, indexing pages by query strings means a lot of unjustified work for the robot.
Some robots, such as Googlebot, will work with URLs that have a limited number of name/value pairs in the request address. Other spiders will ignore all URLs containing query strings.
Flash is great, much better than HTML. This is a dynamic and sharp advantage. Unfortunately, spiders use advantage-chasing technology. Remember: Roughly speaking, a search engine spider is equivalent to version 2.0 of a web browser. Spiders are simply unable to interpret the latest technologies such as Flash.
So, even though Flash animations may shock your visitors, they are invisible to search engines. If you're using Flash to spruce up your site a bit, but most of your pages are written in standard HTML, this won't be a problem. But if you've built your entire site using Flash, you'll have a hard time getting it indexed.
Didn't I mention that search engine spiders use weak technology? That's right, they're so low tech that they don't support frames either. If you use frames, the search engine will be able to crawl through your front page containing FRAME tags. But it won't be able to find the individual FRAME tags that might make up the rest of your site.
In this case, you can at least work on the problem by including NOFRAMES on the first page of your site. This section of your page will be invisible to anyone using a browser that supports frames. On the other hand, this does not prevent you from placing content in the NOFRAMES section that search engines can index into their database.
If you include a NOFRAMES section, take care to include real content there. At a minimum, you should place standard hypertext links (A HREFs) pointing to your individual Frame pages.
Surprisingly, quite often people include a NOFRAMES section that appears to say "This site uses Frame technology. Please improve your browser." If you'd like to experiment, do a Google search for "requires frames." You'll find about 160,000 pages, all of which include the text "this site requires frames." (This site uses Frame technology) Each of these sites has limited search engine visibility.
With www or without www?
My website address is www.keyrelevance.com, but can people access it if they drop "WWW." in the address bar? For most server configurations, the answer is yes, but some say no. Make sure your site works with both www and non-www.
This paper looks at some of the more common reasons that can cause a site to not be indexed. Other factors, such as how you create your web page hierarchy, will also affect how many of your site's pages get indexed by a search engine.
Each of these problems has a solution, and in future articles I will touch on each to help you get more pages indexed.
If you are currently redesigning your site, I want to encourage you to take these notes into account before you breathe life into the site. While each of these search barriers can be removed, it's better to start with search engine friendly development than to fix hundreds of pages after a project is launched.