Tuesday, December 2, 2008
A little while back I designed a simple social networking website for a client making use of UrlRewriter.net for rewriting their dynamic url's into readable static url's. After a couple of months I realised that Google was not indexing these pages and so I started doing a bit of digging.

What I found was that Google made updates to their crawler's browser information, identifying the crawler as a Mozilla type browser. Previously the Google crawler identified itself as a generic browser within ASP.Net.
Since the change, Google crawlers are now matched to the mozilla.browser definition within asp.net due to the word Mozilla appearing in the Google crawlers browser information. The problem with using the mozilla.browser definition for a crawler is that it does not support standard features required for crawlers to index all web pages, resulting in our pages not being correctly indexed.

To correct this we simply create a new .browser definition that closely matches the browser type Google is using, so that asp.net will associate this browser definition with the Google Bot's browser info.

To do this, create a sub-directory called App_Browser in your web application's root folder. Create a new file in this folder called genericmozilla5.browser and copy the xml below.

<browsers>
  <browser id="GenericMozilla5" parentID="Mozilla">
    <identification>
      <userAgent match="Mozilla/5\.(?'minor'\d+).*[C|c]ompatible; ?(?'browser'.+); ?\+?(http://.+)\)" />
    </identification>
    <capabilities>
      <capability name="majorversion" value="5" />
      <capability name="minorversion" value="${minor}" />
      <capability name="browser" value="${browser}" />
      <capability name="Version" value="5.${minor}" />
      <capability name="activexcontrols" value="true" />
      <capability name="backgroundsounds" value="true" />
      <capability name="cookies" value="true" />
      <capability name="css1" value="true" />
      <capability name="css2" value="true" />
      <capability name="ecmascriptversion" value="1.2" />
      <capability name="frames" value="true" />
      <capability name="javaapplets" value="true" />
      <capability name="javascript" value="true" />
      <capability name="jscriptversion" value="5.0" />
      <capability name="supportsCallback" value="true" />
      <capability name="supportsFileUpload" value="true" />
      <capability name="supportsMultilineTextBoxDisplay" value="true" />
      <capability name="supportsMaintainScrollPositionOnPostback" value="true" />
      <capability name="supportsVCard" value="true" />
      <capability name="supportsXmlHttp" value="true" />
      <capability name="tables" value="true" />
      <capability name="vbscript" value="true" />
      <capability name="w3cdomversion" value="1.0" />
      <capability name="xml" value="true" />
      <capability name="tagwriter" value="System.Web.UI.HtmlTextWriter" />
    </capabilities>
  </browser>
</browsers>



Note: You may need to wait up to a few months to see the changes take effect depending how well your website is indexed.

posted on Tuesday, December 2, 2008 12:31:49 PM (South Africa Standard Time, UTC+02:00)  #    Trackback
Related posts:
ASP.NET HTTP 301 Redirect
ASP.NET Session State, Cookies and Subdomains
HttpContext within a Business Layer
Resolving Url's outside of a Web Control