Blog Disclaimer

The posts published in this blog are collected from different blogs or websites written by various famous bloggers/writers. I have just collected these posts only. These posts are not written by me. All collected posts are the great stuffs.

Blog Disclaimer

All content provided/collected on this blog is for informational purposes only, it is not used for any commercial purpose. At the end of any post, the visitor can find the link of the original source.

Blog Disclaimer

At the end of any post, the visitor can find the link of the original source. These posts are only for further reference to review/study latter. It’s a request to all visitors; please go through the original post by clicking on the source given below/above of every post.

August 31, 2011

Google's Real Time Search Is Now Offline

Google's real time search launched in 2009 and added some significant features in 2010, goes Offline as Google’s deal with Twitter expires.

Google sent this explanation:
Since October of 2009, we have had an agreement with Twitter to include their updates in our search results through a special feed, and that agreement expired on July 2.
While we will not have access to this special feed from Twitter, information on Twitter that’s publicly available to our crawlers will still be searchable and discoverable on Google

Twitter sent this:
Since October 2009, Twitter has provided Google with the stream of public tweets for incorporation into their real-time search product and other uses. That agreement has now expired. We continue to provide this type of access to Microsoft, Yahoo!, NTT Docomo, Yahoo! Japan and dozens of other smaller developers. And, we work with Google in many other ways.

For its part, Google said:
 Twitter has been a valuable partner for nearly two years, and we remain open to exploring other collaborations in the future.

Google has temporarily disabled google.com/realtime. They are exploring how to incorporate Google+ into this functionality, so stay tuned.

Some other features removed my Google are - Wonder Wheel, Image Search by Dimension
As part of the initial stage of Google redesign that they'll be rolling out over the next few months, Google have removed the Wonder Wheel search tool.

Google said they made the change for three reasons:

•    Focus :Whether you’re searching, emailing or looking for a map, the only thing you should be concerned about is getting what you want. Our job is to provide the tools and features that will get you there quickly and easily. With the design changes in the coming weeks and months, we’re bringing forward the stuff that matters to you and getting all the other clutter out of your way. Even simple changes, like using bolder colors for actionable buttons or hiding navigation buttons until they’re actually needed, can help you better focus on only what you need at the moment.
•    Elasticity: In the early days, there was pretty much just one way to use Google: on a desktop computer with an average-sized monitor. Over a decade later, all it takes is a look around one’s home or office at the various mobile devices, tablets, high-resolution monitors and TVs to see a plethora of ways to access the web. The new design will soon allow you to seamlessly transition from one device to another and have a consistent visual experience. We aim to bring you this flexibility without sacrificing style or usefulness.
•    Effortlessness: Our design philosophy is to combine power with simplicity. We want to keep our look simple and clean, but behind the seemingly simple design, use new technologies like HTML5, WebGL and the latest, fastest browsers to make sure you have all the power of the web behind you.

Source : http://googleblog.blogspot.com/

10 Status Code Definitions

10 Status Code Definitions

Each Status-Code is described below, including a description of which method(s) it can follow and any metainformation required in the response.

10.1 Informational 1xx

This class of status code indicates a provisional response, consisting only of the Status-Line and optional headers, and is terminated by an empty line. There are no required headers for this class of status code. Since HTTP/1.0 did not define any 1xx status codes, servers MUST NOT send a 1xx response to an HTTP/1.0 client except under experimental conditions.
A client MUST be prepared to accept one or more 1xx status responses prior to a regular response, even if the client does not expect a 100 (Continue) status message. Unexpected 1xx status responses MAY be ignored by a user agent.
Proxies MUST forward 1xx responses, unless the connection between the proxy and its client has been closed, or unless the proxy itself requested the generation of the 1xx response. (For example, if a
proxy adds a "Expect: 100-continue" field when it forwards a request, then it need not forward the corresponding 100 (Continue) response(s).)

10.1.1 100 Continue

The client SHOULD continue with its request. This interim response is used to inform the client that the initial part of the request has been received and has not yet been rejected by the server. The client SHOULD continue by sending the remainder of the request or, if the request has already been completed, ignore this response. The server MUST send a final response after the request has been completed. See section 8.2.3 for detailed discussion of the use and handling of this status code.

10.1.2 101 Switching Protocols

The server understands and is willing to comply with the client's request, via the Upgrade message header field (section 14.42), for a change in the application protocol being used on this connection. The server will switch protocols to those defined by the response's Upgrade header field immediately after the empty line which terminates the 101 response.
The protocol SHOULD be switched only when it is advantageous to do so. For example, switching to a newer version of HTTP is advantageous over older versions, and switching to a real-time, synchronous protocol might be advantageous when delivering resources that use such features.

10.2 Successful 2xx

This class of status code indicates that the client's request was successfully received, understood, and accepted.

10.2.1 200 OK

The request has succeeded. The information returned with the response is dependent on the method used in the request, for example:
GET an entity corresponding to the requested resource is sent in the response;
HEAD the entity-header fields corresponding to the requested resource are sent in the response without any message-body;
POST an entity describing or containing the result of the action;
TRACE an entity containing the request message as received by the end server.

10.2.2 201 Created

The request has been fulfilled and resulted in a new resource being created. The newly created resource can be referenced by the URI(s) returned in the entity of the response, with the most specific URI for the resource given by a Location header field. The response SHOULD include an entity containing a list of resource characteristics and location(s) from which the user or user agent can choose the one most appropriate. The entity format is specified by the media type given in the Content-Type header field. The origin server MUST create the resource before returning the 201 status code. If the action cannot be carried out immediately, the server SHOULD respond with 202 (Accepted) response instead.
A 201 response MAY contain an ETag response header field indicating the current value of the entity tag for the requested variant just created, see section 14.19.

10.2.3 202 Accepted

The request has been accepted for processing, but the processing has not been completed. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place. There is no facility for re-sending a status code from an asynchronous operation such as this.
The 202 response is intentionally non-committal. Its purpose is to allow a server to accept a request for some other process (perhaps a batch-oriented process that is only run once per day) without requiring that the user agent's connection to the server persist until the process is completed. The entity returned with this response SHOULD include an indication of the request's current status and either a pointer to a status monitor or some estimate of when the user can expect the request to be fulfilled.

10.2.4 203 Non-Authoritative Information

The returned metainformation in the entity-header is not the definitive set as available from the origin server, but is gathered from a local or a third-party copy. The set presented MAY be a subset or superset of the original version. For example, including local annotation information about the resource might result in a superset of the metainformation known by the origin server. Use of this response code is not required and is only appropriate when the response would otherwise be 200 (OK).

10.2.5 204 No Content

The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation. The response MAY include new or updated metainformation in the form of entity-headers, which if present SHOULD be associated with the requested variant.
If the client is a user agent, it SHOULD NOT change its document view from that which caused the request to be sent. This response is primarily intended to allow input for actions to take place without causing a change to the user agent's active document view, although any new or updated metainformation SHOULD be applied to the document currently in the user agent's active view.
The 204 response MUST NOT include a message-body, and thus is always terminated by the first empty line after the header fields.

10.2.6 205 Reset Content

The server has fulfilled the request and the user agent SHOULD reset the document view which caused the request to be sent. This response is primarily intended to allow input for actions to take place via user input, followed by a clearing of the form in which the input is given so that the user can easily initiate another input action. The response MUST NOT include an entity.

10.2.7 206 Partial Content

The server has fulfilled the partial GET request for the resource. The request MUST have included a Range header field (section 14.35) indicating the desired range, and MAY have included an If-Range header field (section 14.27) to make the request conditional.
The response MUST include the following header fields:
      - Either a Content-Range header field (section 14.16) indicating
        the range included with this response, or a multipart/byteranges
        Content-Type including Content-Range fields for each part. If a
        Content-Length header field is present in the response, its
        value MUST match the actual number of OCTETs transmitted in the
        message-body.
      - Date
      - ETag and/or Content-Location, if the header would have been sent
        in a 200 response to the same request
      - Expires, Cache-Control, and/or Vary, if the field-value might
        differ from that sent in any previous response for the same
        variant
If the 206 response is the result of an If-Range request that used a strong cache validator (see section 13.3.3), the response SHOULD NOT include other entity-headers. If the response is the result of an If-Range request that used a weak validator, the response MUST NOT include other entity-headers; this prevents inconsistencies between cached entity-bodies and updated headers. Otherwise, the response MUST include all of the entity-headers that would have been returned with a 200 (OK) response to the same request.
A cache MUST NOT combine a 206 response with other previously cached content if the ETag or Last-Modified headers do not match exactly, see 13.5.4.
A cache that does not support the Range and Content-Range headers MUST NOT cache 206 (Partial) responses.

10.3 Redirection 3xx

This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD. A client SHOULD detect infinite redirection loops, since such loops generate network traffic for each redirection.
      Note: previous versions of this specification recommended a
      maximum of five redirections. Content developers should be aware
      that there might be clients that implement such a fixed
      limitation.

10.3.1 300 Multiple Choices

The requested resource corresponds to any one of a set of representations, each with its own specific location, and agent- driven negotiation information (section 12) is being provided so that the user (or user agent) can select a preferred representation and redirect its request to that location.
Unless it was a HEAD request, the response SHOULD include an entity containing a list of resource characteristics and location(s) from which the user or user agent can choose the one most appropriate. The entity format is specified by the media type given in the Content- Type header field. Depending upon the format and the capabilities of
the user agent, selection of the most appropriate choice MAY be performed automatically. However, this specification does not define any standard for such automatic selection.
If the server has a preferred choice of representation, it SHOULD include the specific URI for that representation in the Location field; user agents MAY use the Location field value for automatic redirection. This response is cacheable unless indicated otherwise.

10.3.2 301 Moved Permanently

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible. This response is cacheable unless indicated otherwise.
The new permanent URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).
If the 301 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.
      Note: When automatically redirecting a POST request after
      receiving a 301 status code, some existing HTTP/1.0 user agents
      will erroneously change it into a GET request.

10.3.3 302 Found

The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field.
The temporary URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).
If the 302 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.
      Note: RFC 1945 and RFC 2068 specify that the client is not allowed
      to change the method on the redirected request.  However, most
      existing user agent implementations treat 302 as if it were a 303
      response, performing a GET on the Location field-value regardless
      of the original request method. The status codes 303 and 307 have
      been added for servers that wish to make unambiguously clear which
      kind of reaction is expected of the client.

10.3.4 303 See Other

The response to the request can be found under a different URI and SHOULD be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource. The new URI is not a substitute reference for the originally requested resource. The 303 response MUST NOT be cached, but the response to the second (redirected) request might be cacheable.
The different URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).
      Note: Many pre-HTTP/1.1 user agents do not understand the 303
      status. When interoperability with such clients is a concern, the
      302 status code may be used instead, since most user agents react
      to a 302 response as described here for 303.

10.3.5 304 Not Modified

If the client has performed a conditional GET request and access is allowed, but the document has not been modified, the server SHOULD respond with this status code. The 304 response MUST NOT contain a message-body, and thus is always terminated by the first empty line after the header fields.
The response MUST include the following header fields:
      - Date, unless its omission is required by section 14.18.1
If a clockless origin server obeys these rules, and proxies and clients add their own Date to any response received without one (as already specified by [RFC 2068], section 14.19), caches will operate correctly.
      - ETag and/or Content-Location, if the header would have been sent
        in a 200 response to the same request
      - Expires, Cache-Control, and/or Vary, if the field-value might
        differ from that sent in any previous response for the same
        variant
If the conditional GET used a strong cache validator (see section 13.3.3), the response SHOULD NOT include other entity-headers. Otherwise (i.e., the conditional GET used a weak validator), the response MUST NOT include other entity-headers; this prevents inconsistencies between cached entity-bodies and updated headers.
If a 304 response indicates an entity not currently cached, then the cache MUST disregard the response and repeat the request without the conditional.
If a cache uses a received 304 response to update a cache entry, the cache MUST update the entry to reflect any new field values given in the response.

10.3.6 305 Use Proxy

The requested resource MUST be accessed through the proxy given by the Location field. The Location field gives the URI of the proxy. The recipient is expected to repeat this single request via the proxy. 305 responses MUST only be generated by origin servers.
      Note: RFC 2068 was not clear that 305 was intended to redirect a
      single request, and to be generated by origin servers only.  Not
      observing these limitations has significant security consequences.

10.3.7 306 (Unused)

The 306 status code was used in a previous version of the specification, is no longer used, and the code is reserved.

10.3.8 307 Temporary Redirect

The requested resource resides temporarily under a different URI. Since the redirection MAY be altered on occasion, the client SHOULD continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field.
The temporary URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s) , since many pre-HTTP/1.1 user agents do not understand the 307 status. Therefore, the note SHOULD contain the information necessary for a user to repeat the original request on the new URI.
If the 307 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.

10.4 Client Error 4xx

The 4xx class of status code is intended for cases in which the client seems to have erred. Except when responding to a HEAD request, the server SHOULD include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition. These status codes are applicable to any request method. User agents SHOULD display any included entity to the user.
If the client is sending data, a server implementation using TCP SHOULD be careful to ensure that the client acknowledges receipt of the packet(s) containing the response, before the server closes the input connection. If the client continues sending data to the server after the close, the server's TCP stack will send a reset packet to the client, which may erase the client's unacknowledged input buffers before they can be read and interpreted by the HTTP application.

10.4.1 400 Bad Request

The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications.

10.4.2 401 Unauthorized

The request requires user authentication. The response MUST include a WWW-Authenticate header field (section 14.47) containing a challenge applicable to the requested resource. The client MAY repeat the request with a suitable Authorization header field (section 14.8). If the request already included Authorization credentials, then the 401 response indicates that authorization has been refused for those credentials. If the 401 response contains the same challenge as the prior response, and the user agent has already attempted authentication at least once, then the user SHOULD be presented the entity that was given in the response, since that entity might include relevant diagnostic information. HTTP access authentication is explained in "HTTP Authentication: Basic and Digest Access Authentication" [43].

10.4.3 402 Payment Required

This code is reserved for future use.

10.4.4 403 Forbidden

The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated. If the request method was not HEAD and the server wishes to make public why the request has not been fulfilled, it SHOULD describe the reason for the refusal in the entity. If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.

10.4.5 404 Not Found

The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.

10.4.6 405 Method Not Allowed

The method specified in the Request-Line is not allowed for the resource identified by the Request-URI. The response MUST include an Allow header containing a list of valid methods for the requested resource.

10.4.7 406 Not Acceptable

The resource identified by the request is only capable of generating response entities which have content characteristics not acceptable according to the accept headers sent in the request.
Unless it was a HEAD request, the response SHOULD include an entity containing a list of available entity characteristics and location(s) from which the user or user agent can choose the one most appropriate. The entity format is specified by the media type given in the Content-Type header field. Depending upon the format and the capabilities of the user agent, selection of the most appropriate choice MAY be performed automatically. However, this specification does not define any standard for such automatic selection.
      Note: HTTP/1.1 servers are allowed to return responses which are
      not acceptable according to the accept headers sent in the
      request. In some cases, this may even be preferable to sending a
      406 response. User agents are encouraged to inspect the headers of
      an incoming response to determine if it is acceptable.
If the response could be unacceptable, a user agent SHOULD temporarily stop receipt of more data and query the user for a decision on further actions.

10.4.8 407 Proxy Authentication Required

This code is similar to 401 (Unauthorized), but indicates that the client must first authenticate itself with the proxy. The proxy MUST return a Proxy-Authenticate header field (section 14.33) containing a challenge applicable to the proxy for the requested resource. The client MAY repeat the request with a suitable Proxy-Authorization header field (section 14.34). HTTP access authentication is explained in "HTTP Authentication: Basic and Digest Access Authentication" [43].

10.4.9 408 Request Timeout

The client did not produce a request within the time that the server was prepared to wait. The client MAY repeat the request without modifications at any later time.

10.4.10 409 Conflict

The request could not be completed due to a conflict with the current state of the resource. This code is only allowed in situations where it is expected that the user might be able to resolve the conflict and resubmit the request. The response body SHOULD include enough
information for the user to recognize the source of the conflict. Ideally, the response entity would include enough information for the user or user agent to fix the problem; however, that might not be possible and is not required.
Conflicts are most likely to occur in response to a PUT request. For example, if versioning were being used and the entity being PUT included changes to a resource which conflict with those made by an earlier (third-party) request, the server might use the 409 response to indicate that it can't complete the request. In this case, the response entity would likely contain a list of the differences between the two versions in a format defined by the response Content-Type.

10.4.11 410 Gone

The requested resource is no longer available at the server and no forwarding address is known. This condition is expected to be considered permanent. Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval. If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead. This response is cacheable unless indicated otherwise.
The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server's site. It is not necessary to mark all permanently unavailable resources as "gone" or to keep the mark for any length of time -- that is left to the discretion of the server owner.

10.4.12 411 Length Required

The server refuses to accept the request without a defined Content- Length. The client MAY repeat the request if it adds a valid Content-Length header field containing the length of the message-body in the request message.

10.4.13 412 Precondition Failed

The precondition given in one or more of the request-header fields evaluated to false when it was tested on the server. This response code allows the client to place preconditions on the current resource metainformation (header field data) and thus prevent the requested method from being applied to a resource other than the one intended.

10.4.14 413 Request Entity Too Large

The server is refusing to process a request because the request entity is larger than the server is willing or able to process. The server MAY close the connection to prevent the client from continuing the request.
If the condition is temporary, the server SHOULD include a Retry- After header field to indicate that it is temporary and after what time the client MAY try again.

10.4.15 414 Request-URI Too Long

The server is refusing to service the request because the Request-URI is longer than the server is willing to interpret. This rare condition is only likely to occur when a client has improperly converted a POST request to a GET request with long query information, when the client has descended into a URI "black hole" of redirection (e.g., a redirected URI prefix that points to a suffix of itself), or when the server is under attack by a client attempting to exploit security holes present in some servers using fixed-length buffers for reading or manipulating the Request-URI.

10.4.16 415 Unsupported Media Type

The server is refusing to service the request because the entity of the request is in a format not supported by the requested resource for the requested method.

10.4.17 416 Requested Range Not Satisfiable

A server SHOULD return a response with this status code if a request included a Range request-header field (section 14.35), and none of the range-specifier values in this field overlap the current extent of the selected resource, and the request did not include an If-Range request-header field. (For byte-ranges, this means that the first- byte-pos of all of the byte-range-spec values were greater than the current length of the selected resource.)
When this status code is returned for a byte-range request, the response SHOULD include a Content-Range entity-header field specifying the current length of the selected resource (see section 14.16). This response MUST NOT use the multipart/byteranges content- type.

10.4.18 417 Expectation Failed

The expectation given in an Expect request-header field (see section 14.20) could not be met by this server, or, if the server is a proxy, the server has unambiguous evidence that the request could not be met by the next-hop server.

10.5 Server Error 5xx

Response status codes beginning with the digit "5" indicate cases in which the server is aware that it has erred or is incapable of performing the request. Except when responding to a HEAD request, the server SHOULD include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition. User agents SHOULD display any included entity to the user. These response codes are applicable to any request method.

10.5.1 500 Internal Server Error

The server encountered an unexpected condition which prevented it from fulfilling the request.

10.5.2 501 Not Implemented

The server does not support the functionality required to fulfill the request. This is the appropriate response when the server does not recognize the request method and is not capable of supporting it for any resource.

10.5.3 502 Bad Gateway

The server, while acting as a gateway or proxy, received an invalid response from the upstream server it accessed in attempting to fulfill the request.

10.5.4 503 Service Unavailable

The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.
      Note: The existence of the 503 status code does not imply that a
      server must use it when becoming overloaded. Some servers may wish
      to simply refuse the connection.

10.5.5 504 Gateway Timeout

The server, while acting as a gateway or proxy, did not receive a timely response from the upstream server specified by the URI (e.g. HTTP, FTP, LDAP) or some other auxiliary server (e.g. DNS) it needed to access in attempting to complete the request.
      Note: Note to implementors: some deployed proxies are known to
      return 400 or 500 when DNS lookups time out.

10.5.6 505 HTTP Version Not Supported

The server does not support, or refuses to support, the HTTP protocol version that was used in the request message. The server is indicating that it is unable or unwilling to complete the request using the same major version as the client, as described in section 3.1, other than with this error message. The response SHOULD contain an entity describing why that version is not supported and what other protocols are supported by that server.


Source : http://www.w3.org

What is reputation management?

What is reputation management? Specific to SEO and in laymen’s terms it’s about removing negative listings from search results for specific keywords.
All of what’s been detailed here has been tested within the last 3 months (the age of Panda) and has worked for me.
Use the navigation links below to fly through this post:

Example scenarios where you need reputation management

Back to top Continued from top: To further clarify my definition, let’s use practical scenarios.
Scenario 1 – Oops, I made a mistake
You’re drunk, it’s late and you decide to leave a stupid comment on a blog post. Unfortunately, that post on a very strong site and now shows up for your name. Turns out that you’re looking for a new job and you’re sure that a quick Google enquiry will bring up a moment of idiocy that is sure to diminish your chances of getting hired.
Scenario 2 – That annoying “scam” word
You’re a new start up, and despite all of your kindness, great customer service and big smiles…you now have the word scam associated with your brand. Get mad. Now take deep breaths and stay with me :)
Scenario 3 – The disgruntled customer (aka Trolls)
Seems like your customers genuinely have some complaints, and they decide to vent their frustration on blogs, forums and even make a Youtube video detailing how much they dislike you.
Scenario 4 – The enemy
Someone genuinely dislikes you, knows a bit about SEO and spends 3 hours a day trying to ruin your reputation whilst eating Quavers (Cheetos for North Americans) in their basement yelling at their mom to bring down their laundry because they ran out of fresh pants (NA – underwear) last week due to a Star Trek TV marathon.
There are plenty of other reasons why you’ll need to manage your reputation online, but these are some of the reasons why you’ll need to fight back.
*Keep in mind that I am only dealing with Google in this post, as I haven’t tested any methods for other search engines or social media platforms.

Influencing search suggest

Back to top
Search suggest: As you type in your query into the Google search bar, you’ll notice that Google will suggest words based on your query.
How do those words get there? / How can I change it?
There are a couple of theories – based on search volume and sheer volume of listings. There are three guys that I absolutely respect and trust 100% on this, and I encourage you to read up on how this works…because I can’t/won’t tell you ;)
  1. Brent Payne
  2. Rishi Lakhani
  3. Martin MacDonald
Please note that this activity isn’t exactly legal, so proceed with caution and at your own risk.

Making “scam” happen – Typical culprits

Back to top
So explain why my brand has scam associated with it?
There are people and companies out there who purposely optimize pages for your {brand} + {scam} for a variety of reasons. Usually the mention of your brand + scam in autocomplete (Google suggest) tends to have search volume, but it can also happen because there is indexed content optimized for those keywords. I wish I could explain how this works in depth, but this is a very touchy subject and it’s best left for those who know the law inside out.


This is the best post I’ve ever read on Ripoffreport.com by Seomoz: The anatomy of a rip off report
How easy is it to create fake “scam” reports?
Very easy. Within 15 minutes a friend of mine had created an article on scaminformer.com which was automatically approved, indexed and ranking on page 3 for a fake company name IN 15 MINUTES!!

Tactics that work – Even after the Panda crashed the party

Back to top
The biggest mistake that companies make is to put all of their energy into taking down negative listings. You’ve got to think differently, you need to put up more listings to push down the negative listing. Unless the negative listing is worthy of a DMCA takedown. Please note that this isn’t a short process or universally applicable – you’ll need patience and persistence to make this happen.
In many cases you can push up other listings to knock down the negative listing, otherwise you need to get as many listings on that search result page that YOU CONTROL.
How do I know that this works? I’ve done it recently, and struggled with so many different white hat tactics that just didn’t work. Once I found some white hat seo tactics that did work, I felt I needed to share my experiences here.
In discussion order, so you can skip to wherever you like:

Exact Match Domains – Individual Reputation Management

Back to top
The best thing you can do to combat negative listings is buy the exact match domain name for your personal name. Make sure you buy the appropriate tld .
Example: In the UK, buy .co.uk or .org.uk
Does someone own your brand+scam.com/co.uk/net? If not, go buy it and hold on to it so no one else can get it.
Get some good content on that domain and make sure you’re perfectly optimized for it. I’m going to repeat this, you need GOOD content – you need to make sure your pages are perfectly optimized, then you need to get some links. Start off with directory links and see how you get on – if directory links don’t bring you to page 1, you’ll need to either get better links or better content.
The only time saver is to get WordPress and install Yoast’s SEO plugin for your SEO needs.
Actions:
  1. Buy exact match keyword domain
  2. Populate with good content
  3. Optimize for search
  4. Implement schema.org markup – see schema section below
  5. Build directory links first – slow and steady (No more than 300 links per month)

Press releases

Back to top
This is perfect if you’ve got something newsworthy. If you don’t, find an interesting angle and get it out there. I recommend using MarketWire.
Why a press release?
They are powerful, and tend to get picked up by loads of sites that are connected to Press syndication software such as MarketWire. They also allow for anchor text, and can flood a search result if done properly. Sometimes, websites will take the information from your release and not attribute links back to you – if this occurs you should pick up the phone and make sure you get credit.
Actions:
  1. Get a newsworthy angle, find data to support it
  2. Write a professional press release
  3. Submit through service like MarketWire
  4. Get on the phone / send emails to bloggers, journalists etc…

Schema Microdata

Back to top
If you have no idea what Schema.org is, you need to read up on this now. This is the future of webpage markup that allows you to specifically declare what your page is about beyond html.
Before I get lynched by professional SEOs, I conducted preliminary tests on a few non competitive queries (<1000 local search volume) and I believe that the schema markup was the sole reason for their jump in the ranks.
In one case specifically, I implemented the schema /Person markup for an individual and I jumped 4 positions. No jokes here, get out there and do it on every domain you control – not only because of what I’ve seen, but because it’s just good practice.
If you want me to investigate further and blog about schema microdata influencing rankings, click here to vote ;)

Editorial content from your site optimized for negative keyword

Back to top
Get creative here, and use the negative word + your brand as a title for a blog post/article etc..
Example: www.example.com/scam-in-the-xyz-industry.html
Don’t want to use your own site? No problem, start a new blog – either way, just get it done.

Exact match anchor text

Back to top
Yes, that’s right, build some links with exact match anchor text to the listings already present for the search query you’re trying to clean up. Don’t overdo it, but you definitely do need to build specific anchor text.
I hate to say this, but directories and press releases can do this easily for you. Just try your very best to submit these to decent directory sites.
Now that we’re in the Panda phase, I have tried article marketing and it just doesn’t work anymore. It was crap before, but now seems to be almost completely devalued. 1 out of every 10 directory links seems to work (try to choose a less populated category), and steer clear from automated submission programs like SEnuke, articledemon.com and xrumer.. A quick work for those who are new into SEO, these products might seem like a godsend promising high rankings blah blah blah….do you really think that Google doesn’t know about this stuff? Using these types of programs will reverse your hard work, you’ve been warned.

Social Media Profiles

Back to top
Regardless if you’re a business or individual, strong social media profiles can be interlinked and fed low quality links without risk of being penalized (unless you absolutely abuse it – please don’t do that).
Start here:
I could explain, but why re-invent the wheel. Here’s an excerpt from Rand Fiskin on the SEOmoz blog:
“#6: Leverage Lower Quality Links for Social Profiles, Higher Quality for Self-Managed Domains
I’d never suggest buying crappy links, but if you must or if you have other links you control that are of questionable quality or you think search engines might consider low value or even manipulative, don’t point these to your newly registered domains or the sites you own. Instead, point them at the powerful, high authority social profile pages you’ve created and let the engines decide what/whether to count them.
This works particularly well for nofollow links from comments, wikis and other social participation forms on the web. I’m not sure whether the nofollows directly get counted or if the pages get scraped and re-published in some followed format, but time after time I’ve seen examples of nofollows seemingly doing the heavy lifting to get social profile pages ranking.
If you own some old, neglected sites that are questionable in quality and rankings from the engines’ point of view, you could try testing these by pointing them to other social profile pages (and observing/testing the impact on those URLs’ rankings) before pointing them at your own profiles. Better to be safe than sorry, and there have been plenty of cases where aggressive SEOs have gone too far with linking to social profiles and had either the search engines penalize the pages or even the site administrators pull down the profile, wasting hours or days of work.”

Blog / forum activity

Back to top
Yes, you should comment on blogs and forums and leave a link in your signature. It works, just please don’t be spammy about it. There are genuine blogs and forums out there that you need to interact with – because they are part of your industry. This way, you can read their content, engage in the community and add backlinks to the listings you need to push up.
Please don’t discriminate against nofollow, as they are necessary as well.

Wikipedia

Back to top
If you notice that Wikipedia ranks withing the first 3 pages of your negative listing, you’ll need to leverage it’s power. If the article is a stub (incomplete), do some research and fill it out (images, resources, links, text). If the article is well written and complete – build links to it until you see movement.
This principle of building links to strong results (strong domains) to push them up the ladder applies to every SERP. News, Government and University sites are prime candidates for this type of activity….

Tactics in the grey area

Super huge disclaimer

I do not recommend any of the following Distilled does not recommend any of the following.
The only reason I’m adding this section is to inform you of tactics that others can use against you – you need to understand what the possibilities are in order to fight them. Without further ado:
1) www.CC.CC
This is a free domain service that creates your own custom cc.cc domain name and automatically 301 redirects to the following profile pages:
Blogger, Gmail, Google sites, Tumblr, Bandcamp, Hotmail, Facebook, Twitter, Myspace
2) Social Media mentions through fiverr.com
3) PPC for uncompetitive verticals
Got some money to burn? Adding 3 PPC ads might just be enough to drop that negative listing below the fold…
4) Pushing an EMD to become a “brand” now that Google is using large sitelinks
5) Flooding
This involves pushing out tons of content – and I mean tons (advertorials, blog posts, guest blog posts, articles, etc…) until those negative articles are lost.
6) Youtube videos – Youtube SEO
Specifically for search results that contain video results on pages 1 to 3.
7) Image search
If image results appear, it’s probably because the query is known to generate click throughs on image search. Take a wild guess…
8 ) Mechanical Turk
If you like what you see in this blog post, why not learn more from the Distilled team and other industry experts at our SearchLove conferences held in London and New York.

Posted by David Sottimano

Why You Might Want to Make Some PPC Bid Reductions

An important part of pay-per-click campaign manager’s job is to find the best bid for a specific ad, keyword and budget combination. This is a subtlety of PPC management that is often overlooked or misunderstood. People spend a considerable amount of time doing ad optimization and building negative keyword lists but they may not test incremental bid levels to see how they affect click volume.

Bigger (Bidding) Isn’t Always Better

Frequently I see very high bids set where they don’t make sense; generally people tell me they set them that high “to get the most clicks.” The rationale is that if you bid higher your ad will be placed higher and you’ll get more clicks. This may be the case for some but it’s rarely the best idea. If you have a limited budget and you bid higher, you may get fewer clicks because you will exhaust your budget faster, like happened with this campaign mid-July. (Note: I didn’t do this ;) )

Testing bid changes can result in huge savings if done properly. In most cases you have a specific budget to spend and need to get a certain level of return on ad spend. Whether your desired result is traffic or conversions, you want to get the most for the lowest cost possible. If you have a bid set high enough to allow your ad to display in the first few positions for relevant keywords, undoubtedly you will get some clicks. But, at a lower bid you may be able to get more clicks if your budget is limited. Or, with a lower bid you may be able to reduce your cost per acquisition (CPA) without dramatically decreasing your conversion volume. The principle is simple but the execution isn’t.

The Sweet Spot

The “sweet spot” is the ad position where you get the most value. If you are optimizing based on conversions, this will be the position where you get an acceptable level of conversions for the lowest CPA. If you bid higher, you may get more conversions at an acceptable CPA but the CPA may be high enough that those additional conversions aren’t worthwhile.
For instance, let’s say you sell a product that costs $200 and you have a 2% conversion rate*. You may increase your bid and get additional clicks and that are still profitable for you. The increased revenue may be tempting:

However, those extra 5 conversions actually come at a very high price:

If you knew you could get a high volume of conversions at a $75 CPA you may decrease your bid. You would optimize your campaign to maximize the number of clicks you get at that lower CPC to increase the conversion volume. But, if you don’t know that you can get a large percentage of your conversions for a much lower cost, you may leave your bid at the higher level. This is why testing is so important.
To find the “sweet spot” where your cost per click is low but you are still generating an acceptable level of clicks you need to test many bid levels to find where you get your desired actual CPC and conversion rate. The best way to do this is using AdWords Campaign Experiments (ACE) so you only impact a percentage of your impressions with the bid changes and so you can easily undo the changes if need be. Once you’ve created your experiment you need to determine how much you are going to reduce the bids. There are two possible strategies:
  1. Start with small bid decreases to minimize the chance of a major shift in traffic. This tactic will require longer to test as you’ll need to run a test on the small bid change, report on the results then run another test with the bid decreased slightly more and report on those results and continue repeating the test until you find the bid at which you are generating the most clicks for the lowest CPC. This is the best strategy if you can’t risk a major decrease in click volume.
  2. Start with a large bid decrease to see an immediate impact, then if click volume decreases dramatically begin testing progressively higher bids until you find the sweet spot. This is the best strategy if you have a higher tolerance for changes in click volume and want to see faster results.

Things to Consider

  • CTR. One concern I hear regularly when I recommend decreased bids is that the CTR will go down resulting in a decrease in Quality Score. AdWords normalizes CTR by ad position, recognizing that CTR will be lower in lower ad positions, therefore you are not penalized as long as your CTR is in line with what others are getting in that position.
  • Impression Share. If you decrease your bids too low you may end up not getting any clicks because you aren’t bid high enough to participate in the auction. Keep a close eye on your impression share to make sure you aren’t losing too much with your decreased bids.
  • Ad Copy. Different ad copy works better in different ad positions. You will want to test different ad variations in different positions to see what works best in your market.
  • High Ad Postions Don’t Equal More Conversions. I have experienced ads in the top 1 or 2 positions having lower conversion rates, albeit higher traffic overall, than ads in lower positions. This may be due a host of reasons including people just clicking the first result without really looking at what the ad is for. As you test ad position you need to keep a close eye on conversion rate and bounce rate to see which position is optimal for your site.

If you’ve been following my posts on PPC you’ve no doubt noticed a trend which is testing, testing, testing. You should always be testing something in your ad campaigns and you should test every facet of your campaign to make consistent improvements in performance. Never be satisfied with “good enough” just because you are meeting your goals. When you meet your goals, create new more difficult goals and optimize until you reach them.
*For the sake of keeping the math simple we will assume a constant conversion rate although it does fluctuate and should be tracked along with CPC and position changes.


Written By Pamela Lund

August 30, 2011

Google INFOGRAPHIC

FOR FULL INFOGRAPHIC CLICK THROUGH

August 23, 2011

Know Your Competitors

It’s one of the first questions we ask prospective clients. It sounds easy enough to answer, but you’d surprised. Through this one little question we’re able to learn a bit more about the client and their space, get an idea of who they feel they’re up against, and then determine whether or not their idea at all matches reality. Sometimes it matches pretty well and other times, well other times their idea is simply adorable. ;)
One thing we all know for sure is that times have changed. It used to be that your competitors were the businesses in your area that sold the same product or performed the same service as you. But thanks to the Internet and to the way many search engines display information, this is no longer the case. In order to outwit your competition and dominate sales, you first have to understand who they are. Because you can’t be out for blood until you know whose blood you’re out for, right?
Right.
Below are the five types of competitors you’ll have to consider and plan for.

1. Brick and Mortar Competition

Typically, these are the competitors you have a pretty good handle on. They’re the faces that you fake smile at when you run into them at the local grocery store or whose weekly fliers you slyly throw in the trash so that others can’t see them. If you’re Staples, your brick and mortar competitors are mom and pops like Joe’s Office Supplies, as well as local chains of OfficeMax, OfficeDepot, etc. They’re the stores that both sell a similar product AND that are located in your area. But you know these guys.

2. Competition That Ranks For Your Keywords

Thanks to the Internet, it’s not just the local companies offering similar services that you need to worry about. You also have to be aware of your search competitors – the businesses that are stealing customers by ranking for the keywords you want to be found for. Thanks to the Internet, it doesn’t matter that Jenny’s Computer Depot is a one-woman shop run out of a basement in Idaho, if Jenny is ranking on the first page for your search terms when you’re banished to the second, she’s probably stealing your customers and your retirement fund.
For example, let’s go back to our Staples example.
If you’re Staples and you want to rank for [hanging file folders], your competition isn’t just the brick and mortar guys that exist 15 miles from your storefront.

Thanks to the Internet, you’re also up against:
  • Amazon
  • The Container Store
  • Smead
  • Sams Club
  • DiscountOfficeItems.com
  • Walmart
  • And many, many others
There doesn’t need to be a Container Store within 200 miles of your storefront. If their Web site is showing up above yours in the search result, that’s a direct competitor. And you need to create an SEO plan handle that.
[While I was doing some research for this post I stumbled upon TopRank’s recent post entitled In Search, Your Competition Isn’t Who You Think where Lee Odden gives some great tips on how to overtake your search competitors. I’d give it a read.

3. Competitors Whose MEDIA Ranks For Your Keyword

Did you see that Shopping One Box listed above the results in the last screen shot? No? Well, here it is again.

Meet your other competitors – the businesses who go through the side door while everyone is trying to push through the front. This is a big reason why it’s so important to not only create digital assets related to your brand, but to optimize them for search. Because with the engines looking for this content and placing it directly into the search results, it creates a new kind of search competitor. One where you either have the goods to compete or you don’t. Obviously, this doesn’t just apply for Shopping results, you’ll also want to look at businesses stealing your search thunder via News, Blogs, Images, Video, etc. If someone is ranking above you, you want to know about it.

4. Companies Google Says Are Your Competition

If Google’s the one ranking your business against your peers (quite literally, actually), then it makes sense to understand who Google things you’re similar to, no? Just some food for thought:

5. Share of Buzz Competitors

Thanks to social media, there’s also another nagging competitor to think about – the businesses who are embarrassing you in the Share of Voice department. These are the businesses that sell similar products or services as you but who seem to be involved in every darn social conversation. People are tweeting their stuff, sharing it on Facebook, and referencing them 24/7 while your brand pretty much bobs up and down in the sea of obscurity.
Am I trying to intimidate you by showing you just how much competition there is out there today? No, I’m not. But it’s important that you’re aware of it. That when you take a look at your competition or create systems to help you monitor their actions that you’re looking at the right sites and in the right direction. Because times have changed and you need to make sure you’re focusing on the proper SEO strategy to help you succeed in a land where customers have more options and more distractions than ever before.


by Lisa Barone - the Chief Branding Officer of Outspoken Media
Source : Outspoken Media

August 22, 2011

Guidelines for Undercover Your PPC Competitors

In the corporate world, we prefer to use the softer, more professional reference to competitive intelligence gathering. What ever you prefer to call it, keeping track of competitors is something we all do in one way or another.

The Value Of PPC Competitive Intelligence

I think the two first things that usually come to mind for PPC competitive analysis are keywords and bids. We all want to know what keywords our competitors are using and how much they are paying for them. This certainly would be important data to have, but the real value of competitive intelligence comes when you can know not only what your competitors are doing right now, but what they are likely to do and when they are likely to do it.
It is much more interesting to go beyond the obvious tactical questions and try to answer larger, more strategically important questions, such as:
  • How big is our opportunity?
  • How much are our competitors investing in paid search?
  • Are we investing enough in paid search?
I like to delineate our three primary uses for competitive intelligence: benchmarking and forecasting, tracking day-to-day competitor tactics, and forensics.
Benchmarking/Forecasting
By collecting and analyzing information about your competitors, you can start to develop models to understand what your competitors are investing in PPC, individually and as a a group.
In some cases, you can find direct information by reading SEC filings, press releases and even their Twitter feeds. Once you establish your current place in the competitive pecking order, you can develop forecast models to plan for your own future PPC ad spends.
Tracking day-to-day tactics
In highly competitive and seasonally-active markets, it is important to know what your competitors are doing and when they do it. It is never a good idea to have your boss or your client make you aware that a key competitor has been running a promotion for the past two weeks that you could have known about within hours or days.
With the right PPC tracking programs in place, you should know very quickly when competitors change ads, significantly change bids (drastic shifts in their or your average ad positions), offer special price promotions and develop new landing pages. The more quickly you can discover these changes, the more quickly you can counter their moves.
Forensics
In the wild-west of the Internet, there are always cowboys who play fast and loose with the rules, and a constant influx of new competitors who may or may not actually know what the rules are.
A good competitive intelligence program will allow you to identify and quickly address things like ad double-serving, non-authorized use of your trademarks, ads that violate editorial policies, websites where your content has been completely copied, suspicious clicking patterns from a competitor’s IP address, and so on.
The sooner you identify bad behavior and report it to Google, Microsoft, and/or your legal team, the sooner you can dampen the impact of a competitor’s improper PPC advertising.

Studying Your Competitors

There is plenty of information you can glean from search results pages through simple observation and by asking yourself a few smart questions along these lines:
Ads
  • How many ads are they running?
  • How quickly do they react to ad copy changes you make?
  • How often do they put new ads into rotation?
Landing Pages
  • How many different landing pages do they use?
  • What sort of offers/calls to action do they use that you don’t?
  • How often do they update landing pages?
  • How good is their SEO, and page construction on their website/ landing pages?
Tracking
  • Are they tracking clicks? Ads? Keywords?
  • What sort of analytics do they use?
These are just a few starter questions that will help you gauge your competitors’ level of sophistication and aggressiveness.
The more time you spend and the deeper you go with your competitive intelligence program development, the more questions you’ll develop. It doesn’t take long before you can start to make predictions about your competitors’ behavior and how they are likely to respond to your own campaign adjustments.

Manual vs. Automated Data Collection

While it it possible to collect plenty of data manually, you will quickly realize that tools can greatly improve not only the accuracy of your data collection, but also the volume and breadth.
For example, your repeat searches on Google from the same IP address will give you much different results than others will typically see in the SERPs.
Keyword research and competitive analysis tools can range anywhere from free to over $10K per month, and there are plenty of them out there. The table below lists a few tools we’ve used from time to time to collect data and some of their key features and approximate costs.
(I apologize upfront to any tool vendors not on the list, or if I’ve misstated any of your current features. Please feel free to correct the record in the comments section below.)
PPC Competitive Analysis Tools
PPC Competitive Analysis Tools
All tools have their unique methodologies for gathering and reporting data, so results can be wildly different from tool to tool.
Though each tool claims to be better than all the others, we don’t trust any particular data point, but instead, use the data for approximations and to see how the data changes over time. We typically triangulate by using more than one tool for any specific task, especially bid and market share information, and we use our own campaigns as a control.
I’d say AdGooroo is our favorite intelligence tool because of the way it visually presents the data on competitors using rank order charts and data tables. It is also the only tool we know of that collects display ad intelligence, too, a very powerful feature.
SPYFU has a few nice feature that shows when advertisers change ads and charts that show comparative overlap of keyword spaces. Keyword Competitor has some near real-time reporting, and while it is a relatively new tool, it is one worth evaluating.

Ethical Considerations

As with all competitive intelligence activities, it is easy to lose sight of right and wrong, and even to recognize what is legal or illegal. There are many shades of gray in this marketplace.
For example, a strict reading of the Google AdWords Terms of Service could mean that simply clicking on a competitor’s ad is a prohibited activity, as excerpted below:
“4 Prohibited Uses; License Grant; Representations and Warranties. Customer shall not, and shall not authorize any party to: (a) generate automated, fraudulent or otherwise invalid impressions, inquiries, conversions, clicks or other actions…”
Would a single click on a competitor’s ad be deemed invalid and illegal? Certainly if a competitor clicks on one of my ads, I’d consider that invalid. Is it illegal based on AdWords TOS? I don’t suppose Google is interested in shutting down every advertiser who has ever clicked on competitor’s ad, but at the same time, they certainly would be within their rights to do so, and likely reserve it for abusive behavior such as repeated clicks by advertisers maliciously doing so to drive up advertising costs for a competitor.
The ethical dilemma is clear. If it is a generally accepted practice in our market to occasionally click on a competitors’ ads for the purpose of competitive tracking, knowing it is not legal, but knowing that Google won’t likely enforce this policy, is it ethical do it?
That is a risk assessment and decision each of us needs to make, of course, but again an area where I believe the golden rule applies. Do unto others as you would have them do unto you.

Source : http://searchengineland.com/

August 18, 2011

On Page Optimization For Multiple Keywords

When you began learning SEO, someone probably told you not to optimize a webpage for more than two or three queries. At the 101 level, treating each keyword with the same reverence makes a lot of sense. It is easier to teach and execute. Now, at the 303 level, we want to wring out every last drop of SEO copy goodness we can get.
In this article I will lay out several steps on how to do this. How formally you execute them depends on your situation. If you are unfamiliar with the topic or working on a single page, you may want to stick close to this script. However, if you know the subject you are writing about well, some of this can be done more intuitively or you can combine steps.

Relevance vs. Authority

Optimizing one page for multiple keywords is about relevance and authority. Let’s begin with a little theory.
I like to divide SEO by relevance and authority. It’s simple and easy for clients to grasp that relevance is how germane a document is to a search query, while authority gives a page the strength necessary to secure a top ranking.
The combination of relevance and authority is an attempt to measure quality and over the years search engines have developed lots of ways to determine quality.
They also employee algorithms like Query Deserves Freshness, Query Deserves Diversity, geo-location, search history and more to improve search results.
However, the foundation of ranking documents in search is still relevance plus authority.
This tutorial is about creating on-page relevance for many keywords. It doesn’t matter whether you’re executing a Web search, news, image, shopping or any other type of search. Search engines need identifiable signals within the HTML code and text to match a document to a query.

Understanding Relevance

What demonstrates relevance? While the presence of a keyword or phrase within a document is the most obvious signal, there is more to it than sticking a keyword onto a page a few times.
  • The highest relevance comes from exact matches. If your query is Major League Baseball No-Hitters, then that is exactly what search engines try to find.
  • They will look for all the words on the same page, first in sub-groups like Major League Baseball then as individual words: no-hitters, league, baseball and major.
  • Finally, search engines will look for any of the words on the page.
By the time you get to that last bullet, your page might still be relevant for Major League Baseball No-Hitters, but it will be at the bottom of the list and unlikely to rank.
At this point you might think, why bother? Or, because the topic is optimizing a webpage for multiple keywords, the light bulb inside your head may be going off.
  • If you can write phrases into your page that incorporate the most important keywords on the page you may rank for several long tail keywords.
  • If you can work additional search queries into the copy you might rank for them too.
  • If you can use single words or short phrases that are parts of queries you may earn additional rankings from unexpected queries.

Create A Vocabulary

During a conversation about baseball one might hear words like bat, ball, bases, diamond, hit, run, earned run average, percent on base, team, league, series and dozens more.
Gather these words and use them in your writing. I believe natural language patterns create relevance in search engine algorithms.
But even if you are skeptical, including words that get used together frequently may net additional long tail matches. It also makes it easier to write interesting copy. Good places to look for your vocabulary are:
  • Keyword tool suggestions
  • Alternative search query suggestions in ranking results
  • The dictionary and thesaurus
  • Top ranking pages
  • Wikipedia and other authoritative sites
  • Books on the topic
As you build your vocabulary, be certain to go beyond nouns and include frequently used verbs, adjectives and adverbs.

Selecting Multiple Keywords

Select the top one to three keywords using your normal keyword research and selection process. These will be your page’s high priority keywords. Once you have these you can add medium and low priority keywords to your list.
When optimizing one page for many keywords priority refers to importance within a page. If a word is important enough to your entire website or business it may be better optimized as a high priority keyword on its own webpage.
Then again, a medium priority phrase on a page with lots of off-site links or authority may be easier to rank for than by placing the same search query as a high priority keyword on a separate page with lower authority.
If you are unsure, test both options with different key phrases to see how the search engines react. Some old school SEO practitioners will tell you authority usually trumps relevance.
To select the medium priority keywords, use your vocabulary to generate a list of words and short phrases. Order these by search volume or rate them by how likely someone might search for each. Because you will have longer phrases that do not appear in keyword tools, this can come down to a judgment call.
Move obvious long tail keywords into the low priority list. Now, go through the leftover words and phrases from top to bottom and ask whether or not you can write a section about it. If you answer yes, add the query to the medium priority list. Otherwise, assign it a low priority.
This is where it helps to understand keyword difficulty. Medium and low priority keywords ought to be easier to rank for than the high priority keywords.

Keyword Placement

Now you are ready to write and optimize your webpage. In SEO 101, you want to get all your keywords into the title tag, the #h1 headline, image alt tags and, of course, the actual body text.
Obviously, this will not be possible and you will have to make some choices. Relevance is generated by a combination of multiple signals. If you cannot perfect one thing, strengthen something else.

Title Tag

It is difficult to get three complete keywords into the title tag within 70 characters. How are you going to get even more keywords in there?
Either create a longer title or leave some key phrases out. Yes, shorter titles are more relevant. Yes, keywords at the front of titles are more relevant. But while search engines may only display the first 70 or so characters, they will index much longer titles.
  • Set a maximum length of 200 characters (This is the number I suggest, but search engines can index longer titles.)
  • Include the high priority keywords in the first 70 characters
  • Continue the title tag using medium priority keywords
  • If you cannot get all your medium priority keywords into the title, use ones that include words that appear frequently throughout the medium priority keyword list
  • Include individual words contained in the medium priority list
  • Massage it all together to make the title meaningful and readable
Don’t worry too much about how your title tag will appear in a browser or on the search engines. If a page appears in the SERPs, search engines will display an excerpt of the title that contains the keyword.
As for the appearance in the browser, the title does not appear until someone is on the page. At that point, a clean headline is far more important.

#H1 Headline

In the on-page headline, I suggest sticking to the top tier keywords. It is more important to have a clear, readable headline than to stuff it full of words and phrases you want to rank for. Incorporate individual words that appear frequently within your medium priority list.

On-Page Content

I am terrifically keen on #h2 sub-headlines when it comes to multiple keyword SEO. Not because search engines consider the #h2 tag to be a strong ranking signal, but because it affords the opportunity to write entire sections about medium priority keywords, complete with image and optimized alt tag.

Knuckleball Pitchers Who Threw No-Hitters

Try to create a section of on-page content for every first and second tier keyword. It does not always work, but each time you can, get that keyword into an #h2 and write optimized content.

Knuckleball Pitchers Who Threw No Hitters

knuckleball-pitchers-threw-no-hitters.jpg” alt=”Knuckleball pitchers who threw no hitters” class=”align-left” /> Lorem ipsum dolor sit knuckleball pitchers who threw no hitters amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation pitching ullamco laboris threw nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in no hitter voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non no hitters thrown by knuckleball pitchers proident, sunt in culpa qui officia deserunt mollit anim id est knuckleballs laborum.
After you finish writing the first draft of your page, revisit it and insert medium priority keywords where they fit naturally.
Just as you would with basic SEO copywriting, use different forms of the words (singulars/plurals, past/present/future tense, adverbs, etc.)

Third Tier Keywords

Revisit your entire page to make minor rewrites to sprinkle in the third tier keywords where they will make sense and sound natural.

Break-up Your Keywords & Revisit Your Copy

Break apart all of your keywords into two word phrases and single words. Go through your copy again, this time rewriting to insert these.

Revisit Your Vocabulary

Yet another pass. This time look for opportunities to replace generic words like pronouns with specific terms. For example, he went around the bases might become the base runner rushed around the diamond.

Smooth It Out

This is the final pass. Read your SEO copy out loud. If any of it trips you up or sounds robotic, rewrite it to until your entire page sounds natural. Try to keep all your SEO goodness, but be willing to make concessions to SEO in favor of readability.

Wrtitten by

Source : Search Engine Land

August 16, 2011

Google Local Search Glossary

The following is a collection of terms and definitions from a number of Google’s patent filings on Local Search. It’s possible that not everything discussed in these patent applications has been incorporated into Google Local search – but the interesting thing about many of these patent filings is exploring whether or not they may have been.
Search query categorization for business listings search
Local Search Result with Pasta category
Category Classification Component – Finds appropriate categories for searchers’ queries. May use yellow page business listings or a category classification model automatically trained from different possible training data sources.

Category Classification Model – Based on training data from sources such as yellow page listings, categorized business web sites, consumer reports information, restaurant guides, query traffic data, and advertisement traffic data. Uses statistics to associate search queries with relevant business categories.
Directory listings – Business information may be taken from yellow page type directory listings, such as those compiled by various phone companies. These listings include business categories as well as business names associated with each of those categories.
Miscellaneous Pre-Classified Business Data – From sources like consumer reports information, restaurant guides, or web-based directory listings. Web pages about a specific business contain words fitting into specific categories which may be used to modify categories that a business appears within. Example: a business listed on a page with words on it about “Italian Restaurants” may be placed in an “Italian Restaurant” category.
Query traffic data – Searchers’ selections from searches may be used by the classification component to classify businesses when their query terms are ambiguous. Example: someone searches for “films” and they receive business listings from a “theater” category and a “photographic film” category. If they select listings from the “photographic film” category, the classification component may modify the probability that query shows “photographic film” category results.
Advertisement accompanying a boats Miami search
Advertisement traffic data – When a searcher selects a displayed advertisement, that may indicate that the advertisement was relevant to the search query. The search query and the category of the selected advertisement may be considered training data that can be used to modify or initially train the category classification model in a manner similar to the training performed for query traffic data.
Scoring local search results based on location prominence
Location Prominence – A system may identify a first document associated with a geographic location within a geographical area and identify a second document associated with a geographic location outside the geographical area. The system may also assign a first score to the first document based on a first scoring function and assign a second score to the second document based on a second and possibly different scoring function. These scoring functions can be related to distances, and to other scoring factors. Location prominence may refer to a score generated for a document based on one or more factors unrelated to the geographical area with which the document is associated, the searches performed by users, and/or the search queries provided by the users. Location prominence may use factors that are intended to convey the “best” documents for the geographical area rather than documents based solely on their distance from a particular location within the geographical area.
The location prominence score may be based on a set of factors that are unrelated to the geographical area over which the user is searching. This set may include one or more (a combination) of the following factors:
  1. A score associated with an authoritative document;
  2. The total number of documents referring to a business associated with the document;
  3. The highest score of documents referring to the business;
  4. The number of documents with reviews of the business;
  5. The number of information documents that mention the business (such as Dine.com, Citysearch, and Zagat.com); and,
  6. The set of factors may include additional or different factors.
These factors may possibly be combined with distance scores in some instances.
Map segment showing a centerpoint for Washington, DC, near the Whitehouse.
Centerpoint – When scoring local search results, the search engine may identify a location within the geographical area. It may be associated with the location of city hall, downtown, or a geographic center of the area, or based upon some other centerpoint using geographic information. The local search engine identifies all business listings and/or web pages within a predetermined radius of the identified location. The local search engine may then identify those business listings and/or web pages that match the search query. The identified business listings and/or web pages are assigned distance scores according to their distance from the identified location and ranked based on their scores.
Postal Codes – A geographical area might be identified by set of postal codes allocated to the geographical area, determine a postal code associated with a document, determine whether the postal code is included in the set of postal codes associated with the geographical area, score the document based on a first scoring function when the postal code is included in the set of postal codes associated with the geographical area, and score the document based on a second scoring function when the postal code is not included in the set of postal codes allocated to the geographical area.
Latitude and longitude coordinates – A geographic area might be identified by latitude and longitude coordinates associated with the geographical area, determine a latitude and longitude coordinate associated with a document, determine whether the latitude and longitude coordinate is included in the set of latitude and longitude coordinates associated with the geographical area, score the document based on a first scoring function when the latitude and longitude coordinate is included in the set of latitude and longitude coordinates associated with the geographical area, and score the document based on a second scoring function when the latitude and longitude coordinate is not included in the set of latitude and longitude coordinates associated with the geographical area.
Combination Scores – A score may be assigned to a document based on a combination of two or more of a score associated with another document that is identified as authoritative for the document, a total number of documents referring to a business associated with the document, a highest score associated with the documents referring to the business, a total number of documents with reviews of the business, or a number of information documents that mention the business, and using the score to rank the document.
Broad Area – May be identified as being associated with the search query. Is intended to refer to any geographic location that is specified as an incomplete postal address (i.e., less than a full postal address). Any geographic location that is identified by less than a street name and street number can be considered a broad area. A broad area may include a city, a zip code, a street, a city block, a state, a country, a district, a county, a metropolitan area, a large area (e.g., Lake Tahoe area), a combination of areas (e.g., Sunnyvale and Mountain View), etc. When a search query includes information regarding a geographical area, then the broad area may be identified from the search query.
Zcodes – If a search query includes the phrase “Mountain View,” then the broad area may be identified as “Mountain View.” A set of “zcodes” may be identified that correspond to the broad area. These could be postal codes, such as a U.S. Postal Service zip code in the United States, or something similar to a zip code outside the United States. A set of zcodes corresponding to a broad area may include those zip codes that have been allocated to the geographical area associated with the broad area.
Example:
For the Mountain View example above, assume that the set of zcodes includes the zip codes 94039, 94040, 94041, 94042, and 94043. To compress space, the zcode sets may be stored as a series of ranges. In the case of Mountain View, the zcode set may be stored as 94039:5, which corresponds to the zip code range of 94039 to 94043. If a zip code is unallocated to any other broad area, then it may be added to the range of a surrounding or adjacent zcode set. For example, if the zip code 94044 is unallocated, then it may be added to the Mountain View zcode set.
Top and left sides of map, with zoom and direction slider showing.  Boundaries of maps can be latitude and longitude coordinates.
Map Boundaries – The entire visible map area within a map window. If a search query doesn’t include information regarding a geographical area, then the broad area may be identified in another way. If the user is accessing a map, the entire visible map area within the map window may be considered the broad area. So, a search for a business type or category while looking at Google maps might use a broad area associated with the map. As the user zooms in or out on the map, or moves the map left or right, and/or provides an identifier relating to a geographical area of interest, the broad area is within the map window. The latitude and longitude of the map window may define the broad area.
Search Area – Associated with the broad area, a location within the broad area may be determined. This location may be associated with the location of city hall, a downtown area, a geographic center, or some other location within the broad area. A circle with a predetermined radius (e.g., 30 miles, 45 miles, 90 miles, etc.) may effectively be drawn around this location. The area of this circle may constitute the search area.
Relevant Documents – A relevant set of identified documents may be determined based on the search query, possibly based upon whether the documents that contain the term(s) of the search query in their title, content, and/or category string. When the query includes multiple terms, documents that contain the terms as a phrase, include all of the terms, but not necessarily together, contain less than all of the terms, or synonyms of the terms may be included in the relevant set.
Broad Area Relevant Documents – A determination is made for documents in the relevant set as to whether they fall within the broad area. If they do not, then a distance score may be calculated for those documents. The distance score associated with a document may be determined based on the distance the postal address and/or the latitude and longitude coordinate associated with the document is from the location within the broad area (e.g., the location representing the middle of the search area). If the document is within the broad area, then a location prominence score associated with the document may be determined.
Additional Scoring Factors – In addition to the scoring factors above for location prominence, it’s possible for other scoring factors to be used also. Examples:
  • Numeric scores of the reviews (e.g., how many stars or thumbs up/down),
  • Some function (e.g., an average) of all the scores of the reviews,
  • Type of document containing the review (e.g., a restaurant blog, Zagat.com, Citysearch, or Michelin),
  • Types of language used in the reviews (e.g., noisy, friendly,dirty, best),
  • Derived from user logs, such as what businesses users frequently click on to get detailed information and/or for what businesses they obtain driving directions,
  • Financial data about the businesses, such as the annual revenue associated with the business and/or how many employees the business has,
  • Number of years the business has been around or how long the business has been in the various listings, and;
  • Others.
Methods and systems for endorsing local search results
Local Search Endorsements – Users associated with each other in a social network can create and share personalized lists of local search results and/or advertisements through their endorsements of local search results and/or ads. Those endorsements can be used to personalize the search engine’s ranking of local search results by letting users re-rank results for the people endorsing them, and for the people who trust those endorsers.
Local Search Endorsement Entries – Entries made in a social network including information associated with an endorsed local article. These can include a particular local search query, one or more article identifiers for local articles and/or ads that the user has endorsed for the local search query, and the kind of endorsement for each of the endorsed local endorsed articles and/or ads.
Methods and systems for improving a search ranking using location awareness
Location Awareness – Uses some combination of location score and topical score to order documents related to a query to improve search rankings for that query. It may also include selecting a set of documents from the group of documents, determining a distance score for each document in the set of documents using a document location associated with the document and the location associated with the query, and ordering the set of documents as a function of both the topical scores of the set of documents and the distance scores of the set of documents.
Location Sensitivity – A location component may analyze the query to determine a keyword, or a query topic. A location sensitivity of the identified topic or query is determined. Some topics are location sensitive, and some aren’t. Different topics, query types, users, geographic locales, etc. may influence a different determination of location sensitivity. The amount or extent to which geographically-based search results are relevant to the topic and a relevant geographic range for the topic may be decided by examining such things as user behavior (e.g., user selection behavior, such as mouseover or click through) of search results presented to the user. Examples of location sensitivity:
Topic: A topic, such as “pizza,” may be strongly associated with local documents or web pages (high location sensitivity), whereas a topic like “travel plans” may be less location sensitive.
Scale of default map on a search for pizza in Newark, Delaware
Scale of default map on Search for pizza in Newark, Delaware.
Scale of default map on a search for travel plans in Newark, Delaware
Scale of default map on search for travel plans in Newark, Delaware
Query Types: Certain query types (e.g., commercial queries) may have different location sensitivity.
User Specific: Some users may specify a more local focus for their desired search results than other users, or may be determined to have a more local focus based, at least in part, on browsing history, search history, or transactional or other kinds of available data.
Location differences: One location, such as Manhattan, N.Y., might be more location sensitive compared to another geographic area, such as Camas County, Idaho (the most sparsely populated county in Idaho).
Specificity of Query: The specificity of a location term provided or inferred (e.g., a location specified by a user or a search query), such as a zip code versus a city versus a street address, may affect location sensitivity, as would information, such as a user specified maximum distance (“I’m willing to travel 30 miles to . . . “).
Example:
When a user types in search queries, such as “infinity auto” and “pizza,” a location component may determine associated topics of “car/automobile” and “restaurant.” The location component may determine the sensitivity of the topics “car/automobile” and “restaurant” to location-based search results. It may determine that users are generally more location sensitive for the topic “pizza” than for the topic “automobiles/cars,” so that users may generally be interested in documents on the topic of “automobiles/cars” that are farther away from their location, whereas users may generally only be interested in documents on the topic of “pizza” that are nearer to their location. Location sensitivity can be determined relatively, or can also be mapped to a distance (e.g., users are generally interested in documents with a distance of up to 50 miles for “automobiles/cars,” but only 5 miles for “pizza”).
Document Identification – The search engine looks for previously indexed relevant documents in a search database in response to a query. This document data can include a universal resource locator (URL) that provides a link to a document, web page, or to a location from which a document or web page can be retrieved or otherwise accessed by the user, data indicating one or more locations with which documents are associated, and data corresponding to the text of the documents.
Topic Score – Various information retrieval and other techniques used by conventional search engines are used to determine the relevance of a document, such as text information, link information and link structure, personalized information, etc. This topical score is generated from various sources and signals other than location information. A topic score is also used to find advertisements relevant to a target document.
Locations of pizza shops around a centerpoint, mostly based upon a distance score.
Distance Score – One or more locations is determined to be associated with each of the identified documents, and a distance score is calculated for each based, at least in part, on the distance between the location(s) associated with the document and the location associated with the search query. This distance could be based upon such things as:
  • straight-line distance
  • Driving Distance
  • Estimated Driving Time
Combined relevance score – The topical scores and distance scores could be merged to yield a combined relevance score for a document. The combined relevance score may result in different ranking orders than if documents were ranked by relevance to a topic or by distance alone. How the patent application describes this:
In one embodiment of the invention, because the combined relevance score C considers both the topical score R and the distance score F of a document, it may be possible that the ordering of documents according to combined relevance scores C yields a different order than if the documents are ordered according to topical scores R or according to distance scores F. For example, consider three documents: document A, document B, and document C. Assume that document A has a topical score R1, a distance score F1, and a corresponding relevance score C1; document B has a topical score R2 (where R2>R1), a distance score F2 (where F2R1), a distance score F3 (where F3
Location extraction
Location Extraction – During a Web search, the search terms may indicate the name of a geographic area, and a local search might be done when that geographic area is unambiguous enough.
Ambiguous Search Query – The names of some geographic areas correspond to common words (e.g., Mobile), and it can be hard to tell if a searcher was referring to a location in their search.
Unambiguous Search Query – A user provided query clearly shows an intent for local search documents. A geographic reference may not be completely unambiguous if it is hard to tell which geographic location was being requested, as may happen in a search which includes a City name, but there may be more than one City with the same name.
Results shown when City name is ambiguous
Unambiguous City – If there are two cities with the same names in different states, this process may decide that the one with the largest population should be labeled as an unambiguous city. (The same may be done with counties.) Alternatively, a look at the searcher’s IP address may inform the search engine of which city was the one used in a query. Sometimes a searcher will be asked to choose which state they meant.
Blacklist – A blacklist may be maintained for unambiguous city names that, when combined with one or more other words, mean something other than their respective cities. For example, assume that the city of Orlando, Florida is an unambiguous city. When Orlando appears in a search query with the word Bloom, however, the user likely desires information associated with the actor “Orlando Bloom” and not information concerning flower shops in the city of Orlando. If the city name together with one or more other search terms of the query appear on the blacklist, then a regular web search may be performed based on the search term(s) of the query.
Authoritative document identification
Authoritative Document – The identification of a document or web page (URL) that is associated with a business at a location. This system determines documents that are associated with a location, identifies a group of signals associated with each of the documents, and determines authoritativeness of the documents for the location based on the signals.
Candidate Documents – Documents associated with a particular location, they may be analyzed to identify snippets of text (where a snippet of text may be defined as a portion of a document or the entire document) that include information associated with the location, such as a full or partial address of the location, a full or partial telephone number associated with the location, and/or a full or partial name of a business associated with the location. Links from these may point to the authoritative document. Other signals may be viewed to determine which candidate document is the authoritative document amongst the group of candidates, such as domain names, business name used in anchor text, etc.
Document segmentation based on visual gaps
Document Segmentation – A document may be segmented based on a visual model of the document. The visual model is determined according to an amount of visual white space or gaps that are in the document. The visual model is used to identify a hierarchical structure of the document, which may then be used to segment the document.
Listings on a Web page, with addresses, and visual gaps between them.
Geographic Signals – Information related to a geographic locations, such as full or partial mailing address or telephone number, or name of a business. A page may be filled with different geographical signals, which are segmented from each other by visual gaps. Example: a web page may include a list of restaurants in a particular neighborhood and a short synopsis or review of each restaurant. Or, a page may be filled with multiple reviews of the same restaurant, and segmentation may be used to separate those.
Indexing documents according to geographical relevance
Indexing by Geographical Relevance – Indexing documents relevant to a geographical area by indexing, for each document, multiple location identifiers that collectively define an aggregate geographic region. When creating the index, the search engine may determine a set of geographical areas surrounding a geographical area relevant to a document and associate references to the set of geographical areas with the document index.
Geographical Regions – With some local search engines, the local geographic region of interest is a region defined by a certain distance or radius from a starting location, such as a certain number of miles from a zip code or street address. Ideally, the local search engine should efficiently locate and return relevant results in the desired geographic region.
Location Identifiers – Documents in a database may each be associated with a geographical region. The region may be specified by a location identifier associated with the document. Location identifiers might be derived from a model of the Earth’s surface using a hierarchical grid, such as the well known Hierarchical Triangular Mesh (HTM) model.
Geographically Relevant Documents – Any document that, in some manner, has been determined to have particular relevance to a geographical location. Business listings, such as yellow page listings, for example, may each be considered to be a geographically relevant document that is relevant to the geographic region defined by the address of the business. Other documents, such as web pages, may also have particular geographical relevance. Example: a business may have a home page, may be the subject of a document that comments on or reviews the business, or may be mentioned by a web page that in some other way relates to the business. The particular geographic location for which a document is associated may be determined from postal address or from other geographic signals.
Aggregate Geographic Region – A local search engine efficiently indexes documents relevant to a geographical area by indexing, for each document, multiple location identifiers that collectively define an aggregate geographic region. When the index is used to respond to individual search queries, the aggregate geographic region may be efficiently searched by merely adding a location identifier to the search query.
Classification of ambiguous geographic references
Ambiguous Geographic References – Partial geographic information is associated with a document, which makes it difficult to classify as belonging to a specific geographical location.
Geo-Relevance Profile – A geographic location may be associated with a string of text in a document by looking at a geo-relevance profile that contains that geographic information. A geo-relevance profile is built by looking at a number of documents relating to a business at a specific location.
Known Geographic Signals – A known geographic signal may include, for example, a complete address that unambiguously specifies a geographic location. The geographic signal can be located by, for example, pattern matching techniques that look for sections of text that are in the general form of an address. For example, location classifier engine 100 may look for zip codes as five digit integers located near a state name or state abbreviation and street names as a series of numerals followed by a string that includes a word such as “street,” “st.,” “drive,” etc. In this manner, Location classifier may locate the known geographic signals as sections of text that unambiguously reference geographic addresses.
Known Geographic Regions – Documents that are determined to be associated with valid geographic signals are assumed to be documents that correspond to a known geographic region.
Training Text for Geographical Location Associations – Text selected as training text associated with a document could be chosen a number of ways. Examples: A fixed window (e.g., a 100 term window) around each geographic signal may be selected as the training text. The whole document may be selected. Or, documents with multiple geographic signals may be segmented based on visual breaks in the document and the training text taken from the segments.
Location Identifier Fields – Collected Information based upon types of geographic signals which are filled with text selected for each geographic signal. An example may be zip codes corresponding to the geographic signals.
Zip Codes – Postal codes, which can be used as a geographic signal. They tend to be particularly useful to use as an identifier for a geographic location because zip codes that are close to one another numerically tend to correspond to locations that are close to one another geographically.
Histograms – A way of mapping the occurence of strings in text selections relative to location identifiers for which the terms or phrases occur. The histogram can also be referred to as the geo-relevance profile of the term/phrase. Example: a histogram is created for the bi-gram “capitol hill.” It might include three dominant peaks, a large peak centered in the vicinity of zip code 20515, which corresponds to the “Capitol Hill” area in Washington, D.C., a relatively small peak centered in the vicinity of zip code 95814, which corresponds to the “Capitol Hill” area in Sacramento, Calif., and a moderate peak centered in the vicinity of zip code 98104, which corresponds to the “Capitol Hill” area in Seattle, Wash. While references to “capitol hill,” may involve other places, the histogram illustrates that overall, “capitol hill” tends to be used when referring to one of these three locations. Washington, D.C., which corresponds to the largest peak, can be interpreted as the most likely geographic region intended by a person using the phrase “capitol hill.”
Two examples of histograms showing the number of occurrences of the phrases 'Capitol Hill' and 'Bay Area' relative to different geographic regions.
Statistically Significant Spikes – When it appears that certain terms or phrases may be relevant to a particular geographic location, based upon their proximity to geographic location information while looking at the training text. If certain phrases tend to be tied to certain locations in a way that appears meaningful based upon number of occurences over data collected from the training text, their appearance could be said to be statistically significant.
Local item extraction
Confidence Scores – When a system identifies a document that includes an address and locates business information, that system may assign a confidence score to the business information, where the confidence score relates to a probability that the business information is associated with the address. The system determines whether to associate the business information with the address based on the assigned confidence score.
Local Item Extraction – When looking at a document, attempting to assign a location and assign confidence scores to that document by looking at the business information on the page, at terms that preceed the address to see if any are a business name, and if there are telephone numbers, whether or not the numbers are associated with that business. Landmarks associated with the business may also be identified and assigned a confidence score.
Business Information – A business name (also referred to as a “title”), a telephone number associated with the address, other information related to a business.
Yellow Pages Data – Information commonly associated with a business that is taken from a telecom directory. Some addresses may not have associated yellow pages data or possibly incorrect yellow pages data. Businesses with associated yellow pages data may be used as part of a training set used to extract location information from pages that don’t have associated yellow pages data. The documents in the training set may be analyzed to collect features regarding how to recognize business information in a document when the document includes an address.
Training Set Features – These could include such things as a distance that a candidate term is from a reference point (e.g., the address in the document), characteristics of the candidate term, boundary information associated with the candidate term, and/or punctuation information associated with the candidate term. The particular features that are useful to determine a title may differ from those features that are useful to determine a telephone number. The features may differ still for determining other types of business information.
Landmarks – Information about the location of a business, such as a postal address. This information is tied to attributes of the landmarks such as business name, telephone number, business hours, or a link to a web site or a map) in a document. In other implementations, the above processing may apply to other landmarks and attributes, such as finding the price (attribute) or a product identification number (attribute) associated with a product (landmark).
Assigning geographic location identifiers to web pages
Geographic Location Identifier – may be a partial or complete postal address, telephone number, area code, etc or any other suitable value associated with a physical geographic position, such as longitude and latitude. The geographic location identifier may be based on links, such as hyperlinks, that connect the nodes in the collection of documents – based upon a relevancy of the web documents to each other.
Geographic Relevancy Criteria – Geographic location identifiers included within web pages may be assigned to other web pages that may or may not contain that information, if certain relevancy criteria is in place. This means that web pages that either do not include geographic descriptive information or include unrefined or incomplete geographic location information could be searched or identified based on an assigned geographic location identifier. Document relevancy may be determined based on several factors, such as relative distance between documents, terminology used, and local or web site determination. Example: a home page for a Web site doesn’t contain any address information, but the site has that information on an “About us” page, a “contact page,” and a “directions” page – if certain critieria as defined in the patent application is met, then the home page is seen by the search engine as being relevant for the address information on those other pages.
Forward or Outbound Link – A link originating from a first page and leading to a second page may be called a forward or outbound link relative to the first page and indicate that the first page is a linking document.
Backlink – A link from a first page to a second page may be characterized as a backlink from the second page to the first page. A link originating from the second page and leading to the first page may be called an inbound link relative to the first page and indicate that the first page is a linked document.


By

Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More