Secure SEO

User Vote SEO Conundrum

July 2nd, 2009

I happened across an SEO conundrum related to what is actually the best ranked search result. Today I was looking up the acronym iirc - because I always forget what it means, even though I’ve seen it a thousand times and probably looked it up a half dozen times already. It’s just not a phrase I use often so I can’t retain it - too many things going on in my noggin I guess. So anyway, I searched Yahoo today and here’s what I saw:

User vote SEO conundrum
Click to enlarge.

As you can see the top three results don’t actually tell me what I need to know while staying on the SERP, while the third clearly tells me exactly the information I’m looking for. Now here’s the conundrum. If Yahoo, or any of the search engines for that matter, rely on users to click on a link to register a vote for a particular search result, you’d see that the first three rank extremely high - because the user knows the information is there behind the links, even though it would save the user time and clearly make them less frustrated if they were simply able to get the information they were looking for.

So I’d say that for the user’s benefit the fourth link is by far the best, because I really didn’t want to click through, I just wanted the information. But in the case of the first three links, I think they are far better for the websites in question. In fact, the fact that the fourth link gives away the information is extremely bad for everyone else in that list, because it reduces the overall likelihood of a click-through for the other websites as well. Interesting problem in a way.

SEO RSS feeds

April 26th, 2006

Well here is our third blog post. Looks like we are off to a good start. Don’t worry there is much to come that is equally technically relevant and interesting. Unfortunately my equipment is in storage so much of the development work I would be doing has to wait for at least a few more weeks. But to those people who wondered about the real world relevance of ETag disclosure, don’t fret, there is more to come.

Anyway, I thought as a nice gesture, for anyone who was not already very well in the know of which sites to be looking at for SEO resources, it might be nice for me to link to the RSS feeds that I personally find the most relevant and interesting. So here’s a list of SEO RSS feeds:

I hope you like the SEO RSS feeds and if you have any more to post that are worth reading please drop me a line and I’ll either add them or aggregate them into a bigger list somewhere on the site. I’ll have some more interesting information to post here in the coming weeks, so hold tight.

Apache Information Disclosure Issues or, “How to detect cloaking”

April 7th, 2006

Well, we made it to our second SEO blog post without a major hitch. This one is about an Apache issue that I was talking about that is probably one of the nastier issues out there as far as detecting SEO (Search Engine Optimization) IP cloaking from the search engine’s perspective. I doubt things will roll this fast and furious once we get some of these initial projects out of the way but thus far I am cranking away.

Anyway onto the problem. Again, putting on my black hat, I would assume based on the fact that there are so many SEO companies out there that one or two of them may be IP cloaking. Call me crazy. For anyone not in the know, IP cloaking is where you give a search engine spam (like Google or Yahoo, etc…) and real users legitimate content, or vice versa depending on the application. All this for the eventual goal of raising natural search ranking as opposed to paid advertizing. Eventually I’m going to build an ROI tool to show people why natural search is so valuable, but I digress.

Well, there are really a ton of ways to do IP cloaking but the most common under Apache are using mod_rewrite or using a ScriptAlias. First you provide a link to a search engine and then you direct it to a script to deliver different content depending on IP matching (there are lots of problems with this technique beyond this, which I’ll go into in another blog post).

Okay, so what? Google and Yahoo see something different than everyone else and they can’t tell that they’ve been duped, right? Well, sorta. While I was playing around with some server headers I came across something odd when connecting to scripts verses normal HTML files:

Normal file headers under Apache 2.0:



HTTP/1.1 200 OK

Date: Fri, 07 Apr 2006 08:46:54 GMT

Server: Apache 2

Last-Modified: Fri, 07 Apr 2006 07:52:33 GMT

ETag: “1b0979-777-a5636e40″

Accept-Ranges: bytes

Content-Length: 1911

Connection: close

Content-Type: text/html; charset=ISO-8859-1


CGI Script headers under Apache 2.0:



HTTP/1.1 200 OK

Date: Fri, 07 Apr 2006 08:26:37 GMT

Server: Apache 2

Content-Length: 2616

Connection: close

Content-Type: text/html; charset=ISO-8859-1


Well, that’s kinda interesting I guess, but the fact that the file is named “.cgi” would probably tip you off before anything else so it’s not that interesting. But then I attempted cloaking the file with something like this:



ScriptAlias /cloak.html “/usr/local/www/htdocs/cloak.cgi”


Which would give the user the appearance that they were going to an HTML file while they were actually visiting a dynamic page. This is where it gets interesting. Here is the resultant header:

ScriptAliased file headers under Apache 2.0:



HTTP/1.1 200 OK

Date: Fri, 07 Apr 2006 08:32:47 GMT

Server: Apache 2

Connection: close

Content-Type: text/html; charset=ISO-8859-1


Notice anything different from that header and the normal file? I’ll give you a hint, it’s the ETag. In particular, it’s non-existant on CGI scripts altogether. Why’s that? The ETag header as defined by RFC2616 provides the current value of the entity tag for the requested variant. In english that means that it gives you the unique value of that file being requested by performing a mathematical function on the location on the drive and the last modified date. Okay, that’s pretty interesting but let’s come back to it in a second.

Now what about mod_rewrite? Mod_rewrite is the cloaker’s tool of choice because of it’s flexibility. Let’s say you wanted to send any URLs with the word “seo” in them to a script. IE: www.whatever.com/seo or www.whatever.com/blah/seo/blah etc…. You’d use mod_rewrite simply because it is easy and scalable. Here’a an example that would do just that:

Example .htaccess file with mod_rewrite:



RewriteEngine on

RewriteBase /

RewriteRule seo /cloak.html


In the example above I am re-writing to an HTML file (the same HTML file as the very first example) not a CGI script. Now, this is a pretty good cloaking technique because again it is scalable, however it suffers a different but similar flaw to what we saw before. Here’s an example:

Mod_rewrite to original HTML file headers on Apache 2.0



HTTP/1.1 200 OK

Date: Fri, 07 Apr 2006 08:46:15 GMT

Server: Apache 2

Last-Modified: Fri, 07 Apr 2006 08:52:33 GMT

ETag: “1b0979-777-a5636e40;2bd1c700″

Accept-Ranges: bytes

Content-Length: 1911

Connection: close

Content-Type: text/html; charset=ISO-8859-1


Let’s look at those two ETag signatures side by side:



ETag: “1b0979-777-a5636e40″

ETag: “1b0979-777-a5636e40;2bd1c700″


It looks like Apache has told us two things. It has told us the the original file is the same, and it has told us that it is accessing it in a different way (in this case via mod_rewrite). But wait, there’s more. What if we use mod_rewrite to access a CGI script (the most common application for mod_rewrite for SEO cloaking anyway)? Let’s check it out:

Mod_rewrite forwarding to a CGI script headers



HTTP/1.1 200 OK

Date: Fri, 07 Apr 2006 08:28:11 GMT

Server: Apache 2

Content-Length: 1911

Connection: close

Content-Type: text/html; charset=ISO-8859-1


Okay, but does that really help us? I mean, there’s no ETag at all right? Well, yes, and that’s the exact point. Because there is no ETag on in the header and there is for a confirmed normal file, you can tell that that page is dynamically created using mod_rewrite or a ScriptAlias. But now you’re asking, “What if you don’t know if it normally has the ETag at all, or more specifically what if the entire htdocs directory is dynamic?” How about trying a file that is always there and lives outside of the htdocs directory? The Apache logo that is included with the base install inside the /icons directory definitely qualifies. By getting /icons/apache_pb.gif we see the following:

GET /icons/apache_pb.gif HTTP/1.0



HTTP/1.1 200 OK

Date: Fri, 07 Apr 2006 08:32:37 GMT

Server: Apache 2

Last-Modified: Tue, 21 Apr 2004 14:35:21 GMT

ETag: “1818d7-916-a64a7c40″

Accept-Ranges: bytes

Content-Length: 2326

Connection: close

Content-Type: image/gif


That’s even true if the .htaccess file would seem to disallow that with something extremely restrictive like the next example which tried to make anything with a slash in it redirect to cloak.cgi:



RewriteEngine on

RewriteBase /

RewriteRule “/” /cloak.cgi


The reason being, the .htaccess file lives outside of that directory. So unless the webmaster takes specific action to remove the /icons directory or remove the apache link in httpd.conf or otherwise add cloaking to all the files on the system there is a high risk of cloak detection.

And there you have it folks. Using a static file to base-line, a search engine can tell what else on your system is dynamically built and may make it more likely to be cloaking - thereby raising red flags. I tested this under Apache 2.x primarily but it should work on all forms of Apache that use the ETag header (versions 1.3.23 and later). Black-hat SEOs beware. Your mod_rewrites are vulnerable to information disclosure and the search engines of the world can tell what you are doing if this is every implemented as a detection mechanism. I wonder what Matt Cutts and Jeremy Zawodny will think of this.

Now, back to work!

IP/Header Cloaking, Redirect Tools and Apache Issues, oh my!

April 3rd, 2006

Well, here it is, my first blog post, and boy is it going to be a crappy one. I don’t have much to update other than I have gotten parts of the site up and running and it’s going pretty smoothly other than a mis-spelled domain name I accidentally purchased. Oops. Me and my spelling!

Anyway, I am working on a few SEO projects that are probably going to be worth your while to read once I get them working. Unfortuantely, my machines are still in storage from the move so I am borrowing James’ equipment for the time being. He’s being a good sport about me tweaking the web server beyond all recognition and logging millions of packets, despite the fact I am getting the impression he’d rather tell me to go jump off a bridge because he’s too busy. So for that, thank you James (minus the bridge part)!

So let me put on my black hat here while I write out this partial list of the projects I am working on:

IP/Header Cloaking: Okay, I know, cloaking has been done a thousand times before. Well, that’s true. But never like this. I’ve gotten some amazing data accrewed over the last few days, but it’s both not enough and it’s also incomplete in terms of what I am logging. So I imagine this will be a two phased project. The first phase will be a proof of concept of data in aggregate. The second will move on to more types of data by increased logging infrastructure as well as a better range of logging nodes. Stay tuned on this one.

Redirect tools: Once upon a time I invented a tool to do logging of redirect holes found in sites. After much-ado I am resurrecting that project to do better logging, increased detection engine performance, and DB backend. Stay tuned for this one, although this one will have to wait for my machines to come out of storage so I can get my old code. Not that it would take a while to re-write, but it’s only a month away, so I’ll wait and work on other things in the mean-time.

Apache issues: Randomly I came across a problem the other day with Apache that could cause some blackhat SEOs some issues if the search engines out there ever started implementing what I found. I would release it now but I want to do some more tests before I consider it ready for prime-time. Thus far it seems to be working though. I’ll probably have to wait for my machines to come out of storage for this one too, although I might be able to test on another box I have at my disposal. I haven’t decided on this one yet where it falls on my priority list.

I’m also pondering writing a program to spam a bunch of different tools to see which ones actually drive traffic. I have a feeling our “we aren’t evil” friends are basically lying when it comes to privacy, but we’ll see. That one is definitely on the back-burner since it’s 100% a theory and would require me putting a lot of spyware on a VMware install somewhere. It’ll stay a theory at least until I get a lot more time on my hands - which will doubtfully ever happen. Whelp, that’s it for now. I’ll keep this up to date with more info when I have it.