Secure SEO

Archive for the 'Security' Category

SEOktoberfest

Wednesday, October 14th, 2009

Well, I was planning on releasing tons of information upon my return from SEOktoberfest, but frankly, I was sworn to secrecy. So you’ll have to know that I walked away with tons of knowledge, but that you’re getting none of it. I know, I’m an ass! But anyway, for my part of it, it was a lot of fun. I took my knowledge of security and applied it to the real blackhat way of thinking of SEO - not greyhat, real blackhat. I was told it was a success, but in reality, I think I could have done a much better job. Thinking about it in retrospect, I could have tailored my speech to ways of thinking of traffic in a much more whitehat way, but still apply it to SEO. Alas, maybe next year!

The fun never stopped though. I met a lot of great people - German Playmates - had a massage for the first time ever - and in general it was great to see Germany through the eyes of a VIP. Huge thanks to everyone I met there. There were lots of really amazing and helpful people. Not the least of which were guys like Brent Csutoras, Quadzilla and Mediadonis. There’s even a movie too which I am suspiciously almost entirely absent from. I promise I was there - but I ended up talking to people almost the entire time. Now you’re thinking either I’ve lost my libido completely, or those guys were really that interesting to talk to. I tend to believe it’s the ladder, personally! Thanks to everyone for putting it on!

eBay Domain Split

Monday, September 14th, 2009

I’ve worked in the industry a long long time, and so when I joined eBay many years ago and they told me they wanted to split the view-item pages in half to prevent fraud, I was pretty sure at that point that I’d have to quit my job or be fired. You see, yelling the Senior VP that the idea is totally full of holes from a technical perspective, when they’ve already committed to the project well before you were hired is pretty much the same as taking a suicide pill. Thankfully, after a few weeks of heavy duty research I was able to come in armed with enough paperwork to choke and elephant that proved that approx 22% of listings would be messed up in some way or another once that project launched. Thankfully they saw it for what it was - a bad idea.

However, I eventually left eBay a few years later and bad ideas prevailed, and yes, they did end up launching the project anyway. Even as my friends who were still there at eBay told me that they were doing it, I was warning them what a bad idea it was. And now they know why it was such a bad idea. Splitting the domains is bad in so many ways, I can hardly count them. It’s bad from a security perspective, because it breaks printing for those people who want to print out their auctions as proof of what they purchased. This is important because auctions change because images are hosted by third parties.

It’s bad from an SEO perspective because the content lives on separate domains (ebaydesc.com). It’s bad from a UI perspective because JavaScript isn’t particularly good about dealing with domains. It’s bad for people who try to suppress portions of content through the use of browser plugins (think Noscript and Request Policy) and on and on… I hate to say this, but I really did tell them so! To all those eBay fans out there who are hating this all I can say is I did my best to fight for you on this one!

List of HTTP Headers

Tuesday, September 8th, 2009

I recently polled a log file we had created containing north of half a million user requests. It’s a very interesting log, but it’s also extremely difficult to parse through the data in any meaningful way without the use of relational databases to make sense of all the data. However, one slice of the data is to look at just the HTTP headers themselves. These are the Apache sanitized versions, and not the raw data, but you can see quickly that there are some interesting patterns here, lots of typos in malformed robots, hackers and so on:

1 HTTP_ACCEPTS
1 HTTP_ACCEPT_APPLICATION
1 HTTP_ACCEPT_FONT
1 HTTP_ACCPROXYWS
1 HTTP_ACUNETIX_PRODUCT
1 HTTP_ACUNETIX_SCANNING_AGREEMENT
1 HTTP_ACUNETIX_USER_AGREEMENT
1 HTTP_ADCENTRIA_IM
1 HTTP_ADCENTRIA_IM_S
1 HTTP_CONNECTIONS
1 HTTP_CONTENT_ENCODING
1 HTTP_CONTENT_TRANSFER_ENCODING
1 HTTP_ENCODING_VERSION
1 HTTP_EOF
1 HTTP_EVE_TRUSTED
1 HTTP_EVE_TRUSTME
1 HTTP_EXPIRES
1 HTTP_EXTENSION
1 HTTP_FORWARDED_FOR_IP
1 HTTP_HTTP_FORWARDED
1 HTTP_HTTP_FORWARDED_FOR
1 HTTP_HTTP_FORWARDED_FOR_IP
1 HTTP_HTTP_PROXY_CONNECTION
1 HTTP_HTTP_VIA
1 HTTP_HTTP_X_FORWARDED
1 HTTP_IDENT_USER
1 HTTP_JOPSPFFRZP
1 HTTP_KVWJPTJQFH
1 HTTP_MATERNA_COUNTRY
1 HTTP_MINE
1 HTTP_MKSIHRFHUI
1 HTTP_NONNECTION
1 HTTP_NPFREFR
1 HTTP_N_FORWARDED_FOR
1 HTTP_PNP
1 HTTP_PQ_VERSION
1 HTTP_PYFGOEWUEQ
1 HTTP_REMOTE_ADDR
1 HTTP_REMOTE_HOST
1 HTTP_REMOVED_HEADER
1 HTTP_SOAPACTION
1 HTTP_TYPE
1 HTTP_XCCEPT_ENCODING
1 HTTP_XUBNHKKQFV
1 HTTP_X_ACCEPT_ENCODING
1 HTTP_X_APN_ID
1 HTTP_X_BMI_CA_UPSDOMAIN
1 HTTP_X_CATEGORY
1 HTTP_X_CF_NODEBUG
1 HTTP_X_COOL_JOBS_CONTACT
1 HTTP_X_DHL_USER
1 HTTP_X_DISCARD
1 HTTP_X_FINCH_IDENTITY
1 HTTP_X_FORWARD_FOR
1 HTTP_X_GGSNIP
1 HTTP_X_GOOGLE_COUNTRY
1 HTTP_X_HSP_IDENTITY
1 HTTP_X_I2P_DESTB32
1 HTTP_X_I2P_DESTB64
1 HTTP_X_IMSI
1 HTTP_X_KIELIKOODI
1 HTTP_X_LOOP_103_1031486416
1 HTTP_X_LOOP_16205_1249272000
1 HTTP_X_NAS_IP
1 HTTP_X_NOKIA_MSISDN
1 HTTP_X_NOKIA_MUSICSHOP
1 HTTP_X_NOKIA_PREPAIDIND
1 HTTP_X_POLICY
1 HTTP_X_PROXY_ISSUES_CONTACT
1 HTTP_X_PTAG
1 HTTP_X_SCANSAFE
1 HTTP_X_SCANSAFE_DATA
1 HTTP_X_SGSNIP
1 HTTP_X_SGSN_IP
1 HTTP_X_SHINDIG_DOS
1 HTTP_X_SKYFIRE_CLIENT_IP
1 HTTP_X_SKYFIRE_CLIENT_PLATFORM
1 HTTP_X_SKYFIRE_CLIENT_VERSION
1 HTTP_X_SKYFIRE_FORWARDED_FOR
1 HTTP_X_SKYFIRE_USER_ID
1 HTTP_X_SOPHOS_WSA_CLIENTIP
1 HTTP_X_SOPHOS_WSA_USER
1 HTTP_X_SOURCE_ID
1 HTTP_X_S_UNIQUE_ID
1 HTTP_X_UP_BEARER_TYPE
1 HTTP_X_UP_CALLING_LINE_ID
1 HTTP_X_UP_TELSTRA_UID
1 HTTP_X_USERNAME
1 HTTP_X_WAP_CLIENTID
1 HTTP_X_WAP_CLIENT_SDU_SIZE
1 HTTP_X_WAP_GATEWAY
1 HTTP_X_WAP_MSISDN
1 HTTP_X_WAP_NETWORK_CLIENT_IP
1 HTTP_X_WAP_SESSION_ID
1 HTTP_X_YAHOO_PROXY
1 HTTP_YWZTJJBZTR
1 HTTP__EEP_ALIVE
1 HTTP__HTTP_EVE_TRUSTED
2 HTTP_ACROBAT_VERSION
2 HTTP_BEARER_INDICATION
2 HTTP_CADCEKPASS
2 HTTP_CALLED_STATION_ID
2 HTTP_COS_NAME
2 HTTP_GRANOLA
2 HTTP_HTTP
2 HTTP_REFER
2 HTTP_SWF_HDR_MSG
2 HTTP_USERIP
2 HTTP_XID
2 HTTP_X_ACCEPT_PROGRESSIVE
2 HTTP_X_ASID
2 HTTP_X_GACELA_PROXY_ID
2 HTTP_X_HD_BC
2 HTTP_X_IGOOGLE_REQUEST
2 HTTP_X_LEOTRACE_EXTENSION_USER_ID
2 HTTP_X_MMS_PREPAID_FLAG
2 HTTP_X_RATPROXY_LOOP
2 HTTP_X_SSL_REQUEST
2 HTTP_X_S_DISPLAY_INFO
2 HTTP_X_TICKCOUNT
2 HTTP_X_UP_BEAR_TYPE
2 HTTP_X_XUTHENTICATED_USER
3 HTTP_19_PROFILE
3 HTTP_ACCESS_KEY
3 HTTP_HARMONY_TESTXX
3 HTTP_HTTP_CLIENT_IP
3 HTTP_OPT
3 HTTP_WHO
3 HTTP_X_FEEDLY
3 HTTP_X_FORWARDED_SERVER
3 HTTP_X_MSP_MSISDN
3 HTTP_X_OPENPGP_AGENT
3 HTTP_X_OPENPGP_DIGEST_ALGO
3 HTTP_X_OPENPGP_SIG
3 HTTP_X_OPENPGP_SIG_FIELDS
3 HTTP_X_OPENPGP_TYPE
3 HTTP_X_OPENPGP_VERSION
3 HTTP_X_SKYFIRE_SCREEN
3 HTTP_X_SKYFIRE_VERSION
3 HTTP_X_SWEB_DATA
3 HTTP_X_USER_TRACKING
3 HTTP_X_WELLO_VERSION
4 HTTP_ACCEPT_ENCODE
4 HTTP_ACCEPT_RUBBISH_
4 HTTP_ACCEPT_XNCODING
4 HTTP_AGENT
4 HTTP_BACKEND
4 HTTP_BEWOOPI_PRX_ENABLED
4 HTTP_DRM_VERSION
4 HTTP_HTTP_X_FORWARDED_FOR
4 HTTP_MSISDN
4 HTTP_NPSKIPPROCESSING
4 HTTP_UA_LANGUAGE
4 HTTP_XXXXXXXXXX
4 HTTP_X_AOL_AUTH
4 HTTP_X_DCMGUID
4 HTTP_X_EGZ
4 HTTP_X_FIRELOGGER
4 HTTP_X_JPHONE_COLOR
4 HTTP_X_JPHONE_DISPLAY
4 HTTP_X_JPHONE_MSNAME
4 HTTP_X_JPHONE_REGION
4 HTTP_X_JPHONE_SMAF
4 HTTP_X_MSP_AG
4 HTTP_X_MSP_CLID
4 HTTP_X_MSP_SESSION_ID
4 HTTP_X_MSP_WAP_CLIENT_ID
4 HTTP_X_MSTMP
4 HTTP_X_OPERATOR_DOMAIN
4 HTTP_X_ORANGE_ID
4 HTTP_X_ORANGE_ROAMING
4 HTTP_X_OS_PREFS
4 HTTP_X_PROCESSANDTHREAD
4 HTTP_X_SKYFIRE_PHONE
4 HTTP_X_TINYPROXY
4 HTTP_X_UP_SUBSCRIBER_COS
4 HTTP_X_UP_UPLINK
5 HTTP_MIME_VERSION
5 HTTP_WSER_AGENT
5 HTTP_X_CLIENTIP
5 HTTP_X_FCCKV2
5 HTTP_X_FILTERED
5 HTTP_X_KRONOS_SECURE_CLIENT_CONNECTION
5 HTTP_X_MDS_FORWARDED_FOR
5 HTTP_X_PL_X
5 HTTP_X_REQUEST_IDENTIFIER
5 HTTP_X_UP_TPD_ELID
5 HTTP_X_WAP_PERSONALIZATION
5 HTTP_X_WAP_PROFILE_DIFF
6 HTTP_APN
6 HTTP_RJUEPSSUOS
6 HTTP_X_LOGDIGGER
7 HTTP_NNCOECTION
7 HTTP_NROXY_CONNECTION
7 HTTP_ORACLE_ECID
7 HTTP_X_NOKIA_WIA_ACCEPT_ORIGINAL
7 HTTP___________
8 HTTP_DEPTH
8 HTTP_FRONT_END_HTTPS
8 HTTP_X_ACCOUNT_ID
8 HTTP_X_AUTHENTICATED_USER
8 HTTP_X_FCCK
8 HTTP_X_JPHONE_UID
8 HTTP_X_LOOP_2897_1250000363
8 HTTP_X_NOKIA_GID
8 HTTP_X_PALM_CARRIER
8 HTTP_X_PROFILE_ID
8 HTTP_X_UP_FORWARDED_FOR
9 HTTP_AXXEPT_ENCODING
9 HTTP_DLWEB
9 HTTP_MAX_SIZE
9 HTTP_USERNAME
9 PATH_INFO
9 PATH_TRANSLATED
9 REDIRECT_REQUEST_METHOD
10 HTTP_AAAAAAAAAAAAAAA
10 HTTP_X_NOKIA_MAXDOWNLINKBITRATE
10 HTTP_X_NOKIA_MAXUPLINKBITRATE
10 HTTP_X_UP_DEVCAP_ACCEPT_LANGUAGE
10 HTTP_X_UP_DEVCAP_IMMED_ALERT
10 HTTP_X_UP_DEVCAP_MSIZE
10 HTTP_X_WISP
11 HTTP_MUMMEL
11 HTTP_PORT
11 HTTP_PROTOCOL
11 HTTP_TM_USER_MSISDN
11 HTTP_X_WAP_PROXY_COOKIE
12 HTTP_X_EBO_UA
12 HTTP_X_UP_DEVCAP_CHARSET
12 HTTP_X_UP_DEVCAP_SMARTDIALING
12 HTTP_X_UP_DEVCAP_ZONE
13 HTTP_X_COMPRESSION
14 HTTP_ALLOWAUTOREDIRECT
14 HTTP_CNEONCTION
14 HTTP_CONTENTTYPE
14 HTTP_KEEPALIVE
14 HTTP_OAS_IP
14 HTTP_X_CACHEBUSTER
15 HTTP_X_GWA_METHOD
15 HTTP_X_NOKIA_CONNECTION_MODE
15 HTTP_X_REAL_IP
16 HTTP_A_IM
16 HTTP_DWEB_CLIENT
16 HTTP_X_D_FORWARDER
16 HTTP_X_MSISDN
16 HTTP_X_NOKIA_LOCALSOCKET
16 HTTP_X_NOKIA_REMOTESOCKET
17 HTTP_NOVINET
17 HTTP_WEFERER
17 HTTP_X_XXXXX
17 redirect-carefully
18 HTTP_AVAIL_DICTIONARY
18 HTTP_WSHOST
18 HTTP_WSIP
18 HTTP_________________
19 HTTP_OSUVA_ISTUNTOID
19 HTTP_X_XXXXXXXX
20 HTTP_ACCEPT_ENCODXNG
21 HTTP_X_NOKIA_BEARER
21 HTTP_X_NOKIA_IPADDRESS
22 HTTP_X_SDCH
22 HTTP_X_UP_DEVCAP_CC
22 HTTP_X_UP_DEVCAP_QVGA
23 HTTP_X_CNECTION
23 HTTP_X_UP_DEVCAP_MULTIMEDIA
23 HTTP_X_UP_DEVCAP_SCREENCHARS
23 HTTP_X_UP_DEVCAP_SOFTKEYSIZE
23 HTTP_X_UP_DEVCAP_TITLEBAR
24 HTTP_X_DEVICE_ACCEPT_CHARSET
24 HTTP_X_UP_DEVCAP_MAX_PDU
25 HTTP_X_DEVICE_ACCEPT_ENCODING
25 HTTP_X_FEEDBURNER_URI
25 HTTP_X_XXXXXXXXXXXXXXXXX
27 HTTP_X_NOKIA_GATEWAY_ID
27 HTTP_X_REQUESTED_WITH
28 HTTP_X_SINA_PROXYUSER
29 HTTP_REALIP
29 HTTP_X_REALIP
29 HTTP_X_VIRTUAL_IP
30 HTTP_IF_NONE_MATCH
30 HTTP_X_DEVICE_ACCEPT
31 HTTP_REFRESH_CACHE
31 HTTP_X_MOBILE_GATEWAY
31 HTTP_X_NETWORK_TYPE
31 HTTP_X_NOVARRA_DEVICE_TYPE
32 HTTP_WAP_CONNECTION
32 HTTP_XXXXXXX
34 HTTP_SSSSSSS
35 HTTP_X_UP_DEVCAP_ISCOLOR
35 HTTP_X_XORWARDED_FOR
36 PHP_AUTH_PW
36 PHP_AUTH_USER
37 HTTP_SURROGATE_CAPABILITY
37 HTTP_X_AUDIOCAST_UDPPORT
39 HTTP_ACCEPT_ENCODIND
39 HTTP_UA_COLOR
39 HTTP_UA_PIXELS
39 HTTP_UA_VOICE
39 HTTP_X_P2P_PEERDIST
40 HTTP_CLIENTID
40 HTTP_UA_OS
40 HTTP_X_NETWORK_INFO
41 HTTP_CONEX_O
41 REDIRECT_SERVER_SOFTWARE
42 HTTP_X_SLIPSTREAM_USERNAME
42 HTTP_X_UP_DEVCAP_NUMSOFTKEYS
44 HTTP_X_VIA
45 HTTP_ICY_METADATA
45 HTTP_X_PSP_BROWSER
45 HTTP_X_PSP_PRODUCTCODE
46 HTTP_TRANSLATE
46 HTTP_X_SAUCER
46 HTTP_X_TEACUP
47 HTTP_FORWARDED
47 HTTP_ORIGIN
47 HTTP_X_FORWARDED_HOST
47 HTTP_X_UP_SUBNO
52 HTTP_X_PS3_BROWSER
53 HTTP_X_UP_DEVCAP_SCREENDEPTH
53 HTTP_X_UP_DEVCAP_SCREENPIXELS
58 HTTP_X_NAI_ID
58 HTTP_X_OPENID_ANTI_PHISHING
59 HTTP_X_NOKIA_MUSICSHOP_BEARER
59 HTTP_X_NOKIA_MUSICSHOP_VERSION
61 HTTP_CUDA_CLIIP
67 HTTP_PROXY_AGENT
67 HTTP_X_VERMEER_CONTENT_TYPE
86 HTTP_X_YQL_DEPTH
86 HTTP_YAHOOREMOTEIP
86 HTTP_YAHOOREMOTEIPSIG
87 HTTP_X_TM_VIA
98 HTTP_REFERRER
103 HTTP_MT_PROXY_ID
106 HTTP_XXXXXXXXXXXXXXX
108 HTTP_X_ORIGINAL_USER_AGENT
137 HTTP_X_PAGEVIEW
138 HTTP_X_DEVICE_USER_AGENT
148 HTTP_CONTENT_FILTER_HELPER
151 HTTP_X_NOVINET
155 HTTP_X_FLASH_VERSION
158 HTTP_X_PROXY_ID
160 HTTP_X_MCPROXYFILTER
179 HTTP_PROFILE
193 HTTP_X_CEPT_ENCODING
248 HTTP_X_LORI_TIME_1
344 REDIRECT_nokeepalive
373 nokeepalive
380 HTTP________
390 HTTP_X_OPERAMINI_PHONE_UA
395 HTTP_X_ICAP_VERSION
421 HTTP_X_OPERAMINI_PHONE
422 HTTP_X_OPERAMINI_FEATURES
426 HTTP_CLIENT_IP
452 HTTP_X_WAP_PROFILE
457 HTTP_X_IMFORWARDS
475 HTTP_X_CC_LIST
516 HTTP_DATE
621 HTTP_X_COMING_FROM
626 HTTP_FORWARDED_FOR
684 HTTP________________
717 HTTP_X_PURPOSE
770 HTTP_COOKIE2
867 HTTP_MAX_FORWARDS
1201 HTTP_X_CLIENT_IP
1728 HTTP_X_AUTOPAGER
1956 HTTP_RANGE
2361 CONTENT_LENGTH
3673 HTTP_X_MOZ
9726 CONTENT_TYPE
12309 HTTP_X_BLUECOAT_VIA
14935 HTTP_UA_CPU
17016 HTTP_X_FORWARDED_FOR
23107 HTTP_PRAGMA
24218 HTTP_VIA
28111 HTTP_PROXY_CONNECTION
28267 REDIRECT_QUERY_STRING
34096 HTTP_TE
36899 HTTP_COOKIE
37229 HTTP_FROM
37484 HTTP_IF_MODIFIED_SINCE
39785 QUERY_STRING
54222 HTTP_CACHE_CONTROL
122689 HTTP_KEEP_ALIVE
175368 HTTP_REFERER
199443 HTTP_ACCEPT_CHARSET
270123 HTTP_ACCEPT_LANGUAGE
350423 HTTP_ACCEPT_ENCODING
391676 HTTP_CONNECTION
407171 HTTP_ACCEPT
505521 HTTP_USER_AGENT
526665 HTTP_HOST
526784 REMOTE_ADDR
526784 REMOTE_PORT
526784 REQUEST_METHOD
526784 REQUEST_TIME
526784 REQUEST_URI
526784 SERVER_PROTOCOL

Who knows, someone might get some value out of looking at this slice of data. If there are specific items you want more information about, just drop me a note.

Apache Information Disclosure Issues or, “How to detect cloaking”

Friday, April 7th, 2006

Well, we made it to our second SEO blog post without a major hitch. This one is about an Apache issue that I was talking about that is probably one of the nastier issues out there as far as detecting SEO (Search Engine Optimization) IP cloaking from the search engine’s perspective. I doubt things will roll this fast and furious once we get some of these initial projects out of the way but thus far I am cranking away.

Anyway onto the problem. Again, putting on my black hat, I would assume based on the fact that there are so many SEO companies out there that one or two of them may be IP cloaking. Call me crazy. For anyone not in the know, IP cloaking is where you give a search engine spam (like Google or Yahoo, etc…) and real users legitimate content, or vice versa depending on the application. All this for the eventual goal of raising natural search ranking as opposed to paid advertizing. Eventually I’m going to build an ROI tool to show people why natural search is so valuable, but I digress.

Well, there are really a ton of ways to do IP cloaking but the most common under Apache are using mod_rewrite or using a ScriptAlias. First you provide a link to a search engine and then you direct it to a script to deliver different content depending on IP matching (there are lots of problems with this technique beyond this, which I’ll go into in another blog post).

Okay, so what? Google and Yahoo see something different than everyone else and they can’t tell that they’ve been duped, right? Well, sorta. While I was playing around with some server headers I came across something odd when connecting to scripts verses normal HTML files:

Normal file headers under Apache 2.0:



HTTP/1.1 200 OK

Date: Fri, 07 Apr 2006 08:46:54 GMT

Server: Apache 2

Last-Modified: Fri, 07 Apr 2006 07:52:33 GMT

ETag: “1b0979-777-a5636e40″

Accept-Ranges: bytes

Content-Length: 1911

Connection: close

Content-Type: text/html; charset=ISO-8859-1


CGI Script headers under Apache 2.0:



HTTP/1.1 200 OK

Date: Fri, 07 Apr 2006 08:26:37 GMT

Server: Apache 2

Content-Length: 2616

Connection: close

Content-Type: text/html; charset=ISO-8859-1


Well, that’s kinda interesting I guess, but the fact that the file is named “.cgi” would probably tip you off before anything else so it’s not that interesting. But then I attempted cloaking the file with something like this:



ScriptAlias /cloak.html “/usr/local/www/htdocs/cloak.cgi”


Which would give the user the appearance that they were going to an HTML file while they were actually visiting a dynamic page. This is where it gets interesting. Here is the resultant header:

ScriptAliased file headers under Apache 2.0:



HTTP/1.1 200 OK

Date: Fri, 07 Apr 2006 08:32:47 GMT

Server: Apache 2

Connection: close

Content-Type: text/html; charset=ISO-8859-1


Notice anything different from that header and the normal file? I’ll give you a hint, it’s the ETag. In particular, it’s non-existant on CGI scripts altogether. Why’s that? The ETag header as defined by RFC2616 provides the current value of the entity tag for the requested variant. In english that means that it gives you the unique value of that file being requested by performing a mathematical function on the location on the drive and the last modified date. Okay, that’s pretty interesting but let’s come back to it in a second.

Now what about mod_rewrite? Mod_rewrite is the cloaker’s tool of choice because of it’s flexibility. Let’s say you wanted to send any URLs with the word “seo” in them to a script. IE: www.whatever.com/seo or www.whatever.com/blah/seo/blah etc…. You’d use mod_rewrite simply because it is easy and scalable. Here’a an example that would do just that:

Example .htaccess file with mod_rewrite:



RewriteEngine on

RewriteBase /

RewriteRule seo /cloak.html


In the example above I am re-writing to an HTML file (the same HTML file as the very first example) not a CGI script. Now, this is a pretty good cloaking technique because again it is scalable, however it suffers a different but similar flaw to what we saw before. Here’s an example:

Mod_rewrite to original HTML file headers on Apache 2.0



HTTP/1.1 200 OK

Date: Fri, 07 Apr 2006 08:46:15 GMT

Server: Apache 2

Last-Modified: Fri, 07 Apr 2006 08:52:33 GMT

ETag: “1b0979-777-a5636e40;2bd1c700″

Accept-Ranges: bytes

Content-Length: 1911

Connection: close

Content-Type: text/html; charset=ISO-8859-1


Let’s look at those two ETag signatures side by side:



ETag: “1b0979-777-a5636e40″

ETag: “1b0979-777-a5636e40;2bd1c700″


It looks like Apache has told us two things. It has told us the the original file is the same, and it has told us that it is accessing it in a different way (in this case via mod_rewrite). But wait, there’s more. What if we use mod_rewrite to access a CGI script (the most common application for mod_rewrite for SEO cloaking anyway)? Let’s check it out:

Mod_rewrite forwarding to a CGI script headers



HTTP/1.1 200 OK

Date: Fri, 07 Apr 2006 08:28:11 GMT

Server: Apache 2

Content-Length: 1911

Connection: close

Content-Type: text/html; charset=ISO-8859-1


Okay, but does that really help us? I mean, there’s no ETag at all right? Well, yes, and that’s the exact point. Because there is no ETag on in the header and there is for a confirmed normal file, you can tell that that page is dynamically created using mod_rewrite or a ScriptAlias. But now you’re asking, “What if you don’t know if it normally has the ETag at all, or more specifically what if the entire htdocs directory is dynamic?” How about trying a file that is always there and lives outside of the htdocs directory? The Apache logo that is included with the base install inside the /icons directory definitely qualifies. By getting /icons/apache_pb.gif we see the following:

GET /icons/apache_pb.gif HTTP/1.0



HTTP/1.1 200 OK

Date: Fri, 07 Apr 2006 08:32:37 GMT

Server: Apache 2

Last-Modified: Tue, 21 Apr 2004 14:35:21 GMT

ETag: “1818d7-916-a64a7c40″

Accept-Ranges: bytes

Content-Length: 2326

Connection: close

Content-Type: image/gif


That’s even true if the .htaccess file would seem to disallow that with something extremely restrictive like the next example which tried to make anything with a slash in it redirect to cloak.cgi:



RewriteEngine on

RewriteBase /

RewriteRule “/” /cloak.cgi


The reason being, the .htaccess file lives outside of that directory. So unless the webmaster takes specific action to remove the /icons directory or remove the apache link in httpd.conf or otherwise add cloaking to all the files on the system there is a high risk of cloak detection.

And there you have it folks. Using a static file to base-line, a search engine can tell what else on your system is dynamically built and may make it more likely to be cloaking - thereby raising red flags. I tested this under Apache 2.x primarily but it should work on all forms of Apache that use the ETag header (versions 1.3.23 and later). Black-hat SEOs beware. Your mod_rewrites are vulnerable to information disclosure and the search engines of the world can tell what you are doing if this is every implemented as a detection mechanism. I wonder what Matt Cutts and Jeremy Zawodny will think of this.

Now, back to work!

IP/Header Cloaking, Redirect Tools and Apache Issues, oh my!

Monday, April 3rd, 2006

Well, here it is, my first blog post, and boy is it going to be a crappy one. I don’t have much to update other than I have gotten parts of the site up and running and it’s going pretty smoothly other than a mis-spelled domain name I accidentally purchased. Oops. Me and my spelling!

Anyway, I am working on a few SEO projects that are probably going to be worth your while to read once I get them working. Unfortuantely, my machines are still in storage from the move so I am borrowing James’ equipment for the time being. He’s being a good sport about me tweaking the web server beyond all recognition and logging millions of packets, despite the fact I am getting the impression he’d rather tell me to go jump off a bridge because he’s too busy. So for that, thank you James (minus the bridge part)!

So let me put on my black hat here while I write out this partial list of the projects I am working on:

IP/Header Cloaking: Okay, I know, cloaking has been done a thousand times before. Well, that’s true. But never like this. I’ve gotten some amazing data accrewed over the last few days, but it’s both not enough and it’s also incomplete in terms of what I am logging. So I imagine this will be a two phased project. The first phase will be a proof of concept of data in aggregate. The second will move on to more types of data by increased logging infrastructure as well as a better range of logging nodes. Stay tuned on this one.

Redirect tools: Once upon a time I invented a tool to do logging of redirect holes found in sites. After much-ado I am resurrecting that project to do better logging, increased detection engine performance, and DB backend. Stay tuned for this one, although this one will have to wait for my machines to come out of storage so I can get my old code. Not that it would take a while to re-write, but it’s only a month away, so I’ll wait and work on other things in the mean-time.

Apache issues: Randomly I came across a problem the other day with Apache that could cause some blackhat SEOs some issues if the search engines out there ever started implementing what I found. I would release it now but I want to do some more tests before I consider it ready for prime-time. Thus far it seems to be working though. I’ll probably have to wait for my machines to come out of storage for this one too, although I might be able to test on another box I have at my disposal. I haven’t decided on this one yet where it falls on my priority list.

I’m also pondering writing a program to spam a bunch of different tools to see which ones actually drive traffic. I have a feeling our “we aren’t evil” friends are basically lying when it comes to privacy, but we’ll see. That one is definitely on the back-burner since it’s 100% a theory and would require me putting a lot of spyware on a VMware install somewhere. It’ll stay a theory at least until I get a lot more time on my hands - which will doubtfully ever happen. Whelp, that’s it for now. I’ll keep this up to date with more info when I have it.