Home | Archive | Contact
Previous Entries

Archive for the 'Search Engine Optimisation' Category

Google Webmaster Accounts – Your Permanent Record

Tuesday, August 24th, 2010

In case it hadn’t crossed your mind, your Google Webmaster Tools account (and most likely other Google services) are a permanent record of your activity.

Delete a site on Webmaster Tools and it’s not really gone. Don’t believe me? Try attaching your Google Account to show stats for a site on somewhere like Digital Point and you’ll magically see site names that you deleted long ago.

What do you think happens when you keep doing dodgy stuff and all your accounts are connected on Google Webmaster Tools (or Analytics)?

I wonder how long it will be before there is a market for Google Accounts in “good standing”. I’m certainly not the only person that’s noticed launching almost identical sites on different accounts has its differences.

Posted in Google, Search Engine Optimisation | 12 Comments

Does Blackhat SEO still work?

Monday, December 7th, 2009

If anyone tells you blackhat SEO doesn’t work, get them to comment on this.

Posted in Black Hat, Search Engine Optimisation | 13 Comments

Get A Free Link In 30 Seconds

Tuesday, June 9th, 2009

Quick heads up, if you want a free link from http://www.further.co.uk/blog all you have to do is Tweet an SEO Tip to the #fseo hashtag on Twitter.

Full details here: http://www.further.co.uk/blog/Tweet-SEO-Tips-Get-A-Link-From-Us-144

Doesn’t get much easier than that.

I’m doing a lot of blogging over there at the moment (one reason why there’s fewer updates here). So if you want more solid advice (with less blackhat), I’d recommend:

How Much Is An SEO Site Audit Worth?

SEO Keyword Selection And Calculating Value

SEO For Misspellings

The Google Sitelinks Guide

Banned In Google – The Complete Guide

Posted in Search Engine Optimisation | 9 Comments

Using Twitter To Power Spam

Tuesday, March 3rd, 2009

Good afternoon and a happy square root day to you. (C’mon it’s no more made up than Valentine’s Day).

Despite my initial reservations, I’m actually finding Twitter moderately useful for content and link discovery, the trick is just really following the right people and ditching time wasters. I’m not going to bore you with a lecture on how Twitter is the next big thing, in fact I’m pretty sure we’re fast approaching the point at which Gartner’s Hype Cycles soon predict a crash of interest and disillusionment.

Twitter in the Gartner's Hype Cycle

Well, maybe, maybe not – argue it amongst yourselves, it’s not what I really want to talk about. I want to talk about…

Twitter and Spam
Although I’ve only really talked about parasite hosting indirectly, when looking at ranking factors to do with age and trust, I think it’s a point briefly worth mentioning.

I saw Quadzilla posted today about parasite hosting on twitter. Hopefully, that hasn’t eluded you, aside from other methods of finding places to parasite host all you need to look for are trusted domains that allow you to post content with little moderation. Even a basic search for Viagra shows that the #2 position is essentially a parasite hosted page on the hotfroguk directory (thanks Ryan for your dedication in trawling Viagra results).

As Quadzilla rightly points out, with Twitter being almost totally unmoderated, the sad fact is it’s going to get bombed to hell over the next 12 months by blackhat SEOs and then Google will do something about it and game over.

There are however (slightly) more legitimate uses for Twitter if you’ve got your heart set on some easy rankings.

Twitter and content generation
Content generation can be a tricky game, you can plain scrape it (not really generation :P), scrape it and spin it, you can use synonym replacement, markov chaining, or if you’re really smart – come up with your own way to do it.

There are several problems inherited with content generation, whether it’s duplicate content, poor quality or your algorithm gets skewed by internet random. I’ve seen a lot of people trying to generate websites based on data they can pull from keyword trends or “hot” trends. The problem is that most of the services give you the information you need, after the fact. The news has come, the search spike has been and you’re content generation system has given you a crummy bit of content which now has to compete with established sites with real content. Oh, and the fact nobody cares anymore.

Twitter, on the other hand is instant. It’s not uncommon for me to discover new “hot” things on twitter hours before mainstream news (i.e. authoritative sites) publish it (and days before Seth Godin makes an informed in hindsight) comment.

Without spoon feeding, I put this to you: Why not let tweeting twits find your content for you? There’s many ways you can do this:

1) There are lovely people that get this information for you. For instance: http://twitturly.com/ will give you the most tweeted links. There’s all your early breaking generic news for you, just set your cURL bot to follow those tinyurls and discover the source and scrape away.

2) If you’re in a niche, find everyone who tweets in that niche, use cURL to crawl of the links they tweet, log them to a database, use a little intelligent keyword selection to make sure their relevant, then repost.

Then of course, ping the world with your new content, break some captchas and submit to a list of social sites and drop a few links here and there. Aside from services such as Google Blog Search, which work on an almost exclusively chronological basis, you stand a good chance of getting a healthy amount of visitors since you’re one of the first few to get content up.

Added note for clarity: I’m talking about scraping titles/content from URLs you have followed from tweets – not tweets themselves. The majority of the links to new breaking / interesting stories will come inside a very small window. So if you can post this content up while there is still interest / searches and before someone has link dominance, you should even be able to give the duplicate content penalty the slip, even if you’ve 100% scraped – so you’re on a winner – you could even retweet it (:

Oh, don’t forget to jam it full or ads or something. Who cares? It’s all automated. Think of it at least as a weekend project, but don’t break Twitter, it’s growing on me (:

Posted in Black Hat, Scripting, Search Engine Optimisation, Social Marketing, Splogs | 5 Comments

CURL Page Scraping Script

Tuesday, December 16th, 2008

Using cURL and page scraping for specific data is one of the most important things I do when creating databases. I’m not just talking about scraping pages and reposting here, either.

You can use cURL to grab the HTML of any viewable page on the web and then, most importantly take that data and pick out the bits you need. This is the basis for link analysis scripts, training scripts, compiling databases from sources around the web, there’s almost limitless things you can do.

I’m providing a simple PHP class here, which will use cURL to grab a page then pull out any information between user specified tags, into an array. So for instance, in our example you can grab all of the links from any web page.

The class is quite simple – I had to get rid of the lovely indententation to make it fit nicely onto the blog, but it’s fairly well commented.

In a nutshell, it does this:

1) Goes to specified URL

2) Uses cURL to grab the HTML of the URL

3) Takes the HTML and scans for every instance of the start and end tags you provide (e.g. )

4) Returns these in an array for you.

Download taggrab.class.zip

<?php

class tagSpider
{

// set variable to hold curl instance
var $crl;

// this is where we dump the html we get
var $html; 

// set for binary type transfer
var $binary; 

// this is the url we are going to do a pass on
var $url;


// automatically executed on class call to clear variables
function tagSpider()
{
$this->html = "";
$this->binary = 0;
$this->url = "";
}



// takes url passed to it and.. can you guess?
function fetchPage($url)
{


// set the URL to scrape
$this->url = $url;

if (isset($this->url)) {

// start cURL instance
$this->ch = curl_init ();

// this tells cUrl to return the data
curl_setopt ($this->ch, CURLOPT_RETURNTRANSFER, 1);

// set the url to download
curl_setopt ($this->ch, CURLOPT_URL, $this->url); 

// follow redirects if any
curl_setopt($this->ch, CURLOPT_FOLLOWLOCATION, true); 

// tell cURL if the data is binary data or not
curl_setopt($this->ch, CURLOPT_BINARYTRANSFER, $this->binary); 

// grabs the webpage from the internets
$this->html = curl_exec($this->ch); 

// closes the connection
curl_close ($this->ch); 
}

}


// function takes html, puts the data requested into an array
function parse_array($beg_tag, $close_tag)

{
// match data between specificed tags
preg_match_all("($beg_tag.*$close_tag)siU", $this->html, $matching_data); 

// return data in array
return $matching_data[0];
}


}
?>

So that is your basic class, which should be fairly easy to follow (you can ask questions in comments if needed).

To use this, we need to call it from another PHP file to pass the variables we need to it.

Below is tag-example.php which demonstrates how to pass the URL, start/end tag variables to the class and pump out a set of results.

Download tag-example.zip

<?php

// Inlcude our tag grab class
require("taggrab.class.php"); // class for spider

// Enter the URL you want to run
$urlrun="http://www.techcrunch.com/";

// Specify the start and end tags you want to grab data between
$stag="<a href=";
$etag="</a>";

// Make a title spider
$tspider = new tagSpider();

// Pass URL to the fetch page function
$tspider->fetchPage($urlrun);

// Enter the tags into the parse array function
$linkarray = $tspider->parse_array($stag, $etag); 

echo "<h2>Links present on page: ".$urlrun."</h2><br />";
// Loop to pump out the results
foreach ($linkarray as $result) {

echo $result;

echo "<br/>";
}

?>

So this code will pass the Techcrunch website to the class, looking for any standard a href links. It will then simply echo these out. You could use this in conjunction with SearchStatus Firefox Plugin to quickly see what links Techcrunch is showing bots and what they are following and nofollowing.

You can view a working example of the code here.

As I said, there’s so much you can do from a base like this, so have a think. I might post some proper tutorials on extracting data methodically, saving it to a database then manipulating it to get some interesting results.

Enjoy.

Edit: You’ll of course need cURL library installed on your server for this to work!

Posted in Grey Hat, Research & Analytics, Scripting, Search Engine Optimisation | 21 Comments