Home | Archive | Contact
« Previous Entries

Archive for the 'Grey Hat' Category

How to make a Twitter bot with no coding

Wednesday, September 23rd, 2009

As usual, lazy-man post overview:

With this post you can learn to make a Twitter bot that will automatically retweet users talking about keywords that you specify. You can achieve this with (just about) no coding whatsoever.

Why would you want to do this? Lots of reasons I guess, ranging from spammy to fairly genuine. Normally giving somebody a ReTweet is enough to make them follow you and it keeps your profile active, so you can semi-automate accounts and use it as an aide for making connections. That or you can spam the sh*t out of Twitter, whatever takes your fancy really.

Here we go.

Step 1: Make your Twitter Bot account
Head over to Twitter.com and create a new account for your bot. Shouldn’t really need much help at this stage.. Try to pick a nice name and cute avatar. Or something.

Step 2: Find conversations you want to Retweet
Okay, we’ve got our Twitter account and we’re going to need to scan twitter for conversations to possibly retweet. To do this, we’re going to use Twitter Search. In this example, we’re going to search for “SEO Tips”, but to stop our bot Retweeting itself you want to add a negative keyword of your botname. So search for SEO Tips -botname, likely this:

Twitter Bot




So my bot is called “DigeratiTestBot”. Hit search now, muffin.



Step 3: Getting the feed
The next thing you need to do is get the feed results, which isn’t quite as simple as you’d think you see. Twitter being a bit of a prude doesn’t like bots and services like Feedburner or Pipes interacting with it, so you’re going to need to repurpose the feed or it’s game over for you.

After you’ve done your search you need to get the feed location (top right) so copy the URL of the “Feed for this query”

Twitter Bot




Store that in a safe place, we’ll need it in a second.



Step 4: Making the feed accessible
Okay, so there’s a teeny-tiny bit of code, but this is all, I promise! You’re going to need to republish the feed so it can be accessed later on, but don’t worry – it’s a piece of cake. All we’re going to do is screen scrape the whole feed results page onto our own server.

Make a file called “myfeed.php” and put this in it:

<?
$url = "http://search.twitter.com/search.atom?q=seo+tips+-yourbotname";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
echo $curl_scraped_page;
?>

The only bit you need to change is:

“$url = “http://search.twitter.com/search.atom?q=seo+tips+-yourbotname”;”

which needs to be replaced with whatever your Twitter RSS feed that we carefully saved and stored in a safe place earlier. If you’ve already lost that URL, please proceed back to Step 3 and consider yourself a fail.

So, having completed this and uploaded your myfeed.php to your domain, you can now access the real-time Twitter results feed by accessing http://www.yourdomain.com/myfeed.php.

Step 5: Yahoo Pipes!
Now comes the fun bit, we’re going to set up most of the mechanism for our bot in Yahoo Pipes. You’ll need a Yahoo account, so if you don’t have one, get one and login and click “Create a Pipe” at the top of the screen.

This will give you a blank canvas, so let’s MacGyver us up a god damn Twitter Bot!

Add “Fetch Feed” block from “Sources”
Then in the “URL” field, enter the URL of the feed we repurposed, http://www.yourdomain.com/myfeed.php.

Twitter Bot




Add “Filter” block from “Operators”
Leave the settings as “Block” and “all” then add the following rules:
item.title CONTAINS RT.*RT
item.title CONTAINS @
item.twitter:lang DOES NOT CONTAIN EN


(You click the little green + to add more rules). Once you’ve done that drag a line between the bottom of the “Feed Fetch” box and the top of the “Filter” box to connect them. Hey presto.

Twitter Bot




Add “Loop” block from “Operators”

Add a “String Builder” from “String” and drag in ONTO the “Loop” block you just added


In the String Builder block you just put inside the Loop block, add these 3 items:
item.author.uri
item.y:published.year
item.content.content

Check the radio box of “assign results to” and change this to item.title

Great, now drag a connection between your Filter and Loop blocks. Should look like this now:

Twitter Bot




Add “Regex” block from “Operators”
Add these two rules:
item.title REPLACE http://twitter.com/ WITH RT @
item.title REPLACE 2009 WITH (space character)

Extra points for anyone who writes “(space character)” instead of using a space. Also don’t miss the trailing slash from twitter.com/



Drag a connection between Loop Block and Regex Block, then a connection between Regex and Pipe Output blocks.

Finished! Should look something like this:

Twitter Bot




All you need to do now is Save your pipe (name it whatever you like) and Run Pipe (at the top of the screen).

Once you run your pipe, you’ll get an output screen something like this:

Twitter Bot




What you need to do here is save the URL of your pipe’s RSS feed and keep it in a safe place. If you didn’t lose your RSS feed from Step 3, then I’d suggest keeping it in the same place as that.



Step 6: TwitterFeed
Almost there, comrades. All we need to do now is whack our feed into our TwitterBot account, which is made really easy with TwitterFeed.com. Get yourself over there and sign up for an account.

To set up your bot in TwitterFeed:

1) I suggest not using oauth, as it will make it easer to use multiple Twitter accounts. Click the “Having Oauth Problems?” link and enter the username and password for your TwitterBot account and hit test account details.

2) Name your feed whatever you like and then enter the URL of your Yahoo Pipes RSS that we carefully saved earlier, then hit “test feed”.

3) Important: Click “Advanced Settings” we need to change some stuff here:

Post Frequency: Every 30mins
Updates at a time: 5
Post Content: Title Only
Post Link: No (uncheck)

Then hit “Create Feed”

Twitter Bot




All done!

Have fun and please, don’t buy anything from those losers who are peddling $20 “automate this” Twitter scripts. If you really need to do it, just make it yourself or if you don’t know how leave a comment here and I’ll show you how.

Bosh.

Posted in Advertising, Black Hat, Blogging, Grey Hat, Scripting, Social Marketing, White Hat | 115 Comments »

CURL Page Scraping Script

Tuesday, December 16th, 2008

Using cURL and page scraping for specific data is one of the most important things I do when creating databases. I’m not just talking about scraping pages and reposting here, either.

You can use cURL to grab the HTML of any viewable page on the web and then, most importantly take that data and pick out the bits you need. This is the basis for link analysis scripts, training scripts, compiling databases from sources around the web, there’s almost limitless things you can do.

I’m providing a simple PHP class here, which will use cURL to grab a page then pull out any information between user specified tags, into an array. So for instance, in our example you can grab all of the links from any web page.

The class is quite simple – I had to get rid of the lovely indententation to make it fit nicely onto the blog, but it’s fairly well commented.

In a nutshell, it does this:

1) Goes to specified URL

2) Uses cURL to grab the HTML of the URL

3) Takes the HTML and scans for every instance of the start and end tags you provide (e.g. < a > < / a >)

4) Returns these in an array for you.

Download taggrab.class.zip

<?php

class tagSpider
{

// set variable to hold curl instance
var $crl;

// this is where we dump the html we get
var $html; 

// set for binary type transfer
var $binary; 

// this is the url we are going to do a pass on
var $url;


// automatically executed on class call to clear variables
function tagSpider()
{
$this->html = "";
$this->binary = 0;
$this->url = "";
}



// takes url passed to it and.. can you guess?
function fetchPage($url)
{


// set the URL to scrape
$this->url = $url;

if (isset($this->url)) {

// start cURL instance
$this->ch = curl_init ();

// this tells cUrl to return the data
curl_setopt ($this->ch, CURLOPT_RETURNTRANSFER, 1);

// set the url to download
curl_setopt ($this->ch, CURLOPT_URL, $this->url); 

// follow redirects if any
curl_setopt($this->ch, CURLOPT_FOLLOWLOCATION, true); 

// tell cURL if the data is binary data or not
curl_setopt($this->ch, CURLOPT_BINARYTRANSFER, $this->binary); 

// grabs the webpage from the internets
$this->html = curl_exec($this->ch); 

// closes the connection
curl_close ($this->ch); 
}

}


// function takes html, puts the data requested into an array
function parse_array($beg_tag, $close_tag)

{
// match data between specificed tags
preg_match_all("($beg_tag.*$close_tag)siU", $this->html, $matching_data); 

// return data in array
return $matching_data[0];
}


}
?>

So that is your basic class, which should be fairly easy to follow (you can ask questions in comments if needed).

To use this, we need to call it from another PHP file to pass the variables we need to it.

Below is tag-example.php which demonstrates how to pass the URL, start/end tag variables to the class and pump out a set of results.

Download tag-example.zip

<?php

// Inlcude our tag grab class
require("taggrab.class.php"); // class for spider

// Enter the URL you want to run
$urlrun="http://www.techcrunch.com/";

// Specify the start and end tags you want to grab data between
$stag="<a href=";
$etag="</a>";

// Make a title spider
$tspider = new tagSpider();

// Pass URL to the fetch page function
$tspider->fetchPage($urlrun);

// Enter the tags into the parse array function
$linkarray = $tspider->parse_array($stag, $etag); 

echo "<h2>Links present on page: ".$urlrun."</h2><br />";
// Loop to pump out the results
foreach ($linkarray as $result) {

echo $result;

echo "<br/>";
}

?>

So this code will pass the Techcrunch website to the class, looking for any standard a href links. It will then simply echo these out. You could use this in conjunction with SearchStatus Firefox Plugin to quickly see what links Techcrunch is showing bots and what they are following and nofollowing.

You can view a working example of the code here.

As I said, there’s so much you can do from a base like this, so have a think. I might post some proper tutorials on extracting data methodically, saving it to a database then manipulating it to get some interesting results.

Enjoy.

Edit: You’ll of course need cURL library installed on your server for this to work!

Posted in Grey Hat, Research & Analytics, Scripting, Search Engine Optimisation | 21 Comments »

Blogs Worth Reading

Monday, December 15th, 2008

I’ve never done a round-up of the blogs I read before, which I guess is a bit selfish. So, in no particular order (and this isn’t a complete list) some of my favourite blogs, if you’re looking for some inspiration.

Dark SEO Programming is run by Harry. As he puts it, “SEO Tools. I make ‘em”. A great guy if you need help with coding and somewhat of a captcha guru, with a sense of humour. Definitely worth keeping up with. I wouldn’t be surprised if this guy starts making big Google waves in the next few years.

Ask Apache is a blog I absolutely love. Great, detailed tutorials on script optimisation, advanced SEO and mod_rewrite. AskApache’s blog posts are the kind of ones that live in your bookmarks, rather than your RSS Reader.

Andrew Girdwood is a great chap from BigMouthMedia I met last year (although I very much doubt he remembers that). Andrew seems to be a vigilante web bug hunter. What I like about his blog is that he is usually the first to find weird things with Google that are going down. This usually gets my brain rolling in the right direction of my next nefarious plan. ^_^

Blackhat SEO Blog run by busin3ss is always worth checking out. He was even kind enough to give me a pre-release copy of YACG mass installer to review (it’s coming soon – I’m still playing!). Apart from his excellent tools, his blog features the darker side of link building, which of course, interests me greatly.

Kooshy is a blog run by a guy I know, who.. Well I think he wants to remain anonymous (at least a little). He’s just got started again after closing down his last blog and moving Internet personas (doesn’t the mystery just rivet you?). Anyway, get in early, I think we can expect some good stuff from here. He’s already done a cool post on Pimpin’ Duplicate Content For Links.

Jon Waraas is run by.. Can you guess? Jon has something that a lot of even really smart Internet entrepreneurs are missing, good old fashioned elbow grease. This guy is a workaholic and it pays off in a big way. Apart from time saving posts on loads of different ways to monetise your site, build backlinks and flush out your competitors I get quite a lot of inspiration for his constant stream of effort and ideas. I could definitely take a leaf out of his work ethic book.

Blue Hat SEO is becoming one of the usual suspects really. If you’re here, you probably already know about Eli. Being part of my “let’s only do a post every few months club”, I love Eli’s blog because there is absolutely no fluff. He gets straight down to the business of overthrowing Wikipedia, exploiting social media and answering specific SEO questions. You’ll struggle to find higher quality out there.

SEO Book is probably the most “famous” blog I’m going to mention here. Aaron was off at a disadvantage, because to be honest, I thought he was a massive waste of space for quite a while. (I guess that’s what happens when you take your SEO youth on Sitepoint listening to the people with xx,xxx posts on there). I bought his SEO Book and for me, at least, it was way too fluffy. I’m pleased he’s started an SEO training service now as it represents much better value. I’m sure he was making a lot of money from his SEO Book, but perhaps milked it too long (like I probably would have). Anyway, I kept with his blog and I’ve been impressed with his attitude and posts. He’s done some really cool stuff, like the SEO Mindmap and more recently, a keyword strategy flowchart which would be useful for those looking to a more structured search approach. He’s also written about algorithm weightings for different types of keywords and of course has some useful SEO Tools.

Slightly Shady SEO – Great name, great blog. Although XMCP will probably take it as an insult, I’ve always regarded Slightly Shady as the blog most similar to mine on this list. Maybe it’s because I wish I’d written some of the posts he has, before he did, hehe. Again, a no BS approach to effective SEO, whether he’s writing about Google’s User Data Empire, hiding from it or site automation it’s all gravy.

The Google Cache is a great blog for analytical approaches to SEO. There are some awesome posts on Advanced Whitehat SEO and using proxies with search position trackers. I like.

SEOcracy is run by a lovely database overlord called Rob. Rob’s a cool guy, he was kind enough to donate some databases to include in the Digerati Blackbox a while back. Most of his databases are stashed away in his content club now, which is well worth a look in. He’s also done some enlightening posts on keyword research, stuffing website inputs and Google Hacking.

This is all I’ve got time for now, apologies if I’ve missed you. There may be a Part II in the near future.

Posted in Affiliate Marketing, Approved Services, Black Hat, Blogging, Digerati News, Google, Grey Hat, Marketing Insights, Research & Analytics, Search Engine Optimisation, Social Marketing, Splogs, Viral Marketing, White Hat, Yahoo | 7 Comments »

1,147 DoFollow Blogs & Forums

Tuesday, August 19th, 2008

I thought such a big update was worth a post. Pretty happy with the DoFollow search engine now – well over 1,000 blogs & forums in the index. So, get link building…

Or, if you’re smart write an app to interface with the DoFollow search engine and do it all for you (:

Oh….My…..



Posted in Blogging, Community Sites, Digerati News, Google, Grey Hat, White Hat | 12 Comments »

How To Make Money With An Automated Blog & AutoStumble

Wednesday, August 13th, 2008

Welcome to another “how to” post. If you follow the recipe here, you’ll be onto Stage 3 = Profit in no time. This ties quite nicely in with the Blackhat SEO Tools post and the AutoStumble post, for those who haven’t read them. It’s a little blackhat, but nothing to lose sleep over (hah, as if!) and this is really, really, reaaallyy easy stuff. Sitting comfortably? Let us begin..

What is the end goal?

The end-game of this post is to have a fully automated blog, which generates shitloads of traffic via StumbleUpon, referrals and Google Blogsearch. In the process, you’ll also gain loads of subscribers and generate some nice easy revenue. Once it’s built, the entire thing is just about hands free.

What you need before you begin…#
To complete this project you will need:

1) Nice clean installation of WordPress

2) The Digerati Blackhat SEO Tool Set

3) A registered copy of AutoStumble

Lets get started
I’m going assume you know the basics of setting up a WordPress blog. If not you can get more detail from Making Money With a Video Blog or if you’re totally new, check out the official WordPress documentation. So yea, if you’re that new, please RTFM.

Once you’ve got your WordPress blog installed and running, do the basics such as setting the permalinks to be the post title, so you get those little extra keywords in the URL. You’ll also need to find yourself a theme. As discussed in Making Money With a Video Blog, the layout is really, really important to get clicks on your ads. You could start with a template like ProSense, although I’ve found the click through ratio to be pretty low, but at least it’s quick. Ideally, have a hunt around so you meet the criteria of showing your content above the fold, centrally and having your ads nicely surrounded and blended in. The key here is to experiment and see what works well for you.

Plugins FTW
There’s a whole crapload of plugins that will make your life a lot easier. We’ll start off with the important one, FeedWordpress, which is part of the Digerati Blackhat SEO Tool Set if you don’t already have it.

Upload the feedwordpress folder, as usual to your wp-content/plugins directory. You’ll need to remove the 2 files from the feedwordpress “Magpie” subfolder, and put these into your “wp-includes” directory, which will overwrite some default WordPress files too. Don’t miss that step…

Once you’re installed and you’ve activated the plugin via your WordPress dashboard, you’ll have a new option on your main navigation.



Just like that. So give that a click and then go into the “Syndication Options” menu. From here you’ll be able to configure FeedWordpress to do your bidding.

You should get an option screen like this:


So lets run through these options.

1) The first thing you want to change is the “Check For New Posts” option. You’ll want to set this to “automatic”. This will go sniff your RSS feeds at an interval you specify to grab new content. You can leave it on every 10 minutes for now.

2) Make sure the next 3 boxes are checked, this will keep your feed information bang up to date.

3) You should set syndicated posts to be published immediately. This will allow you to get your content live ASAP, which is always a plus.

4) Pemalinks. This is basically when somebody clicks on the post, do they go to the original website that you er… Borrowed? The content from, or do they go so a scraped version on your site. For this example (which I’ll give the gonadless among you an ethical loophole for later), set it to “this website”.

5) I always set FeedWordpress to create new categories. I never display categories in the menu, but it gives the post a few more keywords and a bit more relevance for search. So, if someone else has gone to the effort of writing a tag, it would just be wasteful of you not to use it!

Okay, that’s set up… What exactly are we scraping?
To be honest, I’m not a big fan of people scraping content that people have sweated over. However, one thing I don’t mind doing is thieving from thieves.

You’re on the hunt for “disposable” content – generally not text based. Think along the lines of Flash games, funny videos, funny pictures, hypnomagical-optical-illusions – that kind of thing. The Internet is awash with blogs that showcase this stuff. Check out Google blogsearch and try a search like funny pictures blog. There’s hundreds of the leeching bastards showcasing other peoples pictures, videos, games and hypnomagical-optical-illusions for their website. They can hardly call it “their” content. With this ethical pebble tossed aside, we can go and grab some content.

There’s loads of ways you can hunt down potential content. You’re on the lookout for RSS feeds with this rich media. So you could try; Google Blogsearch, Technorati, MyBlogLog – basically any site that lets you search the blogosphere.

Once you’ve got the location of about a dozen or so RSS feeds, you can go to your Syndication menu again and “add a new syndicated site”. Simple matter, paste in the RSS feed location and hit syndicate. Once you’ve added them all, it “update”. Boom, shake the room, you’ve probably got a couple of hundred “new posts”.

New posts, no traffic
You want to of course, set up your WordPress RSS. Something like Feedburner is dead easy to set up and will get Google interested off the bat. Make sure you have a nice big RSS button and offer e-mail subscription (Feedburner does this) for those who don’t have a clue what the hell RSS is.

The cool thing about services like Google Blogsearch is that they’re pretty much chronologically sorted. So as long as you have a steady stream of posts, you’re guaranteed at least a trickle of traffic from long-tail searches.

Hot potato, grab and switch
If you really want to get some serious traffic, you’re going to need some “pillar” posts – content that you know for sure is strong. The easiest way to do this is to keep an eye on sites like Digg and Reddit. Check out on there what is going hot, what’s new and what’s viral. Probably the easiest thing to do is subscribe to the Digg Offbeat / Comedy RSS. This will give you constant updates on what’s upcoming.

Due to the differences in the types of people, there doesn’t tend to be as much overlap between hubs such as Digg, Reddit & StumbleUpon as you might first think. I’ve seen things go viral on Reddit and then take two or three days to make it onto the frontpage of Digg. So, you can grab content that’s going hot from one of these hubs; your proverbial “hot potato” and put in front of the nose of another audience.

Here’s where AutoStumble comes in
This is probably the easiest way to use AutoStumble. Grab your hot potato content from Digg and do a manual post on your blog. Submit this page to StumbleUpon.

AutoStumble costs £20 and is a desktop application, which allows you to automatically pool hundreds of StumbleUpon votes with other users. I.e., this is your quick way of getting your content to go viral on StumbleUpon. If you purchase and download AutoStumble, it is simple a matter of pasting in the URL you want to go viral on StumbleUpon and hitting “AutoStumble”.

A few hundred votes later. Voila. You have traffic.

The value of StumbleUpon traffic
1) The most I’ve had is just over 70,000 unique visitors over a 3 day spike from StumbleUpon. So firstly, you can generate a fairly decent bit of green from your initial CPM ad impressions and clicks on things like Adsense. (StumbleUpon users don’t tend to be as picky about clicking on ads as Diggers).

2) With this volume of traffic, you’ll likely find a few people who really like your content. You’ll get RSS / Email subscribers who will be a permanent addition to your monthly traffic (and revenue).

3) A lot of these social sites are populated with pretty tech savvy people. A lot of these people run their own blogs, forums, websites – or at least add content somewhere themselves on the web. If you get 10,000 visitors from StumbleUpon, you can expect a decent amount of lovely natural links from around the web. Links mean better website authority, better rankings, better traffic and better revenue. The value for me at least, is really long-term.

Making things easy for yourself
You’ll probably want to install some extra plugins such as:

  • WordPress Automatic Update – This will update your WordPress installation as well as plugins. Generally, it will save you a lot of time.
  • Clean Archives Reloaded – I use there on my archive page. It’s a nice way to layout all of your blog posts with clean anchortext to improve relevance with some internal linking.
  • Sitemap Generator – I don’t really bother with Sitemaps, but for those who do – saves you generating one from scratch.

Don’t forget, if you’re going to be switching content onto platforms like Digg or Reddit, make sure you have their native vote button included in the post! You want to make it as easy as possible to grab all of the votes you can. Again, personally – I don’t bother with the generic social bookmarking plugins for WordPress, as I find nobody actually seems to use them.

Oh, and before anyone chirps in trying to be clever saying “(sniffle) won’t duplicate content be an issue?” No! it won’t, fucktard! Get back in your hole. Aside from the dupe content filters being primarily built on shit, you’ll be posting mostly rich media. Google’s not too great at working out the exact content of pictures and videos… Yet. Yes, it will probably change one day in the future, and we’ll all look back on this post and laugh..At the moment, it’s not something they do well, so, well…. Ching..Ching.

Taking it one step further
This whole project should take you less than 30 minutes, from sitting down at your computer to having a fully automated blog posting and promotion system set up. If you like the idea, it would be an idea to package everything I’ve mentioned here together into your own custom install file, so you can deploy new sites in under 15minutes.

If you’re going to do this, you may as well make your cookie cutter solution as good as it can be. Hopefully, if you’re thinking down the right road you can come up with some of your own ideas to improve on these techniques (there are loads).

Why not look at only showing social voting buttons, from sites you know that your visitors actually use? Here’s some code.

Enjoy.

Posted in Adsense, Advertising, Black Hat, Blogging, Google, Grey Hat, Search Engine Optimisation, Social Marketing, Splogs, Viral Marketing | 33 Comments »