Home | Archive | Contact

Archive for the 'Scripting' Category

How to make a Twitter bot with no coding

Wednesday, September 23rd, 2009

As usual, lazy-man post overview:

With this post you can learn to make a Twitter bot that will automatically retweet users talking about keywords that you specify. You can achieve this with (just about) no coding whatsoever.

Why would you want to do this? Lots of reasons I guess, ranging from spammy to fairly genuine. Normally giving somebody a ReTweet is enough to make them follow you and it keeps your profile active, so you can semi-automate accounts and use it as an aide for making connections. That or you can spam the sh*t out of Twitter, whatever takes your fancy really.

Here we go.

Step 1: Make your Twitter Bot account
Head over to Twitter.com and create a new account for your bot. Shouldn’t really need much help at this stage.. Try to pick a nice name and cute avatar. Or something.

Step 2: Find conversations you want to Retweet
Okay, we’ve got our Twitter account and we’re going to need to scan twitter for conversations to possibly retweet. To do this, we’re going to use Twitter Search. In this example, we’re going to search for “SEO Tips”, but to stop our bot Retweeting itself you want to add a negative keyword of your botname. So search for SEO Tips -botname, likely this:

Twitter Bot

So my bot is called “DigeratiTestBot”. Hit search now, muffin.

Step 3: Getting the feed
The next thing you need to do is get the feed results, which isn’t quite as simple as you’d think you see. Twitter being a bit of a prude doesn’t like bots and services like Feedburner or Pipes interacting with it, so you’re going to need to repurpose the feed or it’s game over for you.

After you’ve done your search you need to get the feed location (top right) so copy the URL of the “Feed for this query”

Twitter Bot

Store that in a safe place, we’ll need it in a second.

Step 4: Making the feed accessible
Okay, so there’s a teeny-tiny bit of code, but this is all, I promise! You’re going to need to republish the feed so it can be accessed later on, but don’t worry – it’s a piece of cake. All we’re going to do is screen scrape the whole feed results page onto our own server.

Make a file called “myfeed.php” and put this in it:

<?
$url = "http://search.twitter.com/search.atom?q=seo+tips+-yourbotname";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
echo $curl_scraped_page;
?>

The only bit you need to change is:

“$url = “http://search.twitter.com/search.atom?q=seo+tips+-yourbotname”;”

which needs to be replaced with whatever your Twitter RSS feed that we carefully saved and stored in a safe place earlier. If you’ve already lost that URL, please proceed back to Step 3 and consider yourself a fail.

So, having completed this and uploaded your myfeed.php to your domain, you can now access the real-time Twitter results feed by accessing http://www.yourdomain.com/myfeed.php.

Step 5: Yahoo Pipes!
Now comes the fun bit, we’re going to set up most of the mechanism for our bot in Yahoo Pipes. You’ll need a Yahoo account, so if you don’t have one, get one and login and click “Create a Pipe” at the top of the screen.

This will give you a blank canvas, so let’s MacGyver us up a god damn Twitter Bot!

Add “Fetch Feed” block from “Sources”
Then in the “URL” field, enter the URL of the feed we repurposed, http://www.yourdomain.com/myfeed.php.

Twitter Bot

Add “Filter” block from “Operators”
Leave the settings as “Block” and “all” then add the following rules:
item.title CONTAINS RT.*RT
item.title CONTAINS @
item.twitter:lang DOES NOT CONTAIN EN

(You click the little green + to add more rules). Once you’ve done that drag a line between the bottom of the “Feed Fetch” box and the top of the “Filter” box to connect them. Hey presto.

Twitter Bot

Add “Loop” block from “Operators”

Add a “String Builder” from “String” and drag in ONTO the “Loop” block you just added

In the String Builder block you just put inside the Loop block, add these 3 items:
item.author.uri
item.y:published.year
item.content.content

Check the radio box of “assign results to” and change this to item.title

Great, now drag a connection between your Filter and Loop blocks. Should look like this now:

Twitter Bot

Add “Regex” block from “Operators”
Add these two rules:
item.title REPLACE http://twitter.com/ WITH RT @
item.title REPLACE 2009 WITH (space character)

Extra points for anyone who writes “(space character)” instead of using a space. Also don’t miss the trailing slash from twitter.com/

Drag a connection between Loop Block and Regex Block, then a connection between Regex and Pipe Output blocks.

Finished! Should look something like this:

Twitter Bot

All you need to do now is Save your pipe (name it whatever you like) and Run Pipe (at the top of the screen).

Once you run your pipe, you’ll get an output screen something like this:

Twitter Bot

What you need to do here is save the URL of your pipe’s RSS feed and keep it in a safe place. If you didn’t lose your RSS feed from Step 3, then I’d suggest keeping it in the same place as that.

Step 6: TwitterFeed
Almost there, comrades. All we need to do now is whack our feed into our TwitterBot account, which is made really easy with TwitterFeed.com. Get yourself over there and sign up for an account.

To set up your bot in TwitterFeed:

1) I suggest not using oauth, as it will make it easer to use multiple Twitter accounts. Click the “Having Oauth Problems?” link and enter the username and password for your TwitterBot account and hit test account details.

2) Name your feed whatever you like and then enter the URL of your Yahoo Pipes RSS that we carefully saved earlier, then hit “test feed”.

3) Important: Click “Advanced Settings” we need to change some stuff here:

Post Frequency: Every 30mins
Updates at a time: 5
Post Content: Title Only
Post Link: No (uncheck)

Then hit “Create Feed”

Twitter Bot

All done!

Have fun and please, don’t buy anything from those losers who are peddling $20 “automate this” Twitter scripts. If you really need to do it, just make it yourself or if you don’t know how leave a comment here and I’ll show you how.

Bosh.

Posted in Advertising, Black Hat, Blogging, Grey Hat, Scripting, Social Marketing, White Hat | 115 Comments

Using Twitter To Power Spam

Tuesday, March 3rd, 2009

Good afternoon and a happy square root day to you. (C’mon it’s no more made up than Valentine’s Day).

Despite my initial reservations, I’m actually finding Twitter moderately useful for content and link discovery, the trick is just really following the right people and ditching time wasters. I’m not going to bore you with a lecture on how Twitter is the next big thing, in fact I’m pretty sure we’re fast approaching the point at which Gartner’s Hype Cycles soon predict a crash of interest and disillusionment.

Twitter in the Gartner's Hype Cycle

Well, maybe, maybe not – argue it amongst yourselves, it’s not what I really want to talk about. I want to talk about…

Twitter and Spam
Although I’ve only really talked about parasite hosting indirectly, when looking at ranking factors to do with age and trust, I think it’s a point briefly worth mentioning.

I saw Quadzilla posted today about parasite hosting on twitter. Hopefully, that hasn’t eluded you, aside from other methods of finding places to parasite host all you need to look for are trusted domains that allow you to post content with little moderation. Even a basic search for Viagra shows that the #2 position is essentially a parasite hosted page on the hotfroguk directory (thanks Ryan for your dedication in trawling Viagra results).

As Quadzilla rightly points out, with Twitter being almost totally unmoderated, the sad fact is it’s going to get bombed to hell over the next 12 months by blackhat SEOs and then Google will do something about it and game over.

There are however (slightly) more legitimate uses for Twitter if you’ve got your heart set on some easy rankings.

Twitter and content generation
Content generation can be a tricky game, you can plain scrape it (not really generation :P), scrape it and spin it, you can use synonym replacement, markov chaining, or if you’re really smart – come up with your own way to do it.

There are several problems inherited with content generation, whether it’s duplicate content, poor quality or your algorithm gets skewed by internet random. I’ve seen a lot of people trying to generate websites based on data they can pull from keyword trends or “hot” trends. The problem is that most of the services give you the information you need, after the fact. The news has come, the search spike has been and you’re content generation system has given you a crummy bit of content which now has to compete with established sites with real content. Oh, and the fact nobody cares anymore.

Twitter, on the other hand is instant. It’s not uncommon for me to discover new “hot” things on twitter hours before mainstream news (i.e. authoritative sites) publish it (and days before Seth Godin makes an informed in hindsight) comment.

Without spoon feeding, I put this to you: Why not let tweeting twits find your content for you? There’s many ways you can do this:

1) There are lovely people that get this information for you. For instance: http://twitturly.com/ will give you the most tweeted links. There’s all your early breaking generic news for you, just set your cURL bot to follow those tinyurls and discover the source and scrape away.

2) If you’re in a niche, find everyone who tweets in that niche, use cURL to crawl of the links they tweet, log them to a database, use a little intelligent keyword selection to make sure their relevant, then repost.

Then of course, ping the world with your new content, break some captchas and submit to a list of social sites and drop a few links here and there. Aside from services such as Google Blog Search, which work on an almost exclusively chronological basis, you stand a good chance of getting a healthy amount of visitors since you’re one of the first few to get content up.

Added note for clarity: I’m talking about scraping titles/content from URLs you have followed from tweets – not tweets themselves. The majority of the links to new breaking / interesting stories will come inside a very small window. So if you can post this content up while there is still interest / searches and before someone has link dominance, you should even be able to give the duplicate content penalty the slip, even if you’ve 100% scraped – so you’re on a winner – you could even retweet it (:

Oh, don’t forget to jam it full or ads or something. Who cares? It’s all automated. Think of it at least as a weekend project, but don’t break Twitter, it’s growing on me (:

Posted in Black Hat, Scripting, Search Engine Optimisation, Social Marketing, Splogs | 5 Comments

CURL Page Scraping Script

Tuesday, December 16th, 2008

Using cURL and page scraping for specific data is one of the most important things I do when creating databases. I’m not just talking about scraping pages and reposting here, either.

You can use cURL to grab the HTML of any viewable page on the web and then, most importantly take that data and pick out the bits you need. This is the basis for link analysis scripts, training scripts, compiling databases from sources around the web, there’s almost limitless things you can do.

I’m providing a simple PHP class here, which will use cURL to grab a page then pull out any information between user specified tags, into an array. So for instance, in our example you can grab all of the links from any web page.

The class is quite simple – I had to get rid of the lovely indententation to make it fit nicely onto the blog, but it’s fairly well commented.

In a nutshell, it does this:

1) Goes to specified URL

2) Uses cURL to grab the HTML of the URL

3) Takes the HTML and scans for every instance of the start and end tags you provide (e.g. )

4) Returns these in an array for you.

Download taggrab.class.zip

<?php

class tagSpider
{

// set variable to hold curl instance
var $crl;

// this is where we dump the html we get
var $html; 

// set for binary type transfer
var $binary; 

// this is the url we are going to do a pass on
var $url;


// automatically executed on class call to clear variables
function tagSpider()
{
$this->html = "";
$this->binary = 0;
$this->url = "";
}



// takes url passed to it and.. can you guess?
function fetchPage($url)
{


// set the URL to scrape
$this->url = $url;

if (isset($this->url)) {

// start cURL instance
$this->ch = curl_init ();

// this tells cUrl to return the data
curl_setopt ($this->ch, CURLOPT_RETURNTRANSFER, 1);

// set the url to download
curl_setopt ($this->ch, CURLOPT_URL, $this->url); 

// follow redirects if any
curl_setopt($this->ch, CURLOPT_FOLLOWLOCATION, true); 

// tell cURL if the data is binary data or not
curl_setopt($this->ch, CURLOPT_BINARYTRANSFER, $this->binary); 

// grabs the webpage from the internets
$this->html = curl_exec($this->ch); 

// closes the connection
curl_close ($this->ch); 
}

}


// function takes html, puts the data requested into an array
function parse_array($beg_tag, $close_tag)

{
// match data between specificed tags
preg_match_all("($beg_tag.*$close_tag)siU", $this->html, $matching_data); 

// return data in array
return $matching_data[0];
}


}
?>

So that is your basic class, which should be fairly easy to follow (you can ask questions in comments if needed).

To use this, we need to call it from another PHP file to pass the variables we need to it.

Below is tag-example.php which demonstrates how to pass the URL, start/end tag variables to the class and pump out a set of results.

Download tag-example.zip

<?php

// Inlcude our tag grab class
require("taggrab.class.php"); // class for spider

// Enter the URL you want to run
$urlrun="http://www.techcrunch.com/";

// Specify the start and end tags you want to grab data between
$stag="<a href=";
$etag="</a>";

// Make a title spider
$tspider = new tagSpider();

// Pass URL to the fetch page function
$tspider->fetchPage($urlrun);

// Enter the tags into the parse array function
$linkarray = $tspider->parse_array($stag, $etag); 

echo "<h2>Links present on page: ".$urlrun."</h2><br />";
// Loop to pump out the results
foreach ($linkarray as $result) {

echo $result;

echo "<br/>";
}

?>

So this code will pass the Techcrunch website to the class, looking for any standard a href links. It will then simply echo these out. You could use this in conjunction with SearchStatus Firefox Plugin to quickly see what links Techcrunch is showing bots and what they are following and nofollowing.

You can view a working example of the code here.

As I said, there’s so much you can do from a base like this, so have a think. I might post some proper tutorials on extracting data methodically, saving it to a database then manipulating it to get some interesting results.

Enjoy.

Edit: You’ll of course need cURL library installed on your server for this to work!

Posted in Grey Hat, Research & Analytics, Scripting, Search Engine Optimisation | 21 Comments