Author Archive
A Sneak Preview of the London Pro SEO Seminar 2010
Posted by willcritchlow
I’d like to invite you to the London Pro SEO seminar we are running on the 25th and 26th October.
With a 98% satisfaction rating and 95% of attendees recommending it to others, we were blown away by the positive feedback we received last year. Seasoned pros like Rob from Easynet Connect called it the "highlight of the year" and we have plenty more where that came from (see for example: "a must attend event for any SEO professional" –PricewaterhouseCoopers). This year we are aiming to put together something even better.
Just like last year, we expect it to sell out, so if you are already sold on coming, I would recommend booking now.

The Details:
Where: The Congress Centre in London’s West End
When: October 25th and 26th
Price: £699 +VAT
(plus optional super-exclusive breakfast with the speakers @ £149 <– this is very nearly sold out so be super-quick if you want to come to this)
Book: now!
If you are an SEOmoz PRO member, you can get access to special pricing by using the code in the discount store - making it a steal at £499 +VAT / person.
Get your ticket now!

If you haven’t been before and don’t know anyone who has, then I figure I need to do a little more to convince you. Here are three big reasons to come:
#1: just look at this schedule
The full schedule (note: of course with two months to go, it’s possible some of the details of this could change) will look something like this:
- Dueling laptops: Live site reviews
- Speakers: Rand Fishkin, Tom Critchlow and Stephen Pavlovich (of Conversion Factory)
- Live and in-person, a simultaneous review of on- and off-site factors as well as conversion rate tips for members of the audience
- Site architecture and faceted navigation
- Speaker: Duncan Morris
- With design patterns like “mega nav” and filtered searching becoming more popular, how can you offer all the options you want without descending into duplicate content madness?
- Head to head: Reputation management in a real-time world
- Speakers: Will Critchlow vs. Rand Fishkin
- Although the big vote will be saved for the end of the second day (see below), we thought there might be a few bragging rights to be won on one of the trickier topics in the line-up. Reputation management is rapidly changing and difficult. Presenting about it is hard both because of those factors and because case studies tend to be highly secret. We’ll be egging each other on to share our best ideas here.
- Advanced linkbuilding
- Speaker: Wiep Knol
- One of the world’s leading linkbuilders letting us inside real campaigns and showing us what actually works. ‘Nuff said.
- Overcoming Twitter’s cannibalisation of the link graph
- Speaker: Rand Fishkin
- Rand has written about the problems introduced for SEOs by the changing user behaviour that has us all creating content and links on social media sites rather than on our own blogs. What should you do about it? Good question - let’s hope Rand has the answer.
- How to hire SEOs
- Speaker: Tom Critchlow
- You might have noticed Distilled growing quite a bit recently. Tom and Duncan have been honing the interview process, whittling candidates down and learning where to advertise for the best results. Whether you are looking for a job or hiring, you will learn the tricks of the trade that have worked for us.
- SEO vs. Google
- Speaker: Dave Naylor of Bronco
- With Google increasingly ranking their own features and properties within organic search, you might find the SERPs you care about so full of universal stuff, you can’t find the organic results. If you find yourself in that situation you need the kind of creativity that only Dave can bring.
- Sexing up your reports
- Speaker: Will Critchlow
- Doesn’t sound that interesting, but if you have a boss or clients, I promise this will be one of the more actionable sessions of the two days.
- Integrating development and SEO
- Speaker: Alex Craven of Bloom Media
- Bloom build and run websites for some of the largest online brands in the UK. Alex is going to cover the details of how they integrate SEO deep into that process, deal with customer sign-off and change requests without screwing things up and generally make us all better at our jobs.
- Data journalism
- Speaker: Russell Smith of The BBC(!)
- This is what your pansy infographics would be like if they had the might of the BBC behind them. When you’re under that kind of scrutiny, you learn a load of lessons. The shortcut to learning those lessons is to get out of bed on the second day and come hear Russell speak.
- Keyword research - the ultimate process
- Speaker: Richard Baxter of SEOgadget
- I think this is the session I am personally most excited about. Rich is truly a king of the keywords. He has never even told me the full details of his in-depth keyword research process, but he’s going to “give away the farm” (his words).
- Understanding your competitors keyword, link and content strategies
- Speaker: Sam Crocker of Distilled
- Being surrounded by these kind of industry veterans could be pretty intimidating. If you saw Sam speak at SMX London, you know that he’s not scared. Backed by the insights of the full Distilled consulting crew, there’s going to be some great stuff in this one.
- Top 10 tips
- Email
- Speaker: Tamara Gielen
- We thought we should break up the SEO geekfest a little with some nuggets from related fields. The first of these comes from Tamara who brings email marketing insights from eBay and is going to share tips and tricks you can use in SEO.
- CRO
- Speaker: Stephen Pavlovich
- Next up is Stephen whose definitive CRO guide on the SEOmoz blog was one of the most popular of the year so far.
- Design (TBC)
- Email
- The maths of SEO
- Speaker: Ben Hendrickson of SEOmoz
- I introduced him last year as the smartest guy in the room and I fully anticipate that being true again this year. Ben is the guy behind the moz-metrics and he is going to be walking us through Latent Dirichlet Allocation and (hopefully) how an understanding of it can make you better at your job.
- Building the perfect analytics account
- Speaker: Will Critchlow
- I have given dozens of presentations on analytics, but it’s only at our own event that I get the freedom to pull everything together and present my views on the metrics that are important, and those that aren’t, the hacks you need and the ones you don’t.
- How lessons from sales can make you a better SEO
- Speaker: Caitlin Krumdieck of Distilled
- Caitlin is Head of Sales at Distilled, but this session won’t have a single second of pitching in it (if I can keep her under control). Instead, Caitlin will be sharing her skills and insights into how sales skills can help you win budget, convince sceptical development teams and build links. I think I used to do OK running new business at Distilled. Caitlin is better than me. You’ll want to hear her speak.
- SEO in competitive niches
- Travel
- Speaker: Richard Baxter
- Alongside the keyword insights he’s going to be sharing, we thought it was only appropriate that Rich went back to his roots and shared some of his thoughts and experiences of the uber-competitive travel sector.
- Three other sessions with exact details TBC but including insights from Jane Copland of Ayima, Patrick Altoft of Branded3 as well as Martin MacDonald from Seatwave (MOGmartin here on SEOmoz)
- Travel
- Big budget linkbuilding: head to head
- Speakers: Will Critchlow vs. Rand Fishkin
- What can you do to build links if you have lots of money? You can buy them, obviously… Is that going to be all we come up with? You have to come along to find out. You have to figure Rand’s going to bring his A game - he must really want to win one by now…
#2 See Rand get whupped (again)(*)
One of the things we do to get the quality as high as possible is to look at the list of subjects, pick the subject that we thought might be hard to make interesting but that we felt was important to cover and make it a presentation-off (hence the Reputation Management session). By making it competitive, we make sure we put in twice as much effort.
Rand and I have now gone head to head twice. The first time I beat Rand, he helped me make the announcement to a packed lecture hall that my first child was on the way by giving me a present of a babygro saying "my dad beat Rand Fishkin":

My daughter loves wearing it and wore it when she met Rand for the first time some months later:

(*) yeah, I’m 2-0, but only in London - the first US test comes in just over a week at the Seattle seminar. I’m nowhere near as confident about either of them as my bluster makes it sound.
#3 The focus on adding value for experts
If you feel like you know most of the stuff being said at most conferences (or could even give most of the presentations), then the Pro seminar is for you. I learnt loads last year and envisage learning even more this year. I encourage our speakers (none of whom are slouches when it comes to this stuff) to do new research, learn new things and come up with interesting stuff to say that no-one has heard before.
I’m pretty sure that people who came last year will back me up on this in the comments below.
Bonus #4 Networking and fun
Exclusive breakfast with the speakers (if you book early enough), a party for everyone at The Circus (no, not clowns on unicycles - seriously - check out their photo gallery), no expo hall, no multiple tracks, loads of chances to pick the brains of the top minds in SEO - both speakers and attendees.
Were you paying attention? Here’s the details again.
The Details:
Where: The Congress Centre in London’s West End
When: October 25th and 26th
Price: £699 +VAT
(plus optional super-exclusive breakfast with the speakers @ £149 <– this is very nearly sold out so be super-quick if you want to come to this)
Book: now!
If you are an SEOmoz PRO member, you can get access to special pricing by using the code in the discount store - making it a steal at £499 +VAT / person.
Get your ticket now!
A big part of the value of the event is the quality of the other attendees you get to meet, so if you’re coming, please do spread the word (why not start with Twitter!). I hope to see you there.
If you can’t make it out to London, I understand that there are still a handful of tickets available for the Seattle seminar where both Tom and I will be speaking in a little over a week. We’re in town for almost a week so hopefully we’ll see some of the rest of you there.
In the meantime, normal service will be resumed with some less promotional posts
API and Dataset Cheatsheet - Building Quick & Dirty Tools
Posted by willcritchlow
I recently wrote a post on hacking together a linkbuilding tool where I set myself a challenge of learning a bunch of new technologies in 2 hours in order to be able to build a basic linkbuilding tool. I learnt just enough YQL, xpath, Python and Google App Engine to do the job. Since then I’ve put this to use in at least one tool that’s actually helping me and my team do our jobs better.
Inspired by this (and encouraged by Kate Morris, a recent addition to the Distilled team), I started putting together a cheatsheet of the basic YQL and xpath I had learnt. In the end, it turned into that plus inspiration of APIs and datasets that could make great starting points for tools (either for research or for creating linkworthy content):
Download it: API and data cheatsheet
Or link to it: API and datasource cheatsheet [PDF]:
<a href="http://www.seomoz.org/blog/api-and-dataset-cheatsheet-building-quick-dirty-tools">API and datasource cheatsheet</a> [<a href="http://www.distilled.co.uk/blog/wp-content/uploads/2010/05/api-data-cheatsheet.pdf">PDF</a>]
Or tweet it!
I wanted to create the kind of thing that I’d find useful to have around for inspiration and quick memory-jogs. So I focused on three areas:
Sources
APIs
I have been enjoying digging through Programmable Web to find great APIs that do cool things. The two I’m currently most excited about are:
- Face.com - just for pure awesomeness. I haven’t actually tried it yet, but a face recognition API? Are you kidding me?
- Alchemy - for the time-saving ability of extracting visible text from a page. This is the kind of thing I don’t want to have to code myself for sure.
Data sources
In addition to tools that do cool things, sometimes you need input data. Some of the APIs are designed to give you data, others manipulate data, but sometimes you just need that raw data. In addition to being one of the coolest names around (maybe I’m just a sucker for chimps), infochimps, which catalogues data sets around the web, is perhaps also one of the coolest sites on the web. With everything from the 1,000 most frequently used English words to Trst Rank for Twitter users [data] (check out their big datasets if you really want to get your hadoop on).
Magic
As I discussed in my last post, I’m not a developer. My code is testament to that. I therefore love stuff that makes my life easier. Re-using work that other smart people did was cheating at school, but is a hugely valuable life skill when you are actually trying to get real stuff done. There are a small number of bits of syntax for YQL and xpath that I keep needing to look up, so I included them in the cheatsheet.
Horsepower
You could do all this stuff yourself. Or you could get a computer to do it. The final column outlines the tools I have used to for different kinds of tasks:
- Mozenda: best for one-off site scraping and rapid proof-of concept
- 80legs: best for rapid development of well-defined tasks
- Google App Engine: best for combinations of ease-of-use and flexibility. Great for accessing APIs. Better for beginners than:
- Amazon Web Services: best for experts and production code
Sometimes things just have to be done by humans, but that doesn’t mean it necessarily has to be you doing it. I have included some links to my favourites, but Rand’s post on outsourceable SEO tasks is the place to start reading for an introduction.
Inspiration
One of the sources of inspiration for this post has been reading on DataWrangling about the work of Peter Skomoroch who is a research scientist at LinkedIn (and whose delicious links are included in the cheatsheet). I love this presentation on the creation of TrendingTopics.org:
If you liked this, I’d love a tweet or a link: API and datasource cheatsheet [PDF]:
<a href="http://www.seomoz.org/blog/api-and-dataset-cheatsheet-building-quick-dirty-tools">API and datasource cheatsheet</a> [<a href="http://www.distilled.co.uk/blog/wp-content/uploads/2010/05/api-data-cheatsheet.pdf">PDF</a>]
Technorati Tags
How I Get Things Done - And How You Can Too
Posted by willcritchlow
Warning - this is a more personal post and one that isn’t about hardcore SEO tactics. Despite this, I think that I have some useful lessons to share and I hope you find it useful. For thoughts on similar principles but more from an SEO project perspective rather than an individual efficiency angle, you could read project management for SEO.
How I get things done as founder of an SEO agency
After becoming a dad 6 weeks ago, I have been trying desperately to squeeze efficiency gains out of my day. Just before my daughter was born, I was stretching my day out and was regularly at my desk by 8am and still there well after 7pm at night plus working in the evenings and at weekends. Many of you may work even longer hours than this. I don’t think it’s uncommon among business owners. In a previous life, I worked in management consulting and the hours were brutal. I always wanted to build a company that didn’t rely on long hours, but somehow even (generally) succeeding at that didn’t stop me working long hours.
As a general rule, I was fine with it and not coping too badly.
However
I don’t want to be the kind of dad that is never home. I want to be there for bathtime.
But I also don’t want to compromise my ambition. I don’t want Distilled to suffer and I don’t want to hold up or let down my team or our clients.
So, I’m left with finding a few hours a day of ‘efficiency savings’. I need to get better at what I do and more efficient at how I do it. It’s not like I wasn’t trying before, but now it’s serious. Since I’ve been putting so much effort into it, I thought I would also share with you some of my biggest wins and the tools, tips and tricks that help me get things done. I hope they are helpful whatever your role in the SEO process - certainly I have been sharing some of this stuff with the team here at Distilled and I also think that there is a lot of scope to use these same techniques and ideas in in-house roles.
Overarching principles
Everything is based on Getting Things Done (GTD). To my mind, this is all about your mind. Specifically, it’s about finding the things your mind is bad at, and seeking ways to fix it. Here are some key ways my mind is broken - maybe yours is too:
- I’m easily distracted by did you see this kitten?
- I can remember phone numbers from 15 years ago and forget to return a phone call despite a post-it note stuck on my monitor
- ‘Background thinking’ tasks (like trying to remember a list of priorities) make me massively less efficient at whatever I’m actually doing
As a result, I like the fact that GTD builds habits and systems that enable me to:
- Get all the trivial stuff out of my head…
- …and into a trusted system where I know I will find it again…
- and delineate my time away from distractions of new information (like email)
If you have never tried it, it’s hard to describe how powerful a trusted system is. If you trust yourself to work from your todo list and therefore you manage even your most important tasks that way, you free up so much of your mental capacity. I have been guilty in the past of having a todo list, but then using post-its for important stuff. But then I stopped checking my todo list because I never got through the post-its. Similarly, if you try to empty your inbox, but leave really important email in there so that you don’t miss it, you’re doing it wrong.
Email - inbox zero
Many of you will have heard of the concept of inbox zero. I highly recommend watching Merlin Mann’s talk at Google on the subject:
While I’m recommending Merlin’s stuff, do your colleagues a favour and have a read of his article about writing good email and check out his talk on time and attention (something I need to do more thinking about).
I only really got the inbox zero thing when I realised a few crucial elements:
- Getting your inbox to zero is not the same as having no email left to deal with (that would be crazy)
- There is huge value to separating the emails you are working on from the place new emails arrive
- Move tasks out of email onto your todo list when you can
Oh, and keyboard shortcuts are your friend (gmail is excellent for this).
Working on the right things
There is a danger with efficiency systems that you just work faster and faster on trivial things. I have two ways of trying to make sure I’m working on important stuff:
- A weekly review: which involves running through all the stuff I want to achieve and adding tasks to my todo list (balancing the trivial urgent things that tend to accumulate on there). At the same time, I try to clear out the cruft - removing irrelevant things and modifying deadlines where needed
- Daily prioritisation: I have recently started ending my working day with a checklist (checklists rock - see Tom’s post) - the last item on which is "list 5 tasks to do tomorrow"
Apparently [I haven't found a solid reference - link available if you supply one] Charles Schwab once happily paid a fortune for the following efficiency advice:
Each evening, pick six things to do tomorrow. Then do those things.
Supposedly it changed his life. I started with five things (because I mis-remembered that advice, actually) but even so, it’s already making a difference after only a couple of weeks. My record isn’t perfect - only on special days have I managed to do all 5 things, but it keeps me focussed on the things I want to get done.
Delegation and management
It’s all well and good having your own todo list in order. But hopefully you don’t do everything yourself. You probably delegate things, ask others to do things and rely on other people managing their own time and todo list.
Teams work best when everyone takes responsibility for their own performance and their own job - but that doesn’t mean you should just assume they will. I used to think that following up on things you had asked others to do was micro-management. But I’m coming round to the view that actually, that’s just management.
Just as with managing my own tasks, I have found that there are really two critical steps to doing this effectively:
- Note them down somewhere I trust
- Review that list
Building both habits is hard. In the past, I thought it best to build #1 first (as there is no list to check until you are writing things on it). In a similar way to the "important task" dilemma described above, however, I found myself never putting things on the list because I wasn’t in the habit of checking it. So recently I have flipped this on its head. I’ll let the noting down take care of itself but I will religiously check the list at the end of every day.
When I say list, I actually mean two things:
- A list in Remember The Milk (see below)
- A tag in gmail (’followup’) - also more on this below
My tools
My basic toolkit is pretty simple:
- Laptop
- iPhone
- Moleskine notebook and nice pen [this is an area where I find that if I make it fun to write notes, I'm more likely to write them]:

At Distilled, we recently switched over to Google Apps for our domain which means that we finally have gmail. Suffice it to say I’m a big fan - the search is revolutionary to those used to Thunderbird / Outlook (especially as the IMAP access means we aren’t forcing the web interface on anyone). My little tweaks so far:
- Turn on keyboard shortcuts (then use "?" to view a list of available shortcuts)
- Turn on labs
- Turn on "send and archive". A small tweak, but an awesome one.
- Turn off unread counts for inbox - avoid having it bug you to check new email
- I would use "multiple inboxes" to show my followup and starred email on the same page but it currently conflicts with:
- Rememberthemilk addon (I use the Chrome version) - more on RTM below
- Create a follow-up filter so that when you email yourname+followup@example.com it is tagged for followup later
Most of the software I use is either web-based or has applications for most desktop OSs and mobile platforms. I’m on Windows 7 (and iPhone). Screenshots below from the iPhone:
Remember The Milk

I have tried a few todo-list-management tools and nothing beats Remember The Milk in my opinion. I don’t use all its coolness, but do rely on:
- A few lists:
- Work
- Personal
- Delegated
- Projects
- GTD - a "meta" list to remind me to run through my checklists weekly etc.
- Some tags - particularly tagging tasks I have delegated with the name of the delegatee - making it easy to group them for follow-up
- The ability to email tasks into the system
- The gmail addon mentioned above
Evernote

I use Evernote (which essentially syncs text, photo and voice notes across devices) for two main tasks:
- Capturing stuff to process later (quick notes, web pages, photos) - especially on the notes front this is handy if I don’t have pen and paper on me - I almost always have my phone
- Managing my daily list - again because I have it with me all the time, this is where I keep my ‘daily’ list of 5 things along with my end of day checklist
I also have it set up so that I can email stuff into it much like rememberthemilk.
Incidentally, I’m only scratching the surface of what you can do with Evernote. @swerveball pointed me to this excellent article from Ruud Hein on implementing GTD with Evernote but I have to admit I’m a bit scared of it.
Supporting tools
There are a few tools that aren’t strictly for getting things done, but nonetheless help me be more efficient:
Salesforce
.jpg)
We use Salesforce for keeping track of contacts, prospects and leads (as well as for managing much of the contract process - but that’s another post). I use the iPhone app to keep access to thousands of contacts wherever I am. I particularly love the fact that the screen above has a button to ‘clone’ Rand.
Readitlater

I mentioned earlier that I am easily distracted. One of the tools I use to manage that is Read It Later - which is a combination of Firefox plugin and iPhone app (integrating with Echofon for Twitter on the iPhone) that means I can save interesting articles to read later. This means that (most days) I don’t get overwhelmed by the stream of great stuff to read and can capture it in one place to read when it’s convenient to me rather than when I should be working on that document.
Dropbox
In the interests of being able to work everywhere and grab important documents even when mobile, I am increasingly loving Dropbox. That link is clean, this one gets me an extra 256Mb of space - I hope no-one minds that little semi-affiliate link-drop [Update - apparently if you use my referral link, not only do I get more space, but you get more space on your free account too - which is super-cool]. Syncronising files between computers and having a mobile app - it’s pretty damn close to magic.
My checklists
Finally, I have mentioned checklists a couple of times. Here are the two most important ones I use:
Daily
- Review internal emails - gmail search: (label:starred OR label:inbox) AND (from:distilled.co.uk OR from:distilledconsulting.com)
- Follow up delegated tasks
- Remember The Milk list
- Gmail label:followup
- Remember The Milk list
- Delay urgent starred / todo items that can’t happen today
- List 5 things to do tomorrow
Weekly
- Review open projects and create next tasks
- Clear inboxes –> todo list
- Rememberthemilk
- Evernote
- Notebook
- Review all todo tasks
- remove unnecessary ones
- create / adjust deadlines where necessary
Incidentally, it’s not all about working smarter. I wish it were. There is still a large degree to which it’s sacrifice and hard work. I am writing this post at 11.30pm while my daughter sleeps beside me. I sleep a lot less than before (even factoring in the baby effect) but somehow (at least for now), it’s all OK.
Also, if you are waiting for an email reply from me, sorry! None of this makes me perfect, but imagine how bad I’d be without it!
Technorati Tags
Futuristic Ways of Creating Automated Link Building Tools
Posted by willcritchlow
Rand recently asked you all for feedback about improving the blog. The two areas that you asked us to write about more were linkbuilding and tools. In a shameless populist move, I thought I’d write a post about tools for automating (bits of) linkbuilding.
Recently, I keep coming across ways that we are actually living in the future. I don’t mean the jet-pack-wearing-holidaying-in-space-hover-car future, more the holy cow, you can actually run select * from internet where… kind of future.
Yes, I know it’s not as cool.
I believe that technical skills are important in SEO. There are plenty of non-technical roles in SEO agencies or teams (especially on the creative side of things) but if you have ambitions to lead teams, set strategy and run SEO projects, you kinda need to understand how the internet works under the covers. For me, that means knowing how to build stuff - even though I’m not a developer and should never be let near production code, I like to understand the concepts and principles. To keep on top of things, this means occasionally getting my hands dirty and building stuff. It’s fun. I can highly recommend it.
In order to bring you something useful and actionable, I decided to pick something simple that my team wanted, something to help with linkbuilding, but also something I could put together relatively quickly. I chose to build a prototype tool for monitoring the web for mentions of a website that don’t link to that website. Hopefully it’s pretty clear how this could be helpful - but just to give one example - if you are running a PR campaign, you may well get coverage that doesn’t link to you, but if you just drop the journalist a line straight after publishing, they can often get a link included. For those of you who think better in pictures, here is a diagram of what I mean, with my limited drawing skillz:

I recently wrote about some moderately technical tools (e.g. Mozenda, Smartsheet) in my post on data visualization techniques. The tools I’m going to cover today are even more technical and advanced - but they are also infinitely more flexible. I don’t want to scare you into thinking this is something you can’t do yourself though. I learnt all the techniques and tools below and finished my mini-project from scratch in 2 hours. In fact, it’ll take me longer to write the post than it did to do everything in it. If I can do it, so can you.
At 1pm UK time on Thursday 15th April, I tweeted this:
Why those particular tools? Well:
- xpath allows you to navigate and select elements and attributes from an XML document (including HTML). This gives you a really simple way of pulling information out of HTML pages
- How this helps my mini-project: I get a straight-forward way of pulling all links out of a page in order to check whether the page in question links to you
- YQL (Yahoo! Query Language) is the select * from internet where … magic I referred to earlier. It provides an API that you can use to grab pages, RSS feeds and a whole load of other cool stuff
- How this helps my mini-project: with one line of code, I can grab RSS feeds of mentions (for my proof-of-concept, I used a Google Alerts RSS feed) as well as grabbing the pages referenced in order to run xpath on them (did I mention that YQL supports xpath?)
- Google App Engine allows you to deploy web applications without worrying about most of the usual environment, server and configuration issues. It is also a way of dropping buzzwords into your conversation by deploying your newly scalable application to the cloud. FTW
- How this helps my mini-project: I didn’t want to assume any prerequisites like having servers at your disposal, but I also didn’t have time to set anything up from scratch. App Engine is free for small-scale use, and I went from not having an account to deploying my code in under 2 hours
- Python is one of the two programming languages supported by Google App Engine. The other being Java. I know a tiny bit of Java from years ago, whereas I didn’t even know Python gives indentation semantic meaning before I started my project
- How this helps my mini-project: I needed some kind of programming language to enable me to build loops, display the output etc. and I needed to pick one that I (a) didn’t already know and (b) could be used with App Engine
Getting Going
Before I start, let me warn you to read the disclaimer at the end of this post: I build a prototype / proof of concept here and you should definitely not rely on my code. Use at your own risk!
My ’specification’ for the project was:
- Grab mentions from a Google Alerts RSS feed (I chose to hardcode ‘SEOmoz‘ [RSS link] into my proof-of-concept)
- For each mention, see if there is a link to any page on http://www.seomoz.org
- Output a list of mentions that don’t link
Pretty simple, right?
With the clock ticking, I started by downloading the install files for App Engine and Python while reading up on YQL.
The Python download was taking a while, so I spent the first half hour building the YQL queries I needed on the console.
To grab the Google Alerts, I used:
select * from feed where url=’http://www.google.com/alerts/feeds/02091889458087148316/10137124638087203861′
and for each page in that list, I could grab the list of links using:
select * from html where url=’<target URL>‘ and xpath="//a[starts-with(@href,'http://www.seomoz.org')]"
The xpath there probably needs a bit of explaining - I built it using a combination of the basic xpath documentation linked above and the ever-awesome stackoverflow. You can consider it in three sections:
- //a means select all ‘a’ (anchor) elements (i.e. links)
- //a[@href] means select all href attributes of all links
- //a[starts-with(@href,'http://www.seomoz.org')] means select all href attributes of all links that start with http://www.seomoz.org
By the time I’d cracked that, my downloads had finished and I set about getting my environment ready using the App Engine quick start guide.
I also had a lucky break at about this point. I discovered that there is a YQL library for Python. Holy awesome batman! I figured it was going to be pretty easy to build something in Python to query the YQL API, but I didn’t realise it was going to be as easy as yql.Public().execute(query). Sweet!
It took me a while to work out how to import third party libraries into my App Engine environment (turns out you just grab the source code and include the folder in your application’s root folder). My time was running out by this point. I was about halfway through my two hours and I hadn’t yet written a single line of code.
Writing Python Code
I’m not the right person to teach you how to write Python code. Especially because about 10 minutes before the end of my challenge, I realised I didn’t know how to create an if statement. My approach to learning Python is not to be recommended; but there are loads of great tutorials out there. I really wish I could step through and explain my code line-by-line, but honestly? I’d probably just expose my horrific lack of knowledge.
[Want a link from SEOmoz? Understand Python? Write up an explanation for beginners, drop me a line and I'll link to it here. For bonus points, you could show how to improve my code Update: Peter Coles has kindly taken the time to go through my code (improving and) explaining things - if you’re interested, I suggest you read his explanation of my Python code. Thanks Peter!
]. In the meantime, working through the code has to be (as my university lecturers used to say) left as an exercise for the interested reader.
All you really need to know is that in 33 lines of code (at the time of writing), I built my basic prototype. You can see the resulting code over at Google code.
The Outcome
With time running out, I clicked ‘deploy’ and…..
…. huh. That was easy.
OK, so I get time-outs / server errors from time to time and it’s really only a proof-of concept at the moment (see below) but I still think it justified my tweet exactly two hours after the previous one:
Huge Caveats
My prototype is essentially just a proof-of-concept. Among loads of other things, note that it doesn’t have:
- Any error-handling
- Much testing
- Any documentation (including comments)
And that it does have:
- Hardcoded variables
- Massive limitations even given the hardcoding (only grabbing 10 results, for example)
- No way of automating it or doing anything other than running it manually (though App Engine does provide simple ways of extending into this)
In its current form, it’s not really useful for anything, but hopefully it will become interesting soon. If you want to build anything off it (or the ideas contained in it), I’d love to hear about it (but please bear in mind that it really is the definition of non-production-ready code, so if you do use it, you do so entirely at your own risk!).
I still think that it has been a useful learning exercise for me and I hope it presents you with some food for thought (unfortunately not real food like Rand’s recent post).
Please share your ideas
I’d love to hear your thoughts for similar small tools that help us all do our jobs better. I’d also love to see someone take this and turn it into a more fully-functional tool (if you do that, let me know and you’ll likely get a link from here!).
Data Visualization Techniques
Posted by willcritchlow
Rob and Duncan are currently in Seattle, with this week full of interviews of SEO consultants for our US office. Since the announcement in February, we have been working flat out with a bunch of new clients and dealing endlessly with the US immigration service. With people on the ground, I guess we’re now officially participating in the American dream, so to celebrate I’m going to spell Visualization with a z throughout this post. I can’t guarantee full American spelling for everything I’m afraid - muscle memory is a powerful thing.
Anyone who has heard me speak will know about my love of data. Heck, I’ve even given talks on Excel ninjas. However, this post isn’t so much about the data (and that’s the last mention of Excel, I promise). This post is about the visualization.
I expect that everyone in SEO has spent at least some time recently thinking about data visualization techniques. They are great ways for content and data sites to get links and branding benefit and are also loads of fun. Tom’s resource for information visualization and infographics is a great place to start if you don’t really know what I’m talking about.
Last week, I was approached by the FT to pull together some data for them about the use of the web (and social media in particular) across the UK’s political parties as we approach the election. As I started thinking about how I wanted to shape this, I realised that I wanted to produce a visualization for the web as well and that the process I was using might be interesting to you guys. Hence, my top tips for data visualizations with bits and pieces of real world examples:
7 Data Visualization Secrets
1. Gather data (intelligently)
Over the weekend, I had a bit of a think about what kind of data I wanted to be able to visualize. Thinking about Twitter, for example, I wanted to know things like the most influential (and least influential) Twitterers in each party, who was doing things really well and who was making a pig’s ear of it, who could I compare unfavourably to some comedy joke accounts and how did the best of them compare to the Prime Minister’s wife’s pretty impressive performance.
In order to answer any of these questions, I needed data, and lots of it. Obviously, had I been working on this on a weekday, I’d have looked around for the newest recruit in the Distilled office and asked for the data on my desk by the end of the day. Without that option at the weekend, I fired up Mozenda to grab Twittering MPS, their grader ranks, retweetranks, and tweetranks along with follower counts, number of tweets and profile information. It took me about half an hour to gather all this information!



Tip #1: use tools like Mozenda to mash up your own data with multiple sources of public data to get unique insights.
If you haven’t played with Mozenda yet, I highly recommend it - with a simple user interface for creating robust crawlers, it’s a superb tool for any SEO.
2. Delegate additional research
There are some things that even the best scraping engine in the world can’t gather for you. For example, I wanted to cross-reference the data I’d gathered against the cabinet and shadow cabinet. Only a human can do this reliably. For this, I recommend using a virtual assistant service for cheap data gathering (I use timesvr - in the US, you could use mechanical turk for this kind of thing).
I discovered an awesome service the other day - Smartsheet integrates with Google Apps and has an integration with Mechanical Turk that enables you to easily populate tabular spreadsheet data using cheap human resource. Unbelievably useful and powerful.
3. Use great design
I’m not a designer. My design sense is about as well-tuned as my singing. I think this makes me appreciate the importance and value of design even more. Since I’m not the expert here, I’m just going to tell you what works for me when getting other people to make things look pretty:
- Wireframes are your friend: although I hate paper for almost everything, I used to always sketch ideas on paper. Recently I have been a late convert to the power of drawing wireframes on the computer. I am, however, definitely sold. Choose your weapon of choice - I’m currently liking MockingBird but have also seen cool stuff from (Balsamiq, gliffy, Pencil (a Firefox plugin - thanks Simon Lilly) and Mockflow)..
- Pay attention to the users of your data: carefully consider the width, colour scheme and any associated links in the embed code to make the most of embedding opportunities
- Get professionals involved early: don’t lock your limited-design-skills-self in a darkened room only to emerge with something that even a pro couldn’t make look pretty. When you’re at the wireframe / outline stage, show what you have to a designer and get feedback before kicking off the final data collection and design phases
- Brief as well as possible: provide a few examples of the style you are looking for and visual elements you particularly like. Include comments about anything you don’t like in the examples you provide. Try not to be that guy who says I just don’t like it - can’t quite put my finger on why…
The example wireframe that follows is for entertainment only. Any relationship to real infographics real or imagined is coincidental:
If you are including graph-based data, choose your charts carefully (tip: pie charts are often bad). I found this neat flow-chart for choosing what style of graph to use the other day - from Advanced Presentations by Design by Andrew Abela:

4. Consider interactivity for widgets
Any time you are working with data online, you have opportunities to provide your users with interactivity. Sometimes, static infographics are plenty enough to get links and sometimes you will get significantly more if you are providing a widget that allows people to offer their visitors interesting functionality.
You don’t always have to build this yourself. We recently started working with Tableau Software whose business intelligence software has a kick-ass free, public version that is really cool for just dropping in data and creating widgets for embedding. Here’s a subset of the UK politicians on Twitter data:
5. Quirky is at least as important as correct
You all read the internet. You know the power of random facts, cute animals, in-jokes and comedy references. It’s generally not enough to present just the raw facts - interesting comparisons and strong imagery improve the shareability of any piece. We are all wired to remember (and therefore to repeat) comparisons better than plain numbers.
I’m still working on which elements of my infographic might make for quirky comparisons. For example, did you know that an Oscar is the same height as an adult pygmy marmoset monkey? From a client’s recent Oscars infographic:

Source: LocateTV
6. Know who your targets are
Finishing on a couple of strong SEO points, if your goals are improved rankings, you are doing this primarily for links (and if you are doing it for branding purposes, the sharing is critical). So you need to know who your targets are and find a way to reach them. If your target market happens to overlap with Reddit, StumbleUpon etc. then they are obviously going to be great, but don’t forget to drop people in your niche a line as well.
Bonus tip: don’t forget the infographic fans.
7. Provide the embed code (with a link)
You want to provide the embed code for two reasons:
- to make it easy for non-tech-savvy bloggers to share your content
- to make sure (as far as possible) that you get a link out of it
If you can style and include the link in a relevant way (especially if it links to more data or more information) you increase the chance that the people embedding your content will embed the link along with it. If you want to go even further, you could provide your graphic under a Creative Commons Attribution license.
Please keep the comments for discussions of techniques and ideas, not for politics. Any political comments included above are for amusement only and may or may not reflect the political views of the author, or anyone else.
Technorati Tags
data, visualization, graphics, infographics
How to Get the Most Out of Your SEO
Posted by willcritchlow
The good news is that tomorrow (Wednesday 24th Feb), at 8.30am PST (11.30am EST / 4.30pm GMT), I am going to be joined on the next Distilled conference call by Richard Baxter as we discuss "how to get the most from your SEO". The even better news is that it is totally free (as long as you register in time).
If you would like to join us on the call, simply register on the Distilled site and you will be sent instructions to join the conference (which will be handled by gotomeeting / gotowebinar).
Previous calls have been more technical and have been essentially presentations that I have delivered with a slide-deck. I did one on SEOmoz tools and one on how to be an Excel ninja - both videos are available on the Distilled site.
This one is going to be a little different. Rich should need little introduction. With a strong background in in-house travel SEO followed by founding his agency, SEOgadget, he is not only a true guru of keyword research and large site architecture, but also has experience on both sides of the client / agency relationship. He also spoke at the London PRO training seminar last October (thanks to foliovision for the photo):

Rich and I plan to let you into a relaxed chat. We might pull up the occasional website or slide but fundamentally, it’ll be a little like sitting in on a live whiteboard Friday (on a Wednesday, without a whiteboard, or Rand!).
The conversation is likely to be pretty free-flowing - in many ways it will lead on from my WBF conversation with Rand about choosing an SEO consultant - but I can’t guarantee exactly what we will talk about! We are intending to cover:
- the best tasks to keep in-house vs. outsource
- combining SEO effectively with PPC, PR and marketing
- integrating SEO into other processes (e.g. development, business development)
- how to get the most from your agency
- how to keep an eye on your agency and avoid bans and penalties
- how to be a great SEO client and get even more out of your agency
We hope to have you there. We will be taking questions - both on Twitter (hashtag: #optimalSEO) and via the chat interface in gotowebinar, but if you have anything you’d specifically like us to cover, feel free to use the comments below to chime in.
First Touch Tracking in Google Analytics
Posted by willcritchlow
It’s time for a quick mid-week geek-out - I wanted to collect together a bunch of resources I have written on first touch tracking in Google Analytics including (for the first time that I’m aware of), the technical implementation details:
- Following a Whiteboard Friday recorded when I was in Seattle a month ago or so
- I hurriedly wrote a post explaining general principles of how to get past last touch attribution in Google Analytics
- which it turned out was (quite reasonably) not enough for some of our readers (and I really wasn’t happy with the level of detail)
- so I have done my best to go above and beyond with detailed instructions on How to do First Touch Tracking in Google Analytics
- while digging around in GA and JavaScript to put this all together (a combination I probably shouldn’t have been doing) I discovered an interesting difficulty if you are going to migrate from the now-deprecated _setVar to _setCustomVar and wrote a post about it over on Search Engine Land.
If you’re the kind of person that unwraps your birthday presents early, you can skip to the punchline, grab the code you need to get first touch tracking working from Google code (don’t forget to read the instructions!). Here’s the meat from my detailed post:
Include following code anywhere above the Google Analytics code script in your page code:
<script type="text/javascript"
src="http://attributiontrackingga.googlecode.com/svn/trunk/distilled.FirstTouch.js">
</script>
Move your GA code above any Website Optimizer code or anything from Google that might write a visitor (__utma) cookie and look for:
var pageTracker = _gat._getTracker("UA-XXXXXXX-X");
pageTracker._trackPageview();
In between those two lines, you want to put the following line:
distilledFirstTouch(pageTracker);
Oh, and don’t forget all of this is provided as is, with no warranty. I hope it will help you out, but only you are responsible for changes you make to your website and tracking code. Be extremely careful with live profiles and remember that you will need to do something different if you already use custom variables.
That’s all for now folks. Enjoy your analytics, don’t forget to drop comments with improvements, tips, tricks, abuse for writing such a short post etc. and if you need a primer on Excel to make the most of your new-found data, you can check out the recording of my conference call on how to be an Excel ninja (sign up for future calls here).
To distract you from this spectacularly short post, here are some really big things found on the internet this week:
Check out the depth of the ocean:
As well as the size of the earth:
I also recommend watching this one.
Technorati Tags
analytics, ga, firsttouch, google, javascript, code
How To Get Past Last-Touch Attribution With Google Analytics
Posted by willcritchlow
In last week’s Whiteboard Friday "Kill the Head or Chase the Tail", Rand and I started by discussing how to gain true insight into what kind of keywords are leading people to discover your brand and ultimately driving conversions for your business (clue: it’s probably not branded search phrases, despite what your analytics reports are telling you). Today, I’m going to demonstrate one way of measuring this more accurately in Google Analytics.
The problem is well described by the ever-excellent Avinash Kaushik in his post entitled Measuring Upper Funnel Keywords (although nominally about paid search, his description applies perfectly well to natural search except you aren’t paying for traffic in the same way). It can be summarised by thinking about all those reports we have all seen showing branded search terms being the best-converting. While this is true in the sense that the individual finally converted after searching for the brand, it’s clearly not the way they found out about your services. For the purposes of setting strategy, you need to understand in better detail your "visitor acquisition" channels that eventually lead to conversions. Sam’s superb post on SEOmoz’s conversion rate lessons from 2009 touches on this in point 2.
Enter multi-touch analytics tracking.
Most analytics packages use last-touch attribution by default meaning that conversions are allocated to the most recent source of a visit for that visitor. We are interested here in first-touch attribution or even multi-touch attribution models to understand how visitors are influenced over time by repeated visits to the site. If you are interested in analytics packages that can track multiple touches ‘out of the box’, I recommend reading John Santangelo’s YOUmoz post on Google Analytics alternatives.
First-touch tracking in Google Analytics
Patrick at Blogstorm has written about over-riding last click attribution (something I also discussed in my presentation Analytics Every SEO Should Know that Scott linked to from the Whiteboard Friday). But this method only works when you can specify the exact URL of the landing page including parameters as it relies on the utm_nooverride parameter. This works fine for email and PPC traffic, but doesn’t help with tracking organic search traffic.
For this, we need a slightly more involved method.
In my presentation, I touched on the function setVar and a custom function called superSetVar, but in the updates announced in October last year, the GA team released a new function called setCustomVar that is now the best functionality to use. For this purpose we want to track variables at the visitor level.
In your GA tracking code, you want to check for the presence of the __utma cookie which will be present only if the user is a returning visitor. If it is not present, use the JavaScript variable document.referrer to set a visitor-level custom variable (named something like "original referrer") and use location.pathname to set a second visitor-level custom variable (named something like "original landing page"). Take care not to re-use custom variable slots you are using elsewhere in your analytics.
You will probably then want to add a filter to your analytics profile to convert the raw referrer into referring keywords using a filter like this one for getting detailed PPC keyword information (obviously not filtering only PPC traffic). You might also want to pull out the original source (which you can work out from the referrer and landing page) into a separate variable.
With this all set up, you will be able to run conversion reports by original keyword for a given original source and see conversion information based on first click attribution. I would expect that you would see the long-tail contributing far more than it does in the standard reports and branded search much less (not zero of course - there will still be first-touch branded searches driven by PR, offline marketing etc.).
Multi-touch attribution modelling
If you are feeling especially hardcore, you can dig even deeper into this whole mess by attempting to capture multiple touch-points. The idea here is that you want to give attribution for conversions not only to first- and last-touches but also give so-called assists to touch-points along the way (e.g. a conversion path could look like long-tail keyword > head keyword > branded search > direct visit - under this scenario, you might want to give the head and branded searches some attribution for the conversion).
This becomes especially important if you have different departments contributing to the marketing - you would like to be able to give some credit to the departments that bring the visitor in, some to the channels that keep the visitor returning and to the channel that finally converts them.
I haven’t set this up with the new GA functions, but the basic process would involve something similar to the superSetVar function for the new setCustomVar. The idea here would be to stuff repeat visit information into the custom variables. This information is almost certainly unusable via the interface and you will likely need to export to Excel and play there (most likely with Pivot Tables - you all know how much I love them - it’s a little while since we ran a conference call (that link is to a recording of the one I did on Excel) but I’m planning the next one so go and sign up if you aren’t already on that mailing list).
If you’re hardcore enough to really want this information, you can probably work out the details! If anyone has done it and wants to write up detailed instructions, I’ll happily update this post with a link to your explanation.
View-through conversions
The missing piece of the puzzle if you are doing multi-touch attribution modelling is giving ‘assists’ to branding events such as the viewing of a display advert (without a clickthrough). Rich, our PPC guru at Distilled, wrote an introduction to Google’s viewthrough conversion metric.
There are all kinds of privacy concerns in extending this further - but the data is out there to gather this kind of data across whole platforms (e.g. understanding search funnels that led to your site in the end). The signs are there that we are going to get ever more information like this - particularly out of Google who are obviously always looking for ways to persuade their customers to spend in areas outside (the generally cheaper) branded search!
I love analytics and statistics, so I’d love to hear your favourite tips and tricks in the comments.
I’m sure future conference calls in my schedule will involve analytics tips and tricks so go ahead and sign up if you’d like to hear when they are running. You also might be interested in a post I wrote about integrating Google Website Optimizer with Google Analytics on SearchEngineLand.
Linkfromdomain - A Linkbuilding Tip For Use at Bing.com
Posted by willcritchlow
Bing recently came out of beta in the UK and we are seeing the beginnings of the advertising campaign to promote it.
For SEOs, however, there is a more immediate opportunity with Bing than hoping it gathers some market share from Google(*). Linkfromdomain is a search operator that is unique to Bing. It returns the pages that are linked-to from a domain. There are obviously other ways of getting this information in raw form (maybe including Linkscape one day, but certainly including Xenu for mid-sized sites), but for large sites especially, it can be really hard to gather it in any kind of usable form.
The usage of linkfromdomain is to search on Bing for something like:
- linkfromdomain:ox.ac.uk (returns pages linked from the Oxford University site - more on this below)
- linkfromdomain:ox.ac.uk intitle:broadband (filters to broadband in the title)
- linkfromdomain:ox.ac.uk wimax (searches for wimax anywhere on the linked-to page)
The set of results is generally returned in a similar ordering to a regular search query - with a combination of highly relevant and more powerful results first. Unfortunately linkfromdomain does not support searches for sub-domains (even www.) you have do search for linkfromdomain:exampledomain.com.
How do you use this for SEO?
This is a linkbuilding tip post - the idea being two-fold:
- suppose you have a powerful target website (such as an educational institution) and you are seeking ways of getting links from them, this gives you tools for finding techniques, content types and targets for those links (more on this below but it’s very effective for building highly trusted links)
- sometimes the "one-step-removed" linkbuilding model can work superbly well for identifying linkbuilding targets. If I were running a cooking blog (wait, I do - it took superhuman effort not to drop a shameless link there), it might be a good idea to look at something like this as a superb linkbuilding target list
The information contained in the second approach is typically findable through other means (or the targets are likely to appear on your radar in other ways) and there is a lot of searching through chaff to find wheat. I wanted to run through a worked example today to show you how powerful method #1 can be:
Worked example
I had to pick a niche and a target for my worked example. I decided to imagine I was linkbuilding for a technical but not-specifically-web-related company. I’m trying to get links from trusted authoritative domains so I start with big educational institutions.
As some of you may know, I studied at the University of Cambridge (ending with a year at the Statslab). I don’t want them getting link requests from all you lot, so I picked Oxford (**).
I’m pretending my imaginary client works in some area of telecoms and has resources and technical papers on subjects like wimax and spectrum usage.
First up, wimax:
- Start with the linkfromdomain search: http://www.bing.com/search?q=linkfromdomain%3Aox.ac.uk+wimax&go=&form=QBRE&filt=all&qs=n
- Pick out an interesting-looking resource: http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470510285.html
- Dive over to the labs tools to find ox.ac.uk links to this page: http://www.seomoz.org/labs/backlinks?uri=http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470510285.html&linktype=page
- This highlights this kind of page: http://www.conted.ox.ac.uk/courses/details.php?id=O08C835H6J
It turns out that conted.ox.ac.uk is a goldmine for linkbuilders. It’s the Continuing Education section of the Oxford University site and seems to be very generous with linking out. I might suggest that my client gives a talk or writes a resource for a CPD course. At the very least, it might be worth creating some content to target this kind of page.
Tip: I find it best to look for links to pages that aren’t homepages because it’s typically easier to find where the link originates from. Bing doesn’t have an effective link: operator meaning that we have to use Yahoo, Linkscape or similar. Because we are then not using the same index, it can be tricky to track down the link found by linkfromdomain.
Another example starting with spectrum auctions - sometimes it’s funny where this kind of research can take you:
- A different linkfromdomain search: http://www.bing.com/search?q=linkfromdomain%3Aox.ac.uk+spectrum+auctions&filt=all&first=11&FORM=PORE
- A different kind of interesting-looking (and, as it happens, off-topic) resource: http://www.independent.co.uk/arts-entertainment/books/reviews/dead-aid-by-dambisa-moyo-1519875.html
- A demonstration of using Yahoo! instead of labs tools to find the source of the link: http://uk.search.yahoo.com/search?p=link%3Ahttp%3A%2F%2Fwww.independent.co.uk%2Farts-entertainment%2Fbooks%2Freviews%2Fdead-aid-by-dambisa-moyo-1519875.html+site%3Aox.ac.uk&ei=UTF-8&fr=moz35
- Leads to a new strategy - find a way to write about this college and drop them a line: http://www.sant.ox.ac.uk/news/media.html
(Incidentally, I found a very similar opportunity on the Cambridge site, but no, I’m not going to tell you about it.)
In an unexpected turn of events, I also found some pretty active blogs writing about my target subject matter on ox.ac.uk URLs. Even I’m not mean enough to fill up those guys’ inboxes with outreach from you lot just because they picked the wrong university.
(*) I don’t know about anyone else, but I am rooting for a more balanced search market (particularly in the UK, where Google has a ~90% market share). I think competition is good for consumers and for businesses.
(**) seriously, we don’t get on (US folks, think of the relationship between Duke and UNC) but I’m not encouraging anyone to spam Oxford University. Really. I’m not. Even though the varsity match is this week.
There are some other great resources on linkfromdomain - I really liked PPC blog’s tip about expired and for sale domains.
Rand has also written about the uses of linkfromdomain for finding spam you are linking to as well as teasing you with the fact that he "gave up" a similar tip to my worked example above at SMX Advanced.
Technorati Tags
linkbuilding, bing, linkfromdomain
Advanced Link Analysis Charts
Posted by willcritchlow
Bored of sorting massive lists of links in all kinds of different directions to understand the link profile of a new site?
Struggle to understand how to gather actual insights about link profiles from lists of thousands of links and persuade management of the actions needed?
Don’t panic. Help is at hand.
I’m going to share some data visualisation tips today that I reckon I could use to beat up on Rand in a presentation-off (umm, again). We have recently been doing some deep dives into clients’ and prospects’ link profiles which gave me an excuse to mash up some Linkscape API data in Excel. I’ve used Linkscape data, but you could use any link analysis tool you like as long as you can get some metric to sort the linking domain by (I have used domain mozTrust in most of the examples below). Equally, I’ve used Excel, but you can use any data analysis package you like. If you want to use Excel, you will need the Data Analysis Toolpak (for the histogram function).
I’ll get into how to make the charts in a minute, but first I’m going to just show you some pretty pictures:
Impress the boss
This one is of questionable use (I think there are better ways of actually visualising the data) but it’s pretty, and bosses like pretty (allegedly). This is a surface chart of number of linking domains by domain mozTrust shown across 4 data points - all links, links to the homepage and links to the next two strongest pages:
The bit of insight this does give us at a glance is that the vast majority of the site’s very low DmT links go to the homepage and that the most trusted domains linking to the site (DmT >=
don’t link to the homepage or the next two strongest pages.
The same chart just showing links to the homepage compared to all links which shows the top end a litle more clearly:
Gathering insights
I think this data is actually easier to see as a line chart like this (locations A and B are the top two strongest pages on the site after the homepage):

What we just about see here is some bumps up at the top end of the DmT scale in the light blue line which is the same bit of insight I mentioned above.
Drilling down
Diving into this data to show only the top end of the DmT scale, we get:
And we see that although the homepage and these top two location pages are the most powerful pages on the site, they are not the ones with the links from the biggest / most trusted sites. This is an area for further examination that would be hard to discover by looking at endless lists of links.
This is just an example of the kind of insight you can gather. I’m showing off tools and techniques here rather than specific insights. I’ll leave you to do your own playing to discover interesting things about your clients and competitors. I didn’t know what I was going to find when I started diving into the data for this site. You likely won’t know either, but graphs are great discovery tools. Sometimes, of course you find nothing of interest:
Comparing just the top two pages doesn’t give us any very meaningful insights except that the big links out at 6.5-7 DmT to location A probably explain why it’s more powerful than B. It might be more insightful at a lower granularity.
Equally, I haven’t yet learnt to understand the meaning that I am sure is buried in charts like this one:
This is the number of links to a whole site by the mR of the linking page. Like the mythical guys who can understand network traffic by watching LEDs blink on routers, I’d love to be able to look at this kind of chart and really understand things. The closest I’ve got so far is that I think these charts should look roughly smooth in the absence of manipulation. If we assume that the difficulty of acquiring a link is roughly correlated to its strength and that we get links at a rate inversely proportional to their difficulty, then I think this chart should look roughly like a Poisson distribution:
Which this one does, so I’m happy.
Persuading management / bosses
The next thing that some of these charts helps with is making the case to management when you know something is true, but they need more persuading. This next example takes two different sites (neither of them is the site above) that are in different industries but have remarkably similar link characteristics at the macro level (don’t ask me how I found these sites - I am just that sad). The spider chart shows how similar they are:
However, if we dig in a little further, we find quite a difference behind the scenes:
The red site seems to have loads more decent links (mR 4, 5, 6) than the blue site. So how does the blue site end up with similar domain metrics?
It’s all about the relatively small number of very powerful links the blue site has. Zooming in on mR 6 & 7 links:
If you were just to look at this chart, you might imagine that the red site was getting more juice passed via these links than the blue site is. However, you’d be being fooled by the logarithmic scale. In terms of total juice passed by just these mR 6 and 7 links, the actual story is:
In other words, the blue site is competing almost purely on the basis of the big mR 7 links it has that the red site doesn’t. That’s kinda interesting in terms of strategy generation isn’t it?
How do you do this analysis?
Pretty much everything in this post was generated using the histogram function in Excel running over Linkscape API data. It’s pretty straightforward with the online help. The only gotchas I noticed that you might need to know about were:
- Align the ‘bins’ (which are the x-axis values on most of the charts above) either with mR / mT intervals (e.g. 1, 2, 3, 4, …) or go much more granular (e.g. 0.1, 0.2, 0.3, ….). Anything in between tends to generate artifacts
- The bin range has to be on the same sheet as the data - if you try to pull in a bin range from another sheet, it fails silently
- If you want to do the surface chart, you need to do some interpolation between your points. In the examples above, I just did a linear interpolation (i.e. drawing a straight line between the different page levels) - so if the homepage has 100 mR 2 links and the next page has 50 mR 2 links, I just created 10 imaginary pages with 55, 60, 65, 70… mR 2 links to spread the surface out far enough to see it. This may not be the best way of doing things. I’d love to hear from anyone who has a better method
Thanks to foliovision for the photo from the ProSEO seminar.
Technorati Tags
linkbuilding, analysis, visualisation, visualization, data


.gif)





