Oct 13
29
A Good Bad Back-link Removal Experience
Links are the ties that bind the web together. Some links are beneficial for a website while others are not. It is the job of a good search engine optimization specialist to know the difference between the two and weed out the bad ones. That’s much easier to write or read than it is to do. Finding bad links and having them removed is often a multi-step process that (most often) has to be conducted link by link by stinking-rotten link.
This experience started with a find made by DAM SEO Bria Jordan. She was working to fix random (Google or Bing) crawl errors found on a large network of websites relating to consumer financial services. She sent me an instant message containing an incoming link reference she couldn’t figure out. The link originated from a financial services marketing company in the United States. That company has a bot seeking and scraping content on the web based on keywords found on our client’s pages or in the title of their blog posts. The scraper, in this case, is running RSS feeds referencing articles or pages that relate to consumer financial services. As far as our investigation could tell, they are not reusing our client’s content, just referencing it in what appears on their site be a news-feed. The content they scraped was limited to titles of articles rather than the actual copy itself.
Most of the time I wouldn’t be overly concerned about this type of practice. If the website running the scraper was reputable, the incoming link might be beneficial to our client. Even if the website was less than reputable, Google and Bing are both smart enough to understand that we (or our clients) did not code these links and thus our websites should not be punished for something we have little to no control over. They would likely ignore the link. In this case however, their scraper was replacing dashes found in the URLs with unicode symbols, resulting in a faulty generated URL that lead visitors and search spiders to a custom 404 error page. That concerns me.
We build fail-safes into websites we work on. If a user enters the wrong URL or is directed to a page that does not exist, our sites will produce a page designed to tell the user an error occurred and help them find what they were looking for. As a SEO and webmaster, I see the custom 404 page as a sort of safety-net that’s there to catch visitors if they somehow fall off the side of the site. It is a very useful component of a good website which you hope will be used infrequently. Having Google and Bing’s spiders being directed to a 404 error page over and over again is not a good thing. I’m honestly not able to absolutely and positively say it’s a bad thing but I can say with strict authority it’s not good.
Google and Bing are machines of sorts. Their algorithms follow a logical series of measurements to arrive at whatever conclusions they arrive at. That logical series of measurements is comprised of hundreds to thousands of weighed factors, depending on any given search query. If an existing page is identified to Google or Bing’s spiders over and over again on multiple properties but the link to that page results in a 404, eventually Google and/or Bing might see less value in that page. Even if the page owner has nothing to do with the broken link.
So, cause for worry. I was able to figure out what the link was, where it came from and how it was generating a faulty reference. I also confirmed that the same scraper was being used on sites representing other American cities, some of which had the same names as the Canadian cities our client has offices in. In cases where an American city had the same name as a Canadian city where our client has a branch, blog and page titles were scraped and used to populate the “news feeds” of their sites. Suddenly, I had SERIOUS cause for worry, a self-propagating problem. It could very quickly get out of control with more of our SEO staff tasked to chasing and eliminating bad links by the gross rather than the dozen we expect deal with each day.
We might even have to resort to a Google tool of last resort and specifically denounce or disavow the incoming links. The thing is, a tool of last resort should only be used when no other solution is possible. That’s why it’s thought of as a last resort. On principle, I dislike considering last resorts. Nevertheless, there is there is this growing problem with few options available.
The first option is reason. Every website has a webmaster. Webmasters talk to each other. We understand the environments each other work in. We’re buddies. If you’re a webmaster and you have a problem, go to another webmaster. Theoretically, that’s the most likely way to get stuff done the fastest so that’s what I did. Sadly, in the working world of webmasters, theory and reality are two sides of the same fruit salad.
I wrote a nice, polite, and rather pointed letter to the webmasters of a scraper site that is creeping a client site of ours and producing really bad back-links to the client site. I expected no reply but it’s a first step in trying to have dodgy incoming links removed. I made a vague mention about blocking IPs and causing visible errors on their site and expected nothing to come of it but the reportable experience of doing it in the first place.
I expected to be exercising my second option this morning, tracing and blocking the scraper’s IP address from accessing several dozen websites. I’d thought about the process last night and worried about the number of incidents we might see in a couple weeks if we didn’t deal with it right now. Each block would have to be installed individually. That’s like, two person days of work to allocate. Major bummer.
Imagine my surprise to find a polite and well worded reply saying content from the URLs I noted will be restricted from their scraper. I was so delighted I decided to write a blog post about it because this is something that almost never ever happens ever. Today it did.