www.standardsandgrudges.com

Tuesday 31 January, 2006

Google VS. Traditional Media

Filed under: World News — Steven A. Stehling @ 14:54

It’s well known that the traditional print media industry is struggling to find a profitable model in the internet age, but sometimes their efforts are counterproductive. A group of global newspapers is complaining that the Google News aggregator is violating their copyright and damaging their ability to compete for online advertising. The heart of their complaint is that Google News gathers the first few sentences of an article and displays it, with a headline and link back to the news source. They want to call that a copyright violation? It’s been well established that individuals or companies may use excerpts of copyright material. If Google was displaying the entire content of an article and not providing a link back to the news source, then I could see grounds for a complaint. These news media companies need to take advantage of services like Google News. Do you think the average person is going to visit the Alabama International Journal Gazette website? No, they won’t. But they might if they find a search match on Google News. Google News delivers traffic to news media sites. More traffic means more advertising opportunity. It’s up to the websites to take advantage of that traffic and maximize the chances that their content will appear on Google News searches.

But fine, they don’t want Google News to publicize their articles. That’s a simple fix and it doesn’t take the courts. There’s two things you could do. You could simply ask Google not to crawl and index your website. Google may honor that request, but they may not. The Google spider is a complex program and it may affect it’s efficiency to implement too many filters. Now if Google doesn’t honor that request or you simply don’t want to take that route, don’t let Google crawl your site. Set your sever configuration to deny access to the Google spider. You could do this for the entire site, or just some portions of the site. Just have your IT staff make it happen. If I, lowly internet amateur, can do it, then educated IT professionals can too.

Here’s what I would do. I would put all articles and photography in a subdirectory of the webserver. Chances are, this is already done. I would then set the HTACCESS in the subdirectories to deny access to Google domains. You could find the Google spider domains by checking the server statistics for the Googlebot user agent and then blocking domains it operates from. There’s nothing Google could do to get around this (except change their spider domain), because your server controls access to the content. The Google spider requests access to the content in the subdirectory and the webserver denies it. If Google changes their domain, you simply add the new domain to the block list.
I would only block Google from accessing the subdomains because you still want your website to appear on normal Google searches. If you prevent Google from accessing your entire site, you may prevent users that are specifically searching for your site from finding it on a Google search and most internet search referrals come through Google. If you prevent Google from accessing your entire site, you’re in effect preventing a majority of internet users from ever finding your site.

The fact that these news media companies have decided to use the courts is telling of how little they understand internet technology.

TrackBack URL

No Comments »

RSS feed for comments on this post

Leave a comment

Page Generated in 0.163 seconds.
Powered by WordPress
Creative Commons License
All text and watermarked images are licensed under a
Creative Commons Attribution-Share Alike 3.0 United States License.