Trashwiki.org talk:Robots.txt

From Trashwiki.org
Jump to: navigation, search
Moved from the community portal.

I've been considering hiding (parts of) Trashwiki from Google. Companies are googling for their names and might not be happy to find themselves on Trashwiki. Maybe it's a good idea to leave the Main Page open for Google Bot? guaka 13:03, 16 March 2009 (UTC)

I can hardly imagine a store employer googling for his store's name, and even if this occurs, results which would include relatively small Trashwiki would be way back at the end. Furthermore, even when 1 or 2 stores find their way to Trashwiki, not much will be to lose (it is not even clear if the store owner will be managing his store dumpsters in a stricter way after he sees the information on Trashwiki). Hiding from Google will only reduce traffic on the site, so I would say screw any hiding! --Sigurdas 15:01, 16 March 2009 (UTC)
You'd be very surprised. I'm not running Google Analytics, but in Google Webmasters I get some info and in the short time we've been existing we managed to reach some top 10 hits for specific terms that consumers and shop owners are definitely googling for. Hitchwiki never really reached top spots at google, but is still widely popular among hitchhikers. The info on trashwiki is just slightly more sensitive and could actually lead to fewer possibilities. Trashwiki will also spread without Google. Maybe we can have a very specific robots.txt that will only allow some pages to be indexed (such as the Main Page). guaka 15:20, 16 March 2009 (UTC)

I talked about it with Erga and apparently in New York it has become harder to dumpster dive. Of course it's unclear if that's related to Trashwiki, but it might just be and if so, it goes against the entire goal of Trashwiki. So I'm really considering adapting robots.txt one of these days and only leave the main page open for spider bots. guaka 12:16, 17 March 2009 (UTC)

I would support the idea of hiding some trashwiki pages and only leaving the main one. Some shops or supermarket chains could be sensitive for specific information about them and may decide to take actions against dumpster diving. So it's better to be safe. After all, the main page has all keywords and will definitely attract enough attention to the project from those who are interested in freeganism. --LinasD 15:23, 17 March 2009 (UTC)

I've talked to a bunch more people about it. I'm about to move Trashwiki to another server so I'll combine that with an adaptation of our robots.txt. Feel free to help. guaka 01:23, 18 March 2009 (UTC)

Update

Google stopped crawling most of the site, but existing pages are still there. I'm removing all cities and country one by one. Since it's nice to preserve this project for eternity I also want to try to see if it's possible to still let the archive.org crawler :) guaka 13:12, 7 April 2009 (UTC)