Are you using 'Disallow' in robots.txt and think a search engine won't list your page?

Discussion about site promotion and issues affecting search engine listings and rankings.
Post Reply
User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3355
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Are you using 'Disallow' in robots.txt and think a search engine won't list your page?

Post by Albert Wiersch » Wed May 16, 2018 11:13 am

If you're using Disallow in a robots.txt file and you think that a search engine won't link to your page then this is a good read:
https://yoast.com/prevent-site-being-indexed/

If you really want a page to not appear in a search engine's results then you should use a meta tag or the X-Robots-Tag HTTP header while allowing a search engine to 'index' the page so it can see the meta tag or header.

Here's a good Google link about it:
https://support.google.com/webmasters/answer/93710

Which says:
Google wrote:Important! For the noindex directive to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex directive, and the page can still appear in search results, for example if other pages link to it.
Image
Albert Wiersch

User avatar
MikeGale
Rank VI - Professional
Rank VI - Professional
Posts: 704
Joined: Mon Dec 13, 2004 1:50 pm
Location: Tannhauser Gate

Re: Are you using 'Disallow' in robots.txt and think a search engine won't list your page?

Post by MikeGale » Thu May 17, 2018 4:22 am

That's interesting:

"If you don't want me to read your page, then I must read your page".

Am I being too harsh?

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3355
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Are you using 'Disallow' in robots.txt and think a search engine won't list your page?

Post by Albert Wiersch » Thu May 17, 2018 7:55 am

MikeGale wrote:
Thu May 17, 2018 4:22 am
That's interesting:

"If you don't want me to read your page, then I must read your page".

Am I being too harsh?
Maybe a little too harsh. :)

More like "if you don't want me to index your page then I must read it to see the noindex instruction".
Image
Albert Wiersch

User avatar
Lou
Rank V - Professional
Rank V - Professional
Posts: 265
Joined: Fri Jul 29, 2005 5:55 pm
Location: CO
Contact:

Re: Are you using 'Disallow' in robots.txt and think a search engine won't list your page?

Post by Lou » Thu May 17, 2018 5:22 pm

MikeGale wrote:
Thu May 17, 2018 4:22 am
Am I being too harsh?
I think not. There are excepting of course, but generally anyone I would not want to read my site won't follow any rules or look for them. More likely "not" is a come on.
Lou
Say what you will about Sisyphus. He always has work.

User avatar
MikeGale
Rank VI - Professional
Rank VI - Professional
Posts: 704
Joined: Mon Dec 13, 2004 1:50 pm
Location: Tannhauser Gate

Re: Are you using 'Disallow' in robots.txt and think a search engine won't list your page?

Post by MikeGale » Thu May 17, 2018 6:16 pm

Makes me think of the way DNT is not obeyed by just about anybody.

Imagine if there were:
  • Do Not Track (With a richer instruction including things like, "if any of my data will end up at XXX, YYY...; first party re-identification allowed...)
  • Do not collect data
  • Do not de-anonymise
  • Do not advertise at me
  • or whatever makes sense
Then much of the GDPR provisions would be redundant.

(It appears to me that Google is claiming something called "controller" status under these rules. Among other things that appears to allow them to claim exclusive rights to server logs on any site feeding data to them. That's what I see in press write ups. Anybody got better insight into this insane sounding situation?)

Post Reply