Are you using 'Disallow' in robots.txt and think a search engine won't list your page?

For general web development questions that are not specifically related to CSS HTML Validator. This includes (but is not limited to) general HTML, CSS, Accessibility, JavaScript, and SEO questions.
Post Reply
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Are you using 'Disallow' in robots.txt and think a search engine won't list your page?

Post by Albert Wiersch »

If you're using Disallow in a robots.txt file and you think that a search engine won't link to your page then this is a good read:
https://yoast.com/prevent-site-being-indexed/

If you really want a page to not appear in a search engine's results then you should use a meta tag or the X-Robots-Tag HTTP header while allowing a search engine to 'index' the page so it can see the meta tag or header.

Here's a good Google link about it:
https://support.google.com/webmasters/answer/93710

Which says:
Google wrote:Important! For the noindex directive to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex directive, and the page can still appear in search results, for example if other pages link to it.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
MikeGale
Rank VI - Professional
Posts: 726
Joined: Mon Dec 13, 2004 1:50 pm
Location: Tannhauser Gate

Re: Are you using 'Disallow' in robots.txt and think a search engine won't list your page?

Post by MikeGale »

That's interesting:

"If you don't want me to read your page, then I must read your page".

Am I being too harsh?
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Are you using 'Disallow' in robots.txt and think a search engine won't list your page?

Post by Albert Wiersch »

MikeGale wrote: Thu May 17, 2018 4:22 am That's interesting:

"If you don't want me to read your page, then I must read your page".

Am I being too harsh?
Maybe a little too harsh. :)

More like "if you don't want me to index your page then I must read it to see the noindex instruction".
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
Lou
Rank V - Professional
Posts: 297
Joined: Fri Jul 29, 2005 5:55 pm
Location: CO
Contact:

Re: Are you using 'Disallow' in robots.txt and think a search engine won't list your page?

Post by Lou »

MikeGale wrote: Thu May 17, 2018 4:22 am Am I being too harsh?
I think not. There are excepting of course, but generally anyone I would not want to read my site won't follow any rules or look for them. More likely "not" is a come on.
Lou
Say what you will about Sisyphus. He always has work.
User avatar
MikeGale
Rank VI - Professional
Posts: 726
Joined: Mon Dec 13, 2004 1:50 pm
Location: Tannhauser Gate

Re: Are you using 'Disallow' in robots.txt and think a search engine won't list your page?

Post by MikeGale »

Makes me think of the way DNT is not obeyed by just about anybody.

Imagine if there were:
  • Do Not Track (With a richer instruction including things like, "if any of my data will end up at XXX, YYY...; first party re-identification allowed...)
  • Do not collect data
  • Do not de-anonymise
  • Do not advertise at me
  • or whatever makes sense
Then much of the GDPR provisions would be redundant.

(It appears to me that Google is claiming something called "controller" status under these rules. Among other things that appears to allow them to claim exclusive rights to server logs on any site feeding data to them. That's what I see in press write ups. Anybody got better insight into this insane sounding situation?)
Post Reply