Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Subscribe to our newsletter

Advertise on LowEndTalk.com

Latest LowEndBox Offers

    What do you do with a drunken Googlebot?
    New on LowEndTalk? Please read our 'Community Rules' by clicking on it in the right menu!

    What do you do with a drunken Googlebot?

    I made a post to my blog about this over the weekend, but l want to open it up to a larger discussion here. Is there anything you do to keep spiders from behaving badly on your web sites? Something less severe than just banning their subnet at the firewall, of course. :-)

    For example, for the random 404's that Google normally insists on bothering me with:

    66.249.64.235 - - [26/May/2016:11:23:55 -0400] "GET /yrjclqajwyshc.html HTTP/1.1"
    66.249.64.10 - - [27/May/2016:11:15:02 -0400] "GET /ysveybimgdu.html HTTP/1.1"
    66.249.64.3 - - [02/Jun/2016:10:20:53 -0400] "GET /iqswwijkbkk.html HTTP/1.1"
    66.249.64.243 - - [03/Jun/2016:10:11:18 -0400] "GET /qfmtujzxykv.html HTTP/1.1"`
    

    I added the following to my site's .htaccess file so that it gives a 204 response (No Content) instead of logging an error:

    RewriteEngine on
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^[a-z]{8,16}.html$ http://www.google.com/ [R=204,L,CO=google:stop_your_404_probing:impossiblystupid.com]
    

    Any other tips or tricks you use to keep the clutter in your log files to a minimum?

    I am Impossibly Stupid. Hailed by @jarland as an "incessantly belligerent buffoon." Available for parties. Book early to avoid disappointment.

    Comments

    • edited June 2016

      Googlebot is not drunk. Those requests are done on purpose.

      Google is testing your site for proper a HTTP response code to non-existing files/documents.

      Thanked by 1Dylan
    • rds100rds100 Member

      You are complaining about one request per day?

      -

    • globalRegisters said: Google is testing your site for proper a HTTP response code to non-existing files/documents.

      To what end?

    • @Abdussamad said:

      globalRegisters said: Google is testing your site for proper a HTTP response code to non-existing files/documents.

      To what end?

      I'm not sure. Maybe testing for redirections?

      This is nothing new, these odd filename requests having been going on for years by Googlebot.

      Thanked by 1vimalware
    • DamianDamian Member

      This reminds me of people who claim that their VPS is generating extreme load because Google is crawling their site.

      Thanked by 1inthecloudblog
    • ricardoricardo Member
      edited June 2016

      What globalregisters said.

      Google for 'soft 404'. Much easier for Google to check whether you return a 404 when it's nearly certain you should, rather than attempting to guess by the wordage of your page and (non-404) HTTP response.

      What it basically means is that Google has a lower confidence level about the content that is served, because it can't be sure it's some fancy (and possibly temporary) error page or a document that's useful to satisfy a user's query.

    • @globalRegisters said:
      Googlebot is not drunk. Those requests are done on purpose.

      Never said they had no purpose . . . for Google. For me, they're just an annoying "error" that was getting logged, so I decided to change that and thought I'd share.

      Google is testing your site for proper a HTTP response code to non-existing files/documents.

      I'd argue that 204 is more proper than 404 here.

      I am Impossibly Stupid. Hailed by @jarland as an "incessantly belligerent buffoon." Available for parties. Book early to avoid disappointment.

    • @rds100 said:
      You are complaining about one request per day?

      No, I'm opening a discussion about all kinds of log entries that get generated by poorly written spiders. If the error threshold that gets your attention is higher than mine, I still welcome you to share the techniques you use to stop them in their tracks.

      I am Impossibly Stupid. Hailed by @jarland as an "incessantly belligerent buffoon." Available for parties. Book early to avoid disappointment.

    • impossiblystupid said: No, I'm opening a discussion about all kinds of log entries that get generated by poorly written spiders. If the error threshold that gets your attention is higher than mine, I still welcome you to share the techniques you use to stop them in their tracks.

      For Googlebot and all other robots.txt obeying spiders

      User-agent: * Disallow: /

      Thanked by 1impossiblystupid
    • edited June 2016

      @impossiblystupid said:
      I'd argue that 204 is more proper than 404 here.

      You can take whatever stand you want.

      If you don't care about Google's opinion of your site, then serve whatever suits you.
      The fact remains that Googlebot wants to see a 404 in this situation.

    • @globalRegisters said:

      @impossiblystupid said:
      I'd argue that 204 is more proper than 404 here.

      You can take whatever stand you want.

      It shouldn't be about taking a stand, but deciding to do the correct thing. It is wrong for Google to be making up random URLs it has every reason to think will not lead to content. It is right to respond to them with a "no content" result.

      The fact remains that Googlebot wants to see a 404 in this situation.

      Do you have a documented reference for that "fact"? I can see why they might not want a 200 response (i.e., a soft 404). But a 204 should be seen as an even better response than a 404 to a request for content that is known to not exist.

      I am Impossibly Stupid. Hailed by @jarland as an "incessantly belligerent buffoon." Available for parties. Book early to avoid disappointment.

    • ClouviderClouvider Member, Provider

      Isn't that the webmaster tools verification file?

      They prove it to see if particular account/token is still allowed access.

      Clouvider Leading UK Cloud Hosting solution provider || UK Dedicated Servers Sale || Tasty KVM Slices || Latest LET Offer

      Web hosting in Cloud | SSD & SAS True Cloud VPS on OnApp | Private Cloud | Dedicated Servers | Colocation | Managed Services

    • RalliasRallias Member, Provider

      impossiblystupid said: But a 204 should be seen as an even better response than a 404 to a request for content that is known to not exist.

      Except the 200 series of errors indicate an acceptable response, a 400 series of errors indicates that the client was somehow wrong to make the request it did.

    • impossiblystupid said: Do you have a documented reference for that "fact"? I can see why they might not want a 200 response (i.e., a soft 404). But a 204 should be seen as an even better response than a 404 to a request for content that is known to not exist.

      The RFC would seem to disagree with you; regarding 204 it states:

      The 204 (No Content) status code indicates that the server has successfully fulfilled the request and that there is no additional content to send in the response payload body.

      Whereas 404 states:

      The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists.

      I get what you're saying... Google knows the request is not valid so if you're server knows that Google knows, 204 should be its response. But your server doesn't know and Google (or anything, for that matter) is just picking a random URL so your server should respond appropriately, with a 404.

      I interpret 204 to be something like this: You have a WYSIWYG editor with a user-configurable toolbar, the configuration of said toolbar is stored server-side. When the user makes a change, a request is placed to the server to store that information; if the request succeeds that's 204--everything worked but the server has nothing else to say. If it fails, that's 4xx or 5xx, depending on why. (The above-linked RFC also provides a similar example for 204).

      Thanked by 2vimalware Abdussamad
    • edited June 2016

      @Clouvider said:
      Isn't that the webmaster tools verification file?

      They prove it to see if particular account/token is still allowed access.

      No it's not. What you are referring to is a file to verify you are in control of the website content for a domain.

      This is a request for a web page that should not exist on the site and thus is expecting a 404.

    • quadhostquadhost Member, Provider

      I'd respond with a 404 not a 204.

      Get our US, UK or BG 512mb KVM VPS w/ 2 vCPU's, 512mb RAM, 20GB Storage, 750G bandwidth, 1 IPv4 & /64 IPv6 for only £2/m!
      Monthly payment cycle, discounted upto a year, savings applied at checkout stage. Additional IPv4's available for £0.50 per month!

    • @JustAMacUser said:
      The RFC would seem to disagree with you

      I was aware of the RFC definitions before I picked that response code. It doesn't really disagree with me, either; it's wording is, at best, poorly chosen. I mean, I am successfully fulfilling the request without additional content, therefore a 204. I did find the "current representation" for Google's request (a lot of nothing :-), and I'm fully disclosing that it is nothing, so it's not a 404.

      Google knows the request is not valid so if you're server knows that Google knows, 204 should be its response. But your server doesn't know

      But it does now. I put the directive there myself! I get that the fallback position should not be a 200 (in most cases, anyway), but there are many other completely valid ways to handle such a URL than just kicking out a 404.

      a request is placed to the server to store that information; if the request succeeds that's 204

      It certainly can be, and maybe should be. But I'll wager that for modern web services, you actually get a lot more 200 responses to that than 204's.

      To me it comes down to this: is Google broken or operating as designed when it sends these bad requests? If it is broken, it should of course get a 404 back so they can fix their spider (and/or I can fix my server). By all accounts, though, it is operating as intended when it intentionally spiders non-existant URLs, so it really should be getting a 2XX response of some kind on a well-run server.

      By extension, if we look at "common" files that are expected to exist on most servers (like robots.txt or favicon.ico or any of the newer types that Apple is assuming everyone should start using), what is the "proper" response for them? They're nothing special, so they result in a 404 like anything else that isn't found. But not having them isn't really an error. And the solution to people who don't want them (and don't want the pseudo-error logged) should not be to create empty files all over the place and then return a 200. If I know what they want and I know I don't have it, what is a better response than a 204?

      I am Impossibly Stupid. Hailed by @jarland as an "incessantly belligerent buffoon." Available for parties. Book early to avoid disappointment.

    • >

      If I know what they want and I know I don't have it, what is a better response than a 204?

      I see where you're coming from. At the same time I think the answer most people would respond to your question with is 404.

    • sinsin Member
      edited June 2016

      I just let the Googlebots do their thing because I get a lot of good traffic from Google and don't really want to mess it up.

    • joepie91joepie91 Member, Provider
      edited September 2016

      impossiblystupid said: I was aware of the RFC definitions before I picked that response code. It doesn't really disagree with me, either; it's wording is, at best, poorly chosen. I mean, I am successfully fulfilling the request without additional content, therefore a 204.

      The intention of a 204 status code is generally to indicate that you've successfully fulfilled a non-idempotent request (think eg. adding an item through an API), but there's nothing more to send back other than "yeah, that worked".

      A 204 is explicitly not meant to indicate that "I don't have anything for this URL" - that's what a 404 is for. A 204 is meant when you do have something for the URL, it just doesn't have a response payload associated with it. Hence its use in non-idempotent requests, where that "something" is a request handler.

    • @joepie91 said:
      A 204 is explicitly not meant to indicate that "I don't have anything for this URL" - that's what a 404 is for.

      I don't see that as being the clear intent of the RFC, as I noted in my reply to JustAMacUser. You're welcome to quote and parse the specific wording you think supports your case. I maintain that if the client isn't making a request in good faith, it is perfectly reasonable for my server to respond with not an error, but "I see what you're trying to do, and . . ."

      you get nothing

      A 204 is meant when you do have something for the URL, it just doesn't have a response payload associated with it.

      That is exactly what I have.

      I am Impossibly Stupid. Hailed by @jarland as an "incessantly belligerent buffoon." Available for parties. Book early to avoid disappointment.

    • Sadly there isn't a response code for "I know what you're doing but disagree with your methodology."

    • jarjar Provider
      edited September 2016

      @Damian said:
      This reminds me of people who claim that their VPS is generating extreme load because Google is crawling their site.

      Totally happens. Load up Wordpress with a crappy theme filled with more ajax than anything ever should, install 150 plugins, 2 or more of which should be for ecommerce, 3 for security. Should blow a couple of CPU cores on every page load ;)

      I wish I couldn't say that I've worked on this theoretical site many many times.

      Thanked by 1mycosys
    • JustAMacUser said: Sadly there isn't a response code for "I know what you're doing but disagree with your methodology."

      A 418 might be an appropriate response for an impasse.

      Thanked by 1JustAMacUser
    • RazzaRazza Member
      edited September 2016

      impossiblystupid said: maintain that if the client isn't making a request in good faith, it is perfectly reasonable for my server to respond with not an error

      Ok, I don't understand your logic if a page doesn't exist it's a 404 not 204 as 2xx are for success.

      You don't have to be a arse with your willy wonka meme as @joepie91 was only trying to point out your flawed logic.

    • joepie91joepie91 Member, Provider
      edited September 2016

      impossiblystupid said: I don't see that as being the clear intent of the RFC

      That's because the RFCs were 1) not very detailed to begin with (in part because HTTP is meant to be generically usable), 2) not correctly implemented everywhere, and 3) the de facto implementations mostly decided the "real" meaning of HTTP status codes.

      The specifications are quite clear about the intent of the 404 status code:

      The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address.

      Logically speaking, and according to the usual rule of specifications that "more specific rules trump less specific rules", that means that for a non-existent resource, you are to use 404 unless it should be a 410 instead - and the 204 status code is meant for cases where ...

      The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation.

      ... but excluding the cases where the resource is not found, since that is already covered by 404. Further, "fulfilled" here means "processing the request by serving the requested resource". If a resource is not found, you cannot do that.

      impossiblystupid said: By extension, if we look at "common" files that are expected to exist on most servers (like robots.txt or favicon.ico or any of the newer types that Apple is assuming everyone should start using), what is the "proper" response for them? They're nothing special, so they result in a 404 like anything else that isn't found. But not having them isn't really an error.

      Yes, it is. The client requested a resource that does not exist. That's a client error.

      impossiblystupid said: To me it comes down to this: is Google broken or operating as designed when it sends these bad requests? If it is broken, it should of course get a 404 back so they can fix their spider (and/or I can fix my server). By all accounts, though, it is operating as intended when it intentionally spiders non-existant URLs, so it really should be getting a 2XX response of some kind on a well-run server.

      It's not your job to try and determine the intentions of a client. The whole point of how HTTP is designed, is that it's to be both server-neutral and client-neutral - stateless messages that provide only objective information, and it's up to the receiving party to interpret it according to its own expectations.

      Hence, you return a 404 for "not found", and let Googlebot worry about what a 404 means for its purpose.


      Look, you can try to redefine status codes and "well, technically" your way through it, but the reality is that you are going to break shit, because you're violating the expectations the clients have. Just stick with the spec, and when the spec is unclear, work on understanding what the spirit of the specification is, and how it is commonly implemented. There's really no discussion to be had here.

      Thanked by 1ucxo
    • Since Googlebot is drunk, drink with it!

      Thanked by 1ManofServer
    • joepie91joepie91 Member, Provider

      @JustAMacUser said:
      Sadly there isn't a response code for "I know what you're doing but disagree with your methodology."

      There is! Well, sort of:

      The 422 (Unprocessable Entity) status code means the server understands the content type of the request entity (hence a 415(Unsupported Media Type) status code is inappropriate), and the syntax of the request entity is correct (thus a 400 (Bad Request) status code is inappropriate) but was unable to process the contained instructions. For example, this error condition may occur if an XML request body contains well-formed (i.e., syntactically correct), but semantically erroneous, XML instructions.

      Thanked by 1Abdussamad
    • 204 says no content. If you serve any HTML your doing it wrong. A 204 should be a white page only. 404 is correct useage for not finding a client specified page

    • @JustAMacUser said:
      Sadly there isn't a response code for "I know what you're doing but disagree with your methodology."

      There must be, unless you're suggesting that the server just hang and wait for the timeout.

      @ricardo said:
      A 418 might be an appropriate response for an impasse.

      That's still an error that gets logged on my end. The only error I should see is one I can fix, or one that is legitimately a problem for the end user.

      @Razza said:
      Ok, I don't understand your logic if a page doesn't exist it's a 404 not 204 as 2xx are for success.

      Exactly. It is by all measures a success if a request comes in for a URL where no content is expected, and I return just that. There are all sorts of things you can reasonably return other than a 404 when an "bad" request is received, the most common probably being a redirect.

      You don't have to be a arse with your willy wonka meme as @joepie91 was only trying to point out your flawed logic.

      It's just a graphical depiction of what my server is telling Google, not any sort of attack on joepie.

      @joepie91 said:
      The specifications are quite clear about the intent of the 404 status code:

      The server has not found anything matching the Request-URI.

      Exactly why 404 doesn't apply here. I did find a match for Google's fabricated URLs: no content.

      and the 204 status code is meant for cases where ...

      The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation.

      Exactly the situation outlined in my original post (here's your nothing, and have a cookie). So either you're agreeing that 204 is the correct response, or you're suggesting that some other "more specific" 2xx code is appropriate. I'm all ears regarding what you think is the correct non-error result to give.

      ... but excluding the cases where the resource is not found, since that is already covered by 404. Further, "fulfilled" here means "processing the request by serving the requested resource". If a resource is not found, you cannot do that.

      Again, I did find the resource. It was nothing. I returned that nothing.

      Let me ask you to respond to what you think is the correct way to deal with robots.txt like I mentioned previously. If I have nothing to tell the spiders, I can either leave it returning a 404 error, create an otherwise empty file returning a 200, or give a 204 like I do for these other requests. What do you think is the best practice for all parties involved?

      Yes, it is. The client requested a resource that does not exist. That's a client error.

      But Google isn't fixing their error. In fact, they intentionally wrote their spider to behave badly like that. I have my RFC-compliant workaround that keeps Google's problems out of my log file. What do you suggest is the better way to deal with the drunken Googlebot?

      It's not your job to try and determine the intentions of a client.

      Yes, it is. If I have a client coming in (here's another fabrication Google loves to do) looking for a /m/ or /mobile/ site, I absolutely can determine what their intention is in relation to my server, and consequently redirect them back to / if I have a responsive site (or a subdomain, or whatever else might reasonably fulfill their request). Everything a web server does beyond serving up static files is about trying to determine the intentions of a client.

      Hence, you return a 404 for "not found", and let Googlebot worry about what a 404 means for its purpose.

      But I did find the resource they requested! It was nothing. I returned it with the proper 204 code. They're welcome to figure out what that means for their purpose, too.

      Look, you can try to redefine status codes and "well, technically" your way through it, but the reality is that you are going to break shit, because you're violating the expectations the clients have. Just stick with the spec, and when the spec is unclear, work on understanding what the spirit of the specification is, and how it is commonly implemented. There's really no discussion to be had here.

      I thought so, too, when it ended months ago. Yet here we are again. :-) I again refer you to the robots.txt example if you need something more concrete to consider; I'm not trying to "redefine" anything.

      I am Impossibly Stupid. Hailed by @jarland as an "incessantly belligerent buffoon." Available for parties. Book early to avoid disappointment.

    • joepie91joepie91 Member, Provider

      If you want to pretend that you've found a resource that doesn't exist, then there's no discussion to be had here. This is precisely the "well, technically"ing that I was referring to. I have no intention of discussing here (since I already know what the correct answer is, and your arguments are nonsense), I'm just trying to explain it to you.

      If you want to ignore that, that's your call.

      impossiblystupid said: Let me ask you to respond to what you think is the correct way to deal with robots.txt like I mentioned previously. If I have nothing to tell the spiders, I can either leave it returning a 404 error, create an otherwise empty file returning a 200, or give a 204 like I do for these other requests. What do you think is the best practice for all parties involved?

      A 404.

      Thanked by 3mycosys Clouvider Razza
    • @impossiblystupid said:

      @JustAMacUser said:
      Sadly there isn't a response code for "I know what you're doing but disagree with your methodology."

      There must be, unless you're suggesting that the server just hang and wait for the timeout.

      I was actually making a joke, but in this particular case the correct response to a request for a resource that is not in existence is 404.

    • rincewindrincewind Member
      edited September 2016

      Lol. Waiting for the day when AI bots start trolling each other #FutureofLET

    • @rincewind said:
      Lol. Waiting for the day when AI bots start trolling each other #FutureofLET

      Oh god, please tell me you havent let the luggage into cyperspace?

      Thanked by 1rincewind
    • @joepie91 said:
      If you want to pretend that you've found a resource that doesn't exist, then there's no discussion to be had here. This is precisely the "well, technically"ing that I was referring to. I have no intention of discussing here (since I already know what the correct answer is, and your arguments are nonsense), I'm just trying to explain it to you.

      And from my point of view, it is you who is pulling the "well, technically" card. I'm not in any way "pretending" to find a resource that doesn't exist. Whether it's an empty robots.txt file or any of Google's fabricated URLs (or all sorts of other similar files that clients just assume will exist), I have specifically implemented the process of finding them, and it turns out they lack any useful content, so I respond appropriately.

      impossiblystupid said: Let me ask you to respond to what you think is the correct way to deal with robots.txt like I mentioned previously. If I have nothing to tell the spiders, I can either leave it returning a 404 error, create an otherwise empty file returning a 200, or give a 204 like I do for these other requests. What do you think is the best practice for all parties involved?

      A 404.

      We simply have different standards for how a web site should be properly run. If I can fix an error, I do. I don't know why that approach seems to rub people the wrong way.

      I am Impossibly Stupid. Hailed by @jarland as an "incessantly belligerent buffoon." Available for parties. Book early to avoid disappointment.

    • joepie91joepie91 Member, Provider

      impossiblystupid said: And from my point of view, it is you who is pulling the "well, technically" card.

      No. I'm telling you what clients expect.

      impossiblystupid said: I'm not in any way "pretending" to find a resource that doesn't exist. Whether it's an empty robots.txt file or any of Google's fabricated URLs (or all sorts of other similar files that clients just assume will exist), I have specifically implemented the process of finding them, and it turns out they lack any useful content, so I respond appropriately.

      I don't think you understand what "resource" means here...

      impossiblystupid said: We simply have different standards for how a web site should be properly run.

      Again - I'm telling you what clients expect. This isn't really a point of discussion. You can either implement it, or choose to break shit instead.

    • SplitIceSplitIce Member, Provider

      Redirect your site to 127.0.0.1 and give Googlebot a ride home.

      X4B - DDoS Protection: Affordable DDoS protection including Layer 7 mitigation with PoPs in the US, EU and Asia.
      Latest Offer: $14 in Asia DDoS mitigation
    • @impossiblystupid said:

      @joepie91 said:
      If you want to pretend that you've found a resource that doesn't exist, then there's no discussion to be had here. This is precisely the "well, technically"ing that I was referring to. I have no intention of discussing here (since I already know what the correct answer is, and your arguments are nonsense), I'm just trying to explain it to you.

      And from my point of view, it is you who is pulling the "well, technically" card. I'm not in any way "pretending" to find a resource that doesn't exist. Whether it's an empty robots.txt file or any of Google's fabricated URLs (or all sorts of other similar files that clients just assume will exist), I have specifically implemented the process of finding them, and it turns out they lack any useful content, so I respond appropriately.

      Mate - very simple. There is a difference between an enpty set and an undefined set. 2xx indicates and empty set, a valid file that has been processed and served. 404 indicates there is no set defined, not just no content but no content container. If someone tells you to go get a box in the corner would you return after not finding it and tell them it was empty? Only if it was a practical joke.

    • 204 means there is no content to return. Or nothing serverd apart from the header.

      https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

      The 204 response MUST NOT include a message-body, and thus is always terminated by the first empty line after the header fields

      ^^^^^

      If you serve anything other than a blank page with that 204. Your using it wrong. You should be using a 404. The client requested something that doesn't exist.

    • @impossiblystupid said:
      Any other tips or tricks you use to keep the clutter in your log files to a minimum?

      I would disable the logging altogether if I can't deal with the log itself.

      Sorry for my bad English

    • k0nslk0nsl Member, Member without signature

      Jeez...what a thick ****.

    • @joepie91 said:

      impossiblystupid said: And from my point of view, it is you who is pulling the "well, technically" card.

      No. I'm telling you what clients expect.

      But you had just previously said "It's not your job to try and determine the intentions of a client." Are you changing your tune on that point?

      I don't think you understand what "resource" means here...

      And I don't think you understand the usefulness of a null return value. Your JavaScript coding advice must be horrible! :-)

      Again - I'm telling you what clients expect. This isn't really a point of discussion. You can either implement it, or choose to break shit instead.

      Absolutely nothing should break if a client gets back an 204 response for a resource instead of a 404 or a 200 or a 301. That is especially true if the URL was completely fabricated in the first place, so the client should have no expectation on what gets returned.

      @mycosys said:
      If someone tells you to go get a box in the corner would you return after not finding it and tell them it was empty? Only if it was a practical joke.

      Or if it was a crazy drunk who was constantly asking me to get their box. Or a child. I would absolutely pantomime giving them an invisible box. My mind is still flexible to that kind of lateral thinking.

      @exussum said:
      If you serve anything other than a blank page with that 204.

      The original post makes it clear that I'm doing just that, so I don't know what you're going on about here. You could have even tested it directly with one of the example URLs. You'll find you get back a big fat Content-Length: 0.

      I am Impossibly Stupid. Hailed by @jarland as an "incessantly belligerent buffoon." Available for parties. Book early to avoid disappointment.

    • Google is pretty lenient with status codes either way, it's either going to see "content that it will potentially index from a 200 status code, or a 30* redirect pointing to another potentially indexable resource", otherwise it's not going to do much else with it. If it continually sees errors of some kind, it may affect the crawl budget of your site.

      One way to 'end the argument' is to verify your site in Webmaster tools and see if there are errors being reported wrt your 204 responses. I suspect not. TBH, the consensus would be to serve a 404 but in practical terms, I don't think it's going to make a difference serving a 204.

      Thanked by 1impossiblystupid
    • dailydaily Member
      edited September 2016

      @exussum said:
      204 says no content. If you serve any HTML your doing it wrong. A 204 should be a white page only. 404 is correct useage for not finding a client specified page

      This. Strange this was ignored by the OP. (edit: It wasn't, my bad.)

      If you are sending even a page that notes "this page or content does not exist", then it isn't a 204, it is a 404.

      impossiblystupid said: You could have even tested it directly with one of the example URLs. You'll find you get back a big fat Content-Length: 0.

      Can't really when we don't have the website you're talking about.

    • @daily said:

      impossiblystupid said: You could have even tested it directly with one of the example URLs. You'll find you get back a big fat Content-Length: 0.

      Can't really when we don't have the website you're talking about.

      It's the one in my signature. Or the directives given in the original post could just be dropped into your own site config to test the results.

      I am Impossibly Stupid. Hailed by @jarland as an "incessantly belligerent buffoon." Available for parties. Book early to avoid disappointment.

    Sign In or Register to comment.