Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12171

Google Cloud Run - Random 504 Http Errors

$
0
0

We're currently running a Java 11 application on Cloud Run with Jetty 10 and have encountered an unusual issue. While the service generally operates smoothly, we've noticed that a small fraction of POST requests, approximately 1,000 out of 3,000,000 processed in the last month, fail mysteriously. These failed requests result in either a 503 malformed response or a 504 gateway timeout HTTP error, with the majority being 504 errors. Strangely, these requests seem to remain in an idle state without being picked up by the instance for execution. Eventually, they end up executing successfully on a different instance. Moreover, these failed requests share the same instanceID (there are some successful requests for this instance among the failed ones which is even more bizarre).

Regarding the 503 malformed response errors, we've consulted Google's troubleshooting documentation and ruled out memory issues and application-level timeouts. Despite our investigation, we haven't identified any downstream network bottlenecks, and our request rate remains well below Google's specified limits at an average of 15 requests per second.

For the 504 gateway timeout errors, the message indicates that the request has reached the maximum request timeout, yet it appears that these requests never actually reach the instance, as they don't produce any logs indicating execution.

Below are some related screenshots.

Liveness probe followed by 200 OK and 504 Gateway Timeout

504 Http Errors

To address these issues, we've taken several steps:

  1. Configured readiness and liveness probes to determine instance health, which have generally functioned well. In one "faulty" instance, the liveness probe detected an issue after one hour, resulting in the termination of this instance.
  2. Monitored CPU and memory utilization, both of which appear healthy, with CPU consistently under 50% and no occurrences of out-of-memory errors.
  3. Ensured proper closure of resources, such as Google Cloud file storage and Redis client connections, to prevent resource leaks.
  4. Confirmed compliance with Google Cloud API quotas, ensuring that we're not hitting any limits.

Cloud Run Configuration

CPU allocation: CPU is always allocated.Startup CPU boost: EnabledConcurrency: 80Request timeout: 1800 secondsExecution environment: Second generationAutoscaling: Enabled Min instances: 12Max instances: 100CPU Limit: 4Memory Limit: 8GBSession Affinity: EnabledHTTP2: Disabled

Furthermore, we utilize a VPC connector to route traffic only to private IPs, and we remain within all specified thresholds.

We would really appreciate any insights you might have into what might be causing these issues.

Best regards.


Viewing all articles
Browse latest Browse all 12171

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>