I have a Python script to download an image from a URL and upload it to AWS S3. This script works perfectly when I run it on my local machine. However, when I deploy and run the same script on an AWS EC2 instance, I encounter a ReadTimeout
error.
The error I'm receiving is as follows:
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.net-a-porter.com', port=443): Read timed out. (read timeout=100)
Below is the relevant part of my code:
import requestsimport tempfileimport osdef upload_image_to_s3_from_url(self, image_url, filename, download_timeout=120):""" Downloads an image from the given URL to a temporary file and uploads it to AWS S3, then returns the S3 file URL.""" try: headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36",'Accept': 'image/avif,image/webp,image/apng,image/*,*/*;q=0.8' } # Request the image response = requests.get(image_url, timeout=download_timeout, stream=True, headers=headers) response.raise_for_status() # Determine the content type content_type = response.headers.get('Content-Type', 'image/jpeg') # Default to image/jpeg # Create a temporary file with tempfile.NamedTemporaryFile(delete=False) as tmp_file: # Write the response content to the temporary file for chunk in response.iter_content(chunk_size=8192): tmp_file.write(chunk) # Now that we have the image locally, upload it to S3 with the correct content type file_url = self.upload_image_to_s3(tmp_file.name, filename, content_type) # Optionally, delete the temporary file here if you set delete=False os.unlink(tmp_file.name) return file_url except requests.RequestException as e: raise Exception(f"Failed to download or upload image. Error: {e}")# Example URL causing issuesimage_url = "https://www.net-a-porter.com/variants/images/1647597326276381/in/w1365_a3-4_q60.jpg"
This issue occurs when trying to download an image from www.net-a-porter.com
. The timeout is set to 120 seconds, which I assumed would be more than enough.
What I've tried so far:
- Increasing the timeout duration
- Changing the
User-Agent
in the request headers - Running the script at different times of the day to rule out server load issues
Any insights or suggestions on how to resolve this issue would be greatly appreciated.