'm experiencing difficulties with extracting text from images, even after preprocessing them. Here are the details of my situation:
I'm using Tesseract OCR to extract text from images, and I've implemented the following logic:
The issue is that I'm not getting any results, even though I've preprocessed the images. Could you please help me identify the problem?
import cv2import pytesseractimport redef preprocess_image(img_path, save_path=None): try: img = cv2.imread(img_path) if img is None: raise Exception(f"Error loading image: {img_path}") img_scaled = cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), None, fx=3.0, fy=3.0) _, binary = cv2.threshold(cv2.morphologyEx(img_scaled, cv2.MORPH_CLOSE, cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))), 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU) if save_path: cv2.imwrite(save_path, binary) return binary except Exception as e: print(f"Error in image preprocessing: {str(e)}") return Nonedef extract_text(image_path, save_path=None): try: preprocessed_img = preprocess_image(image_path, save_path) if preprocessed_img is None: raise Exception("Image preprocessing failed") custom_config = r'--oem 3 --psm 6' text = pytesseract.image_to_string(preprocessed_img, config=custom_config) return text.strip() except Exception as e: print(f"Error extracting text: {str(e)}") return ""def extract_simple(text): if len(text) < 4: pattern = re.compile(r'\d+') matches = pattern.findall(text) return f"Bolt {matches[0]} kr" if matches else "Bolt - kr" else: return "Bolt - kr"# Usage example:image_path = "area_1.jpg" # Path to the image you want to process# Extract text from the imageextracted_text = extract_text(image_path)# Process the extracted textprocessed_text = extract_simple(extracted_text)# Print the processed textprint("Processed Text:")print(processed_text)
I tried using Tesseract OCR to extract text from images after preprocessing them. I expected the extracted text to be accurate and relevant to the content in the images. However, what actually resulted was that I didn't get any text extraction results, even though I had preprocessed the images.