Welcome to Ed2Ti Blog.

My journey on IT is here.

Programing - Python

Programing - Python

PDF OCR with Python

- Posted in Programing - Python by

OCR (Optical Character Recognition) is a process of converting scanned images, PDFs, or other documents into editable text. In Python, there are several libraries available for OCR, including PyOCR, Tesseract OCR, and OCRopus. In this answer, we will use PyOCR and Pillow libraries to perform OCR on a PDF file.

First, we need to install the required libraries. You can install PyOCR and Pillow using pip:

pip install pyocr pillow

Next, we will write the code to perform OCR on the PDF file. Here is a sample code:

import io
import sys
import pyocr
import pyocr.builders
from PIL import Image
from pdf2image import convert_from_path

# Path of the PDF file
pdf_path = 'example.pdf'

# Convert PDF to PIL Image objects
pages = convert_from_path(pdf_path)

# OCR
tool = pyocr.get_available_tools()[0]
lang = tool.get_available_languages()[0]

for page in pages:
    txt = tool.image_to_string(
        Image.fromarray(page),
        lang=lang,
        builder=pyocr.builders.TextBuilder()
    )
    print(txt)

In this code, we first convert the PDF file to PIL Image objects using the pdf2image library. Then, we loop through each page of the PDF and perform OCR using the PyOCR library. Finally, we print the extracted text from each page.

Note that the OCR accuracy depends on the quality of the scanned image, the language of the text, and the font used in the document. Therefore, you may need to experiment with different OCR engines, languages, and settings to get the best results for your specific use case.

College: Trebas Institute
Professor: Iyad Koteich

It's impressive how fast and easy you can start your won API solution using Flask and Python.

In this post, I want to share with you my experience using tools and concepts to create a simple API solution.

Tools:

  • Phyton (https://www.python.org/)
  • Flask (https://flask.palletsprojects.com/en/2.2.x/)
  • SQLite (https://www.sqlite.org/)
  • Postman (https://web.postman.co/)

In this restful API I'm using the verbs:
* POST - To insert information into a database
* GET - To recover information from a database.
* PUT (I'm not using) - To update information into a database.
* DELETE (I'm not using) - To delete information from a database.

The next objective is to create a full CRUD API.
All the codes are here on my Git repository: https://github.com/ed2ti/python_flask_db

Today we did a program in Python just to convert celsius to Fahrenheit

# ********* #
# College : Trebas Institute 
# Professor: Iyad Koteich
# Class : Edward
# Day: 03/10/2022
# ********* #

def convert(gc):
    gf = (gc*9/5)+32
    return gf

def good(gf):
    if (gf>60 and gf<90):
        print(f"{gf} fahrenheit is a good Wealth")
    else:
        print(f"{gf} fahrenheit is not Good Wealth")

def trebas():
    print("College : Trebas Institute") 
    print("Professor: Iyad Koteich")
    print("Class : Edward")
    print("Day: 03/10/2022")
    print("")

trebas()

optoin = '' 
while optoin != '0':
    option = int(input("Value in celcus: "))
    if option != 0:
        gf = convert (option)
        good(gf)
    else:
        print("Exiting")
        break