Tuesday 21 September 2021

Extract Text From PDF File Using Python

 To achieve what we want which is text extraction first of all we need to have the python package called PyPDF.

Installation of PyPDF

Open your terminal as administrator and type the command below.

pip install PyPDF2


Here is a glance of the PDF we input to the code for extraction.



Now let's see the python code to extract text from the pdf.


import PyPDF2 
    
pdfFileObject = open('mynote.pdf', 'rb') 
    
pdfReader = PyPDF2.PdfFileReader(pdfFileObject) 
    
print("Number of pages in the pdf : ",pdfReader.numPages)

print()
print("******************")
    
pageObject = pdfReader.getPage(0) 
    
print(pageObject.extractText()) 
    
pdfFileObject.close() 



Below is a snap of the output.




Let's look into the code in line by line.


pdfFileObject = open('mynote.pdf', 'rb')

Here we open the mynote.pdf in binary mode and saved the file object as pdfFileObject.


pdfReader=PyPDF2.PdfFileReader(pdfFileObject)

Now here we create an object of PdfFileReader class of PyPDF2 module.
Then we pass the pdf file object and get a pdf reader object.


print("Number of pages in the pdf : ",pdfReader.numPages)

numPages gives the exact number of pages in PDF file. In our case , it is 4.


pageObject = pdfReader.getPage(0) 

Here we create an object of PageObject class of PyPDF2 module.
PDF reader object has function getPage() which takes the page number starting from index 0.
Then it returns the page object.

print(pageObject.extractText())

Page object has the function extractText() to extract text from our PDF file.


pdfFileObject.close()

Finally we close the PDF file object.

Monday 6 September 2021

Zipping Files and Folders using Python

This is the python code to zip a file:

import zipfile
zip_file = zipfile.ZipFile('temp.zip','w')
zip_file.write('test.txt',compress_type=zipfile.ZIP_DEFLATED)
zip_file.close()


Before the code is run : 







After the code is run :



You can see 'temp.zip' is created which is the zipped file of 'test.txt'  .



Now let's look into the code line by line. 


Line 1 :

 zipfile is a class of zipfile module for reading and writing zip files.


Line 2 : 

Here zip file object is created(zip_file), 'temp.zip' is the name of the zip file you want to make. 

zipfile.ZipFile is the class for reading and writing ZIP files.

We use 'w' because here the system wants to create a new zip file.


Line 3 : 

'test.txt' is the file you want to zip.

ZIP_DEFLATED is the numeric constant for the usual ZIP compression method.


Line 4 :

zip_file.close() is used to close the archive file before exiting the program



Next is the python code to zip a folder:

import shutil
try:
    shutil.make_archive('new','zip','test')
except Exception as e:
    print(e)
else:
    print('Zipping Done')


Before the code is run :




After the code is run :



You can see 'new'  is created which is the zipped file of the folder 'test' .



Now let's look into the code line by line. 


Line 1 :

shutil module offers a number of high-level operations on files and collections of files.


Line 3 :

shutil.make_archive create an archive file(such as zip or tar) and return its name.

'new' is the name of the zip file you want to make.

'test' is the name of the folder you want to zip.


If you folder is zipped successfully you'll be prompted as 'Zipping Done'.
 


Introduction to the Python Calendar Module

 The Calendar module is built into Python 3. But for some reason it is installed by default. We can install it to  - Windows Administrator u...