Tuesday 21 September 2021

Extract Text From PDF File Using Python

 To achieve what we want which is text extraction first of all we need to have the python package called PyPDF.

Installation of PyPDF

Open your terminal as administrator and type the command below.

pip install PyPDF2


Here is a glance of the PDF we input to the code for extraction.



Now let's see the python code to extract text from the pdf.


import PyPDF2 
    
pdfFileObject = open('mynote.pdf', 'rb') 
    
pdfReader = PyPDF2.PdfFileReader(pdfFileObject) 
    
print("Number of pages in the pdf : ",pdfReader.numPages)

print()
print("******************")
    
pageObject = pdfReader.getPage(0) 
    
print(pageObject.extractText()) 
    
pdfFileObject.close() 



Below is a snap of the output.




Let's look into the code in line by line.


pdfFileObject = open('mynote.pdf', 'rb')

Here we open the mynote.pdf in binary mode and saved the file object as pdfFileObject.


pdfReader=PyPDF2.PdfFileReader(pdfFileObject)

Now here we create an object of PdfFileReader class of PyPDF2 module.
Then we pass the pdf file object and get a pdf reader object.


print("Number of pages in the pdf : ",pdfReader.numPages)

numPages gives the exact number of pages in PDF file. In our case , it is 4.


pageObject = pdfReader.getPage(0) 

Here we create an object of PageObject class of PyPDF2 module.
PDF reader object has function getPage() which takes the page number starting from index 0.
Then it returns the page object.

print(pageObject.extractText())

Page object has the function extractText() to extract text from our PDF file.


pdfFileObject.close()

Finally we close the PDF file object.

Monday 6 September 2021

Zipping Files and Folders using Python

This is the python code to zip a file:

import zipfile
zip_file = zipfile.ZipFile('temp.zip','w')
zip_file.write('test.txt',compress_type=zipfile.ZIP_DEFLATED)
zip_file.close()


Before the code is run : 







After the code is run :



You can see 'temp.zip' is created which is the zipped file of 'test.txt'  .



Now let's look into the code line by line. 


Line 1 :

 zipfile is a class of zipfile module for reading and writing zip files.


Line 2 : 

Here zip file object is created(zip_file), 'temp.zip' is the name of the zip file you want to make. 

zipfile.ZipFile is the class for reading and writing ZIP files.

We use 'w' because here the system wants to create a new zip file.


Line 3 : 

'test.txt' is the file you want to zip.

ZIP_DEFLATED is the numeric constant for the usual ZIP compression method.


Line 4 :

zip_file.close() is used to close the archive file before exiting the program



Next is the python code to zip a folder:

import shutil
try:
    shutil.make_archive('new','zip','test')
except Exception as e:
    print(e)
else:
    print('Zipping Done')


Before the code is run :




After the code is run :



You can see 'new'  is created which is the zipped file of the folder 'test' .



Now let's look into the code line by line. 


Line 1 :

shutil module offers a number of high-level operations on files and collections of files.


Line 3 :

shutil.make_archive create an archive file(such as zip or tar) and return its name.

'new' is the name of the zip file you want to make.

'test' is the name of the folder you want to zip.


If you folder is zipped successfully you'll be prompted as 'Zipping Done'.
 


Sunday 13 June 2021

Useful DOS Commands For A Software Developer.

 Assoc - it displays a full list of file name extensions and program associations.

you can a get a view all the file associations your computer knows about.

 

 - the above is to get associate information about the .py file extension.



Cipher - it can be used to encrypt or decrypt data on NTFS drives.

this tool also lets to securely delete data by overwriting it.


Driverquery - it displays all the drivers installed on your computer.

you'll be seeing a list of all the drivers along with their name, type, and other information.




fc - this can be used to identify differences in text between two files.




Powercfg /? - allows the user to view and modify the power plans and settings.



Systeminfo - detail configuration overview of your computer will be displayed.

ex : it shows information about OS configuration, security information, product id and hardware properties.



    

sfc /scannow - to run this command first you should launch CMD as an administrator.

entering this command will check the integrity of all protected files.

if a problem is found files will be repaired with backup system files.





schtasks - enables an administrator to create, delete, query, change, run and end scheduled tasks.



chkdsk - to run this command first you should launch CMD as an administrator.

can be used to scan an entire disk.



.

attrib - used to display or change the file attributes for a file or folder.




robocopy - to make copies of files and folder.



cls - clears the screen.

       clear out previously typed commands in cmd.


before cls command :



after cls command:











Introduction to the Python Calendar Module

 The Calendar module is built into Python 3. But for some reason it is installed by default. We can install it to  - Windows Administrator u...