Search

Series List Filter

How to Work With a PDF in Python

Whether it is an ebook, digitally signed agreements, password protected documents, or scanned documents such as passports, the most preferred file format is PDF or Portable Document Format. It was originally developed by Adobe and is a file format used to present and transfer documents easily and reliably. It uses the file extension .pdf. In fact, PDF being the most widely used digital media, is now considered as an open standard which is maintained by the International Standards Organization (ISO). Python has relatively easy syntax which makes it even easier for the ones who are in their initial stage of learning the language. The popular Python libraries are well suited and integrated which allows to easily extract documents from a PDF, rotate pages if required, split pdf to make separate documents, or add watermarks in them. Now an important question rises, why do we need Python to process PDFs? Well, processing a PDF falls under the category of text analytics. There are several libraries and frameworks available which are designed in Python exclusively for text analytics. This makes it easier to play with a PDF in Python. You can also extract information from PDF and use into Natural Language Processing or any other Machine Learning models. Get certified and learn more about Python Programming and apply those skills and knowledge in the real world.History of  pyPDF, PyPDF2, pyPDF4The first PyPDF package was released in 2005 and the last official release in 2010. After a year or so, a  company named Phasit sponsored a branch of the PyPDF called PyPDF2 which was consistent with the original package and worked pretty well for several years.A series of packages were released later on with the name of PyPDF3 and later renamed as PyPDF4. The biggest difference between PyPDF and the other versions was that the later versions supported Python3. PyPDF2 has been discarded recently. But since PyPDF4 is not fully backward compatible with the PyPDf2, it is suggested to use PyPDF2. You can also use a substitute package - pdfrw. Pdfrw was created by Patrick Maupin and allows you to perform all functions which PyPDF2 is capable of except a few such as encryption, decryption, and types of decompression.Some common libraries in PythonLet us look into some of the libraries Python offers to handle PDFs:PdfMiner It is a tool used to extract information from PDF documents. PDFMiner allows the user to analyze text data and obtain the definite location of a text. It provides information such as fonts and lines. We can also use it as a PDF transformer and a PDF parser.PyPDF2PyPDF2 is purely a Python library which allows users to split, merge, crop, encrypt, and transform PDFs. You can also add customized data, view options, and passwords to the documents. Tabula-pyIt is a Python wrapper of tabula-java which can read tables from PDF files and convert into Pandas Dataframe or into CSV/TSV/JSON file formats.SlateIt is a Python package which facilitates the extraction of information and is dependent on the PdfMiner package.PDFQueryA light Python wrapper which uses minimum code to extract data from PDFs.xPDFIt is an open source viewer of PDF which also includes an extractor, converter and other utilities. Out of all the libraries mentioned above, PyPDF2 is the most used to perform operations like extraction, merging, splitting and so on.Installing PyPDF2If you're using Anaconda, you can install PyPDF2 using pip or conda. To install PyPDF2 using pip, run the following command in the command line:pip install PyPDF2The module is case-sensitive. So you need to make sure that proper syntax is followed. The installation is really quick since PyPDF2 is free of dependencies.Extracting Document Information from a PDF in PythonPyPDF2 can be used to extract metadata and all sorts of texts from PDF when you are performing operations on preexisting PDF files. The types of data you can extract are:AuthorCreatorProducerSubjectTitleNumber of PagesTo understand it better, let us use an existing PDF in your system or you can go to Leanpub and download a book sample.The code for extracting the document information from the PDF—# get_doc_info.py from PyPDF2 import PdfFileReader def getinfo(path):     with open(path, 'rb') as f:         PDF = PdfFileReader(f)         information = PDF.getDocumentInfo()         numberofpages = PDF.getNumPages()     print(information)     author = information.author     creator = information.creator     producer =information .producer     subject = information.subject     title = information.title if __name__ == '__main__':     path = 'reportlab-sample.pdf'     getinfo(path)The output of the program above will look like—Here, we have firstly imported PdfFileReader from the PyPDF2 package. The class PdfFileReader is used to interact with PDF files like reading and extracting information using accessor methods. Then, we have created our own function getinfo with a PDF file as an argument and then called the getdocumentinfo(). This returned an instance of DocumentInformation. And finally we got extract information like the author, creator, subject or title, etc.getNumPages() is used to count the number of pages in the document. PdfMiner can be used when you want to extract text from a PDF file. It is potent and particularly designed for extracting text from PDF.We have learned to extract information from PDF. Now let’s learn how to rotate a PDF. Rotating pages in PDFA lot of times we receive PDFs which contain pages in landscape orientation instead of portrait. You may also find certain documents to be upside down, which happens while scanning a document or mailing. However, we can rotate the pages clockwise or counterclockwise according to our choice using Python with PyPDF2.The code for rotating the article is as follows—# rotate_pages.py from PyPDF2 import PdfFileReader, PdfFileWriter def rotate(pdf_path):     pdf_write = PdfFileWriter()     pdf_read = PdfFileReader(path)     # Rotate page 90 degrees to the right     page1 = pdf_read.getPage(0).rotateClockwise(90)     pdf_write.addPage(page1)     # Rotate page 90 degrees to the left     page2 = pdf_read.getPage(1).rotateCounterClockwise(90)     pdf_write.addPage(page2)     # Add a page in normal orientation     pdf_write.addPage(pdf_read.getPage(2))     with open('rotate_pages.pdf', 'wb') as fh:         pdf_write.write(fh) if __name__ == '__main__':     path = 'mldocument.pdf'     rotate(path)The output of the code will be as follows—Here firstly we imported the PdfFileReader and the PdfFileWriter so that we can write out a new PDF file. Then we declared a function rotate with a path to the PDF that is to be modified. Within the function, we created a read object pdf_read and write object pdf_write.Then, we used the getPage() to grab the pages. Two pages page1 and page2 are taken and rotated to 90 degrees clockwise and 90 degrees counterclockwise respectively using rotateClockwise() and rotateCounterClockwise().We used addPage() function after each rotation method calls. This adds the rotated page to the write object. The last page we add is page3 without any rotation.Lastly, we have used write() with a file-like parameter to write out the new PDF. The final PDF contains three pages, the first two will be in the landscape mode and rotated in reversed direction and the third page will be in normal orientation.Now we will learn to merge different PDFs into one.Merging PDFsIn many cases, we need to merge two PDFs into a single one. For example, suppose you are working on a project report and you need to print it and bind it into a book. It contains a cover page followed by the project report. So you have two different PDFs and you want to merge them into one PDF. You can simply use Python to do so. Let us see how can we merge PDFs into one.The code for merging two PDF documents using PyPDF in mentioned below:# pdf_merging.py from PyPDF2 import PdfFileReader, PdfFileWriter def pdfmerger(paths, output):     pdfwrite = PdfFileWriter()     for path in paths:         pdfread = PdfFileReader(path)         for page in range(pdfread.getNumPages()):             # Add each page to the writer object             pdfwrite.addPage(pdfread.getPage(page))     # Write out the merged PDF     with open(output, 'wb') as out:         pdfwrite.write(out) if __name__ == '__main__':     paths = ['document-1.pdf', 'document-2.pdf']     pdfmerger(paths, output='merged.pdf')Here we have created a function pdfmerger() which takes a number of inputs and a single output. Then we created a PdfFileReader() object for each PDF path and looped over the pages, added each page to the write object. Finally, using the write() function the object’s contents are written to the disk.PyPDF2 makes the process of merging simpler by creating the PdfFileMerger class.Code for merging two documents using PyPDF2—# pdf_merger2.py import glob from PyPDF2 import PdfFileMerger def merger(output_path, input_paths):     pdfmerge = PdfFileMerger()     file_handles = []     for path in input_paths:         pdfmerge.append(path)     with open(output_path, 'wb') as fileobj:         pdfmerge.write(fileobj) if __name__ == '__main__':     paths = glob.glob('d-1.pdf')     paths.sort()     merger('d-2.pdf', paths)The PyPDF2 makes it simpler in the way that we don’t need to loop the pages of each document ourselves.  Here, we created the object pdfmerge and looped through the PDF paths. The PyPDF2 automatically appends the whole document. Finally, we write it out.Let’s perform the opposite of merging now!Splitting PDFsThe PyPDF2 package has the ability to split up a single PDF into multiple PDFs. It allows us to split pages into different PDFs. Suppose we have a set of scanned documents in a single PDF and we need to separate the pages into different PDFs as per requirement, we can simply use Python to select pages we want to split and get the work done.Code for splitting a single PDF into multiple PDFs—# pdf_splitter.py import os from PyPDF2 import PdfFileReader, PdfFileWriter def splitpdf(path):     fname = os.path.splitext(os.path.basename(path))[0]     pdf = PdfFileReader(path)     for page in range(pdf.getNumPages()):         pdfwrite = PdfFileWriter()         pdfwrite.addPage(pdf.getPage(page))         outputfilename = '{}_page_{}.pdf'.format(             fname, page+1)         with open(outputfilename, 'wb') as out:             pdfwrite.write(out)         print('Created: {}'.format(outputfilename)) if __name__ == '__main__':     path = 'document-1.pdf'     splitpdf(path)Here we have imported the PdfFileReader and PdfFileWriter from PyPDF2. Then we created a function called splitpdf() which accepts the path of PDF we want to split. The first line of the function takes the name of the input file. Then we open the PDF and create a read object. Using the read object’s getNumPages(), we loop over all the pages.In the next step, we created an instance of PdfFileWriter inside the for loop. Then, we created a PDF write instance and added each page to it for each of the pages in the PDF input. We also created a unique filename using the original filename + the word ‘page’ + the page number + 1.Once we are done with running the script, we will have each of the pages of the input PDF split into multiple PDFs. Now let us learn how to add a watermark to a PDF and keep it secured.Adding Overlays/WatermarksAn image or superimposed text on selected pages in a PDF document is referred to as a Watermark. The Watermark adds security features and protects our rational property like images and PDFs. Watermarks are also called overlays.The PyPDF2 allows us to watermark documents. We just need to have a PDF which will consist of our watermark text, image or signature.Code for adding a watermark in a PDF—# watermarker.py from PyPDF2 import PdfFileWriter, PdfFileReader def watermark(inputpdf, outputpdf, watermarkpdf):     watermark = PdfFileReader(watermarkpdf)     watermarkpage = watermark.getPage(0)     pdf = PdfFileReader(inputpdf)     pdfwrite = PdfFileWriter()     for page in range(pdf.getNumPages()):         pdfpage = pdf.getPage(page)         pdfpage.mergePage(watermarkpage)         pdfwrite.addPage(pdfpage)     with open(outputpdf, 'wb') as fh:         pdfwrite.write(fh) if __name__ == '__main__':     watermark(inputpdf='document-1.pdf',               outputpdf='watermarked_w9.pdf',               watermarkpdf='watermark.pdf')The output of the code will look like— There are three arguments of the function watermark(): inputpdf: The path of the PDF that is to be watermarked. outputpdf: The path where the watermarked PDF will be saved. watermarkpdf: The PDF which contains the watermark.Firstly, we extract the PDF page which contains the watermark image or text and then open that PDF page where we want to give the desired watermark.Using the inputpdf, we create a read object and using the pdfwrite, we create a write object to write out the watermarked PDF and then iterate over the pages.Next, we call the page object’s mergePage and apply the watermark and add that to the write object pdfwrite.When the loop terminates, the watermarked PDF is written out to the disk and it’s done!Encrypting a PDFIn the PDF world, the PyPDF2 package allows an owner password which gives the user the advantage to work as an administrator. The package also provides the user password which allows us to open the document upon entering the password.The PyPDF2 basically doesn’t permit any allowances on any PDF file yet it allows the user to set the owner password and user password.Code to add a password and add encryption to a PDF—# pdf_encrypt.py from PyPDF2 import PdfFileWriter, PdfFileReader def encryption(inputpdf, outputpdf, password):     pdfwrite = PdfFileWriter()     pdfread = PdfFileReader(inputpdf)     for page in range(pdfread.getNumPages()):         pdfwrite.addPage(pdfread.getPage(page))     pdfwrite.encrypt(user_pwd=password, owner_pwd=None,                       use_128bit=True)     with open(outputpdf, 'wb') as fh:         pdfwrite.write(fh) if __name__ == '__main__':     encryption(inputpdf='document-1.pdf',                   outputpdf='document-1-encrypted.pdf',                   password='twofish')We declare a  function named encryption() with three arguments—the input PDF path, the output PDF path and the password that we want to keep. Then we create one read object pdfread and one write object pdfwrite. Now we loop over all the pages and add them to the write object since we need to encrypt the entire document.Finally, we call the encrypt() function which accepts three parameters—the user password, the owner password and the whether or not to use 128-bit encryption. The PDF  will be encrypted to 40-bit encryption if the argument use128bit is set to false. Also if the owner password is set to none, then it will be set to user password automatically.Reading the Table data from PDFSuppose you want to work with the Table data in Pdf, you can use tabula-py to read tables in a PDF. To install tabula-py, run:pip install tabula-pyCode to extract simple Text from pdf using PyPDF2:import tabula # readinf the PDF file that contain Table Data # you can find the pdf file with complete code in below # read_pdf will save the pdf table into Pandas Dataframe df = tabula.read_pdf("document.pdf") # in order to print first 5 lines of Table df.head()If you PDF file contains Multiple Tabledf = tabula.read_pdf("document.pdf",multiple_tables=True)If you want to extract Information from the specific part of any specific page of PDFtabula.read_pdf("document.pdf", area=(126,149,212,462), pages=1)If you want the output into JSON Formattabula.read_pdf("offense.pdf", output_format="json")Exporting PDF into ExcelSuppose you want to export a PDF into Excel, you can do so by writing the following code and convert the PDF Data into Excel or CSV.tabula.convert_into("document.pdf", "document_testing.xlsx", output_format="xlsx")Let us sum up what we have learned in the article:Extraction of data from a PDFRotate pages in a PDFMerge PDFs into one PDFSplit a PDF into many PDFsAdd watermarks or overlays in a PDFAdd password or encryption to a PDFReading table from PDFExporting PDF into Excel or CSVAs you have seen, PyPDF2 is one of the most useful tools available in Python. The features of PyPDF2 makes life easier whether you are working on a large project or even when you quickly want to make some changes to your PDF documents. Learn more about such libraries and frameworks as KnowledgeHut offers Python Certification Course for Programmers, Developers, Jr./Sr Software Engineers/Developers and anybody who wants to learn Python.
Rated 4.5/5 based on 1 customer reviews

How to Work With a PDF in Python

7709
How to Work With a PDF in Python

Whether it is an ebook, digitally signed agreements, password protected documents, or scanned documents such as passports, the most preferred file format is PDF or Portable Document Format. It was originally developed by Adobe and is a file format used to present and transfer documents easily and reliably. It uses the file extension .pdf. In fact, PDF being the most widely used digital media, is now considered as an open standard which is maintained by the International Standards Organization (ISO). 

Python has relatively easy syntax which makes it even easier for the ones who are in their initial stage of learning the language. The popular Python libraries are well suited and integrated which allows to easily extract documents from a PDF, rotate pages if required, split pdf to make separate documents, or add watermarks in them. 

Now an important question rises, why do we need Python to process PDFs? Well, processing a PDF falls under the category of text analytics. There are several libraries and frameworks available which are designed in Python exclusively for text analytics. This makes it easier to play with a PDF in Python. You can also extract information from PDF and use into Natural Language Processing or any other Machine Learning models. Get certified and learn more about Python Programming and apply those skills and knowledge in the real world.

History of  pyPDF, PyPDF2, pyPDF4

The first PyPDF package was released in 2005 and the last official release in 2010. After a year or so, a  company named Phasit sponsored a branch of the PyPDF called PyPDF2 which was consistent with the original package and worked pretty well for several years.

A series of packages were released later on with the name of PyPDF3 and later renamed as PyPDF4. The biggest difference between PyPDF and the other versions was that the later versions supported Python3. 

PyPDF2 has been discarded recently. But since PyPDF4 is not fully backward compatible with the PyPDf2, it is suggested to use PyPDF2. You can also use a substitute package - pdfrw. Pdfrw was created by Patrick Maupin and allows you to perform all functions which PyPDF2 is capable of except a few such as encryption, decryption, and types of decompression.

Some common libraries in Python

Let us look into some of the libraries Python offers to handle PDFs:

PdfMiner 

It is a tool used to extract information from PDF documents. PDFMiner allows the user to analyze text data and obtain the definite location of a text. It provides information such as fonts and lines. We can also use it as a PDF transformer and a PDF parser.

PyPDF2

PyPDF2 is purely a Python library which allows users to split, merge, crop, encrypt, and transform PDFs. You can also add customized data, view options, and passwords to the documents. 

Tabula-py

It is a Python wrapper of tabula-java which can read tables from PDF files and convert into Pandas Dataframe or into CSV/TSV/JSON file formats.

Slate

It is a Python package which facilitates the extraction of information and is dependent on the PdfMiner package.

PDFQuery

A light Python wrapper which uses minimum code to extract data from PDFs.

xPDF

It is an open source viewer of PDF which also includes an extractor, converter and other utilities. 

Out of all the libraries mentioned above, PyPDF2 is the most used to perform operations like extraction, merging, splitting and so on.

Installing PyPDF2

If you're using Anaconda, you can install PyPDF2 using pip or conda. To install PyPDF2 using pip, run the following command in the command line:

pip install PyPDF2

The module is case-sensitive. So you need to make sure that proper syntax is followed. The installation is really quick since PyPDF2 is free of dependencies.

Extracting Document Information from a PDF in Python

PyPDF2 can be used to extract metadata and all sorts of texts from PDF when you are performing operations on preexisting PDF files. The types of data you can extract are:

  • Author
  • Creator
  • Producer
  • Subject
  • Title
  • Number of Pages

To understand it better, let us use an existing PDF in your system or you can go to Leanpub and download a book sample.

The code for extracting the document information from the PDF—

# get_doc_info.py
from PyPDF2 import PdfFileReader
def getinfo(path):
    with open(path, 'rb') as f:
        PDF = PdfFileReader(f)
        information = PDF.getDocumentInfo()
        numberofpages = PDF.getNumPages()
    print(information)
    author = information.author
    creator = information.creator
    producer =information .producer
    subject = information.subject
    title = information.title
if __name__ == '__main__':
    path = 'reportlab-sample.pdf'
    getinfo(path)

The output of the program above will look like—

Here, we have firstly imported PdfFileReader from the PyPDF2 package. The class PdfFileReader is used to interact with PDF files like reading and extracting information using accessor methods. 

Then, we have created our own function getinfo with a PDF file as an argument and then called the getdocumentinfo()This returned an instance of DocumentInformation. And finally we got extract information like the author, creator, subject or title, etc.

getNumPages() is used to count the number of pages in the document. 

PdfMiner can be used when you want to extract text from a PDF file. It is potent and particularly designed for extracting text from PDF.

We have learned to extract information from PDF. Now let’s learn how to rotate a PDF. 

Rotating pages in PDF

A lot of times we receive PDFs which contain pages in landscape orientation instead of portrait. You may also find certain documents to be upside down, which happens while scanning a document or mailing. However, we can rotate the pages clockwise or counterclockwise according to our choice using Python with PyPDF2.

The code for rotating the article is as follows—

# rotate_pages.py
from PyPDF2 import PdfFileReader, PdfFileWriter
def rotate(pdf_path):
    pdf_write = PdfFileWriter()
    pdf_read = PdfFileReader(path)
    # Rotate page 90 degrees to the right
    page1 = pdf_read.getPage(0).rotateClockwise(90)
    pdf_write.addPage(page1)
    # Rotate page 90 degrees to the left
    page2 = pdf_read.getPage(1).rotateCounterClockwise(90)
    pdf_write.addPage(page2)
    # Add a page in normal orientation
    pdf_write.addPage(pdf_read.getPage(2))
    with open('rotate_pages.pdf', 'wb') as fh:
        pdf_write.write(fh)
if __name__ == '__main__':
    path = 'mldocument.pdf'
    rotate(path)

The output of the code will be as follows—

Rotating pages Output in Python

Here firstly we imported the PdfFileReader and the PdfFileWriter so that we can write out a new PDF file. Then we declared a function rotate with a path to the PDF that is to be modified. Within the function, we created a read object pdf_read and write object pdf_write.

Then, we used the getPage() to grab the pages. Two pages page1 and page2 are taken and rotated to 90 degrees clockwise and 90 degrees counterclockwise respectively using rotateClockwise() and rotateCounterClockwise().

We used addPage() function after each rotation method calls. This adds the rotated page to the write object. The last page we add is page3 without any rotation.

Lastly, we have used write() with a file-like parameter to write out the new PDF. The final PDF contains three pages, the first two will be in the landscape mode and rotated in reversed direction and the third page will be in normal orientation.

Now we will learn to merge different PDFs into one.

Merging PDFs

In many cases, we need to merge two PDFs into a single one. For example, suppose you are working on a project report and you need to print it and bind it into a book. It contains a cover page followed by the project report. So you have two different PDFs and you want to merge them into one PDF. You can simply use Python to do so. Let us see how can we merge PDFs into one.

The code for merging two PDF documents using PyPDF in mentioned below:

# pdf_merging.py
from PyPDF2 import PdfFileReader, PdfFileWriter
def pdfmerger(paths, output):
    pdfwrite = PdfFileWriter()
    for path in paths:
        pdfread = PdfFileReader(path)
        for page in range(pdfread.getNumPages()):
            # Add each page to the writer object
            pdfwrite.addPage(pdfread.getPage(page))
    # Write out the merged PDF
    with open(output, 'wb') as out:
        pdfwrite.write(out)
if __name__ == '__main__':
    paths = ['document-1.pdf', 'document-2.pdf']
    pdfmerger(paths, output='merged.pdf')

Here we have created a function pdfmerger() which takes a number of inputs and a single output. Then we created a PdfFileReader() object for each PDF path and looped over the pages, added each page to the write object. Finally, using the write() function the object’s contents are written to the disk.

PyPDF2 makes the process of merging simpler by creating the PdfFileMerger class.

Code for merging two documents using PyPDF2—

# pdf_merger2.py

import glob
from PyPDF2 import PdfFileMerger

def merger(output_path, input_paths):
    pdfmerge = PdfFileMerger()
    file_handles = []

    for path in input_paths:
        pdfmerge.append(path)

    with open(output_path, 'wb') as fileobj:
        pdfmerge.write(fileobj)

if __name__ == '__main__':
    paths = glob.glob('d-1.pdf')
    paths.sort()
    merger('d-2.pdf', paths)

The PyPDF2 makes it simpler in the way that we don’t need to loop the pages of each document ourselves.  Here, we created the object pdfmerge and looped through the PDF paths. The PyPDF2 automatically appends the whole document. Finally, we write it out.

Let’s perform the opposite of merging now!

Splitting PDFs

The PyPDF2 package has the ability to split up a single PDF into multiple PDFs. It allows us to split pages into different PDFs. Suppose we have a set of scanned documents in a single PDF and we need to separate the pages into different PDFs as per requirement, we can simply use Python to select pages we want to split and get the work done.

Code for splitting a single PDF into multiple PDFs—

# pdf_splitter.py
import os
from PyPDF2 import PdfFileReader, PdfFileWriter
def splitpdf(path):
    fname = os.path.splitext(os.path.basename(path))[0]
    pdf = PdfFileReader(path)
    for page in range(pdf.getNumPages()):
        pdfwrite = PdfFileWriter()
        pdfwrite.addPage(pdf.getPage(page))
        outputfilename = '{}_page_{}.pdf'.format(
            fname, page+1)
        with open(outputfilename, 'wb') as out:
            pdfwrite.write(out)
        print('Created: {}'.format(outputfilename))
if __name__ == '__main__':
    path = 'document-1.pdf'
    splitpdf(path)

Here we have imported the PdfFileReader and PdfFileWriter from PyPDF2. Then we created a function called splitpdf() which accepts the path of PDF we want to split. 

The first line of the function takes the name of the input file. Then we open the PDF and create a read object. Using the read object’s getNumPages(), we loop over all the pages.

In the next step, we created an instance of PdfFileWriter inside the for loop. Then, we created a PDF write instance and added each page to it for each of the pages in the PDF input. We also created a unique filename using the original filename + the word ‘page’ + the page number + 1.

Once we are done with running the script, we will have each of the pages of the input PDF split into multiple PDFs. 

Now let us learn how to add a watermark to a PDF and keep it secured.

Adding Overlays/Watermarks

An image or superimposed text on selected pages in a PDF document is referred to as a Watermark. The Watermark adds security features and protects our rational property like images and PDFs. Watermarks are also called overlays.

The PyPDF2 allows us to watermark documents. We just need to have a PDF which will consist of our watermark text, image or signature.

Code for adding a watermark in a PDF—

# watermarker.py
from PyPDF2 import PdfFileWriter, PdfFileReader
def watermark(inputpdf, outputpdf, watermarkpdf):
    watermark = PdfFileReader(watermarkpdf)
    watermarkpage = watermark.getPage(0)
    pdf = PdfFileReader(inputpdf)
    pdfwrite = PdfFileWriter()
    for page in range(pdf.getNumPages()):
        pdfpage = pdf.getPage(page)
        pdfpage.mergePage(watermarkpage)
        pdfwrite.addPage(pdfpage)
    with open(outputpdf, 'wb') as fh:
        pdfwrite.write(fh)
if __name__ == '__main__':
    watermark(inputpdf='document-1.pdf',
              outputpdf='watermarked_w9.pdf',
              watermarkpdf='watermark.pdf')

The output of the code will look like— 

Adding Overlays/Watermarks Outputs in Python

There are three arguments of the function watermark():

  1.  inputpdf: The path of the PDF that is to be watermarked.
  2.  outputpdf: The path where the watermarked PDF will be saved.
  3.  watermarkpdf: The PDF which contains the watermark.

Firstly, we extract the PDF page which contains the watermark image or text and then open that PDF page where we want to give the desired watermark.

Using the inputpdf, we create a read object and using the pdfwrite, we create a write object to write out the watermarked PDF and then iterate over the pages.

Next, we call the page object’s mergePage and apply the watermark and add that to the write object pdfwrite.

When the loop terminates, the watermarked PDF is written out to the disk and it’s done!

Encrypting a PDF

In the PDF world, the PyPDF2 package allows an owner password which gives the user the advantage to work as an administrator. The package also provides the user password which allows us to open the document upon entering the password.

The PyPDF2 basically doesn’t permit any allowances on any PDF file yet it allows the user to set the owner password and user password.

Code to add a password and add encryption to a PDF—

# pdf_encrypt.py
from PyPDF2 import PdfFileWriter, PdfFileReader
def encryption(inputpdf, outputpdf, password):
    pdfwrite = PdfFileWriter()
    pdfread = PdfFileReader(inputpdf)
    for page in range(pdfread.getNumPages()):
        pdfwrite.addPage(pdfread.getPage(page))
    pdfwrite.encrypt(user_pwd=password, owner_pwd=None,
                      use_128bit=True)
    with open(outputpdf, 'wb') as fh:
        pdfwrite.write(fh)
if __name__ == '__main__':
    encryption(inputpdf='document-1.pdf',
                  outputpdf='document-1-encrypted.pdf',
                  password='twofish')

We declare a  function named encryption() with three arguments—the input PDF path, the output PDF path and the password that we want to keep. 

Then we create one read object pdfread and one write object pdfwrite. Now we loop over all the pages and add them to the write object since we need to encrypt the entire document.

Finally, we call the encrypt() function which accepts three parameters—the user password, the owner password and the whether or not to use 128-bit encryption. The PDF  will be encrypted to 40-bit encryption if the argument use128bit is set to false. Also if the owner password is set to none, then it will be set to user password automatically.

Reading the Table data from PDF

Suppose you want to work with the Table data in Pdf, you can use tabula-py to read tables in a PDF. To install tabula-py, run:

pip install tabula-py

Code to extract simple Text from pdf using PyPDF2:

import tabula
# readinf the PDF file that contain Table Data
# you can find the pdf file with complete code in below
# read_pdf will save the pdf table into Pandas Dataframe

df = tabula.read_pdf("document.pdf")
# in order to print first 5 lines of Table

df.head()

If you PDF file contains Multiple Table

df = tabula.read_pdf("document.pdf",multiple_tables=True)

If you want to extract Information from the specific part of any specific page of PDF

tabula.read_pdf("document.pdf", area=(126,149,212,462), pages=1)

If you want the output into JSON Format

tabula.read_pdf("offense.pdf", output_format="json")

Exporting PDF into Excel

Suppose you want to export a PDF into Excel, you can do so by writing the following code and convert the PDF Data into Excel or CSV.

tabula.convert_into("document.pdf", "document_testing.xlsx", output_format="xlsx")

Let us sum up what we have learned in the article:

  • Extraction of data from a PDF
  • Rotate pages in a PDF
  • Merge PDFs into one PDF
  • Split a PDF into many PDFs
  • Add watermarks or overlays in a PDF
  • Add password or encryption to a PDF
  • Reading table from PDF
  • Exporting PDF into Excel or CSV

As you have seen, PyPDF2 is one of the most useful tools available in Python. The features of PyPDF2 makes life easier whether you are working on a large project or even when you quickly want to make some changes to your PDF documents. Learn more about such libraries and frameworks as KnowledgeHut offers Python Certification Course for Programmers, Developers, Jr./Sr Software Engineers/Developers and anybody who wants to learn Python.

Priyankur

Priyankur Sarkar

Data Science Enthusiast

Priyankur Sarkar loves to play with data and get insightful results out of it, then turn those data insights and results in business growth. He is an electronics engineer with a versatile experience as an individual contributor and leading teams, and has actively worked towards building Machine Learning capabilities for organizations.

Join the Discussion

Your email address will not be published. Required fields are marked *

1 comments

Rithvik sharma 11 Jul 2019 1 likes

Nice understanding and easy to read

Suggested Blogs

Top 10 Python IDEs and Code Editors

Over the years, Python language has evolved enormously with the contribution of developers. Python is one of the most popular programming languages. It was designed primarily for server-side web development, software development, evaluation, scripting and artificial intelligence. For this feature Python encloses certain code editors and IDEs that are used for software development say, Python itself. If you are new to programming, learning Python is highly recommended as it is fast, efficient and easy to learn. Python interpreters are available on various operating systems such as Windows, Linux, Mac OS. This article provides a look into code editors and IDEs along with their features, pros and cons and talks about which are the best suited for writing Python codes. But first let us see what are code editors and IDEs. What is a Code Editor? A code editor is built for editing and modifying source code. A standalone text editor is used for writing and editing computer programs. Excellent ones can execute code as well as control a debugger as well as interact with source control systems. Compared to an IDE, a good dedicated code editor is usually smaller and quicker, but is less functional. Typically they are optimized for programming languages. One major feature of a text editor is that they are designed to modify various files and work with whatever language or framework you choose. What is IDE? IDE (Integrated Development Environment) understands the code significantly better than a text editor. It is a program exclusively built for software development. It is designed with a set of tools that all work together:  Text editor  Compiler Build automation Debugging Libraries, and many more to speed up the work.  These tools integrate: An editor designed to frame codes with text formatting, auto-completionetc., build, execution, debugging tools, file management and source and version control. It reduces manual efforts and combines all the equipment in a typical framework. IDE comes with heavy files. Hence, the downloads and installation is quite tedious. IDE requires expertise along with a lot of patience.  How does an IDE and Code editor differ from each other? An IDE is distinctive from code editors in the following ways: Integrated build process:The user does not have to write his own scripts to build apps in an IDE.  File management: IDE has an integrated file management system and deployment tool. It provides support to other framework as well. On the other hand, a Text editor is a simple editor where source code can be edited and it has no other formatting or compiling options. Development Environment: An IDE is mainly used for development purposes as it provides comparatively better features than a text editor. It allows you to write, compile and debug the entire script.  Syntax Highlighting:The editor displays the text message and puts the source code in different colours to improve its readability. Even error messages are displayed in different colours so that the user understands where he has written the wrong code.  Auto completion:It identifies and inserts a common code for the user instantly. This feature acts as an assistance for the programmer. The code suggestion automatically gets displayed.  Debugger: This tool helps the programmer to test and debug the source code of the main program.  Although IDEs have far better features than a Text editor one major significance of Text editor is that it allows modifying all types of files rather than specifying any definite language or types. Features For a good software development, we need code editors and IDEs which help the developer to automate the process of editing, compiling, testing, debugging and much more. Some of the features of these editors are listed below: Good user interface: They allow users to interact and run programs easily. Incredibly fast: Although these IDEs need to import heavy libraries, compile and debug, they offer fast compilation and run time.  Syntax stylizing: Codes are colorized automatically and syntax is highlighted.    Debugging tool: Itruns the code, set breakpoints, examine the variables. Provides good language syntax: IDEs usually work on a specific language but the others are designed for multi-language support. Code editors are designed with multi-language support.  Good source and version control environment: IDEs come with source control feature to keep a track of changes made in source code and other text files during the development of any software. Intelligent code completion:This feature speeds up the coding process by automatically suggesting for incomplete codes. It reduces typos and other common mistakes. Why do we need a good coding environment? For a good software development one seeks a better coding environment. Although features vary from app to app, a definite set of features is required for one. There are many other things involved such as source code control, extension tools, language support etc. Listed below are the core features which make a good coding environment : Retrieve files: All the codes written in an IDE get saved. Also, the programmer can retrieve his code file at the same state where the work is left off. Run within the environment: It should be able to compile and run within the environment where the codes are written. No external file shall be needed to be downloaded for the execution of the programs.  Good Debugging Tool: An IDE or editor should be able to diagnose and  troubleshoot the programmer’s works and highlight the lines with errors if any. A pop-up window should display the error message. This way the programmer can keep a track of his errands and diagnose them.   Automatic formatting tool: Indentation is done automatically as soon as the programmer moves onto the next line. It keeps the code clean and readable. Quick highlighting: keywords, variables and symbols are highlighted. This feature keeps the code clean and easy to understand. Also, pops up the variables making them easy to spot. This makes it a whole lot easier to pick out portions of code than simply looking at a wall of undifferentiated text. Some of the IDEs and code editors There are various Python IDEs and text editors. Some of the IDEs and text editors along with their features and pros and cons are mentioned below: IDLEKey Features: It is an open source IDE entirely written in Python. It is mainly supported by WINDOWS, LINUX, MAC OS etc.. IDLE is a decent IDE for learning because it is lightweight and quite simple to use. IDLE is installed by default as soon as installation of Python is complete. This makes it easier to get started in Python. IDLE features include the Python shell window(interactive interpreter), auto-completion, syntax highlighting, smart indentation, and a basic integrated debugger. It is however not suitable for the completion of larger projects and best suitable for educational purposes only.  Pros A cross-platform where a developer can search within any window, search through multiple files and replace within the windows editor  Supports syntax highlighting, auto code completion, smart indentation and editable configurations Includes Python shell with highlighter Powerful Integrated Debugger with continuous breakpoints, global view, and local spaces Improves the performance  Call stack visibility Increases the flexibility for developers Cons Used for programming just for beginners Limited to handle normal usage issues. Supports basic design  Large software development cannot be handled  Sublime text Key Features: It is a source code editor, supported on all platforms. It is a very popular cross-platform  and a better text editor. It possesses a built-in support for Python for code editing and packages to extend the syntax and editing features. All Sublime Text packages are written in Python and also a Python API. Installation of the packages often requires you to execute scripts directly in Sublime Text. it is designed to support huge programming and markup languages. Additional functions can be applied by the user with the help of plugins.  Pros More reliable for developers and is cross-platform Supports GOTO anything to access files  Generates wide index of each method, class, and function. AllowsUser interface toolkit Easy navigation to words or symbols Multiple selections to change things at one time Offers command palette to sort, edit and modify the syntax and maintain the indentation.  Offers powerful API and package ecosystem Great performance Highly customizable Allows split editing and instant project switch  Better compatibility with language grammar Custom selection on specific projects Cons Not free Installation of extensions is quite tricky Does not support for direct executing or debugging code from within the editor Less active GIT plugin AtomKey Features: It is an open source code editor developed by Github. It is supported on all platforms. It has features similar to that of Python. It has a framework based on atom shells which help to achieve cross platform functionality. With a sleek interface, file system browser, and marketplace for extensions, it offers a framework for creating desktop applications using JavaScript, HTML, CSS . Extensions can be installed when Atom is running.It enables support for third party packages. Its major feature is that although it is a code editor,it can also be used as an IDE. It is also used for educational purposes. Atom is being improvised day by day, striving to make the user experience rewarding and not remain confined to beginners use only.  Pros Cross-platform  Smooth editing Improves performance of its users Offers built-in package manager and file system browser Faster scripting  Offers smart auto-completion  Smart and flexible Supports multiple pane features Easy navigation across an application Simple to use Allows user interface customization Full support from GitHub Quick access to data and information Cons For beginners only Tedious for sorting configurations and plugins Clumsy tabs reduce performance  Slow loading Runs on JavaScript process  Built on Electron, does not run as a native application VimKey Features: Categorized as a stable open source code editor, VI and VIM are modal editors. As it is supported on almost every platform such as: Windows, LINUX, MAC OS, IOS, Android, UNIX, AmigaOS, MorphOS etc. it is highly configurable. Because of its modal mode of operation, it differs from most other text editors. It possesses three basic modes: insert mode, normal or command mode and command line mode. It is easily customized by the addition of extensions and configuration which makes it easily adaptable for Python development.  Pros Free and easily accessible Customizable and persistent  Has a multi-level undo tree  Extensions are added manually Configuration file is modified Multi-buffers support simultaneous file editing Automated indentation  Good user interface Recognition and conversion of file formats Exclusive libraries including wide range of languages Comes with own scripting language with powerful integration, search and replace functionality Extensive system of plugins Allows debugging and refactoring  Provides two different modes to work: normal and editing mode Strings in VIM can be saved and reused  Cons Used as a text editor only No different color for the pop-up option Not good for beginners PyDev Key Features: It is also categorized as an open source IDE mainly written with JAVA.Since it is an eclipse plugin, the Java IDE is transformed into Python IDE. Its integration with Django gives a Python framework. It also has keyword auto-completion, good debugging tool, syntax highlighting and indentation. Pros Free open source Robust IDE feature set Auto-completion of codes and analysis Smart indentation Interactive console shortcuts Integrated with Django configuration  Platform independent Cons: User interface is not great  Visual studioKey Features: It is categorized as an IDE, is a full-featured IDE developed by Microsoft. It is compatible with Windows and Mac OS only and comes with free as well as paid versions. It has its own marketplace for extensions. PTVS(Python Tools for Visual Studio) offers various features as in coding for Python development, IntelliSense, debugging, refactoring etc. Pros Easy and less tedious installation for development purposes Cons Spacious files  Not supported on Linux Visual studio code Key Features: VS code is a code editor and is way more different from VS. It is a free open source code editor developed by Microsoft can be run on platforms such as Windows, Linux and Mac OS X.  It has a full-featured editor that is highly configurable with Python compatibility for software development. Python tools can be added to enable coding in Python.VS code is integrated with Git which promotes it to perform operations like push, commit directly from the editor itself. It also has electron framework for Node JS applications running on the Blink browser engine. It is enclosed with smart code completion with function definition, imported modules and variable types. Apart from these, VS code also comes with syntax highlighting, a debugging console and proprietary IntelliSense code auto completion. After installing Python, VS code recognizes Python files and libraries immediately.  Pros Free and available on every platform  Small, light-weight but highly extensible Huge compatibility Has a powerful code management system Enables debugging from the editor Multi-language support  Extensive libraries Smart user interface and an acceptable layout Cons Slow search engine Tedious launch time Not a native app just like Atom WingKey Features: Wing is also one of the powerful IDEs today and comes with a lot of good features. It is an open source IDE used commercially. It also is constituted with a strong framework and has a strong debugger and smart editor for Python development making it fast, accurate and fun to perform. It comes with a 30 day trial version. It supports text driven development with unit test, PyTest and Django testing framework.  Pros Open source Find and go-to definition Customizable and extensible Auto-code completion Quick Troubleshoot  Source browser shows all the variables used in the script Powerful debugger  Good refactoring  Cons Not capable of supporting dark themes Wing interface is quite intimidating Commercial version is expensive Python-specific IDEs and Editors Anaconda - Jupyter NotebooksKey Features: It is also an open source IDE with a server-client structure, used to create and edit the codes of a Python. Once it is saved, you can share live code equations, visualizations and text. It has anaconda distribution i.e., libraries are preinstalled so downloading the anaconda itself does the task. It supports Python and R language which are installed by default at installation.  This IDE is again used for data science learning. Quite easy to use, it is not just used as an editor but also as an educational tool or presentation. It supports numerical simulation, machine  learning visualization and statistical modelling. Pros Free Open source  Good user interface Server-client structure Educational tool- Data science, Machine learning  Supports numerical simulation  Enables to create, write, edit and insert images Combines code, text and images Integrated libraries - Matplotlib, NumPy, Pandas Multi-language support Auto code completion Cons Sometimes slow loading is experienced Google Colaboratory Key Features: It is the simplest web IDE used for Python. It gives a free GPU access. Instead of downloading heavy files and tedious launch time, one can directly update the files from Colab to the drive. All you need to do is log in to your google account and open Colab. There is no need for extra setup. Unlike other IDEs no files are required to download. Google provides free computation resources with Colaboratory. It is designed for creating machine learning models. For compilation and execution, all you need to do is to update Python package and get started.   Pros Available to all Code can be run without any interruption Highly user interactive No heavy file downloads Integrated libraries Multi-language support Updated in google drive Update the Python package for execution  Runs on cloud Comments can be added in cells Can import Jupiter or IPython notebooks Cons  All colaboratory files are to be stored in google drive Install all specific libraries No access to unsaved files once the session is over Pycharm Key Features: Developed by Jet Brains and one of the widely used full-featured Python IDE, this is a cross-platform IDE for Python programming and  is well-integrated with Python console and IPython Notebook. It is supported by Windows, Linux, Mac OS and other platforms as well. It has massive productivity and saves ample amount of time. It comes with smart code navigation, code editor, good debugging tool, quick refactoring etc. and supports Python web development frameworks such as Angular JS, JavaScript, CSS, HTML  and live editing functions. The paid version offers advanced features such as full database management and a multitude Framework than the community version such as Django, Flask, Google App, Engine, Pyramid and web2py. Pros Great supportive community Brilliant performance. Amazing editing tools Robust debugging tool Smart code navigation Quick and safe refactoring  Built in developer tools Error detection and fix up suggestions Customizable interface Available in free and paid version Cons Slow loading  Installation is quite difficult and may hang up in between SpyderKey Features: It is an open source IDE supported on all platforms. Ranked as one of the best Python compilers, it supports syntax highlighting, auto completion of codes just like Pycharm. It offers an advanced level of editing, debugging, quick diagnose, troubleshoot and many data exploration features. To get started with Spyder, one needs to install anaconda distribution which is basically used in data science and machine learning. Just like Pycharm it has IntelliSense auto-completion of code. Spyder is built on a structured and powerful framework which makes it one of the best IDE used so far. It is most commonly used for scientific development. Pros Free open source IDE Quick troubleshoot Active framework Smart editing and debugging Syntax is automatically highlighted Auto completion of codes Good for data science and machine learning Structured framework Integrates common Python data science libraries like SciPy, NumPy, and Matplotlib Finds and eliminates bottlenecks Explores and edits variables directly from GUI  Performs well in multi-language editor and auto completion mode Cons Spyder is not capable to configure a specific warning Too many plugins degrades its performance ThonnyKey Features: Thonny is another IDE best suited for beginners for Python development and provides a good virtual environment. It is supported on all platforms. It gives a simple debugger with F5, F6 and F7 keys for debugging. Also, Thonny supports highlighting errors, good representation of function calls, auto code completion and smart indentation. It even allows the developers to configure their code and shell commands. by default,  in Thonny Python is pre-installed as it downloads with its own version of Python.  Pros Simple Graphical user interface.  Free open source IDE Best for beginners Simple debugger with F5, F6, F7 Keys Tackles issues with Python interpreters Highlights syntax error Auto-completion of code Good representation of function calls User can change reference mode easily Step through expression evaluation Reply and resolve to comments Cons Interface is not that good for developers Confined to text editing No template support Slow plugin creation Too basic IDE for software development Which Python IDE is right for you? Requirements vary from programmer to programmer. It is one’s own choice to pick the right tool that is best suited for the task at hand. Beginners need to use a simple tool with few customizations whereas experts require tools with advanced features to bring new updates. Few suggestions are listed below:- Beginners should start with IDLE and Thonny as they do not have complex features and are pretty easy to learn. For data science learners Jupyter Notebooks and Google Colaboratory is preferred. Generally, large scale enterprises prefer the paid versions of IDEs like PyCharm, Atom, Sublime Text etc. in order to get extensive service support from the company. Also, they provide easy finance options and manpower. On the other hand, middle and small scale enterprises tend to look for open source tools which provides them with excellent features. Some of such IDEs are Spyder, Pydev, IDLE and Visual Studio. Conclusion Today, Python stands out as one of the most popular programming languages worldwide. IDE being a program dedicated to software development has made it easier for developers to build, execute, and debug their codes. Code editors can only be used for editing codes whereas an IDE is a feature rich editor which has inbuilt text editor, compiler, debugging tool and libraries. Different IDEs and code editors are detailed in this article along with their merits and demerits. Some are suitable for beginners because of their lightweight nature and simplicity like IDLE, Thonny whereas experts require advance featured ones for building software.  For learning purposes say data science, machine learning Jupyter and Google Colaboratory are strongly recommended. Again there are large scale enterprises who prefer PyCharm, Atom, Sublime Text for software development. On the other hand, small scale enterprises prefer Spyder, Pydev, IDLE and Visual Studio. Hence,the type of IDE or code editor that should be used completely depends upon the requirement of the programmer . To gain more knowledge about Python tips and tricks, check our Python tutorial and get a good hold over coding in Python by joining the Python certification course. 
Rated 4.5/5 based on 19 customer reviews
9901
Top 10 Python IDEs and Code Editors

Over the years, Python language has evolved enormo... Read More

Scala Vs Kotlin

Ever-changing requirements in coding have always been happening, ones that cause programmers to change their minds about using the appropriate programming language and tools to code. Java has been there for a long time, a really long time, 24 years ago. It is relatively easy to use, write, compile, debug, and learn than other programming languages. However, its certain inhibitions like slow performance, unavailability of any support for low-level programming, possessing poor features in GUI 4, and having no control over garbage collection is putting Java developers in a dilemma on choosing an alternative to Java, such as JetBrains’ programming language, Kotlin, presently an officially supported language for Android development or Scala, an all-purpose programming language supporting functional programming and a strong static type system. Today, we will discuss how developers can decide to choose Scala or Kotlin as an alternative to Java. We will briefly talk about Scala and Kotlin separately and talk about their application before moving forward to looking at the differences, advantages, and disadvantages of both and finally have you decide which one of these two suits your requirements. User’s requirement Before we begin, here is a question for the readers, ‘What are you looking for in the next programming language that you will use?’ It is an obvious question because the programming purposes drive the actual basis and need of developing a language. Do you need a language that strives to better Java or use a language that lets you do things that aren’t possible in Java? If it is the first reason, then Scala might be the best one for you, otherwise, it is a simplified programming language like Kotlin. Now let us first briefly discuss Scala and Kotlin individually. ScalaDeveloped by Martin Odersky, the first version of Scala was launched in the year 2003 and is a classic example of a  general-purpose, object-oriented computer language, offering a wide range of functional programming language features and a strong static type system. Inspired from Java itself, Scala, as the name suggests, is highly scalable and this very feature sets Scala apart from other programming languages. When we say that Scala is inspired from Java, that means developers can code Scala in the same way they do for Java. Additionally, Scala makes it possible to use numerous Java and libraries within itself as well. It is designed to be able to use an elegant, concise and type-safe method to express common programming patterns. Scala is a very popular programming language amongst developers and rising up its ranks in the world of technology. Although Scala comes with a number of plus points, there are some which make it a bit ineffective. Here are the strengths and weaknesses of Scala. Strengths: Full Support for Pattern Matching, Macros, and Higher-Kinded Types Has a very flexible code syntax Gets a bigger Community Support Enables overloading operators Weaknesses: Slow in compilation Challenging Binary Compilation Not so proficient in the Management of Null SafetyKotlin Developed by JetBrains, Kotlin was released on February 2012 as an open-source language. Until now, there have been two released versions with the latest one being Kotlin 1.2, the most stable version that was released on November 28, 2017. Since Kotlin is extremely compatible with Java 6 the latest version of Java on Android, it has gained critical acclaim on Android worldwide and additionally, it offers various key features that are prepared only for Java 8 and not even Java 6 developers have access to that. Kotlin provides seamless and flawless interoperability with Java. That means, developers can easily call Java codes from Kotlin and same goes the other way around. The built-in null safety feature avoids showing the NullPointerException (NPE) that makes developing android apps easy and joyful, something every android programmer wants. Below mentioned are the key pointers on the strengths and weaknesses of Kotlin. Strengths Takes a Functional Programming Approach and Object-Oriented Programming style(OOP) Style  Has Higher-Order Functions Short, Neat, and Verbose-Free Expression  Supported by JetBrains and Google. Weaknesses: More limited Pattern Matching Additional Runtime Size Initial Readability of Code Shortage of Official Support Smaller Support Community. Ease of learning: Scala vs Kotlin Scala is a powerful programming language packed with superior features and possesses a flexible syntax. It is not an easy language to learn and is a nightmare for newcomers. Kotlin, on the other hand, has been reported to have been an easy-to-learn language for many Java developers as getting started with Kotlin is relatively easy and so is writing codes. Even though it is a comparatively easier language to learn and code with, Kotlin lacks the solid set of features that is common in Scala. It might take less time to learn a programming language, but the most important thing to look for is a comprehensive array of features. Scala, even though a very difficult language to learn, is cherished by the developers as it lets them do things that cannot be done in Kotlin Here are the major differences between Scala and Kotlin: ScalaKotlinType inferenceEfficientImmutabilityExtension FunctionsSingleton objectMassive InteroperabilityConcurrency controlLessens Crashes at RuntimeString interpolationSmart Cast FunctionHigher-order functionSafe and ReliableCase classes and Pattern matching Lazy computationLow adoption costRich collection setMaking the appropriate choice of languageNow, whether you may like a programming language or not, if that very language helps you get the best out of your job, then you will have to live with it. These are the facts about getting the best results. The outcome is the main factor in you deciding the appropriate language for your job. Kotlin is the only option for Android development as Android doesn’t use JVM, so any old JVM-compatible language will not work in Android. Kotlin has it all what it takes to compile, debug, and run the software on Android because of which it is in-built into Android Studio. However, Kotlin is not so usable outside Android development. If you are one of the developers who like working with Eclipse for your IDE, then Scala IDE is better than the Kotlin Plugin even if you can make Eclipse work with both the languages with limitations. Scala IDE is more advanced than the Kotlin plugin and is easier to set up. Some developers found it quite difficult to make the Kotlin plugin work. This case is quite the same with NetBeans. Kotlin is still getting there but is already popular amongst Java developers as it offers an easier transition than Scala. Kotlin is still maturing, but many Java people find adopting it is an easier transition than Scala is.  Scala, however, is for developers who are focused more on discovering new ideas while Kotlin is for those who want to get results. Kotlin stresses fast compilation but is more restrictive while Scala gives a lot of flexibility. Go for Scala if you breathe functional programming! It has more appropriate features for this type of programming than Kotlin does. Scala supports currying and partial application, the methods of breaking down functions requiring multiple arguments offering more flexibility. Go for the one that is the most appropriate one for your work, style of working and what you are aiming at. Think before you leap. The Outcome At the end of the day, all that matters is what you want to use the language for. While Scala goes well for the projects that require a combination of functional, OOP style programming languages, and where programmers need to handle lots of data or complex modelling, Kotlin becomes the best choice when you want something less frustrating than Java while developing apps because using Kotlin makes app development less cumbersome and a great thing to work on. It is just like a better-looking version of Java with less lengthy codes. 
Rated 4.5/5 based on 19 customer reviews
7597
Scala Vs Kotlin

Ever-changing requirements in coding have always b... Read More

Xcode vs Swift

Xcode and Swift are two different products developed by Apple for macOS, iOS, iPadOS, watchOS, and tvOS. While Xcode is an integrated development environment (IDE) for macOS containing a suite of software development tools to develop software for macOS, iOS, iPadOS, watchOS, and tvOS, Swift is a general-purpose, multi-paradigm, compiled programming language developed iOS, macOS, watchOS, tvOS, Linux, and z/OS. So it is clear that they can not be compared with each other. On the contrary, Swift is compatible with Xcode as Swift v 5.1, the default version of Swift is included in Xcode v 11. In this article, we will go through what Xcode and Swift are in general and cover their features strengths and weaknesses followed by how Swift is compatible with Xcode. XcodeIt was first released in 2003 as version 1 with the latest stable one being version 10.2.1 released on 17 April 2019. It can be downloaded from the Mac App Store and is free to use for macOS Mojave users. Registered developers may download the preview releases and previous versions of the suite using via the Apple Developer website.  Overview of the major featuresSupport: Programming languages such as C, C++, Objective-C, Objective-C++, Java, AppleScript, Python, Ruby, ResEdit (Rez), and Swift are supported by Xcode with source code along with support for a variety of programming models including Cocoa, Carbo, and Java. Not only that, there is additional support via third parties for GNU Pascal, Free Pascal, Ada, C#, Perl, and D Capability: Xcode can build fat binary files that include the code for various architectures in the Mach-O executable format. Known as universal binary files, these allow the application to run on both PowerPC and Intel-based (x86) platforms including both 32-bit and 64-bit codes Compiling and debugging: Xcode uses the iOS SDK to compile and debug applications for iOS that run on ARM architecture processors GUI tool: Xcode comprises of the GUI tool, Instruments that runs dynamic tracing framework on the top of DTrace, a dynamic tracing framework designed by Sun Microsystems and released as a part of OpenSolaris. Advantages and disadvantages of Xcode: Xcode is designed by Apple and will only work with Apple operating systems: macOS, iOS, iPadOS, watchOS, and tvOS. Since its release in 2003, Xcode has made significant improvements and the latest version, Xcode 10.2.1 has all the features that are needed to perform continuous integration. Let us have a look at the pros of using Xcode: Equipped with a well designed and easy to use UI creator Excellent for code completion Using Xcode, a developer can learn profiling and heap analysis in a natural way Xcode’s simulator lets you easily test your app while you build it in an environment that simulates your iPhone The app store has a wide range of audience who are willing to pay for apps. Now, the cons: Clunky and outdated Objective C makes it more frustrating if you are habituated to use a modern language No support for tabbed work environments makes it difficult to work with multiple windows Hardly any information can be found online to solve problems due to a previous Apple NDA on Xcode development It is a complicated process to export your app onto a device Will only work with Apple operating systems The App Store approval process can be annoyingly lengthy.SwiftSwift was launched at Apple's 2014 Worldwide Developers Conference as a general-purpose, multi-paradigm, compiled programming language for iOS, macOS, watchOS, tvOS, Linux, and z/OS Being a new entry these operating systems, Swift accelerates on the best parts of C and Objective C without being held back by its compatibility. It utilises safe patterns for programming, adding more features to it, thus making programming easier and more flexible. By developing their existing debugger, compiler and framework infrastructure, it took quite some time to create the base for Swift. Furthermore, Automatic Reference Counting was used to simplify the memory management part. The framework stack which was once built upon a solid framework of Cocoa and Foundation has undergone significant changes and is now completely regulated and refurbished. Developers who have worked with Objective-C do find Swift quite similar. Objective-C’s dynamic object model and its comprehensively named parameters provide a lot of control to Swift.  Developers can use Swift to have access to the existing Cocoa framework in addition to the mix and match interoperability with an objective C code. Swift uses this common rule to offer multiple new features in combination with object-oriented and procedural portions of the language. The idea is to create the best possible language for a wide range of uses, varying from desktop and mobile apps, systems programming, and scaling up to cloud services. The designing of Swift was done to make sure that developers find it easy to maintain and write correct programs. Coding done in Xcode is safe, fast and expressive. Swift offers a host of features that give developers the control needed to make the code easy to read and write. Furthermore, Apple made Swift to be easily understandable to help developers avoid making mistakes while coding and make the code look organised, along with the modules that give namespaces and eliminate headers. Since Swift uses some features present in other languages, one of them being named parameters written with clean syntax that makes the APIs much easier to maintain and read. Here are some of the additional features of Swift: Multiple return values and Tuples Generics Short and quick iterations over a collection or range Structs that support extensions, methods and protocols Functional programming patterns Advanced control flow Powerful error handling. These features are systematically designed to make them work together resulting in creating a powerful but fun-to-use language. Advantages and disadvantages of Swift: Pros of using the Swift Programming language: Easy to read and maintain: The Swift program codes are based on natural English as it has borrowed syntaxes from other programming languages. This makes the language more expressive Scalable: Users can add more features to Swift, making it a scalable programming language. In the future, Swift is what Apple is relying on and not Objective C Concise: Swift does not include long lines of code and that favours the developers who want a concise syntax, thus increasing the development and testing rate of the program Safety and improved performance: It is almost 40% better than the Objective-C when speed and performance are taken into consideration as it is easy to tackle the bugs which lead to safer programming Cross-device support: This language is capable of handling a wide range of Apple platforms such as iOS, iOS X, macOS, tvOS, and watchOS. Automatic Memory Management: This feature present in Swift prevents memory leaks and helps in optimizing the application’s performance that is done by using Automatic Reference Counting. Cons of Swift: Compatibility issues: The updated versions Swift is found to a bit unstable with the newer versions of Apple leading to a few issues. Switching to a newer version of Swift is the fix but that is costly Speed Issues: This is relevant to the earlier versions of the Swift programming language Less in number: The number of Swift developers is limited as Swift is a new programming language Delay in uploading apps: Developers will be facing delays over their apps written in Swift to be uploaded to the App Store only after iOS 8 and Xcode 6 are released. The estimated time for release is reported to be September-October, 2014. Conclusion So as we discussed both Xcode and Swift, it is clear that they cannot be compared to each other. In fact, they both complement each other to deliver impressive results without any headaches. Apple relies on both quite a lot and it is certain to have Swift and Xcode the perfect combination of a robust application and a user-friendly programming language.
Rated 4.5/5 based on 11 customer reviews
8598
Xcode vs Swift

Xcode and Swift are two different products develop... Read More