5.1. Count Number of Workbooks and Rows and

Columns in Each Workbook In some cases, you may know the contents of the workbooks you’re dealing with; however, sometimes you didn’t create them so you don’t yet know their contents. Unlike CSV files, Excel workbooks can contain multiple worksheets, so if you’re unfamiliar with the workbooks, it’s important to get some descriptive information about them before you start processing them.

To count the number of workbooks in a folder, the number of worksheets in each workbook, and the number of rows and columns in each worksheet, type the following code into a text editor and save the file as 12excel_introspect_all_workbooks.py1:

>>> #!/usr/bin/env python3
>>> import glob
>>> import os
>>> import sys
>>> from xlrd import open_workbook
>>> input_directory = sys.argv[1]
>>> workbook_counter = 0
>>> for input_file in glob.glob(os.path.join(input_directory, '*.xls*')):
>>>
>>>     workbook = open_workbook(input_file)
>>>     print('Workbook: %s' %os.path.basename(input_file))
>>>
>>>     print('Number of worksheets: %d' %workbook.nsheets)
>>>     for worksheet in workbook.sheets():
>>>
>>>         print('Worksheet name:', worksheet.name,'\tRows:',\
>>>
>>>         worksheet.nrows, '\tColumns:',worksheet.ncols)
>>>
>>> workbook_counter += 1
>>> print('Number of Excel workbooks: %d' %(workbook_counter))
Number of Excel workbooks: 1

Lines 2 and 3 import Python’s built-in glob and os modules, respectively, so we can use their functions to identify and parse the pathnames of the files we want to process.

Line 8 uses Python’s built-in glob and os modules to create the list of input files that we want to process and applies a for loop over the list of input files. This line enables us to iterate over all of the workbooks we want to process.

Lines 10 to 14 print information about each workbook to the screen. Line 10 prints the name of the workbook. Line 11 prints the number of worksheets in the workbook. Lines 13 and 14 print the names of the worksheets in the workbook and the number of rows and columns in each worksheet.

To run the script, type the following on the command line and hit Enter:

python 12excel_introspect_all_workbooks.py “C::raw-latex:`\Users`:raw-latex:`\Clinton`:raw-latex:`\Desktop`

You should then see the output shown in Figure 1-12 printed to your screen.

_images/12.png

Figure 1-12. Output of Python script for processing multiple workbooks

The output shows that the script processed three workbooks. It also shows the names of the threesales_2013.xlsx workbooks (e.g., ), the names of the three worksheets in each workbook (e.g., ), and the number of rows and columns in each worksheet (e.g., 7 rows and 5 columns). january_2013

Printing some descriptive information about files you plan to process is useful when you’re less familiar with the files. Understanding the number of files and the number of rows and columns in each file gives you some idea about the size of the processing job as well as the consistency of the file layouts.