1.1. Excel Files¶
Microsoft Excel is ubiquitous. We use Excel to store data on customers, inventory, and employees. We use it to track operations, sales, and financials. The list of ways people use Excel in business is long and diverse. Because Excel is such an integral tool in business, knowing how to process Excel files in Python will enable you to add Python into your data processing workflows, receiving data from other people and sharing results with them in ways they’re comfortable with.
Unlike Python’s csv module, there is not a standard module in Python for processing Excel files (i.e., files with the or extension). To complete the .xls .xlsx examples in this section, you need to have the xlrd and xlwt packages. The xlrd and xlwt packages enable Python to process Excel files on any operating system, and they have strong support for Excel dates. If you installed Anaconda Python, then you already have the packages because they’re bundled into the installation.
A few words on terminology: when I refer to an “Excel file” that’s the same thing as an “Excel workbook.” An Excel workbook contains one or more Excel worksheets.In this lesson, I’ll be using the words “file” and “workbook” interchangeably, and I’ll refer to the individual worksheets within a workbook as worksheets.
We’ll go through each of the examples here in base Python, so you can see every logical step in the data processing, and then using pandas, so you can have a (usually) shorter and more concise example—though one that’s a bit more abstract—if you want to copy and modify it for use in your work.
To get started with the examples in this lesson, we need to create an Excel workbook:
Open Microsoft Excel.
Add three separate worksheets to the workbook and name them , , and . Then add the data as shown in Figure 1- 1, Figure 1-2, and Figure 1-3, respectively. march_2013 january_2013 february_2013
Save the workbook as sales_2013.xlsx .
Figure 1-1. Worksheet 1: january_2013
Figure 1-2. Worksheet 2: february_2013
Figure 1-3. Worksheet 3: march_2013