pyexcel - Let you focus on data, instead of file formats

Author:C.W.
Source code:http://github.com/pyexcel/pyexcel.git
Issues:http://github.com/pyexcel/pyexcel/issues
License:New BSD License
Development:0.6.0
Released:0.5.3
Generated:Sep 22, 2017

Note

The documentation of pyexcel v0.6.0 is under review and rewrite. If you have missed some information, please read v0.5.3

Introduction

pyexcel provides single application programming interface(API) to read, write and manipulate data in different excel file formats, in different storage media(disk, memory, database) and in different python data structures. Its loosely coupled architecture makes it extremely extensible.

The idea originated from the common usability problem when an excel file driven web applications is delivered for non-technical users: such as team assistant, human resource administrator. The fact is that not all people know the difference among various excel formats: csv, xls, xlsx. Instead of training those people about file formats, this library helps web developers to handle most of the excel file formats by providing a common programming interface. To add a specific excel file format to you application, all you need is to install an extra pyexcel plugin. No code change to your application. No issues with excel file formats any more. Looking at the community, this library and its associated ones try to become a small and easy to install alternative to Pandas.

Installation

You can install it via pip:

$ pip install pyexcel

or clone it and install it:

$ git clone https://github.com/pyexcel/pyexcel.git
$ cd pyexcel
$ python setup.py install

For individual excel file formats, please install them as you wish:

A list of file formats supported by external plugins
Package name Supported file formats Dependencies Python versions
pyexcel-io csv, csvz [#f1]_, tsv, tsvz [#f2]_   2.6, 2.7, 3.3, 3.4, 3.5, 3.6 pypy
pyexcel-xls xls, xlsx(read only), xlsm(read only) xlrd, xlwt same as above
pyexcel-xlsx xlsx openpyxl same as above
pyexcel-ods3 ods pyexcel-ezodf, lxml 2.6, 2.7, 3.3, 3.4 3.5, 3.6
pyexcel-ods ods odfpy same as above
Dedicated file reader and writers
Package name Supported file formats Dependencies Python versions
pyexcel-xlsxw xlsx(write only) XlsxWriter Python 2 and 3
pyexcel-odsr read only for ods, fods lxml same as above
pyexcel-htmlr html(read only) lxml,html5lib same as above

In order to manage the list of plugins installed, you need to use pip to add or remove a plugin. When you use virtualenv, you can have different plugins per virtual environment. In the situation where you have multiple plugins that does the same thing in your environment, you need to tell pyexcel which plugin to use per function call. For example, pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr. You need to append get_array(..., library=’pyexcel-odsr’).

Footnotes

[1]zipped csv file
[2]zipped tsv file

For compatibility tables of pyexcel-io plugins, please click here

Plugin compatibility table
pyexcel pyexcel-io pyexcel-text pyexcel-handsontable pyexcel-pygal pyexcel-gantt
0.6.0+ 0.4.0+ 0.2.6+ 0.0.1 0.0.1 0.0.1
0.5.0+ 0.4.0+ 0.2.6+ 0.0.1 0.0.1 0.0.1
0.4.0+ 0.3.0+ 0.2.5      
a list of support file formats
file format definition
csv comma separated values
tsv tab separated values
csvz a zip file that contains one or many csv files
tsvz a zip file that contains one or many tsv files
xls a spreadsheet file format created by MS-Excel 97-2003 [#f1]_
xlsx MS-Excel Extensions to the Office Open XML SpreadsheetML File Format. [#f2]_
xlsm an MS-Excel Macro-Enabled Workbook file
ods open document spreadsheet
fods flat open document spreadsheet
json java script object notation
html html table of the data structure
simple simple presentation
rst rStructured Text presentation of the data
mediawiki media wiki table
[f1]quoted from whatis.com. Technical details can be found at MSDN XLS
[f2]xlsx is used by MS-Excel 2007, more information can be found at MSDN XLSX

Usage

Suppose you want to process the following excel data :

Here are the example usages:

>>> import pyexcel as pe
>>> records = pe.iget_records(file_name="your_file.xls")
>>> for record in records:
...     print("%s is aged at %d" % (record['Name'], record['Age']))
Adam is aged at 28
Beatrice is aged at 29
Ceri is aged at 30
Dean is aged at 26
>>> pe.free_resources()

Tutorial

Indices and tables