pyexcel-io - Let you focus on data, instead of file formats

Author:
  1. Wang
Source code:

http://github.com/pyexcel/pyexcel-io.git

Issues:

http://github.com/pyexcel/pyexcel-io/issues

License:

New BSD License

Released:

0.5.6

Generated:

Jan 11, 2018

Introduction

pyexcel-io provides one application programming interface(API) to read and write data in different excel formats. It makes information processing involving excel files a simple task. The data in excel files can be turned into an ordered dictionary with least code. This library focuses on data processing using excel files as storage media hence fonts, colors and charts were not and will not be considered.

It was created due to the lack of uniform programming interface to access data in different excel formats. A developer needs to use different methods of different libraries to read the same data in different excel formats, hence the resulting code is cluttered and unmaintainable. This is a challenge posed by users who do not know or care about the differences in excel file formats. Instead of educating the users about the specific excel format a data processing application supports, the library takes up the challenge and promises to support all known excel formats.

All great work have done by individual library developers. This library unites only the data access API. With that said, pyexcel-io also bring something new on the table: “csvz” and “tsvz” format, new format names as of 2014. They are invented and supported by pyexcel-io.

Installation

You can install pyexcel-io via pip:

$ pip install pyexcel-io

or clone it and install it:

$ git clone https://github.com/pyexcel/pyexcel-io.git
$ cd pyexcel-io
$ python setup.py install

For individual excel file formats, please install them as you wish:

A list of file formats supported by external plugins
Package name Supported file formats Dependencies Python versions
pyexcel-io csv, csvz [1], tsv, tsvz [2]   2.6, 2.7, 3.3, 3.4, 3.5, 3.6 pypy
pyexcel-xls xls, xlsx(read only), xlsm(read only) xlrd, xlwt same as above
pyexcel-xlsx xlsx openpyxl same as above
pyexcel-ods3 ods pyexcel-ezodf, lxml 2.6, 2.7, 3.3, 3.4 3.5, 3.6
pyexcel-ods ods odfpy same as above
Dedicated file reader and writers
Package name Supported file formats Dependencies Python versions
pyexcel-xlsxw xlsx(write only) XlsxWriter Python 2 and 3
pyexcel-xlsxr xlsx(read only) lxml same as above
pyexcel-odsr read only for ods, fods lxml same as above
pyexcel-htmlr html(read only) lxml,html5lib same as above

In order to manage the list of plugins installed, you need to use pip to add or remove a plugin. When you use virtualenv, you can have different plugins per virtual environment. In the situation where you have multiple plugins that does the same thing in your environment, you need to tell pyexcel which plugin to use per function call. For example, pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr. You need to append get_array(…, library=’pyexcel-odsr’).

Footnotes

[1]zipped csv file
[2]zipped tsv file

After that, you can start get and save data in the loaded format. There are two plugins for the same file format, e.g. pyexcel-ods3 and pyexcel-ods. If you want to choose one, please try pip uninstall the un-wanted one. And if you want to have both installed but wanted to use one of them for a function call(or file type) and the other for another function call(or file type), you can pass on “library” option to get_data and save_data, e.g. get_data(.., library=’pyexcel-ods’)

Note

pyexcel-text is no longer a plugin of pyexcel-io but a direct plugin of pyexcel

Plugin compatibility table
pyexcel-io xls xlsx ods ods3 odsr xlsxw
0.5.1 0.5.0 0.5.0 0.5.0 0.5.0 0.5.0 0.5.0
0.4.x 0.4.x 0.4.x 0.4.x 0.4.x 0.4.x 0.4.x
0.3.0+ 0.3.0+ 0.3.0 0.3.0+ 0.3.0+ 0.3.0 0.3.0
0.2.2+ 0.2.2+ 0.2.2+ 0.2.1+ 0.2.1+   0.0.1
0.2.0+ 0.2.0+ 0.2.0+ 0.2.0 0.2.0   0.0.1

API

get_data(afile[, file_type, streaming]) Get data from an excel file source
save_data(afile, data[, file_type]) Save data to an excel file source

Indices and tables