Extend pyexcel-io Tutorial

You are welcome to extend pyexcel-io to read and write more tabular formats. No. 1 rule, your plugin must have a prefix ‘pyexcel_’ in its module path. For example, pyexcel-xls has ‘pyexcel_xls’ as its module path. Otherwise, pyexcel-io will not load your plugin.

On github, you will find two examples in examples folder. This section explains its implementations to help you write yours.

Note

No longer, you will need to do explicit imports for pyexcel-io extensions. Instead, you install them and manage them via pip.

Simple Reader for a yaml file

Suppose we have a yaml file, containing a dictionary where the values are two dimensional array. The task is to write a reader plugin to pyexcel-io so that we can use get_data() to read yaml file out.

sheet 1:
- - 1
  - 2
  - 3
- - 2
  - 3
  - 4
sheet 2:
- - A
  - B
  - C

Implement IReader

First, let’s impolement reader interface as below. Three implementations are required:

  1. content_array attribute, is expected to be a list of NamedContent
  2. read_sheet function, read sheet content by its index.
  3. close function, to clean up any file handle
class YourReader(IReader):
    def __init__(self, file_name, file_type, **keywords):
        self.file_handle = open(file_name, "r")
        self.native_book = yaml.load(self.file_handle)
        self.content_array = [
            NamedContent(key, values)
            for key, values in self.native_book.items()
        ]

    def read_sheet(self, sheet_index):
        two_dimensional_array = self.content_array[sheet_index].payload
        return YourSingleSheet(two_dimensional_array)

    def close(self):
        self.file_handle.close()

Implement ISheet

YourSingleSheet makes this simple task complex in order to show case its inner workings. Two abstract functions require implementation:

  1. row_iterator: should return a row: either content arry or content index as long as
    column_iterator understands
  2. column_iterator: should return cell values one by one.
class YourSingleSheet(ISheet):
    def __init__(self, your_native_sheet):
        self.two_dimensional_array = your_native_sheet

    def row_iterator(self):
        yield from self.two_dimensional_array

    def column_iterator(self, row):
        yield from row

Plug in pyexcel-io

Last thing is to register with pyexcel-io about your new reader. relative_plugin_class_path meant reference from current module, how to refer to YourReader. locations meant the physical presence of the data source: “file”, “memory” or “content”. “file” means files on physical disk. “memory” means a file stream. “content” means a string buffer. stream_type meant the type of the stream: binary for BytesIO and text for StringIO.

IOPluginInfoChainV2(__name__).add_a_reader(
    relative_plugin_class_path="YourReader",
    locations=["file"],
    file_types=["yaml"],
    stream_type="text",
)

Test your reader

Let’s run the following code and see if it works.

if __name__ == "__main__":
    data = get_data("test.yaml")
    print(data)

You would see these in standard output:

$ python custom_yaml_reader.py
OrderedDict([('sheet 1', [[1, 2, 3], [2, 3, 4]]), ('sheet 2', [['A', 'B', 'C']])])

A writer to write content in yaml

Now for the writer, let’s write a pyexcel-io writer that write a dictionary of two dimentaional arrays back into a yaml file seen above.

Implement IWriter

Two abstract functions are required:

  1. create_sheet creates a native sheet by sheet name, that understands how to code up the native sheet. Interestingly, it returns your sheet.
  2. close function closes file handle if any.
class MyWriter(IWriter):
    def __init__(self, file_name, file_type, **keywords):
        self.file_name = file_name
        self.content = {}

    def create_sheet(self, name):
        array = []
        self.content[name] = array
        return MySheetWriter(array)

    def close(self):
        with open(self.file_name, "w") as f:
            f.write(yaml.dump(self.content, default_flow_style=False))

Implement ISheetWriter

It is imagined that you will have your own sheet writer. You simply need to figure out how to write a row. Row by row write action was already written by ISheetWrier.

class MySheetWriter(ISheetWriter):
    def __init__(self, sheet_reference):
        self.native_sheet = sheet_reference

    def write_row(self, data_row):
        self.native_sheet.append(data_row)

    def close(self):

Plug in pyexcel-io

Like the reader plugin, we register a writer.

IOPluginInfoChainV2(__name__).add_a_writer(
    relative_plugin_class_path="MyWriter",
    locations=["file"],
    file_types=["yaml"],
    stream_type="text",
)

Test It

Let’s run the following code and please examine mytest.yaml yourself.

if __name__ == "__main__":
    data_dict = {
        "sheet 1": [[1, 3, 4], [2, 4, 9]],
        "sheet 2": [["B", "C", "D"]],
    }

    save_data("mytest.yaml", data_dict)

And you shall find a file named ‘mytest.yaml’:

$ cat mytest.yaml
sheet 1:
- - 1
  - 3
  - 4
- - 2
  - 4
  - 9
sheet 2:
- - B
  - C
  - D

Other pyexcel-io plugins

Get xls support

Here’s what is needed:

>>> from pyexcel_io import save_data
>>> data = [[1,2,3]]
>>> save_data("test.xls", data)

And you can also get the data back:

>>> from pyexcel_io import get_data
>>> data = get_data("test.xls")
>>> data['pyexcel_sheet1']
[[1, 2, 3]]

Other formats

As illustrated above, you can start to play with pyexcel-xlsx, pyexcel-ods and pyexcel-ods3 plugins.