Extend pyexcel-io Tutorial¶
You are welcome to extend pyexcel-io to read and write more tabular formats. No. 1 rule, your plugin must have a prefix ‘pyexcel_’ in its module path. For example, pyexcel-xls has ‘pyexcel_xls’ as its module path. Otherwise, pyexcel-io will not load your plugin.
On github, you will find two examples in examples folder. This section explains its implementations to help you write yours.
Note
No longer, you will need to do explicit imports for pyexcel-io extensions. Instead, you install them and manage them via pip.
Simple Reader for a yaml file¶
Suppose we have a yaml file, containing a dictionary where the values are two dimensional array. The task is to write a reader plugin to pyexcel-io so that we can use get_data() to read yaml file out.
sheet 1:
- - 1
- 2
- 3
- - 2
- 3
- 4
sheet 2:
- - A
- B
- C
Implement IReader
First, let’s impolement reader interface as below. Three implementations are required:
- content_array attribute, is expected to be a list of NamedContent
- read_sheet function, read sheet content by its index.
- close function, to clean up any file handle
class YourReader(IReader):
def __init__(self, file_name, file_type, **keywords):
self.file_handle = open(file_name, "r")
self.native_book = yaml.load(self.file_handle)
self.content_array = [
NamedContent(key, values)
for key, values in self.native_book.items()
]
def read_sheet(self, sheet_index):
two_dimensional_array = self.content_array[sheet_index].payload
return YourSingleSheet(two_dimensional_array)
def close(self):
self.file_handle.close()
Implement ISheet
YourSingleSheet makes this simple task complex in order to show case its inner workings. Two abstract functions require implementation:
- row_iterator: should return a row: either content arry or content index as long as
- column_iterator understands
- column_iterator: should return cell values one by one.
class YourSingleSheet(ISheet):
def __init__(self, your_native_sheet):
self.two_dimensional_array = your_native_sheet
def row_iterator(self):
yield from self.two_dimensional_array
def column_iterator(self, row):
yield from row
Plug in pyexcel-io
Last thing is to register with pyexcel-io about your new reader. relative_plugin_class_path meant reference from current module, how to refer to YourReader. locations meant the physical presence of the data source: “file”, “memory” or “content”. “file” means files on physical disk. “memory” means a file stream. “content” means a string buffer. stream_type meant the type of the stream: binary for BytesIO and text for StringIO.
IOPluginInfoChainV2(__name__).add_a_reader(
relative_plugin_class_path="YourReader",
locations=["file"],
file_types=["yaml"],
stream_type="text",
)
Test your reader
Let’s run the following code and see if it works.
if __name__ == "__main__":
data = get_data("test.yaml")
print(data)
You would see these in standard output:
$ python custom_yaml_reader.py
OrderedDict([('sheet 1', [[1, 2, 3], [2, 3, 4]]), ('sheet 2', [['A', 'B', 'C']])])
A writer to write content in yaml¶
Now for the writer, let’s write a pyexcel-io writer that write a dictionary of two dimentaional arrays back into a yaml file seen above.
Implement IWriter
Two abstract functions are required:
- create_sheet creates a native sheet by sheet name, that understands how to code up the native sheet. Interestingly, it returns your sheet.
- close function closes file handle if any.
class MyWriter(IWriter):
def __init__(self, file_name, file_type, **keywords):
self.file_name = file_name
self.content = {}
def create_sheet(self, name):
array = []
self.content[name] = array
return MySheetWriter(array)
def close(self):
with open(self.file_name, "w") as f:
f.write(yaml.dump(self.content, default_flow_style=False))
Implement ISheetWriter
It is imagined that you will have your own sheet writer. You simply need to figure out how to write a row. Row by row write action was already written by ISheetWrier.
class MySheetWriter(ISheetWriter):
def __init__(self, sheet_reference):
self.native_sheet = sheet_reference
def write_row(self, data_row):
self.native_sheet.append(data_row)
def close(self):
Plug in pyexcel-io
Like the reader plugin, we register a writer.
IOPluginInfoChainV2(__name__).add_a_writer(
relative_plugin_class_path="MyWriter",
locations=["file"],
file_types=["yaml"],
stream_type="text",
)
Test It
Let’s run the following code and please examine mytest.yaml yourself.
if __name__ == "__main__":
data_dict = {
"sheet 1": [[1, 3, 4], [2, 4, 9]],
"sheet 2": [["B", "C", "D"]],
}
save_data("mytest.yaml", data_dict)
And you shall find a file named ‘mytest.yaml’:
$ cat mytest.yaml
sheet 1:
- - 1
- 3
- 4
- - 2
- 4
- 9
sheet 2:
- - B
- C
- D
Other pyexcel-io plugins¶
Get xls support
Here’s what is needed:
>>> from pyexcel_io import save_data
>>> data = [[1,2,3]]
>>> save_data("test.xls", data)
And you can also get the data back:
>>> from pyexcel_io import get_data
>>> data = get_data("test.xls")
>>> data['pyexcel_sheet1']
[[1, 2, 3]]
Other formats¶
As illustrated above, you can start to play with pyexcel-xlsx, pyexcel-ods and pyexcel-ods3 plugins.