Componentized tutorial | PyF, flow-based python programming Componentized tutorial

Componentized tutorialΒΆ

The componentized way of working with Pyf makes it easy to describe your network in an XML format so that the glue code is automatized by Pyf itself.

The code is now extremely simple since everything lies inside the XML definition:

from pyf.componentized import Manager
from pyf.componentized import ET
manager = Manager(ET.fromstring(networkxml))
output_files = manager.process("main")
# paths are temporary folders
print [filename for path, filename in output_files]

The networkxml variable contains this xml code:

<?xml version="1.0" encoding="UTF-8"?>
<config>
  <process name="main">
    <node type="producer" name="source1" from="code">
      <code>
        <![CDATA[
class User(object):
    def __init__(self, name, email, level):
        self.name = name
        self.email = email
        self.level = level
def get_source():
    for index in range(0, 10):
        yield User(
                "John%04d" % index,
                "john%04d@some-where.com" % index,
                ['high', 'low', 'high', 'high', 'low'][index%5])
        ]]>
      </code>
      <children>
        <node type="adapter" name="filter1" pluginname="simple_filter">
          <expression>item.level == "high"</expression>
          <children>
            <node type="adapter" name="adapter1" pluginname="compute_attributes">
              <attributes>
                <!-- in a compute_attributes plugin, the attribute is evaluated as python code -->
                <!-- the original object is not changed (to make sure other branches are not impacted,
                in fact the object is wrapped (adapted) in a proxy object that supports the new attribute
                -->
                <attribute name="numeric_level">int(10)</attribute>
              </attributes>
              <children><link name="csvoutput"/></children>
            </node>
          </children>
        </node>
        <node type="adapter" name="filter2" from="code">
          <code>
            <![CDATA[
def low_or_med(items):
    for item in items:
        if item.level == "low" or item.level == "med":
            yield item
        else:
            yield Ellipsis
            ]]>
          </code>
          <children>
            <node type="adapter" name="adapter2" from="code">
              <code>
                <![CDATA[
def set_numeric_level(items):
    for item in items:
        item.numeric_level = 0
        yield item
              ]]>
              </code>
              <children><link name="csvoutput"/></children>
            </node>
          </children>
        </node>
      </children>
    </node>
    <node type="consumer" pluginname="csvwriter" name="csvoutput">
      <encoding>UTF-8</encoding>
      <!-- in real life you want to have some kind of unique file name
      <target_filename>%Y%m%d-%H%M%S-user_levels.csv</target_filename>
      -->
      <target_filename>user_levels.csv</target_filename>
      <target_directory>./</target_directory>
      <delimiter>;</delimiter>
      <columns>
        <column>name</column>
        <column>email</column>
        <column attribute="numeric_level">level</column>
      </columns>
    </node>
  </process>
</config>

So, let’s see what we have here :

  • Our producer is made from code (without using a plugin).

    Hint

    To get more information about nodes from code, see pyf.componentized.components.CodeComponent

  • Under our producer, we have two filters:
    • filter2 using a code component (yielding items or Ellipsis),
    • filter1 which uses the SimpleFilter plugin.
  • To adapt our objects, we have two ways described here:
    • adapter2 uses a code component (modifying and yielding items)
    • adapter1 uses the ComputeAttributes plugin, that does not modify the object but adds getter on an adapter proxy object.
  • Finally, our output is handled by a CSVWriter plugin

Your program outputs the following because we print it. In fact the network manager .process() method returns the filenames with their full pathname relative to your process:

['user_levels.csv']