- pyf.dataflow.components.all_true(sources, out)¶
A function that verifies that all the sources are true. If all the sources are true, return true, else return False. Useful to check that everything worked in the end of the dataflow chain and to synchronize chain sizes.
- pyf.dataflow.components.all_true_longest(sources, out)¶
A function that verifies that all the sources still flowing are true. If a source doesn’t have an item (too short), it is assumed as true. If all the sources are true, return true, else return False. Useful to check that everything worked in the end of the dataflow chain.
- pyf.dataflow.components.bufferize(source, out, chunk_size=20)¶
Group items in buffers. Useful to write N items at once. IN: source, an iterator OUT: an iterator of groups kwarg chunk_size: size of the groups
- pyf.dataflow.components.inc(value, out, step=1)¶
increments each value with step kwarg usefull to add a column to a table if passing lists as values, and list as step
- pyf.dataflow.components.splitm(source, out, size=3)¶
Splits a data source in n sources (size kwarg)
- pyf.dataflow.components.status_lookup(sources, out, buffer_num_getters=None, clear_each=10, fillvalue=True)¶
A function that yields statuses (True/False) and statuses from sources.
It uses a list of buffer getters (functions) to know if there is any result left in buffer for a particular source.
Every “clear_each” (nth) iteration it checks for the buffer and consumes it.
This component is especially useful to synchronise sources so they consume their own sources in a pseudo synchronous way (useful in tree shapes).
About performance : With clear_each set as 1 you can achieve a great memory consumption perf but this will reduce the performance of your loop. For relatively small records, a value of 10 to 100 is recommended.
- pyf.dataflow.components.sum(sources, out)¶
yields a sum for data sources for each row. can also be used to concatenate lists