computation graphs
map
reduce
join
sort
MapReduce
This is a library for calculation on graphs.
Calculations on tables are run by computational graphs. By computational graph we mean a predefined sequence of operations, which can then be applied to various data sets.
Table - is a sequence of dictionaries, where each dictionary is a row of the table,
and the dictionary key is a column of the table
For simplicity, we can assume that all rows in the input tables contain the same set of keys.
Computational graphs allow you to separate the description of a sequence of operations from their execution. Thanks to this, you can both run operations in another environment (for example, describe a graph in a python interpreter, and then execute it on a video card), and independently and in parallel run on multiple machines of a computing cluster to process a large array of input data in an adequate finite time
The calculation graph consists of data entry points and operations on them.
- graph_from_iter
graph = Graph.graph_from_iter('input')
- rows_from_file
iter_of_rows = Graph.rows_from_file(filename, parser)
- graph_copy
another_graph = Graph.graph_copy(graph)
- Map - The operation, which takes one row and return one row
- Reduce - The operation, which takes some rows grouped by keys and returns some rows
- Join - The operation, which join two graphs into one
- Sort - The operation, which sort rows by keys
After the description of the graph, you need to run.
graph.run(input=lambda: iter([{'key': 'value'}]))
or
graph.run(input=lambda: Graph.rows_from_file(filename, parser)))
You can run graph anytimes with different input.
pip install compgraph
Create a graph from stack of operations then run it from using your own data
graph = Graph().operation1(...)
.operation2(...)
.operation2(...)
result = graph.run()
You can use python run_TASK
if you're going from 'examples' path. You should unarchive 'extract_me.tgz' before.
With dafault resources it can take some time.
- word_count
Constructs graph which count words in text
python run_word_count [OUTPUT_FILENAME] [INPUT_FILENAME]
- inverted_index
Constructs graph which calculates td-idf for every word/document pair top N(3)
python run_inverted_index [OUTPUT_FILENAME] [INPUT_FILENAME] [-n INT]
- pmi
Constructs graph which gives for every document the top N(10) words ranked by pointwise mutual information
python run_pmi [OUTPUT_FILENAME] [INPUT_FILENAME] [-n INT]
- yandex_maps
Constructs graph which measures average speed in km/h depending on the weekday and hour
python run_pmi [OUTPUT_FILENAME] [INPUT_TIME_FILENAME] [INPUT_LENGTH_FILENAME] [--graphic PATH]
You can create heatmap graphic with --graphic/-g option
pytest compgraph