Skip to content

Commit

Permalink
Merge pull request #122 from zacbrannelly/docs/complete-docs
Browse files Browse the repository at this point in the history
DOC: Document all code & Update Website
  • Loading branch information
ucokzeko authored Jun 28, 2019
2 parents 3e45779 + 41b36a9 commit d5bf38a
Show file tree
Hide file tree
Showing 28 changed files with 2,132 additions and 169 deletions.
144 changes: 87 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,86 +1,117 @@
# Surround
<p align="center">
<img src="./docs/source/temp_logo_hq.png" width="500">
</p>

Surround is a lightweight framework for serving machine learning pipelines in Python. It is designed to be flexible, easy to use and to assist data scientists by focusing them on the problem at hand rather than writing glue code. Surround began as a project at the Applied Artificial Intelligence Institute to address the following problems:
Surround is a lightweight framework for serving machine learning pipelines in Python. It is designed to be flexible, easy to use and to assist data scientists by focusing them on the problem at hand rather than writing glue code. Surround began as a project at the [Applied Artificial Intelligence Institute](https://a2i2.deakin.edu.au) to address the following problems:

* The same changes were required again and again to refactor code written by data scientists to make it ready for serving e.g. no standard way to run scripts, no standard way to handle configuration and no standard pipeline architecture.
* Existing model serving solutions focus on serving the model rather than serving an end-to-end solution. Our machine learning projects require multiple models and glue code to tie these models together.
* Existing serving approaches do not allow for the evolution of a machine learning pipeline without re-engineering the solution i.e. using a cloud API for the first release before training a custom model much later on.
* Code was commonly being commented out to run other branches as experimentation was not a first class citizen in the code being written.

**Note:** Surround is currently under heavy development!
### Used in projects by:

## Components
Here are some components in this library that you can use to build Surround solution. ![Surround diagram](docs/flow-diagram.png)

1. Surround
A group of many stages (or it can be 1 stage) to transform data into more meaningful data. You can set the order of stages directly on your implementation or via a config file. The config file allows you to define more than 1 pipeline implementation and then you can switch between them easily.

2. Surround Data
A sharable object between stages that holds necessary information for each stage. A stage will read some information from Surround Data, process it, then put back new information that will be used by other stage(s). When you extend this class, you can add as many variables as you need to help you transform input data into output data. But there are 4 core variables that are ready for you to utilise:
* **stage_metadata** is information that can be used to identify a stage.
* **execution_time** is recorded time to complete a process.
* **errors** is information to identify failure of a stage.
* **warnings** is information when transformation is not 100% right.

3. Stage
An implementation of data transformation. Here is where Surround Data is modified to achieve the result that you need. Each stage is only aimed to perform a set of related action. 1st stage can be a stage where you prepare data to be processed and last stage can be where your populate data to be sent back to the user.
* **operate** is a function that you need to override when you extend stage class. It should contain data transformation implementation.

4. Runner (optional) ***Implementation is coming later***
An interface to connect Surround to/from data. If you need to have this now, please look at [file adapter](examples/file-adapter) and [web server](examples/web-server) examples for implementation.

## When to use Surround?

* You want a flexible way to serve a pipeline in Python without writing C/C++ code.
* You have multiple models (custom or pre-trained) from different frameworks that need to be combined into a single Surround solution.
* You want to use existing intelligent APIs (AWS Rekognition, Google Cloud AI, Cognitive Services) as part of your Surround implementation.
* You have pre or post processing steps that aren't part of your models but need to be deployed as part of your Surround implementation.
* You need to package up your dependencies for running Surround as an offline solution on another machine.
<img src="./docs/source/a2i2_logo.PNG" width="300">

## Installation

Tested on Python 3.6.5

* Clone this repository
* Navigate to the root directory
* `python3 setup.py install`
### Prerequisites
- [Python](https://www.python.org/) 3+ (Tested on 3.6.5)
- [Docker](https://www.docker.com/) (required for running in containers)
- [Tornado](https://www.tornadoweb.org/en/stable/) (optional, needed if serving via Web)

To run the tests: `python3 setup.py test`
Use package manager [pip](https://pip.pypa.io/en/stable/) to install the latest (stable) version:
```
$ pip3 install surround
```

## A Simple Example
## Simple usage

A short explanation is provided in the hello-world example's [README](examples/hello-world/) file.
```python
from surround import Stage, SurroundData, Surround
import logging
from surround import SurroundData, Validator, Estimator, Assembler

class HelloWorld(Estimator):
def estimate(self, surround_data, config):
surround_data.text = "Hello world"

def fit(self, surround_data, config):
print("No training implemented")

class ValidateData(Validator):
def validate(self, surround_data, config):
if surround_data.text:
raise ValueError("'text' is not None")

class BasicData(SurroundData):
text = None

class HelloStage(Stage):
def operate(self, data, config):
data.text = "hello"

if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
data_state = BasicData()
surround = Surround([HelloStage()])
output = surround.process(data_state)
print(data_state.text)
data = BasicData()
assembler = Assembler("Hello world example", ValidateData(), HelloWorld())
assembler.run(data)
print("Text is '%s'" % data.text)
```

## Command Line
You can use command line to generate a new project.
## Command Line Usage
Surround comes with a range of command line tools to help you create and run Surround pipelines.

- To list more information: `surround -h`
- To generate project: `surround init <path-to-dir>`
- Use `-p` or `--project-name` to specify name: `surround init <path-to-dir> -p sample`
- Use `-d` or `--description` to specify description: `surround init <path-to-dir> -p sample -d sample-project`
- To run a Surround project: `surround run <project doit task>`. Note the Surround project must have a `dodo.py` file with tasks for this command to work.
To get more information on these tools, run the following command:
```
$ surround -h
```

### Generating projects
For example you can use the sub-command ``init`` to generate a new project:
```
$ surround init <path-to-dir> --project-name sample --description "Sample description" --require-web
```

Where a new folder in `path-to-dir` (current directory if left blank) will be created with the name of the project. In this folder will be a collection of scripts and folders typically needed for a Surround project. For more information on what is generated, see our [Getting Started](https://surround.readthedocs.io/getting-started.html) guide.

### Running projects
You can then test the genereated pipeline using the `run` sub-command in the root of the project like so:
```
$ surround run batch_local
```

This will execute the pipeline locally in batch mode. If you want to run the pipeline in a container then use the following:
```
$ surround run build
$ surround run batch
```

If you would like to serve your pipeline via Web endpoints (`--require-web` is required when generating for this option) then you can use:
```
$ surround run web
```
Which (by default) will accept input data as JSON via HTTP POST to the endpoint `http://localhost:8080/estimate` in the following format:
```
{ "message": "this data will be processed by the pipeline" }
```

To see a full list of the available tasks just run the following command:
```
$ surround run
```

For more information on different run modes and when/how they should be used see both our [About](https://surround.readthedocs.io/about.html) and [Getting Started](https://surround.readthedocs.io/getting-started.html) pages.

## Surround pipeline architecture
The following diagram describes how data flows through a Surround pipeline depending on the mode used when running.

<img src="./docs/source/pipeline_flow_diagram.png" width="500">

For a more in-depth description of this diagram, see the [About](https://surround.readthedocs.io/about.html) page on our website.

## Examples

See the [examples](https://github.com/dstil/surround/tree/master/examples) directory.
See the [examples](https://github.com/dstil/surround/tree/master/examples) directory for useful examples on how Surround can be utilized.

## Full Documentation
See [our website](https://surround.readthedocs.io/) for an in-depth explanation of Surround (in the About page), a Getting Started Guide, and full documentation of the API.

## Contributing

Expand All @@ -91,6 +122,5 @@ For guidance on setting up a development environment and how to make a contribut

Surround is released under a [BSD-3](https://opensource.org/licenses/BSD-3-Clause) license.

## Release (only for admin)
1. Tag repo with a version that you want this to be released with.
2. Push to tag to master.
## Project Status
Surround is currently under **heavy** development, please submit any issues that occur or suggestions you may have, it is very much appreciated!
17 changes: 17 additions & 0 deletions docs/building-docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Building the documentation using Sphinx

## Dependencies
Install the dependencies listed in `docs/requirements.txt` using PIP:
```
$ pip3 install -r requirements.txt
```

## Building HTML
Use `sphinx-build` to render the documents as HTML:
```
$ sphinx-build source/ source/_build
```

This will place all of the HTML files in `docs/source/_build`.

**NOTE**: `_build` is ignored by Git.
2 changes: 2 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sphinx
sphinx-argparse
1 change: 1 addition & 0 deletions docs/source/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
_build
Binary file added docs/source/a2i2_logo.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit d5bf38a

Please sign in to comment.