Skip to content

Commit

Permalink
Merge pull request #2 from scbirlab/fixes
Browse files Browse the repository at this point in the history
Fixes
  • Loading branch information
eachanjohnson authored Oct 11, 2024
2 parents ea8ab81 + dcfdcec commit fd52979
Show file tree
Hide file tree
Showing 11 changed files with 372 additions and 60 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11"]
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
Expand Down
47 changes: 30 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ Cleaning, collating, and augmenting chemical datasets.

- [Installation](#installation)
- [Command-line usage](#command-line-usage)
- [Example](#example)
- [Other commands](#other-commands)
- [Python API](#python-api)
- [Documentation](#documentation)

Expand All @@ -33,30 +31,47 @@ pip install -e .

## Command-line usage

**schemist** provides command-line utlities to ... The tools complete specific tasks which
can be easily composed into analysis pipelines, because the TSV table output goes to
`stdout` by default so they can be piped from one tool to another.

To get a list of commands (tools), do
**schemist** provides command-line utlities. The list of commands can be checked like so:

```bash
schemist --help
$ schemist --help
usage: schemist [-h] [--version] {clean,convert,featurize,collate,dedup,enumerate,react,split} ...

Tools for cleaning, collating, and augmenting chemical datasets.

options:
-h, --help show this help message and exit
--version, -v show program's version number and exit
Sub-commands:
{clean,convert,featurize,collate,dedup,enumerate,react,split}
Use these commands to specify the tool you want to use.
clean Clean and normalize SMILES column of a table.
convert Convert between string representations of chemical structures.
featurize Convert between string representations of chemical structures.
collate Collect disparate tables or SDF files of libraries into a single table.
dedup Deduplicate chemical structures and retain references.
enumerate Enumerate bio-chemical structures within length and sequence constraints.
react React compounds in silico in indicated columns using a named reaction.
split Split table based on chosen algorithm, optionally taking account of chemical structure during splits.
```
And to get help for a specific command, do
Each command is designed to work on large data files in a streaming fashion, so that the entire file is not held in memory at once. One caveat is that the scaffold-based splits are very slow with tables of millions of rows.
All commands (except `collate`) take from the input table a named column with a SMILES, SELFIES, amino-acid sequence, HELM, or InChI representation of compounds.
The tools complete specific tasks which
can be easily composed into analysis pipelines, because the TSV table output goes to
`stdout` by default so they can be piped from one tool to another.
To get help for a specific command, do
```bash
schemist <command> --help
```
For the Python API, [see below](#python-api).
## Example


## Other commands


## Python API
Expand All @@ -66,8 +81,6 @@ For the Python API, [see below](#python-api).
>>> import schemist as sch
```


## Documentation
Full API documentation is at [ReadTheDocs](https://schemist.readthedocs.org).
8 changes: 6 additions & 2 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/schemist)
![PyPI](https://img.shields.io/pypi/v/schemist)

Cleaning, collating, and augmenting chemical datasets.
Organizing and processing tables of chemical structures.

```{toctree}
:maxdepth: 2
Expand All @@ -16,6 +16,10 @@ python
modules
```

## Issues, problems, suggestions

Add to the [issue tracker](https://www.github.com/schemist/issues).

## Source

`GitHub <https://github.com/scbirlab/schemist>`_
View source at [GitHub](https://github.com/scbirlab/schemist).
17 changes: 17 additions & 0 deletions docs/source/installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Installation

## The easy way

Install the pre-compiled version from GitHub:

```bash
$ pip install schemist
```

## From source

Clone the [repository](https://www.github.com/schemist), then `cd` into it. Then run:

```bash
pip install -e .
```
7 changes: 7 additions & 0 deletions docs/source/modules.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
schemist
========

.. toctree::
:maxdepth: 4

schemist
109 changes: 109 additions & 0 deletions docs/source/schemist.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
schemist package
================

Submodules
----------

schemist.cleaning module
------------------------

.. automodule:: schemist.cleaning
:members:
:undoc-members:
:show-inheritance:

schemist.cli module
-------------------

.. automodule:: schemist.cli
:members:
:undoc-members:
:show-inheritance:

schemist.collating module
-------------------------

.. automodule:: schemist.collating
:members:
:undoc-members:
:show-inheritance:

schemist.converting module
--------------------------

.. automodule:: schemist.converting
:members:
:undoc-members:
:show-inheritance:

schemist.features module
------------------------

.. automodule:: schemist.features
:members:
:undoc-members:
:show-inheritance:

schemist.generating module
--------------------------

.. automodule:: schemist.generating
:members:
:undoc-members:
:show-inheritance:

schemist.io module
------------------

.. automodule:: schemist.io
:members:
:undoc-members:
:show-inheritance:

schemist.rest\_lookup module
----------------------------

.. automodule:: schemist.rest_lookup
:members:
:undoc-members:
:show-inheritance:

schemist.splitting module
-------------------------

.. automodule:: schemist.splitting
:members:
:undoc-members:
:show-inheritance:

schemist.tables module
----------------------

.. automodule:: schemist.tables
:members:
:undoc-members:
:show-inheritance:

schemist.typing module
----------------------

.. automodule:: schemist.typing
:members:
:undoc-members:
:show-inheritance:

schemist.utils module
---------------------

.. automodule:: schemist.utils
:members:
:undoc-members:
:show-inheritance:

Module contents
---------------

.. automodule:: schemist
:members:
:undoc-members:
:show-inheritance:
55 changes: 55 additions & 0 deletions docs/source/usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Usage

**schemist** has a variety of utilities which can be used through the command-line or the [Python API](#python-api).

## Command-line usage

**schemist** provides command-line utlities. The list of commands can be checked like so:

```bash
$ schemist --help
usage: schemist [-h] [--version] {clean,convert,featurize,collate,dedup,enumerate,react,split} ...

Tools for cleaning, collating, and augmenting chemical datasets.

options:
-h, --help show this help message and exit
--version, -v show program's version number and exit
Sub-commands:
{clean,convert,featurize,collate,dedup,enumerate,react,split}
Use these commands to specify the tool you want to use.
clean Clean and normalize SMILES column of a table.
convert Convert between string representations of chemical structures.
featurize Convert between string representations of chemical structures.
collate Collect disparate tables or SDF files of libraries into a single table.
dedup Deduplicate chemical structures and retain references.
enumerate Enumerate bio-chemical structures within length and sequence constraints.
react React compounds in silico in indicated columns using a named reaction.
split Split table based on chosen algorithm, optionally taking account of chemical structure during splits.
```
Each command is designed to work on large data files in a streaming fashion, so that the entire file is not held in memory at once. One caveat is that the scaffold-based splits are very slow with tables of millions of rows.
All commands (except `collate`) take from the input table a named column with a SMILES, SELFIES, amino-acid sequence, HELM, or InChI representation of compounds.
The tools complete specific tasks which
can be easily composed into analysis pipelines, because the TSV table output goes to
`stdout` by default so they can be piped from one tool to another.
To get help for a specific command, do
```bash
schemist <command> --help
```
For the Python API, [see below](#python-api).
## Python API
You can access the underlying functions of `schemist` to help custom analyses or develop other tools.
```python
>>> import schemist as sch
```
6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,15 @@ classifiers = [
]

dependencies = [
"carabiner-tools[pd]",
"carabiner-tools[pd]>=0.0.3.post1",
"datamol",
"descriptastorus",
"descriptastorus==2.6.1",
"nemony",
"openpyxl==3.1.0",
"pandas",
"rdkit",
"requests",
"selfies"
"selfies",
]

[project.urls]
Expand Down
Loading

0 comments on commit fd52979

Please sign in to comment.