Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update enrichment analysis #15

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open

update enrichment analysis #15

wants to merge 40 commits into from

Conversation

enryH
Copy link
Collaborator

@enryH enryH commented Nov 25, 2024

No description provided.

- line lenght restriction to 100 characters
maybe the dataset is not he best one to test enrichment analysis (few diff. reg. protein groups)
how many proteins/genes do need to be rejected to
be considered valid. Before it was at least 2 genes.
- will be tested by building docs (maybe add a unittest later)
not sure if the example is the best.
@@ -308,7 +388,8 @@ def run_enrichment(
num_background = num_background[0]
else:
num_background = 0
if method == "fisher" and num_foreground > 1:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to discuss. should we mistrust enrichements which are based on single hits?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure they would be significant but filtering them would reduce the number of tests, which may be good in terms of performance.

@enryH enryH requested a review from albsantosdel November 26, 2024 13:59
_rejected, _pvals_corrected, _, _ = multitest.multipletests(p[mask], alpha, method)
pval_corrected[mask] = _pvals_corrected
rejected = np.full(p.shape, np.nan)
rejected[mask] = _rejected
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToDo: add a test that ensures that 1.0 and 0.0 are interpreted as bool.

@enryH enryH changed the title Gsea update enrichment analysis Nov 28, 2024
acore/enrichment_analysis.py Outdated Show resolved Hide resolved
acore/enrichment_analysis.py Outdated Show resolved Hide resolved
acore/enrichment_analysis.py Outdated Show resolved Hide resolved
@enryH
Copy link
Collaborator Author

enryH commented Dec 3, 2024

double check docstrings of module (for rendering)

enryH added 2 commits December 3, 2024 16:47
- prepare to defien a common return type by specifying message TYPE_COLS_MSG
enryH added 17 commits December 4, 2024 14:29
sphinx deals differently with parsing the strings...
…orrectly

otherwise numpy notation should work fine
…imes

- default parameter is rejected, so it is easier to not mistype parameter name
- output should be tsv due to how pandas return is constructed for now
- enrichment only separate for up- and down regulated protein groups
- keep hint on to do for PTM dataset
@albsantosdel
Copy link
Contributor

Hi, I checked the documentation notebook and the table showing the functional PCA loadings shows multiple GOs per row when there should be only one GO per row. I wonder if you are annotating each protein with all the associated terms (separated by ;)?

Screenshot 2024-12-18 at 15 35 31

Go term annotations had several Go terms concatenated using `;`, which are now split up. Performance can be improved.
@enryH enryH marked this pull request as ready for review December 27, 2024 12:48
annotation field contains identifier again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants