GitHub - matlok-ai/python-copilot-image-and-audio-examples

Multimodal Datasets for Training Python Copilots from Source Code Analysis

Below are image and audio (narrated mp3) samples extracted from the matlok datasets. These samples provide an overview for how the images look, and how the mp3s are structured with an answer and a question from the image's knowledge graph text box.

Welcome to the matlok multimodal python copilot training datasets. This is an overview for our training and fine-tuning datasets found below:

~2.35M unique source code rows
~1.7M instruct alpaca yaml text rows
~923K png knowledge graph images with alpaca text description
~410K mp3s for ~2 years of continuous audio playtime
requires 1.2 TB storage on disk

Please reach out if you find an issue or want help with a similar dataset. We want to make it easier to create and share large datasets: hello@matlok.ai

Python Copilot Training using Knowledge Graph Images

These are knowledge graphs created for training generative ai models on how writing python CLIP transformer code by understanding an overview on:

classes
base classes for inheritance and polymorphism
global functions
imports

Class - Knowledge Graph Images

Here are samples from the python copilot class image knowledge graph dataset (304 GB). These images attempt to teach how to use software with a networkx graph saved as a png with an alpaca text box:

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPOnnxConfig class

How to use the transformers/src/transformers/models/clip/tokenization_clip.py CLIPTokenizer class

Base Class - Inheritance and Polymorphism Knowledge Graph Images

Here are samples from the python copilot base class inheritance and polymorphism image knowledge graph dataset (135 GB). These images attempt to teach how to use software with a networkx graph saved as a png with an alpaca text box:

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPConfig inherited base class(es)

How to use the transformers/src/transformers/models/clip/tokenization_clip_fast.py CLIPTokenizerFast inherited base class(es)

Global Functions - Knowledge Graph Images

Here are samples from the python copilot global functions image knowledge graph dataset (130 GB). These images attempt to teach how to use software with a networkx graph saved as a png with an alpaca text box:

How to use the transformers/src/transformers/models/clip/convert_clip_original_pytorch_to_hf.py global functions

How to use the transformers/src/transformers/models/clip/tokenization_clip.py global functions

Imports - Knowledge Graph Images

Here are samples from the python copilot imports image knowledge graph dataset (211 GB). These images attempt to teach how to use software with a networkx graph saved as a png with an alpaca text box:

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPTextConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPVisionConfig class

How to use the transformers/src/transformers/models/clip/tokenization_clip_fast.py imports like the CLIPTokenizerFast class

Audio Training Examples - Question and Answering in Alpaca

Below are extracted question and answer mp3 samples. Each mp3 is either a recording of the alpaca question or answer. Question mp3s use a different speaker than the answer mp3 voice.

Note: mobile browsers have issues playing the mp3s and show a question mark due to markdown failing to show the Listen link vs a confusing ? mark icon sorry!

Question	Answer
Play question run_clip.mp3 =>	Play answer run_clip.mp3 =>
Play question run_clip.Transform.mp3 =>	Play answer run_clip.Transform.mp3 =>
Play question run_generation_contrastive_search.mp3 =>	Play answer run_generation_contrastive_search.mp3 =>
Play question run_generation.mp3 =>	Play answer run_generation.mp3 =>
Play question checkpointing.mp3 =>	Play answer checkpointing.mp3 =>
Play question fully_sharded_data_parallel.mp3 =>	Play answer fully_sharded_data_parallel.mp3 =>
Play question fully_sharded_data_parallel.FullyShardedDataParallel.mp3 =>	Play answer fully_sharded_data_parallel.FullyShardedDataParallel.mp3 =>
Play question convert-hf-to-gguf.QwenModel.mp3 =>	Play answer convert-hf-to-gguf.QwenModel.mp3 =>
Play question engine.DeepSpeedEngine.mp3 =>	Play answer engine.DeepSpeedEngine.mp3 =>
Play question flash_mixtral_modeling.MixtralModel.mp3 =>	Play answer flash_mixtral_modeling.MixtralModel.mp3 =>
Play question flash_mixtral_modeling.MixtralLayer.mp3 =>	Play answer flash_mixtral_modeling.MixtralLayer.mp3 =>
Play question flash_mixtral_modeling.MixtralAttention.mp3 =>	Play answer flash_mixtral_modeling.MixtralAttention.mp3 =>
Play question flash_mixtral_modeling.BlockSparseMoE.mp3 =>	Play answer flash_mixtral_modeling.BlockSparseMoE.mp3 =>
Play question flash_mixtral_modeling.MixtralModel.mp3 =>	Play answer flash_mixtral_modeling.MixtralModel.mp3 =>
Play question flash_llama_modeling.FlashLlamaAttention.mp3 =>	Play answer flash_llama_modeling.FlashLlamaAttention.mp3 =>
Play question flash_llama_modeling.FlashLlamaLayer.mp3 =>	Play answer flash_llama_modeling.FlashLlamaLayer.mp3 =>

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
mp3		mp3
png		png
static		static
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Datasets for Training Python Copilots from Source Code Analysis

Python Copilot Training using Knowledge Graph Images

Class - Knowledge Graph Images

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPOnnxConfig class

How to use the transformers/src/transformers/models/clip/tokenization_clip.py CLIPTokenizer class

Base Class - Inheritance and Polymorphism Knowledge Graph Images

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPConfig inherited base class(es)

How to use the transformers/src/transformers/models/clip/tokenization_clip_fast.py CLIPTokenizerFast inherited base class(es)

Global Functions - Knowledge Graph Images

How to use the transformers/src/transformers/models/clip/convert_clip_original_pytorch_to_hf.py global functions

How to use the transformers/src/transformers/models/clip/tokenization_clip.py global functions

Imports - Knowledge Graph Images

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPTextConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPVisionConfig class

How to use the transformers/src/transformers/models/clip/tokenization_clip_fast.py imports like the CLIPTokenizerFast class

Audio Training Examples - Question and Answering in Alpaca

Thanks for reading, listening and your time

About

Releases

Packages

matlok-ai/python-copilot-image-and-audio-examples

Folders and files

Latest commit

History

Repository files navigation

Multimodal Datasets for Training Python Copilots from Source Code Analysis

Python Copilot Training using Knowledge Graph Images

Class - Knowledge Graph Images

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPOnnxConfig class

How to use the transformers/src/transformers/models/clip/tokenization_clip.py CLIPTokenizer class

Base Class - Inheritance and Polymorphism Knowledge Graph Images

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPConfig inherited base class(es)

How to use the transformers/src/transformers/models/clip/tokenization_clip_fast.py CLIPTokenizerFast inherited base class(es)

Global Functions - Knowledge Graph Images

How to use the transformers/src/transformers/models/clip/convert_clip_original_pytorch_to_hf.py global functions

How to use the transformers/src/transformers/models/clip/tokenization_clip.py global functions

Imports - Knowledge Graph Images

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPTextConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPVisionConfig class

How to use the transformers/src/transformers/models/clip/tokenization_clip_fast.py imports like the CLIPTokenizerFast class

Audio Training Examples - Question and Answering in Alpaca

Thanks for reading, listening and your time

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages