Skip to content

matlok-ai/python-copilot-image-and-audio-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Datasets for Training Python Copilots from Source Code Analysis

Multimodal Datasets for Training Python Copilots from Source Code Analysis

Below are image and audio (narrated mp3) samples extracted from the matlok datasets. These samples provide an overview for how the images look, and how the mp3s are structured with an answer and a question from the image's knowledge graph text box.

Welcome to the matlok multimodal python copilot training datasets. This is an overview for our training and fine-tuning datasets found below:

  • ~2.35M unique source code rows
  • ~1.7M instruct alpaca yaml text rows
  • ~923K png knowledge graph images with alpaca text description
  • ~410K mp3s for ~2 years of continuous audio playtime
  • requires 1.2 TB storage on disk

Please reach out if you find an issue or want help with a similar dataset. We want to make it easier to create and share large datasets: hello@matlok.ai

Python Copilot Training using Knowledge Graph Images

These are knowledge graphs created for training generative ai models on how writing python CLIP transformer code by understanding an overview on:

  • classes
  • base classes for inheritance and polymorphism
  • global functions
  • imports

Class - Knowledge Graph Images

Here are samples from the python copilot class image knowledge graph dataset (304 GB). These images attempt to teach how to use software with a networkx graph saved as a png with an alpaca text box:

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPOnnxConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPOnnxConfig class

How to use the transformers/src/transformers/models/clip/tokenization_clip.py CLIPTokenizer class

How to use the transformers/src/transformers/models/clip/tokenization_clip.py CLIPTokenizer class

Base Class - Inheritance and Polymorphism Knowledge Graph Images

Here are samples from the python copilot base class inheritance and polymorphism image knowledge graph dataset (135 GB). These images attempt to teach how to use software with a networkx graph saved as a png with an alpaca text box:

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPConfig inherited base class(es)

How to use the transformers/src/transformers/models/clip/configuration_clip.py CLIPConfig inherited base class

How to use the transformers/src/transformers/models/clip/tokenization_clip_fast.py CLIPTokenizerFast inherited base class(es)

How to use the transformers/src/transformers/models/clip/tokenization_clip_fast.py CLIPTokenizerFast inherited base class

Global Functions - Knowledge Graph Images

Here are samples from the python copilot global functions image knowledge graph dataset (130 GB). These images attempt to teach how to use software with a networkx graph saved as a png with an alpaca text box:

How to use the transformers/src/transformers/models/clip/convert_clip_original_pytorch_to_hf.py global functions

How to use the transformers/src/transformers/models/clip/convert_clip_original_pytorch_to_hf.py global functions

How to use the transformers/src/transformers/models/clip/tokenization_clip.py global functions

How to use the transformers/src/transformers/models/clip/tokenization_clip.py global functions

Imports - Knowledge Graph Images

Here are samples from the python copilot imports image knowledge graph dataset (211 GB). These images attempt to teach how to use software with a networkx graph saved as a png with an alpaca text box:

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPTextConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPTextConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPVisionConfig class

How to use the transformers/src/transformers/models/clip/configuration_clip.py imports like the CLIPVisionConfig class

How to use the transformers/src/transformers/models/clip/tokenization_clip_fast.py imports like the CLIPTokenizerFast class

How to use the transformers/src/transformers/models/clip/tokenization_clip_fast.py imports like the CLIPTokenizerFast class

Audio Training Examples - Question and Answering in Alpaca

Below are extracted question and answer mp3 samples. Each mp3 is either a recording of the alpaca question or answer. Question mp3s use a different speaker than the answer mp3 voice.

Note: mobile browsers have issues playing the mp3s and show a question mark due to markdown failing to show the Listen link vs a confusing ? mark icon sorry!

Question Answer
Play question run_clip.mp3 => Listen Play answer run_clip.mp3 => Listen
Play question run_clip.Transform.mp3 => Listen Play answer run_clip.Transform.mp3 => Listen
Play question run_generation_contrastive_search.mp3 => Listen Play answer run_generation_contrastive_search.mp3 => Listen
Play question run_generation.mp3 => Listen Play answer run_generation.mp3 => Listen
Play question checkpointing.mp3 => Listen Play answer checkpointing.mp3 => Listen
Play question fully_sharded_data_parallel.mp3 => Listen Play answer fully_sharded_data_parallel.mp3 => Listen
Play question fully_sharded_data_parallel.FullyShardedDataParallel.mp3 => Listen Play answer fully_sharded_data_parallel.FullyShardedDataParallel.mp3 => Listen
Play question convert-hf-to-gguf.QwenModel.mp3 => Listen Play answer convert-hf-to-gguf.QwenModel.mp3 => Listen
Play question engine.DeepSpeedEngine.mp3 => Listen Play answer engine.DeepSpeedEngine.mp3 => Listen
Play question flash_mixtral_modeling.MixtralModel.mp3 => Listen Play answer flash_mixtral_modeling.MixtralModel.mp3 => Listen
Play question flash_mixtral_modeling.MixtralLayer.mp3 => Listen Play answer flash_mixtral_modeling.MixtralLayer.mp3 => Listen
Play question flash_mixtral_modeling.MixtralAttention.mp3 => Listen Play answer flash_mixtral_modeling.MixtralAttention.mp3 => Listen
Play question flash_mixtral_modeling.BlockSparseMoE.mp3 => Listen Play answer flash_mixtral_modeling.BlockSparseMoE.mp3 => Listen
Play question flash_mixtral_modeling.MixtralModel.mp3 => Listen Play answer flash_mixtral_modeling.MixtralModel.mp3 => Listen
Play question flash_llama_modeling.FlashLlamaAttention.mp3 => Listen Play answer flash_llama_modeling.FlashLlamaAttention.mp3 => Listen
Play question flash_llama_modeling.FlashLlamaLayer.mp3 => Listen Play answer flash_llama_modeling.FlashLlamaLayer.mp3 => Listen

Thanks for reading, listening and your time

Thanks for reading, listening and your time

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published