Preparing Mammography Images for the ResNet Algorithm and Performing Preprocessing of Mammography Images for the Teknofest 2022 Health Artificial Intelligence Competition.
Read this in other languages: Turkish
I would like to share with you the data preprocessing project I developed as part of the Health Artificial Intelligence Competition at the previous Teknofest event. Within the scope of the project, I have developed specialized preprocessing codes for the processing and analysis of .dcm extension medical image data. I am excited to share these codes with you.
In my project, I developed image preprocessing steps to better process and analyze healthcare data. With these codes, I converted .dcm extension files to .png format, cropped out unwanted text details such as undesirable annotations on the image for deep learning algorithms, resized the images for the ResNet algorithm, and reduced differences between color channels. Additionally, to maintain data integrity, I conducted image naming processes and file arrangements.
During the algorithm development phase, I worked with a smaller dataset, and subsequently, the algorithm was applied to the entire dataset. The complete dataset consists of 16,000 .dcm files.
Key Features:
- .dcm files were converted to .png format.
- The angle information of the written photograph on the images was cropped from the image
- Outlier images were removed to enhance the algorithm's training and learning processes.
- The dimensions of the images were normalized to ensure data integrity.
- The equalization of different color channels and enhancement of image quality were aimed.
- Data management was facilitated through customized naming and file organization.
- The developed codes automated the data preprocessing steps and made them more suitable for analysis.
- 📁 teknofest_data_preprocessing
- 📁 test ➜ Raw data folder
- 📁 Data ➜ Target folder for classified data
- 📄 siralama.xlsx ➜ Excel table information
Note: The data to be categorized into classes A, B, C, and D must be previously created in the data folder. After creation, the preprocessing algorithm should be executed.
Excel table of raw data
Raw data
Contents of raw data (.dcm files) folder
Data after the data preprocessing algorithm
Classification of processed data
Classified and processed data
Representation of processed data and labels
Detailed representation of data
Information of data after preprocessing algorithm
Through this project, we have taken a step towards better management and processing of healthcare data.
I have shared my code on GitHub, aiming to create a resource that other colleagues working in this field can benefit from. I hope it will be useful for those who are undertaking similar work.