The implementation for the paper "LASCA: A Large-Scale Stable Customer Segmentation Approach to Credit Risk Assessment".
Create the running environment with conda 23.5.2
with Python 3.7.16
:
conda create -n lasca python==3.7.16
conda activate lasca
Install the requirements for running LASCA:
pip install -r requirements.txt
Due to the inclusion of a significant amount of personal privacy and commercially sensitive information in the datasets, the contents of real-world datasets are not publicly disclosed in this paper.
To simulate a real-world dataset, we provide mock user data in the data/demo directory. This includes two CSV files: df1_demo.csv
and df2_demo.csv
.
df1_demo.csv is the pre-binning result (100 pre-bins over 3 months) and df2_demo.csv is user data (5000 users' score over 8 months).
These two csv file are generated by random, serving as the input user data for LASCA.
Note that this demo dataset are only used for academic research, it does not represent any real business situation.
Run the phase 1 of LASCA: high quality dataset construction (HDC). This phase take user dataset as input and output the solutions dataset for data-driven optimization
python experiments.py run_hdc
Run the phase 2 of LASCA: reliable data-driven optimization (RDO). This phase take the collected solutions dataset as input and output the optimized binning solutions.
python experiments.py run_rdo
LASCA
├─ core
│ ├─ main.py
│ ├─ model.py
│ ├─ optimizer.py
│ └─ task.py
├─ data
│ └─ demo
│ ├─ hdc.csv
│ ├─ df1_demo.py
│ └─ df2_demo.py
├─ utils
│ ├─ logger.py
│ ├─ metric.py
│ └─ utils.py
├─ experiments.py
├─ README.md
└─ requirements.txt
Notes for the project structure:
- The files in the folder
core
are the main components of the algorithms. - The files in the folder
utils
are some useful functions for the implementation of LASCA. - The files in the folder
data
are the user datasets of each task. Since the three real-world user datasets are classified, a demo user dataset is provided.
This paper has been accepted by the SIGKDD 2024 conference. Should you find our work beneficial to your studies or work, we kindly request that you acknowledge our contributions by citing our work:
Yongfeng Gu, Yupeng Wu, Huakang Lu, Xingyu Lu, Hong Qian, Jun Zhou, and Aimin Zhou. 2024. LASCA: A Large-Scale Stable Customer Segmentation Approach to Credit Risk Assessment. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24).