LASCA

The implementation for the paper "LASCA: A Large-Scale Stable Customer Segmentation Approach to Credit Risk Assessment".

Setup

Create the running environment with conda 23.5.2 with Python 3.7.16:

conda create -n lasca python==3.7.16
conda activate lasca

Install the requirements for running LASCA:

pip install -r requirements.txt

Real-world user datasets

Due to the inclusion of a significant amount of personal privacy and commercially sensitive information in the datasets, the contents of real-world datasets are not publicly disclosed in this paper.

To simulate a real-world dataset, we provide mock user data in the data/demo directory. This includes two CSV files: df1_demo.csv and df2_demo.csv. df1_demo.csv is the pre-binning result (100 pre-bins over 3 months) and df2_demo.csv is user data (5000 users' score over 8 months). These two csv file are generated by random, serving as the input user data for LASCA. Note that this demo dataset are only used for academic research, it does not represent any real business situation.

Run LASCA

Run the phase 1 of LASCA: high quality dataset construction (HDC). This phase take user dataset as input and output the solutions dataset for data-driven optimization

python experiments.py run_hdc

Run the phase 2 of LASCA: reliable data-driven optimization (RDO). This phase take the collected solutions dataset as input and output the optimized binning solutions.

python experiments.py run_rdo

The project structure

LASCA
├─ core
│  ├─ main.py
│  ├─ model.py
│  ├─ optimizer.py
│  └─ task.py
├─ data
│  └─ demo
│     ├─ hdc.csv
│     ├─ df1_demo.py
│     └─ df2_demo.py
├─ utils
│  ├─ logger.py
│  ├─ metric.py
│  └─ utils.py
├─ experiments.py
├─ README.md
└─ requirements.txt

Notes for the project structure:

The files in the folder core are the main components of the algorithms.
The files in the folder utils are some useful functions for the implementation of LASCA.
The files in the folder data are the user datasets of each task. Since the three real-world user datasets are classified, a demo user dataset is provided.

Citation

This paper has been accepted by the SIGKDD 2024 conference. Should you find our work beneficial to your studies or work, we kindly request that you acknowledge our contributions by citing our work:

Yongfeng Gu, Yupeng Wu, Huakang Lu, Xingyu Lu, Hong Qian, Jun Zhou, and Aimin Zhou. 2024. LASCA: A Large-Scale Stable Customer Segmentation Approach to Credit Risk Assessment. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LASCA

Setup

Real-world user datasets

Run LASCA

The project structure

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
core		core
data/demo		data/demo
utils		utils
README.md		README.md
experiments.py		experiments.py
requirements.txt		requirements.txt

Gu-Youngfeng/LASCA_CODE

Folders and files

Latest commit

History

Repository files navigation

LASCA

Setup

Real-world user datasets

Run LASCA

The project structure

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages