Output of Network is NaN #4

Mrvolcan · 2021-04-11T17:19:15Z

Hi,

I am trying to reproduce your results using the code and data provided.
The code runs without errors, however the output is NAN values. Loss function stays constant instead of decreasing.
Bart reconstruction seems to work fine.
Could it by a version issue? I am using Python 3.7, in combination with your requirements file. Hardware: nvidia rtx 3090, 24gb ram. Some packages were not listed in the requirements so I tried to find one that works. E.g. "gast" (version 2.2), skiimage. I have attached the requirements that I am using, maybe you can comment if that is correct.
Do you have any idea where to look for the error?

I would like to use the code later for preclinical MRI data (single coil). Could you comment on the following

Do I need to normalize the data or do any other preprocessing (right now it is complex k-space data)
Should I train each region and image type separately (e.g. head T1, abdomen T2) or can I train them together?
In case image size is different can I zerofill the images to have the same matrix size or does the original MR data need to be the same matrix size.

Thanks!

requirements.txt

fharman · 2021-06-15T09:46:52Z

Hi @Mrvolcan,

I have the same error and could not overcome it. If you could overcome, please share it. Because I could not have reconstructed Bart recon,too with NaN metric value.

Best,

duancaohui · 2021-06-22T01:51:02Z

I ran into the same problem as above:
I carefully check the Data, network architecture and training parameters as follows:

Train data:
The knee dataset: http://old.mridata.org/fullysampled/knees
20 subjects, spliting into train, validation, and test subset. The training data consists of 4800 images.

Network architecture
I used the default codes to construct network architecture:

im_out_place = mri_model.unroll_fista(
            ks_place,
            sense_place,
            is_training=True,
            verbose=True,
            do_hardproj=FLAGS.do_hard_proj,
            num_summary_image=FLAGS.num_summary_image,
            resblock_num_features=FLAGS.feat_map,
            num_grad_steps=FLAGS.num_grad_steps,
            conv=FLAGS.conv,
            do_conjugate=FLAGS.do_conjugate,
            activation=FLAGS.activation
        )

Training parameters:
I used the same training parameters as in the article:

batch_size = 2
adam_beta1 = 0.9,
adam_beta1 = 0.999. 
learning rate = 0.001

Error results:
As the number of training steps increases, the reconstructed image suddenly becomes abnormal and the error is very large as follows:

Results of step = 33: