Skip to content
/ GFR Public

Genomic Functional Regression in R [R Package - dev version]

License

Notifications You must be signed in to change notification settings

frahik/GFR

Repository files navigation

Genomic Functional Regression Logo

Genomic Functional Regression analysis in R | Development version 0.9.13

LGPL, Version 3.0 Status of the Repo:  Initial development is in progress, but there has not yet been a stable, usable release suitable for the public Dowloads from the CRAN CRAN

[Last README update: 2018-05-22]


Table of contents

News of this version (0.9.13)

Revision 13

  • The BFR function now can obtain the PCC (Percentage of Correct Classification) to the Ordinal or binary data (BFR(..., response_type = 'ordinal')).
  • forceClean parameter from cleanDat() has changed to noConfirm.

Revision 12

  • The package includes a better BFR shiny app (use runBFRInterface() ).

Revision 10

  • The package now inclues a shiny app.

Past revisions

  • Rename from BGFRA -> BFR -> GFR
  • Initial development is in progress, but there has not yet been a stable, usable release suitable for the public; this is a pre-release, be careful.

Instructions for proper implementation

Installation

To complete installation of dev version of GFR from GitHub, you have to install Rtools Software and a few packages first.

install.packages('devtools')
devtools::install_github('frahik/GFR')

Demostration examples

Package Web version

The package includes a shiny app to very easy use of the package, includes the most essencial parts, to use this,

library(GFR)
runInterface()

Availabe data

Three data sets are available inside the package, to use it use data() function,

rm(list = ls())
library(GFR)
data("Wheat_GFR")
head(Wheat_GFR) # Load from data Wheat_GFR
##   Response    Line     Env
## 1 1.587324 3827768 Drought
## 2 3.140629 6176013 Drought
## 3 3.145934 4905617 Drought
## 4 0.984776 6931494 Drought
## 5 2.936291 6932344 Drought
## 6 1.882823 6935856 Drought
paste0('Dimension of the Bands matrix: ', dim(Wheat_Bands)) # Load from data Wheat_GFR
## [1] "Dimension of the Bands matrix: 300"
## [2] "Dimension of the Bands matrix: 250"
paste0('Number of wavelenths:', length(Wheat_Wavelengths)) # Load from data Wheat_GFR
## [1] "Number of wavelenths:250"
data("WheatI_GFR")
head(WheatI_GFR) # Load from data WheatI_GFR
##   Line Response       Env
## 1    1 6.862352 Irrigated
## 2    2 6.844494 Irrigated
## 3    3 6.290179 Irrigated
## 4    4 7.026786 Irrigated
## 5    5 6.013394 Irrigated
## 6    6 6.639137 Irrigated
paste0('Dimension of the Bands matrix: ',dim(WheatI_Bands)) # Load from data WheatI_GFR
## [1] "Dimension of the Bands matrix: 976"
## [2] "Dimension of the Bands matrix: 250"
paste0('Number of wavelenths:',length(WheatI_Wavelengths)) # Load from data WheatI_GFR
## [1] "Number of wavelenths:250"
data("Maize_GFR")
head(Maize_GFR) # Load from data Maize_GFR
##        Line Env Trait Response
## 1 CKDHL0002 EBU Yield     6.65
## 2 CKDHL0003 EBU Yield     6.10
## 3 CKDHL0004 EBU Yield     5.07
## 4 CKDHL0005 EBU Yield     6.55
## 5 CKDHL0007 EBU Yield     6.82
## 6 CKDHL0008 EBU Yield     6.88
paste0('Dimension of the Bands matrix: ',dim(Maize_Bands)) # Load from data Maize_GFR
## [1] "Dimension of the Bands matrix: 2781"
## [2] "Dimension of the Bands matrix: 48"
paste0('Number of wavelenths:',length( Maize_Wavelengths)) # Load from data Maize_GFR
## [1] "Number of wavelenths:48"

Fitting a model

To more simple way to fit a model is with one environment and one trait data,

data("Wheat_GFR")
data <- Wheat_GFR[which(Wheat_GFR$Env == 'Drought'), ]

fm <- BFR(data, nIter = 1000, burnIn = 300, verbose = F)

plot(fm)

Predictive model with a K-Folds Cross-validation

To do a predictive model with a cross-validation, only we need to provide a list object with the type and the number of crossvalidation to do, in the package is available two cross-validatin types, in the following code, we see the CV K-Folds,

data("Wheat_GFR")
data <- Wheat_GFR[which(Wheat_GFR$Env == 'Drought'), ]
Crossvalidation_list <- list(Type = 'KFold', nFolds = 3)

pm <- BFR(data, nIter = 1000, burnIn = 300, set_seed = 10, CrossValidation = Crossvalidation_list, verbose = F)
## Warning in cor(Tab_i$y_p, Tab_i$y_o, use = "pairwise.complete.obs"): the
## standard deviation is zero
summary(pm)
##          Fold     Env Trait Pearson SE_Pearson   MSEP SE_MSEP   Time
## 1           1 Drought        0.1775         NA 0.2653      NA 0.1500
## 2           2 Drought            NA         NA 0.6432      NA 0.1600
## 3           3 Drought       -0.0001         NA 0.4550      NA 0.1800
## 4 Average_all Drought        0.0887     0.0725 0.4545  0.1091 0.1633
boxplot(pm)

Generate automatically a linear predictor (Only Multi-Environment example)

For more advanced predictions, we also provide a eta generator, for the estimations, this function automatically prepare the data and do a lot of validations, this is the recomended way to do a proper analysis,

library(GFR)
data("Wheat_GFR")
CrossV <- list(Type = 'KFold', nFolds = 3)
ETA2 <- ETAGenerate(Wheat_GFR, datasetID = 'Line', priorType = 'BayesB', Bands = Wheat_Bands,
                    Wavelengths = Wheat_Wavelengths, method = 'Alternative2', basisType = 'Bspline.Basis', nBasis = 21)


pm2 <- BFR(ETA = ETA2, data, nIter = 1000, burnIn = 300, set_seed = 10, CrossValidation = CrossV, verbose = F)
summary(pm2)
##           Fold              Env Trait Pearson SE_Pearson   MSEP SE_MSEP
## 1            1        Irrigated        0.2519         NA 0.1672      NA
## 2            1          Drought        0.3727         NA 0.3692      NA
## 3            1 ReducedIrrigated        0.0914         NA 0.1330      NA
## 4            2 ReducedIrrigated        0.3414         NA 0.1868      NA
## 5            2          Drought        0.6947         NA 0.2161      NA
## 6            2        Irrigated        0.1901         NA 0.3198      NA
## 7            3          Drought        0.5841         NA 0.3702      NA
## 8            3 ReducedIrrigated        0.1342         NA 0.1391      NA
## 9            3        Irrigated       -0.3217         NA 0.5997      NA
## 10 Average_all        Irrigated        0.0401     0.1818 0.3622  0.1266
## 11 Average_all          Drought        0.5505     0.0945 0.3185  0.0512
## 12 Average_all ReducedIrrigated        0.1890     0.0772 0.1529  0.0170
##    Time
## 1  1.46
## 2    NA
## 3    NA
## 4  1.24
## 5    NA
## 6    NA
## 7  1.71
## 8    NA
## 9    NA
## 10 1.47
## 11   NA
## 12   NA
plot(pm2)

Generate automatically a linear predictor (Multi-Trait & Multi-Environment example)

Also, you can add some special terms, but be carefull,

data("Maize_GFR")
CrossV <- list(Type = 'RandomPartition', NPartitions = 3, PTesting = .25)
ETA3 <- ETAGenerate(Maize_GFR, basisType = 'Bspline.Basis', Bands = Maize_Bands, Wavelengths = Maize_Wavelengths, priorType = 'BRR', method = 'Simple', nBasis = 21)
ETA3$Design
## [1] "Bayes-MultiBands"
ETA3$Basis
## [1] "Bspline.Basis"
ETA3$Prior
## [1] "BRR"
ETA3$Method
## [1] "Simple"
pm3 <- BFR(ETA = ETA3, data, nIter = 1000, burnIn = 300, set_seed = 10, CrossValidation = CrossV, verbose = F)
summary(pm3)
##           Fold Env Trait Pearson SE_Pearson     MSEP SE_MSEP    Time
## 1            1 EBU   ASI  0.0713         NA   9.7606      NA 30.3300
## 2            1 KAK   ASI  0.1738         NA  12.6603      NA      NA
## 3            1 KTI   ASI -0.1371         NA  19.8582      NA      NA
## 4            1 EBU    PH  0.1953         NA  77.2690      NA      NA
## 5            1 KAK    PH  0.1504         NA 201.2580      NA      NA
## 6            1 KTI    PH  0.1461         NA 266.3032      NA      NA
## 7            1 EBU Yield  0.2388         NA  23.3576      NA      NA
## 8            1 KAK Yield  0.2385         NA  10.2770      NA      NA
## 9            1 KTI Yield  0.2846         NA  16.8525      NA      NA
## 10           2 EBU   ASI  0.0757         NA   8.6373      NA 29.0400
## 11           2 KAK   ASI -0.1204         NA  14.0191      NA      NA
## 12           2 KTI   ASI -0.1624         NA  18.7926      NA      NA
## 13           2 EBU    PH  0.1275         NA  79.0484      NA      NA
## 14           2 KAK    PH  0.0071         NA 143.2196      NA      NA
## 15           2 KTI    PH -0.0440         NA 224.1392      NA      NA
## 16           2 EBU Yield  0.1901         NA   9.2023      NA      NA
## 17           2 KAK Yield  0.1832         NA  14.5796      NA      NA
## 18           2 KTI Yield  0.2896         NA  20.2985      NA      NA
## 19           3 EBU   ASI -0.0877         NA  29.2658      NA 31.6400
## 20           3 KAK   ASI -0.0743         NA  13.6334      NA      NA
## 21           3 KTI   ASI -0.2539         NA  20.6632      NA      NA
## 22           3 EBU    PH  0.1998         NA  93.8359      NA      NA
## 23           3 KAK    PH  0.0689         NA 139.9538      NA      NA
## 24           3 KTI    PH  0.1504         NA 245.1025      NA      NA
## 25           3 EBU Yield  0.2450         NA   7.5862      NA      NA
## 26           3 KAK Yield  0.3698         NA  11.2297      NA      NA
## 27           3 KTI Yield  0.5140         NA  17.5302      NA      NA
## 28 Average_all EBU   ASI  0.0198     0.0537  15.8879  6.6968 30.3367
## 29 Average_all KAK   ASI -0.0070     0.0914  13.4376  0.4043      NA
## 30 Average_all KTI   ASI -0.1845     0.0355  19.7713  0.5418      NA
## 31 Average_all EBU    PH  0.1742     0.0234  83.3844  5.2509      NA
## 32 Average_all KAK    PH  0.0755     0.0415 161.4771 19.9128      NA
## 33 Average_all KTI    PH  0.0842     0.0641 245.1816 12.1718      NA
## 34 Average_all EBU Yield  0.2247     0.0174  13.3820  5.0096      NA
## 35 Average_all KAK Yield  0.2638     0.0553  12.0288  1.3047      NA
## 36 Average_all KTI Yield  0.3627     0.0756  18.2271  1.0540      NA

Handmade linear predictor

If you are a expert and know what are you doing, you can generate the ETA manually,

CrossV <- list(Type = 'KFold', nFolds = 3)
ETA4 <- list(Env = list(X = model.matrix(~0+as.factor(Wheat_GFR$Env)), model = 'FIXED'),
             Line = list(X = model.matrix(~0+as.factor(Wheat_GFR$Line)), model = 'BRR'),
             Bands = list(X = Bspline.Basis(Wheat_Bands, Wheat_Wavelengths, nBasis = 23), model = 'BayesA'))
pm4 <- BFR(data = Wheat_GFR, ETA = ETA4, nIter = 1000, burnIn = 300, CrossValidation = CrossV, set_seed = 10, verbose = F)
summary(pm4)
##           Fold              Env Trait Pearson SE_Pearson   MSEP SE_MSEP
## 1            1        Irrigated        0.0829         NA 0.1649      NA
## 2            1          Drought        0.2781         NA 0.3881      NA
## 3            1 ReducedIrrigated        0.5140         NA 0.0760      NA
## 4            2 ReducedIrrigated        0.3857         NA 0.1803      NA
## 5            2          Drought        0.7182         NA 0.2142      NA
## 6            2        Irrigated        0.1544         NA 0.3431      NA
## 7            3          Drought        0.5427         NA 0.3707      NA
## 8            3 ReducedIrrigated        0.2257         NA 0.1787      NA
## 9            3        Irrigated       -0.4164         NA 0.7249      NA
## 10 Average_all        Irrigated       -0.0597     0.1795 0.4110  0.1652
## 11 Average_all          Drought        0.5130     0.1279 0.3243  0.0553
## 12 Average_all ReducedIrrigated        0.3751     0.0834 0.1450  0.0345
##    Time
## 1  0.32
## 2    NA
## 3    NA
## 4  0.50
## 5    NA
## 6    NA
## 7  0.59
## 8    NA
## 9    NA
## 10 0.47
## 11   NA
## 12   NA
plot(pm4, select = 'MSEP')

boxplot(pm4, select = 'MSEP')

Clean the working directory

If you don’t need anymore the files that the BFR() function creates, you can clean the directory with the following function provided in the package,

cleanDat(noConfirm = TRUE)
##     ETA_Bands_parBayesB.dat   ETA_Bands_ScaleBayesA.dat 
##                        TRUE                        TRUE 
##          ETA_Bands_varB.dat               ETA_Env_b.dat 
##                        TRUE                        TRUE 
##      ETA_EnvxTrait_varB.dat ETA_EnvxTraitxLine_varB.dat 
##                        TRUE                        TRUE 
##      ETA_Line_parBayesB.dat           ETA_Line_varB.dat 
##                        TRUE                        TRUE 
##  ETA_LinexEnv_parBayesB.dat       ETA_LinexEnv_varB.dat 
##                        TRUE                        TRUE 
##     ETA_LinexTrait_varB.dat             ETA_Trait_b.dat 
##                        TRUE                        TRUE 
##                      mu.dat                    varE.dat 
##                        TRUE                        TRUE

How to cite this package

First option, by the article paper

(Comming soon)

Second option, by the manual package

(Comming soon)

Contributions

If you have any suggestions or feedback, I would love to hear about it. Feel free to report new issues in this link, also if you want to request a feature/report a bug, or make a pull request if you can contribute.

Authors

  • Francisco Javier Luna-Vázquez (Author, Maintainer)
  • Osval Antonio Montesinos-López (Author)
  • Abelardo Montesinos-López (Author)
  • José Crossa (Author)
  • Gustavo de los campos (Author)

About

Genomic Functional Regression in R [R Package - dev version]

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published