-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathSampling Discrete Mode.Rmd
262 lines (212 loc) · 10.6 KB
/
Sampling Discrete Mode.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
---
title: "Sampling Discrete Mode"
author: "Xianbin Cheng"
date: "July 8, 2019"
output: html_document
---
# Objective
* This is an instruction for the discrete mode.
# Method
###1. Load libraries and source R code.
```{r, warning = FALSE, message = FALSE}
source(file = "Sampling_libraries.R")
source(file = "Sampling_contamination.R")
source(file = "Sampling_contamination_3d.R")
source(file = "Sampling_visualization.R")
source(file = "Sampling_assay_prep.R")
source(file = "Sampling_plan.R")
source(file = "Sampling_plan_3d.R")
source(file = "Sampling_assay_3d.R")
source(file = "Sampling_assay.R")
source(file = "Sampling_outcome_3d.R")
source(file = "Sampling_outcome.R")
source(file = "Sampling_iteration.R")
source(file = "Sampling_tuning_3d.R")
source(file = "Sampling_analysis.R")
```
###2. List important parameters.
**Sampling contamination**
* `c_hat` = the estimated mycotoxin concentration in the container
* `x_lim` = the limits of the x-axis
* `y_lim` = the limits of the y-axis
* `z_lim` = the limits of the z-axis
* `dis_level` = a list of parameters associated with contamination level
+ `type` = "constant"
- `args` = a constant value (ppb)
+ `type` = "Gamma"
- `args` = a list of the mode (ppb) and the lower bound (ppb), assuming contamination level follows $lb+Gamma(\alpha, \theta=\frac {mode-20}{\alpha-1})$
* `n_affected` = the number of affected kernels near the contamination spot
* `covar_mat` = covariance matrix of `x` and `y`, which defines the spread of contamination. Assume the spread follows a 2D normal distribution with var(X) = 0.25, var(Y) = 0.25 and cov(X, Y) = 0
**Sampling strategies**
* `method_sp` = sampling strategy (`srs`, `strs`, `ss`)
+ `ss` = probe system
* `n_sp` = the number of sampling points
* `d` = inner diameter of the probe (m)
* `L` = length of the probe (m). We assume it's fully inserted to the corn
+ `L` = 5, 6, 8, 10, 12 feet
+ Depend on the container type. Remember to convert it to meters.
* `rho` = average density of a kernel (g/cm3)
* `m_kbar` = average mass of a kernel (g)
* `homogeneity` = degree to which kernels are ground
+ FGIS requires at least 60% of the ground sample passes a U.S. Standard No. 20 sieve
* `container` = container of corn
+ "truck"
- `depth_ft` = the depth of corn inside the truck.
- There are two sampling patterns for trucks, depending on whether depth is higher than or lower than 4 ft.
+ "barge"
+ "hopper"
- `compartment` = 2 or 3,
- `type` = `open_top` or `trough`
**Sample Assay**
* `method_det` = method of detection
+ "ELISA": LOD = 1 ng/g (Helica Total Aflatoxins ELISA kit)
* `Mc` = maximum concentration limit of mycotoxin (ng/g or ppb)
* `tox` = type of mycotoxin
+ `AF` = aflatoxin
- Minimum sample size:
- 908 g (2 lbs) for trucks
- 1362 g (3 lbs) for railcars (hopper)
- 4540 g (10 lbs) for barges, sublots and composite samples
- 4540 g is the recommended submitted sample size
+ `FM` = fumonisin
+ `ZEN` = zearalenone
+ `OTA` = Ochratoxin A
+ `DEN` = Deoxynivalenol
**Iteration**
* `n_iter` = the number of iterations
```{r}
# Pre-generate healthy kernel concentrations to save time
conc_neg = rpert(n = 10^6, min = 0, mode = 0.7, max = 19.99, shape = 80)
```
```{r}
## Contamination
#n_contam = 100
c_hat = 10
x_lim = c(0, 3)
y_lim = c(0, 2)
z_lim = c(0, 1.5)
lims = list(xlim = x_lim, ylim = y_lim, zlim = z_lim)
dis_level = list(type = "constant", args = 40000)
#dis_level = list(type = "Gamma", args = list("mode"= 40000, "lb" = 20))
spread = "discrete"
n_affected = 0
covar_mat = make_covar_mat(spread = spread, varx = 0.0009, vary = 0.0009, varz = 0.0009,
covxy = 0, covxz = 0, covyz = 0)
# Sampling
method_sp = "ss"
container = "truck"
d = 0.1
sp_radius = d / 2
L = get_Lprobe(container = container, lims = lims)
m_kbar = 0.3
rho = 1.28
homogeneity = 0.6
#n_sp = 5
#depth_ft = 5
#compartment = 2
#type = "open_top"
# Assaying
method_det = "ELISA aflatoxin"
tox = "AF"
Mc = 20
# Iteration
n_iter = 10
# Wrap arguments into one single list
ArgList_default = list(c_hat = c_hat, lims = lims, spread = spread, covar_mat = covar_mat,
n_affected = n_affected, dis_level = dis_level, method_sp = method_sp,
sp_radius = sp_radius,container = container, L = L, rho = rho, m_kbar = m_kbar,
conc_neg = conc_neg, tox = tox, Mc = Mc, method_det = method_det, verbose = FALSE, homogeneity = homogeneity)
```
###3. Set a parameter for tuning and a vector of values to tune over.
* For example, if you want to tune the parameter `n_contam`, just assign the character "n_contam" to the variable `param_name`, and then assign a vector of tuning values to the variable `vals`.
* Parameters available for tuning:
+ Continuous parameters: `n_contam`, `n_affected`, `dis_level`, `d`
+ Categorical parameters: `method_sp`, `method_det`, `container`
* Some parameters have a finite range of values. Going beyond the allowable range may not produce an error, but it will definitely produce weird results.
* `seed` = the random seed for contamination locations
* `n_seed` = the number of seeds.
```{r}
param_name = "c_hat"
vals = c(10, 20, 30)
seed = 5
n_seed = 10
```
###4. Tune the parameter and produce the following results:
* `c_true` = true contamination level (ng/g)
* `decision` = a number that indicates lot decision.
+ 5 = Accept lot. Mean sample concentration < LOD
+ 6 = Reject lot. Mean sample concentration >= Mc
+ 7 = Accept lot
* In diagnostics mode
+ `mean_raw` = mean contamination level of raw sample
+ `mean_work` = mean contamination level of work portion
+ `mean_test` = mean contamination level of test portion
* `param` = the tuning parameter
```{r, warning = FALSE}
# Iterate the model over tuning parameters. Each set of parameters is iterated n_seed x n_iter times
results = map(.x = vals, .f = tune_param, Args = ArgList_default, n_seed = n_seed, n_iter = n_iter, param = param_name)
```
5. Clean up values of probability of detection and probability of acceptance.
```{r, warning = FALSE}
results_cleaned = metrics_dis_n(data = results, Mc = Mc)
```
# Visualization
###1. Run the model once for visualization purposes.
```{r}
# Remove unnecessary arguments
ArgList_vis = ArgList_default
ArgList_vis[c("Mc", "method_det", "verbose")] = NULL
ArgList_vis$seed = NaN
# Produce intermediate outputs
one_iteration = do.call(what = sim_intmed, args = ArgList_vis)
```
```{r, eval = FALSE, warning = FALSE, echo = FALSE, out.width = "50%"}
overlay_draw(method_sp = method_sp, data = one_iteration[["contam_sp_xy"]]$combined , spread = spread, xlim = lims$xlim, ylim = lims$ylim)
overlay_draw_probe(data = one_iteration[["contam_sp_xy"]]$combined, lims = lims, L = L)
sample_dist(raw = one_iteration$contam_sp_xy$raw, work = one_iteration$sample$work, test = one_iteration$sample$test, Mc = Mc)
assay_draw(data = one_iteration$sample$test, Mc = Mc, method_det = method_det, spread = spread)
```
###2. Visualize the probability of detection and acceptance.
```{r}
plot_metrics_dis(df = results_cleaned, xlab = "Input aflatoxin concentration (ppb)", verbose = TRUE)
```
# Appendix
* This is for the purpose of debugging only. The following functions are important intermediate functions. You may figure out which step is malfunctioning by running through these functions line by line.
* For more info on the model framework, please refer to the function call graph in the slides "Schemes.pptx".
```{r, eval = FALSE}
# These functions produce intermediate data
contam_xy = sim_contam_new(c_hat = c_hat, rho = rho, m_kbar = m_kbar, conc_neg = conc_neg,
lims = lims, spread = spread, n_affected = n_affected,
covar = covar_mat, dis_level = dis_level, seed = seed)
sp_xy = sim_plan_new(method_sp = method_sp, spread = spread, lims = lims,
radius = sp_radius, container = container)
dist_contam_sp = calc_dist(df_contam = contam_xy, df_sp = sp_xy, spread = spread,
method_sp = method_sp)
contam_sp_xy = gen_sim_data_new(df_contam = contam_xy, df_sp = sp_xy, dist = dist_contam_sp, spread = spread, L = L, rho = rho, m_kbar = m_kbar, sp_radius = sp_radius, conc_neg = conc_neg)
sample_dis = get_sample_dis(data = contam_sp_xy$raw, container = container, m_kbar = m_kbar, tox = tox, homogeneity = homogeneity)
decision = lot_decision_new(data = sample_dis$test, Mc = Mc, spread = spread, method_det = method_det)
words(x = decision)
# This function produces the dataframe "contam_sp_xy" and the distance dataframe
test1 = sim_intmed(c_hat = c_hat, lims = lims, spread = spread, covar_mat = covar_mat,
n_affected = n_affected, dis_level = dis_level, method_sp = method_sp,
container = container, sp_radius = sp_radius, L = L, rho = rho,
m_kbar = m_kbar, tox = tox, conc_neg = conc_neg, seed = seed, homogeneity = homogeneity)
# This function produces
test2 = sim_outcome_new(c_hat = c_hat, lims = lims, spread = spread, method_sp = method_sp,
method_det = method_det, covar_mat = covar_mat, n_affected = n_affected,
dis_level = dis_level, sp_radius = sp_radius, container = container,
L = L, rho = rho, m_kbar = m_kbar, conc_neg = conc_neg, Mc = Mc,
tox = tox, verbose = TRUE, seed = seed, homogeneity = homogeneity)
# This function produces a sim_outcome_new() function with all the input parameters loaded
test3 = gen_sim_outcome_new(c_hat = c_hat, lims = lims, spread = spread,
method_sp = method_sp, method_det = method_det, covar_mat = covar_mat,
n_affected = n_affected, dis_level = dis_level, sp_radius = sp_radius,
container = container, L = L, rho = rho, m_kbar = m_kbar, Mc = Mc,
conc_neg = conc_neg, tox = tox, verbose = TRUE, seed = seed, homogeneity = homogeneity)
# First layer of iteration: This function iterates the sim_outcome_new() for n_iter times with the same contamination locations
test4 = sim_iterate(n_iter = n_iter, Args = ArgList_default, seed = seed)
# Second layer of iteration: Iterate the 1st layer with different seeds
test5 = sim_iterate2(n_seed = n_seed, n_iter = n_iter, Args = ArgList_default)
# Tuning layer
test6 = tune_param(Args = ArgList_default, n_seed = n_seed, n_iter = n_iter, param = param_name, val = 30)
```