readreplace
modifies the dataset currently in memory by making replacements that are specified in an external dataset, the replacements file.
The list of differences saved by the SSC program cfout
is designed for later use by readreplace
. After the addition of a new variable to the cfout
differences file that holds the new (correct) values, the file can be used as the readreplace
replacements file.
readreplace
is available through SSC: type ssc install cfout
in Stata to install.
ARCED's modification: Fix for Stata 17. Stata program to make replacements specified in an external dataset
The certification script of readreplace
is cscript/readreplace.do
. If you are new to certification scripts, you may find this Stata Journal article helpful. See this guide for more on readreplace
testing.
net install readreplace, all replace from(https://raw.githubusercontent.com/ARCED-Foundation/readreplace/master)
Converted automatically from SMCL:
log html readreplace.sthlp readreplace.md
The help file looks best when viewed in Stata as SMCL.
Titlereadreplace -- Make replacements that are specified in an external dataset
readreplace using filename, id(varlist) variable(varname) value( varname) [options]
options Description ------------------------------------------------------------------------- Main * id(varlist) variables for matching observations with the replacements specified in the using dataset * variable(varname) variable in the using dataset that indicates the variables to replace * value(varname) variable in the using dataset that stores the new values
Import insheet use insheet to import filename; the default use use use to load filename excel use import excel to import filename import(options) options to specify to the import command ------------------------------------------------------------------------- * id(), variable(), and value() are required.
Description
readreplace modifies the dataset currently in memory by making replacements that are specified in an external dataset, the replacements file.
The list of differences saved by the SSC program cfout is designed for later use by readreplace. After the addition of a new variable to the cfout differences file that holds the new (correct) values, the file can be used as the readreplace replacements file.
readreplace changes the contents of existing variables by making replacements that are specified in a separate dataset, the replacements file. The replacements file should be long by replacement such that each observation is a replacement to complete. Replacements are described by a variable that contains the name of the variable to change, specified to option variable(), and a variable that stores the new value for the variable, specified to option value(). The replacements file should also hold variables shared by the dataset in memory that indicate the subset of the data for which each change is intended; these are specified to option id(), and are used to match observations in memory to their replacements in the replacements file.
Below, an example replacements file is shown with three variables: uniqueid, to be specified to id(), Question, to be specified to variable(), and CorrectValue, to be specified to value().
+--------------------------------------+ | uniqueid Question CorrectValue | |--------------------------------------| | 105 district 13 | | 125 age 2 | | 138 gender 1 | | 199 district 34 | | 2 am_failure 3 | +--------------------------------------+
For each observation of the replacements file, readreplace essentially runs the following replace command:
replace Question_value = CorrectValue_value if uniqueid == uniqueid_value
That is, the effect of readreplace here is the same as these five replace commands:
replace district = 13 if uniqueid == 105 replace age = 2 if uniqueid == 125 replace gender = 1 if uniqueid == 138 replace district = 34 if uniqueid == 199 replace am_failure = 3 if uniqueid == 2
The variable specified to value() may be numeric or string; either is accepted.
The replacements file may be one of the following formats:
o Comma-separated data. This is the default format, but you may specify option insheet; either way, readreplace will use insheet to import the replacements file. You can also specify any options for insheet to option import(). o Stata dataset. Specify option use to readreplace, passing any options for use to import(). o Excel file. Specify option excel to readreplace, passing any options for import excel to import().
readreplace may be employed for a variety of purposes, but it was designed to be used as part of a data entry process in which data is entered two times for accuracy. After the second entry, the two separate entry datasets need to be reconciled. cfout can compare the first and second entries, saving the list of differences in a format that is useful for data entry teams. Data entry operators can then add a new variable to the differences file for the correct value. Once this variable has been entered, load either of the two entry datasets, then run readreplace with the new replacements file.
The GitHub repository for readreplace is here. Previous versions may be found there: see the tags.
Remarks for promoting storage types
readreplace will change variables' storage types in much the same way as replace, promoting storage types according to these rules:
1. Storage types are only promoted; they are never compressed. 2. The storage type of float variables is never changed. 3. If a variable of integer type (byte, int, or long) is replaced with a noninteger value, its storage type is changed to float or double according to the current set type setting. 4. If a variable of integer type is replaced with an integer value that is too large or too small for its current storage type, it is promoted to a longer type (int, long, or double). 5. When needed, str# variables are promoted to a longer str# type or to strL.
Make the changes specified in correctedValues.csv . use firstEntry . readreplace using correctedValues.csv, id(uniqueid) variable(question) value(correctvalue)
Same as the previous readreplace command, but specifies option case to insheet to import the replacements file . use firstEntry . readreplace using correctedValues.csv, id(uniqueid) variable(Question) value(CorrectValue) import(case)
Same as the previous readreplace command, but loads the replacements file as a Stata dataset . use firstEntry . readreplace using correctedValues.dta, id(uniqueid) variable(Question) value(CorrectValue) use
readreplace stores the following in r():
Scalars r(N) number of real changes
Macros r(varlist) variables replaced
Matrices r(changes) number of real changes by variable
Ryan Knight Matthew White
For questions or suggestions, submit a GitHub issue or e-mail researchsupport@poverty-action.org.
Also see
Help: [D] generate
User-written: cfout, bcstats, mergeall