| Title: | DDI with R |
|---|---|
| Description: | Useful functions for various DDI (Data Documentation Initiative) related inputs and outputs. Converts data files to and from DDI, SPSS, Stata, SAS, R and Excel, including user declared missing values. |
| Authors: | Adrian Dusa [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-3525-9253>) |
| Maintainer: | Adrian Dusa <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.19.18 |
| Built: | 2026-05-29 13:11:23 UTC |
| Source: | https://github.com/dusadrian/DDIwR |
Build a dataset-level dictionary of missing values and their recoded targets for internal SPSS-style normalization or export-oriented Stata/SAS mapping.
buildDictionary(dataset, to = c("SPSS", "Stata", "SAS"), start = -91)buildDictionary(dataset, to = c("SPSS", "Stata", "SAS"), start = -91)
dataset |
A data frame. |
to |
Target software for the recoding dictionary. |
start |
Starting code for generated SPSS-style numeric missings. |
A data frame with columns such as label, old, new, count,
and n_variables.
This function converts (or transfers) between R, Stata, SPSS, SAS, Excel and DDI XML files. Unlike the regular import / export functions from packages haven or rio, this function uses the DDI standard as an exchange platform and facilitates a consistent conversion of the missing values.
convert( from, to = NULL, declared = TRUE, chartonum = FALSE, recode = TRUE, encoding = "UTF-8", csv = NULL, n_max = -1L, skip = 0L, ... )convert( from, to = NULL, declared = TRUE, chartonum = FALSE, recode = TRUE, encoding = "UTF-8", csv = NULL, n_max = -1L, skip = 0L, ... )
from |
A path to a file, or a data.frame object |
to |
Character, the name of a software package or a path to a specific file |
declared |
Logical, return the resulting dataset as a declared object |
chartonum |
Logical, recode character categorical variables to numerical categorical variables |
recode |
Logical, recode missing values |
encoding |
The character encoding used to read a file |
csv |
Complex argument, see the Details section |
n_max |
Integer, maximum number of rows to import from SPSS, Stata or SAS files |
skip |
Integer, number of rows to skip when importing from SPSS, Stata or SAS files |
... |
Additional parameters passed to other functions, see the Details section |
When the argument to specifies a certain statistical package
("R", "Stata", "SPSS", "SAS", "XPT") or "Excel", the name of the
destination file will be identical to the one in the argument from,
with an automatically added software specific extension.
SPSS portable file (with the extension ".por") can only be read, but not
written.
The argument to can also be specified as a path to a specific file,
in which case the software package is determined from its file extension.
The following extentions are currently recognized: .xml for DDI,
.rds for R, .dta for Stata, .sav for SPSS, .xpt for SAS, and
.xlsx for Excel.
Additional parameters can be specified via the three dots argument
..., that are passed to the respective functions from packages
haven and readxl. For instance the function
write_dta() has an additional argument called
version when writing a Stata file.
The formal arguments n_max and skip are used only when importing
foreign data files through the direct ReadStat readers, namely SPSS (.sav,
.zsav, .por), Stata (.dta) and SAS (.sas7bdat, .xpt). They can be
used to read only the first n_max rows or to import large files in batches
by skipping the first skip rows.
The most important argument to consider is called user_na, part of
the function read_sav(). Defaulted to FALSE in
package haven, in package DDIwR it is used as
having the value of TRUE, and it can be deactivated by explicitly
specifying user_na = FALSE in function convert().
The same three dots argument is used to pass additional parameters to other
functions in this package, for instance exportCodebook() when writing
to a DDI file. One of its argument embed (activated by default) can be
used to control embedding the data in the XML file. Deactivating it will
create a CSV file in the same directory, using the same file name as the
XML file.
When converting from DDI, if the dataset is not embedded in the XML file, the
CSV file is expected to be found in the same directory as the DDI Codebook,
and it should have the same file name as the XML file. The path to the CSV
file can be provided via the csv argument. Additional formal
parameters of the function read.csv() can
be passed via the same three dots ... argument. Alternatively, the
csv argument can also be an R data frame.
When converting to DDI, if the argument embed is set to FALSE, users
have the option to save the data in a separate CSV file (the default) or not
to save the data at all, by setting csv to FALSE.
The DDI .xml file generates unique IDs for all variables, if not already present in the attributes. These IDs are useful for referencing purposes, in newer versions of the DDI Codebook.
The argument chartonum signals recoding character categorical
variables, and employs the function recodeCharcat().
This only makes sense when recoding to Stata, which does not allow allocating
labels for anything but integer variables.
If the argument to is left to NULL, the data is (invisibly) returned
to the R enviroment. Conversion to R, either in the working space or as
a data file, will result (by default) in a data frame containing declared
labelled variables, as defined in package declared.
The current version reads and creates DDI Codebook version 2.6, with future
versions to extend the functionality for DDI Lifecycle versions 3.x and link
to the future package DDI4R for the UML model based version 4. It
extends the standard DDI Codebook by offering the possibility to embed a
serialized version of the R dataset into the XML file containing the
Codebook, within a notes child of the fileDscr component. This type of
generated codebook is unique to this package and automatically detected when
converting to another statistical software. This will likely be replaced with
a time insensitive text version.
Converting to SAS is experimental, and it relies on the same package
haven that uses the ReadStat C library. The safest way to
convert, which at the same time consistently converts the missing values, is
to export the data to a CSV file and create a setup file produced by function
setupfile() and run the commands manually.
Converting data from SAS is possible, however reading the metadata is also
experimental (the current version of haven only partially imports the
metadata). Either specify the path to the catalog file using the argument
catalog_file from the function read_sas(),
or have the catalog file in the same directory as the data set, with the same
file name and the extension .sas7bcat
The argument recode controls how missing values are treated. If the
input file has SPSS like numeric codes, they will be recoded to extended
(a-z) missing types when converting to Stata or SAS. If the input has Stata
like extended codes, they will be recoded to SPSS like numeric codes.
Missing values are harmonized across the entire dataset by default when
exporting to Stata or SAS. This automatically builds a dataset-level missing
value dictionary when possible. Use harmonize = FALSE via the three
dots argument to deactivate this behavior. The alias
harmonise = FALSE is also accepted.
The character encoding is usually passed to the corresponding functions
from package haven. It can be set to NULL to reset at the
default in that package.
Converting to SPSS works with numerical and character labelled vectors, with or without labels. Date/Time variables are partially supported by package haven: either having such a variable with no labels and missing values, or if labels and missing values are declared the variable is automatically coerced to numeric, and users may have to make the proper settings in SPSS.
An invisible R data frame, when the argument to is NULL.
Adrian Dusa
DDI - Data Documentation Initiative, see the DDI Alliance website.
setupfile,
getCodebook,
declared
## Not run: # Assuming an SPSS file called test.sav is located in the working directory # The following command imports the file into the R environment: test <- convert("test.sav") # The following command will extract the metadata in a DDI Codebook and # produce a test.xml file in the same directory convert("test.sav", to = "DDI") # The data may be saved separately from the DDI file, using: convert("test.sav", to = "DDI", embed = FALSE) # To produce a Stata file: convert("test.sav", to = "Stata") # To produce an R file: convert("test.sav", to = "R") # To produce an Excel file: convert("test.sav", to = "Excel") ## End(Not run)## Not run: # Assuming an SPSS file called test.sav is located in the working directory # The following command imports the file into the R environment: test <- convert("test.sav") # The following command will extract the metadata in a DDI Codebook and # produce a test.xml file in the same directory convert("test.sav", to = "DDI") # The data may be saved separately from the DDI file, using: convert("test.sav", to = "DDI", embed = FALSE) # To produce a Stata file: convert("test.sav", to = "Stata") # To produce an R file: convert("test.sav", to = "R") # To produce an Excel file: convert("test.sav", to = "Excel") ## End(Not run)
addChildren() adds one or more children to a standard DDI Codebook element
(see makeElement), anyChildren() checks if an element has any
children at all, hasChildren() checks if the element has specific children,
indexChildren() returns the positions of the children among all containing
children, and getChildren() extracts them. For attributes and content,
there are dedicated functions to add*(), remove*() and change*().
addChildren(children, to, overwrite = TRUE, ...) anyChildren(element) getChildren(xpath, from, ...) hasChildren(element, name) indexChildren(element, name) removeChildren(name, from, overwrite = TRUE, ...) addContent(content, to, overwrite = TRUE) makePath(xpath, from, overwrite = TRUE, ...) moveChild(xpath, from, to, overwrite = TRUE, ...) replaceChild(xpath, with, overwrite = TRUE, ...) changeContent(content, to, overwrite = TRUE) removeContent(from, overwrite = TRUE) addAttributes(attrs, to, overwrite = TRUE) anyAttributes(element) changeAttributes(attrs, from, overwrite = TRUE) hasAttributes(element, name) removeAttributes(name, from, overwrite = TRUE)addChildren(children, to, overwrite = TRUE, ...) anyChildren(element) getChildren(xpath, from, ...) hasChildren(element, name) indexChildren(element, name) removeChildren(name, from, overwrite = TRUE, ...) addContent(content, to, overwrite = TRUE) makePath(xpath, from, overwrite = TRUE, ...) moveChild(xpath, from, to, overwrite = TRUE, ...) replaceChild(xpath, with, overwrite = TRUE, ...) changeContent(content, to, overwrite = TRUE) removeContent(from, overwrite = TRUE) addAttributes(attrs, to, overwrite = TRUE) anyAttributes(element) changeAttributes(attrs, from, overwrite = TRUE) hasAttributes(element, name) removeAttributes(name, from, overwrite = TRUE)
children |
A standard element of class |
to |
A standard element of class |
overwrite |
Logical, overwrite the original object in the parent frame. |
... |
Other arguments, mainly for internal use. |
element |
A standard element of class |
xpath |
Character, an xpath to a DDI Codebook element. Indexed segments
are supported using square brackets, e.g. |
from |
A standard element of class |
name |
Character, name(s) of specific child element / attribute. |
content |
Character, the text content of a DDI element. |
with |
A standard element of class |
attrs |
A list of specific attribute names and values. |
Although an XML list generally allows for multiple contents, sometimes spread
between the children elements, it is preferable to maintain a single content
(eventually separated with carriage return characters for separate lines).
XPath resolution accepts indexed segments using [n]. When an index is
missing, the first matching element is selected.
Arguments are unique, and can be changed by simply referring to their names.
Elements can be repeated. For example, dataDscr contains one var element
per dataset variable. When multiple var elements exist, referring only to
the name is ambiguous. Use indexed xpaths like var[3] to target a specific
instance, or use indexChildren() to list all positions for iteration.
If more than one children, they should be grouped into a list.
Functions addContent, changeContent, removeContent, addAttributes,
changeAttributes, and removeAttributes accept either a standard DDI element
or a character xpath. When an xpath is provided, the target element is
resolved and replaced in the root element.
An invisible standard DDI element. Functions any*() and has*()
return a logical (vector).
Adrian Dusa
Create a DDI Codebook version 2.6, XML file structure.
exportCodebook(codeBook, to = "", OS = "", indent = 2, ...)exportCodebook(codeBook, to = "", OS = "", indent = 2, ...)
codeBook |
A standard element of class |
to |
either a character string naming a file or a connection open for writing ("" indicates output to the console) |
OS |
The target operating system, for the eol - end of line character(s) |
indent |
Indent width, in number of spaces |
... |
Other arguments, mainly for internal use |
#' The information object is a codeBook DDI element having at least two
main children:
fileDscr, with the data provided as a sub-component named
datafile
dataDscr, having as many components as the number of variables in the
(meta)data.
For the moment, only DDI codebook version 2.6 is exported, and DDI Lifecycle is planned for future releases.
A small number of required DDI specific elements and attributes have generic
default values, if not otherwise specified in the codeBook list object. For
the current version, these are: monolang, xmlang, IDNo, titl,
agency, URI (for the holdings element), distrbtr, abstract and
level (for the otherMat element).
The codeBook object is exported as provided, and it is the user's
responsibility to test its validity against the XML schema. Most of these
arguments help create the mandatory element stdyDscr, which cannot be
harvested from the dataset. If this element is not already present, providing
any of these arguments via the three dots ... gate, signal an automatic
creation and inclusion with the values provided.
Argument xmlang expects a two letter ISO country coding, for instance
"en" to indicate English, or "ro" to indicate Romanian etc. The original
DDI Codebook attribute is called xml:lang, which for obvious reasons
had to be renamed into this R function.
A logical argument monolang signal if the document is monolingual, in which
case the attribute xmlang is placed a single time for the entire document
in the codeBook element. For multilingual documents, xmlang should be
placed in the attributes of various other (child) elements, for instance
abstract, or the study title, name of the distributing institution,
variable labels etc.
The argument OS can be either:"windows" (default), or "Windows", "Win", "win","MacOS", "Darwin", "Apple", "Mac", "mac","Linux", "linux".
The end of line separator changes only when the target OS is different from the running OS.
The argument indent controls how many spaces will be used in the XML
file, to indent the different sub-elements.
An XML file containing a DDI version 2.6 metadata.
Adrian Dusa
https://ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/field_level_documentation.html
## Not run: exportCodebook(codeBook, to = "codebook.xml") # using a namespace exportCodebook(codeBook, to = "codebook.xml", xmlns = "ddi") ## End(Not run)## Not run: exportCodebook(codeBook, to = "codebook.xml") # using a namespace exportCodebook(codeBook, to = "codebook.xml", xmlns = "ddi") ## End(Not run)
Extract a list containing the variable labels, value labels and any available information about missing values.
getCodebook(from = NULL, encoding = "auto", ignore = NULL, ...)getCodebook(from = NULL, encoding = "auto", ignore = NULL, ...)
from |
A path to a file, or a data frame object |
encoding |
The character encoding used to read a file |
ignore |
Character, ignore DDI elements when reading from an XML file |
... |
Additional arguments for this function (internal use only) |
This function extracts the metadata from an R dataset, or alternatively it can read an XML file containing a DDI codebook version 2.6, or an SPSS or Stata file and returns a list containing the variable labels, value labels and information about the missing values.
If the input is a dataset, it will extract the variable level metadata (labels, missing values etc.). From a DDI XML file, it will import all metadata elements, the most expensive being the data description.
It additionally attempts to automatically detect a type for each variable:
cat: |
categorical variable using numeric values |
catchar: |
categorical variable using character values |
catnum: |
categorical variable for which numerical summaries |
| can be calculated (ex. a 0...10 Likert response scale) | |
num: |
numerical |
numcat: |
numerical variable with few enough values (ex. number of children) |
| for which a table of frequencies is possible in addition to frequencies |
Apart from utf8, other encodings might be necessary when reading from
SPSS or DDI XML files, for instance latin1 or windows-1252, and it also
accepts bytes for multi-byte encodings. To use the one specified in the
file, set encoding = NULL. The default is encoding = "auto", which tries to
detect the encoding automatically.
For the moment, only DDI Codebook is supported, but DDI Lifecycle is planned to be implemented.
An R list roughly equivalent to a DDI Codebook, containing all variables, their corresponding variable labels and value labels, and (if applicable) missing values if imported and found.
Adrian Dusa
x <- data.frame( A = declared( c(1:5, -92), labels = c(Good = 1, Bad = 5, NR = -92), na_values = -92 ), C = declared( c(1, -91, 3:5, -92), labels = c(DK = -91, NR = -92), na_values = c(-91, -92) ) ) getCodebook(from = x)x <- data.frame( A = declared( c(1:5, -92), labels = c(Good = 1, Bad = 5, NR = -92), na_values = -92 ), C = declared( c(1, -91, 3:5, -92), labels = c(DK = -91, NR = -92), na_values = c(-91, -92) ) ) getCodebook(from = x)
catgry elements for a particular variableUtility function to create the catgry elements, as well as all
necessary sub-elements (e.g. catValu, labl, varFormat) along with their
associated XML attributes.
makeCategories(metadata)makeCategories(metadata)
metadata |
A list of two or three components: |
A list of standard catgry DDI elements.
Adrian Dusa
notes element for the dataset.Create the notes element to embed a serialized, gzip-ed version of the data
in the fileDscr section of the codeBook.
makeDataNotes(data)makeDataNotes(data)
data |
An R dataframe. |
A standard notes DDI element.
Adrian Dusa
Creates a standard DDI element.
makeElement( name, children = NULL, attributes = NULL, content = NULL, fill = FALSE, ... )makeElement( name, children = NULL, attributes = NULL, content = NULL, fill = FALSE, ... )
name |
Character, a DDI Codebook element name. |
children |
A list of standard DDI codebook elements. |
attributes |
A vector of named values. |
content |
Character scalar. |
fill |
Logical, fill the element with arbitrary values for its mandatory children and attributes |
... |
Other arguments, see Details. |
The structure of a DDI element in R follows the usual structure of
an XML node, as returned by the function as_list() from package xml2,
with one additional (first) component named ".extra" to accommodate any other
information that is not part of the DDI element.
In the DDI Codebook, most elements and their attributes are optional, but some are mandatory. In case of attributes, some become mandatory only if the element itself is present. The mandatory elements need to be present in the final version of the Codebook, to pass the validation against the XML schema.
By activating the argument fill, this function creates DDI elements
containing all mandatory (sub)elements and (their) attributes, filled with
arbitrary values that can be changed later on. Some recommended elements are
also filled, as expected by the CESSDA Data Catalogue profile for DDI
Codebook.
By default, the Codebook is assumed to have a single language for all
elements. The argument monolang can be deactivated through the "..."
gate, in which situation the appropriate elements will receive a default
argument xmlang = "en". For other languages, that argument can also be
provided through the "..." gate.
One such DDI Codebook element is the stdyDscr (Study Description), with the
associated mandatory children, for instance title, ID number, distributor,
citation, abstract etc.
The complete list of elements for which default values are added is: "IDNo", "titl", "titlStmt", "distrbtr", "distStmt", "holdings", "citation", "abstract", "stdyInfo", "stdyDscr", "prodDate", "software", "prodStmt", "docDscr" and "otherMat".
A standard list element of class "DDI" with reserved component names.
Adrian Dusa
addChildren
getChildren
showDetails
stdyDscr <- makeElement("stdyDscr", fill = TRUE) # easier to extract with: getChildren("citation/titlStmt/titl", from = stdyDscr)stdyDscr <- makeElement("stdyDscr", fill = TRUE) # easier to extract with: getChildren("citation/titlStmt/titl", from = stdyDscr)
Recodes a character categorical variables to a numerical categorical variable.
recodeCharcat(x, ...)recodeCharcat(x, ...)
x |
A character categorical variable |
... |
Other internal arguments |
For this function, a categorical variable is something else than a base
factor. It should be an object of class "declared" with a specific
attribute called "labels" that stores the value labels.
A numeric categorical variable of the same class as the input.
Adrian Dusa
x <- declared( c(letters[1:5], -91), labels = c(Good = "a", Bad = "e", NR = -91), na_values = -91 ) recodeCharcat(x)x <- declared( c(letters[1:5], -91), labels = c(Good = "a", Bad = "e", NR = -91), na_values = -91 ) recodeCharcat(x)
A function to recode all missing values to either SPSS or Stata types, uniformly (re)using the same codes across all variables.
recodeMissings( dataset, to = c("SPSS", "Stata", "SAS"), dictionary = NULL, start = -91, ... )recodeMissings( dataset, to = c("SPSS", "Stata", "SAS"), dictionary = NULL, start = -91, ... )
dataset |
A data frame |
to |
Software to recode missing values for |
dictionary |
A named vector, with corresponding Stata missing codes to SPSS missing values |
start |
A named vector, with corresponding Stata missing codes to SPSS missing values |
... |
Other internal arguments |
Package DDIwR uses numeric declared missing codes as its internal R
representation. Recoding to "Stata" or "SAS" is therefore best viewed as
an export-oriented temporary representation, not as a preferred internal
storage strategy for declared vectors in R.
When no dictionary is provided, export-oriented recoding can be performed either by scanning the entire dataset for a harmonized mapping, or by recoding each variable independently. The package now defaults to the faster per-variable strategy in export preparation. Supplying a dictionary keeps the harmonized cross-dataset behavior.
When a dictionary is not provided, it is automatically constructed from the available data and metadata, using negative numbers starting from -91 and up to 27 letters starting with "a".
If the dataset contains mixed variables with SPSS and Stata style missing values, unless otherwise specified in a dictionary it uses other codes than the existing ones.
For the SPSS type of missing values, the resulting variables are coerced to a declared labelled format.
Unlike SPSS, Stata does not allow labels for character values. Both cannot be
transported from SPSS to Stata, it is either one or another. If labels are
more important to preserve than original values (especially the information
about the missing values), the argument chartonum replaces all character
values with suitable, non-overlapping numbers and adjusts the labels
accordingly.
If no labels are found in the metadata, the original values are preserved.
A data frame with all missing values recoded consistently.
Adrian Dusa
# x <- data.frame( # A = declared( # c(1:5, -92), # labels = c(Good = 1, Bad = 5, NR = -92), # na_values = -92 # ), # B = labelled( # c(1:5, tagged_na('a')), # labels = c(DK = tagged_na('a')) # ), # C = declared( # c(1, -91, 3:5, -92), # labels = c(DK = -91, NR = -92), # na_values = c(-91, -92) # ) # ) # xrec <- recodeMissings(x, to = "Stata") # attr(xrec, "dictionary") # Supply a dictionary to harmonize missing meanings across variables # dictionary <- data.frame( # old = c(-91, -92, "a"), # new = c("c", "d", "c") # ) # recodeMissings(x, to = "Stata", dictionary = dictionary) # recodeMissings(x, to = "SPSS") # dictionary$new <- c(-97, -98, -97) # recodeMissings(x, to = "SPSS", dictionary = dictionary) # recodeMissings(x, to = "SPSS", start = 991) # recodeMissings(x, to = "SPSS", start = -8)# x <- data.frame( # A = declared( # c(1:5, -92), # labels = c(Good = 1, Bad = 5, NR = -92), # na_values = -92 # ), # B = labelled( # c(1:5, tagged_na('a')), # labels = c(DK = tagged_na('a')) # ), # C = declared( # c(1, -91, 3:5, -92), # labels = c(DK = -91, NR = -92), # na_values = c(-91, -92) # ) # ) # xrec <- recodeMissings(x, to = "Stata") # attr(xrec, "dictionary") # Supply a dictionary to harmonize missing meanings across variables # dictionary <- data.frame( # old = c(-91, -92, "a"), # new = c("c", "d", "c") # ) # recodeMissings(x, to = "Stata", dictionary = dictionary) # recodeMissings(x, to = "SPSS") # dictionary$new <- c(-97, -98, -97) # recodeMissings(x, to = "SPSS", dictionary = dictionary) # recodeMissings(x, to = "SPSS", start = 991) # recodeMissings(x, to = "SPSS", start = -8)
Search function to return elements that contain a certain word or regular expression pattern.
searchFor( x, where = c("everywhere", "title", "description", "attributes", "examples"), ... )searchFor( x, where = c("everywhere", "title", "description", "attributes", "examples"), ... )
x |
Character, either word(s) or a regular expression. |
where |
Character, in which section(s) to search for. |
... |
Other arguments to be passed to the grepl() function. |
Character vector of DDI element names.
Adrian Dusa
Creates a setup file, based on a list of variable and value labels.
setupfile( obj, file = "", type = "all", csv = NULL, recode = TRUE, OS = "", stringnum = TRUE, ... )setupfile( obj, file = "", type = "all", csv = NULL, recode = TRUE, OS = "", stringnum = TRUE, ... )
obj |
A data frame, or a list object containing the metadata, or a path to a data file or to a directory where such objects are located, for batch processing |
file |
Character, the (path to the) setup file to be created |
type |
The type of setup file, can be: "SPSS", "Stata", "SAS", "R", or "all" (default) |
csv |
The original dataset, used to create the setup file commands, or a path to the directory where the .csv files are located, for batch processing |
recode |
Logical, recode missing values to extended .a-.z range |
OS |
The target operating system, for the eol - end of line character(s) |
stringnum |
Logical, recode string variables to numeric |
... |
Other arguments, see Details below |
When a path to a metadata directory is specified for the argument obj,
then next argument file is silently ignored and all created setup files
are saved in a directory called "Setup Files" that (if not already found) is
created in the working directory.
The argument file expects the name of the final setup file being
saved on the disk. If not specified, the name of the object provided for the
obj argument will be used as a filename.
If file is specified, the argument type is automatically
determined from the file's extension, otherwise when type = "all", the
function produces one setup file for each supported type.
If batch processing multiple files, the function will inspect all files in
the provided directory, and retain only those with the extension .R or
.r or DDI versions with the extension .xml or .XML (it will
subsequently generate an error if the .R files do not contain an object list,
or if the .xml files do not contain a DDI structured metadata file).
If the metadata directory contains a subdirectory called "data" or
"Data", it will match the name of the metadata file with the name of the
.csv file (their names have to be exactly the same, regardless of
their extension).
The csv argument can provide a data frame object produced by reading
the .csv file, or a path to the directory where the .csv files are
located. If the user doesn't provide something for this argument, the
function will check the existence of a subdirectory called data in the
directory where the metadata files are located.
In batch mode, the code starts with the argument delim = ",", but if
the .csv file is delimited differently it will also try hard to find other
delimiters that will match the variable names in the metadata file. At the
initial version 0.1-0, the automatically detected delimiters include ";"
and "\t".
The argument OS (case insensitive) can be either:"Windows" (default), or "Win","MacOS", "Darwin", "Apple", "Mac","Linux".
The end of line character(s) changes only when the target OS is different from the running OS.
A setup file to complement the imported raw dataset.
Adrian Dusa
## Not run: # IMPORTANT: # make sure to set the working directory to a directory with # read/write permissions # setwd("/path/to/read/write/directory") setupfile(codeBook) # if the csv data file is available setupfile(codeBook, csv="/path/to/csv/file.csv") # generating a specific type of setup file setupfile(codeBook, file = "codeBook.do") # type = "Stata" also works # other types of possible utilizations, using paths to specific files # an XML file containing a DDI metadata object setupfile("/path/to/the/metadata/file.xml", csv="/path/to/csv/file.csv") # or in batch mode, specifying entire directories setupfile("/path/to/the/metadata/directory", csv="/path/to/csv/directory") ## End(Not run)## Not run: # IMPORTANT: # make sure to set the working directory to a directory with # read/write permissions # setwd("/path/to/read/write/directory") setupfile(codeBook) # if the csv data file is available setupfile(codeBook, csv="/path/to/csv/file.csv") # generating a specific type of setup file setupfile(codeBook, file = "codeBook.do") # type = "Stata" also works # other types of possible utilizations, using paths to specific files # an XML file containing a DDI metadata object setupfile("/path/to/the/metadata/file.xml", csv="/path/to/csv/file.csv") # or in batch mode, specifying entire directories setupfile("/path/to/the/metadata/directory", csv="/path/to/csv/directory") ## End(Not run)
Describe what a DDI element is
showDetails(x, ...) showDescription(x, ...) showAttributes(x, name = NULL, ...) globalAttributes() showExamples(x, ...) showRelations(x, ...) showLineages(x, ...)showDetails(x, ...) showDescription(x, ...) showAttributes(x, name = NULL, ...) globalAttributes() showExamples(x, ...) showRelations(x, ...) showLineages(x, ...)
x |
Character, a DDI Codebook element name. |
... |
Other arguments, mainly for internal use. |
name |
Character, print only a specific element (name) |
All arguments having predefined values such as "(Y | N) : N" are mandatory if the element is used
Adrian Dusa
showDetails("codeBook") showAttributes("catgry") showExamples("abstract") showLineages("titl")showDetails("codeBook") showAttributes("catgry") showExamples("abstract") showLineages("titl")
Attempts a minimal validation of a DDI Codebook element, by searching for mandatory elements and attributes.
testValid(element, monolang = TRUE)testValid(element, monolang = TRUE)
element |
A standard element of class |
monolang |
Logical, the codebook file is monolingual |
This function currently attempts a minimal check for the absolute
most mandatory elements, such as the stdyDscr. An absolute bare version
of this element, filled with arbitrary default values, can be produced with
the function makeElement(), activating its attribute fill.
It also checks for chained expectations, that is element X is mandatory only
if the parent element is present.
Future versions will implement more functionality for recommended elements and attributes, with the intention to provide a 1:1 validation as offered by the "CESSDA Metadata Validator".
To ease the validation of the DDI Codebook XML files, the argument monolang
is activated by default. This means a single attribute xmlang in the main
codeBook element. For multi-language codebooks, an error is flagged if this
argument is missing where appropriate.
A character vector of validation problems found.
Adrian Dusa
Update an XML file containing a DDI Codebook.
updateCodebook(xmlfile, with, ...)updateCodebook(xmlfile, with, ...)
xmlfile |
A path to a DDI Codebook XML document. |
with |
An R object containing a root |
... |
Other internal arguments. |
This function replaces entire Codebook sections. Any such section present in the R object will replace the corresponding section from the XML document.
Adrian Dusa
Rebuilds the internal schema object, from a (newer) XML Schema codebook.xsd file.
updateSchema(xsd = NULL, return = FALSE)updateSchema(xsd = NULL, return = FALSE)
xsd |
A path to the Codebook XML Schema file. |
return |
Return an R object representing the schema instead of updating the internal one. |
Releasing a new stable version of the DDI Codebook takes about 10 years. There are numerous elements and attributes that have to work together, and most importantly the Codebook has to obey the backward compatibility rule. Until a new version is released, the Codebook schema is incrementally modified by the DDI Alliance, in their GitHub repository.
This function is intended to update the internal DDI Codebook schema object, parsing the latest version of the Codebook XML Schema file.
Unless the codebook.xsd file is provided by the user, the function will attempt to read it from the DDI Alliance GitHub repository, located at: https://github.com/ddialliance/ddi-c_2
Adrian Dusa