finding “C_cov”

I was playing with R’s var() function the other day.
Not sure whether it simply returns population variance by dividing over n or with Bessel’s correction by dividing over (n-1), I quickly entered some test data.

> rate <- c(9.34, 8.50, 7.62, 6.93, 6.60)
> var(rate)
[1] 1.27272
> sum((rate - mean(rate))^2)/length(rate)
[1] 1.018176
> sum((rate - mean(rate))^2)/(length(rate) -1)
[1] 1.27272

So looks like it indeed applies Bessel’s correction, which is great – then I was curious to see how var() is implemented.

I know I can normally see a bit of code by simply print function’s name so I start by:

> var
function (x, y = NULL, na.rm = FALSE, use)
{
    if (missing(use))
        use <- if (na.rm)
            "na.or.complete"
        else "everything"
    na.method <- pmatch(use, c("all.obs", "complete.obs", "pairwise.complete.obs",
        "everything", "na.or.complete"))
    if (is.na(na.method))
        stop("invalid 'use' argument")
    if (is.data.frame(x))
        x <- as.matrix(x)
    else stopifnot(is.atomic(x))
    if (is.data.frame(y))
        y <- as.matrix(y)
    else stopifnot(is.atomic(y))
    .Call(C_cov, x, y, na.method, FALSE)
}
<bytecode: 0x1e38b30>
<environment: namespace:stats>

so far so good with some basic validations and argument parsing until I ran to last line where it goes on to call C_cov.

Tried to print C_cov but no luck this time:

> C_cov
Error: object 'C_cov' not found

Not much experience with how this gets implemented I begin my exploration with function getAnywhere() and it return something seemingly confusing:

> getAnywhere(C_cov)
4 differing objects matching ‘C_cov’ were found
in the following places
  namespace:stats
Use [] to view one of them

Following the instruction and tried to display all 4:

> getAnywhere(C_cov)[1:4]
$name
[1] "cov"

$address
<pointer: 0xe87af0>
attr(,"class")
[1] "RegisteredNativeSymbol"

$dll
DLL name: stats
Filename: /usr/lib/R/library/stats/libs/stats.so
Dynamic lookup: FALSE

$numParameters
[1] 4

So it seems to suggest its real name is “cov” and is a symbol in dll /usr/lib/R/library/stats/libs/stats.so.
Of course it is not a real dll in strict sense probably the author who created this comes from window background, for us we would probably call it dso to be accurate.

So I dived into stats’s source code and I want to understand how the name “C_cov” connects with “cov” or should I expect to see “C_cov” in the source of stats package?

A quick grep shows there is no definition but only some similar reference:

$package/R-3.1.1/src/library/stats>
$grep -nr "C_cov" *
R/cor.R:150:    .Call(C_cov, x, y, na.method, method == "kendall")
R/cor.R:154:            .Call(C_cov, Rank(na.omit(x)), NULL, na.method,
R/cor.R:163:            .Call(C_cov, Rank(dropNA(x, nas)), Rank(dropNA(y, nas)),
R/cor.R:169:    .Call(C_cov, x, y, na.method, method == "kendall")
R/cor.R:184:    .Call(C_cov, x, y, na.method, FALSE)

And there are cov() as a c function at the same time:

package/R-3.1.1/src/library/stats/src/cov.c
 50 SEXP cov(SEXP x, SEXP y, SEXP na_method, SEXP kendall)
 51 {
 52     return corcov(x, y, na_method, kendall, FALSE);
 53 }

A grep against “C_” shows me this is probably a convention for C/Fortran symbols which gets registered in the namespace file:

package/R-3.1.1/src/library/stats/src/NAMESPACE
# Refer to all C/Fortran routines by their name prefixed by C_
useDynLib(stats, .registration = TRUE, .fixes = "C_")
...

And NAMESPACE file is normally loaded and parsed by base system’s loadNamespace():

 170 loadNamespace <- function (package, lib.loc = NULL,
 171                            keep.source = getOption("keep.source.pkgs"),
 172                            partial = FALSE, versionCheck = NULL)
 173 {
...
 339         if (! packageHasNamespace(package, package.lib)) {
 340             hasNoNamespaceError <-
...
 360         else parseNamespaceFile(package, package.lib, mustExist = FALSE)
...
1060 ## NB this needs a decorated name, foo_ver, if appropriate
1061 packageHasNamespace <- function(package, package.lib) {
1062     namespaceFilePath <- function(package, package.lib)
1063         file.path(package.lib, package, "NAMESPACE")
1064     file.exists(namespaceFilePath(package, package.lib))
1065 }

And by calling parse() on the NAMESPACE file we turned all the contents into expression objects and then we call the corresponding useDynLib:

1194                useDynLib = {
1195
1196                    ## This attempts to process as much of the
1197                    ## information as possible when NAMESPACE is parsed
1198                    ## rather than when it is loaded and creates
1199                    ## NativeRoutineMap objects to handle the mapping
1200                    ## of symbols to R variable names.
1201
1202                    ## The name is the second element after useDynLib
1203                    dyl <- as.character(e[2L])
...
1244                        ## Deal with any prefix/suffix pair.
1245                        fixes <- c("", "")
1246                        idx <- match(".fixes", names(symNames))
1247                        if(!is.na(idx)) {
1248                            ## Take .fixes and treat it as a call,
1249                            ## e.g. c("pre", "post") or a regular name
1250                            ## as the prefix.
1251                            if(symNames[idx] != "") {
1252                                e <- parse(text = symNames[idx], srcfile = NULL)[[1L]]
1253                                if(is.call(e))
1254                                    val <- eval(e)
1255                                else
1256                                    val <- as.character(e)
1257                                if(length(val))
1258                                    fixes[seq_along(val)] <- val
1259                            }
1260                            symNames <- symNames[-idx]
1261                        }
1262
1263                        ## Deal with a .registration entry. It must be
1264                        ## .registration = value and value will be coerced
1265                        ## to a logical.
1266                        useRegistration <- FALSE
1267                        idx <- match(".registration", names(symNames))
1268                        if(!is.na(idx)) {
1269                            useRegistration <- as.logical(symNames[idx])
1270                            symNames <- symNames[-idx]
1271                        }
1272
1273                        ## Now merge into the NativeRoutineMap.
1274                        nativeRoutines[[ dyl ]] <<-
1275                            if(dyl %in% names(nativeRoutines))
1276                                mergeNativeRoutineMaps(nativeRoutines[[ dyl ]],
1277                                                       useRegistration,
1278                                                       symNames, fixes)
1279                            else
1280                                nativeRoutineMap(useRegistration, symNames,
1281                                                 fixes)
1282                    }
1283                },

And most importantly it checks .fix in our useDynlib() directive and actually allows us to put both prefix and suffix.
In our case we provide it as a single string rather than a list so suffix defaults to “”.

The critical part is when we actually call nativeRoutineMap to apply the fixes, and we can guess it must have applied to all symbols fetched from the dso, which led us to the code:

 280         assignNativeRoutines <- function(dll, lib, env, nativeRoutines) {
 281             if(length(nativeRoutines) == 0L) return(NULL)
 282
 283             if(nativeRoutines$useRegistration) {
 284                 ## Use the registration information to register ALL the symbols
 285                 fixes <- nativeRoutines$registrationFixes
 286                 routines <- getDLLRegisteredRoutines.DLLInfo(dll, addNames = FALSE)
 287                 lapply(routines,
 288                        function(type) {
 289                            lapply(type,
 290                                   function(sym) {
 291                                       varName <- paste0(fixes[1L], sym$name, fixes[2L])
 292                                       if(exists(varName, envir = env))
 293                                           warning(gettextf("failed to assign RegisteredNativeSymbol for %s to %s since %s is already defined in the %s namespace",
 294                                                            sym$name, varName, varName, sQuote(package)),
 295                                                   domain = NA)
 296                                       else
 297                                           assign(varName, sym, envir = env)
 298                                   })
 299                        })
 300
 301             }

so the line paste0() is what we have been looking for and the sym$name is the raw name, in this case,”cov”.

So the routines are returned from function call getDLLRegisteredRoutines.DLLInfo which is pretty straight forward and end up fetching all 4 classes of symbols:

1269 SEXP attribute_hidden
1270 R_getRegisteredRoutines(SEXP dll)
1271 {
1272     DllInfo *info;
1273     SEXP ans, snames;
1274     int i;
1275     const char * const names[] = {".C", ".Call", ".Fortran", ".External"};
1276
1277     if(TYPEOF(dll) != EXTPTRSXP &&
1278        R_ExternalPtrTag(dll) != install("DLLInfo"))
1279   error(_("R_getRegisteredRoutines() expects a DllInfo reference"));
1280
1281     info = (DllInfo *) R_ExternalPtrAddr(dll);
1282     if(!info) error(_("NULL value passed for DllInfo"));
1283
1284
1285     PROTECT(ans = allocVector(VECSXP, 4));
1286
1287     SET_VECTOR_ELT(ans, 0, R_getRoutineSymbols(R_C_SYM, info));
1288     SET_VECTOR_ELT(ans, 1, R_getRoutineSymbols(R_CALL_SYM, info));
1289     SET_VECTOR_ELT(ans, 2, R_getRoutineSymbols(R_FORTRAN_SYM, info));
1290     SET_VECTOR_ELT(ans, 3, R_getRoutineSymbols(R_EXTERNAL_SYM, info));
1291
1292     PROTECT(snames = allocVector(STRSXP, 4));
1293     for(i = 0; i < 4; i++)
1294   SET_STRING_ELT(snames, i, mkChar(names[i]));
1295     setAttrib(ans, R_NamesSymbol, snames);
1296     UNPROTECT(2);
1297     return(ans);
1298 }

Also one may notice the critical line that links the varName (which is now C_ prefixed) to the structure sym which actually establishes the link between “C_cov” and the info in the correct env.

So what environment is that, tracing backward a little and we see:

 386         ns <- makeNamespace(package, version = version, lib = package.lib)
...
 418         env <- asNamespace(ns)
 419         ## save the package name in the environment
 420         assign(".packageName", package, envir = env)

So env is the nothing but the namespace!

So let’s list out all the namespaces and try on “stats”:

> loadedNamespaces()
[1] "base"      "datasets"  "graphics"  "grDevices" "methods"   "stats"
[7] "utils"
> ns <- asNamespace("stats")
> class(ns)
[1] "environment"
> grep("cov", ls(ns), value=TRUE)
 [1] "C_cov"            "cov"              "cov2cor"          "covratio"
 [5] "cov.wt"           "vcov"             "vcov.Arima"       "vcov.glm"
 [9] "vcov.lm"          "vcov.mlm"         "vcov.nls"         "vcov.summary.glm"
[13] "vcov.summary.lm"

Indeed we are able to see the “C_cov” symbol and we can also show its class:

> class(ns$C_cov)
[1] "CallRoutine"      "NativeSymbolInfo"

So when we print it it shows exactly the same info we saw earlier:

> ns$C_cov
$name
[1] "cov"

$address
<pointer: 0xe87af0>
attr(,"class")
[1] "RegisteredNativeSymbol"

$dll
DLL name: stats
Filename: /usr/lib/R/library/stats/libs/stats.so
Dynamic lookup: FALSE

$numParameters
[1] 4

attr(,"class")
[1] "CallRoutine"      "NativeSymbolInfo"

Which, not surprisingly is supported by the print method of R class “NativeSymbolInfo”.

And we can even verify by directly apply “nm” on the dso:

nm -C /usr/lib/R/library/stats/libs/stats.so|grep -i cov
00000361c0 t corcov
0000039cc0 t cov

Of course based on how you build R, you might simply get “no symbols” which means symbols has been stripped but code is still in that dso.

Lastly as the nice picture from Advanced R – environmentsshows:
package namespace

Your package “stats” env would be on the search list but the function would have the defining env as the namespace env we just saw.
Let’s do some verification to see if that’s indeed the case:

> search()
[1] ".GlobalEnv"        "package:stats"     "package:graphics"
[4] "package:grDevices" "package:utils"     "package:datasets"
[7] "package:methods"   "Autoloads"         "package:base"
> e <- as.environment("package:stats")
> e
<environment: package:stats>
attr(,"name")
[1] "package:stats"
attr(,"path")
[1] "/usr/lib/R/library/stats"
> environment(e$var)
<environment: namespace:stats>
> identical(environment(e$var), ns)
[1] TRUE

So indeed it links correctly in this way.

Finally a trick to show the symbol once we know it’s in stat package:

> stats::C_cov
Error: 'C_cov' is not an exported object from 'namespace:stats'
> stats:::C_cov
$name
[1] "cov"

$address
<pointer: 0xe87af0>
attr(,"class")
[1] "RegisteredNativeSymbol"

$dll
DLL name: stats
Filename: /usr/lib/R/library/stats/libs/stats.so
Dynamic lookup: FALSE

$numParameters
[1] 4

attr(,"class")
[1] "CallRoutine"      "NativeSymbolInfo"

And the special form “:::” does nothing but call the get() to recursively find the symbol:

> `:::`
function (pkg, name)
{
    pkg <- as.character(substitute(pkg))
    name <- as.character(substitute(name))
    get(name, envir = asNamespace(pkg), inherits = FALSE)
}
<bytecode: 0x15d97f0>
<environment: namespace:base>
> `:::`("stats", "C_cov")
$name
[1] "cov"

$address
<pointer: 0xe87af0>
attr(,"class")
[1] "RegisteredNativeSymbol"

$dll
DLL name: stats
Filename: /usr/lib/R/library/stats/libs/stats.so
Dynamic lookup: FALSE

$numParameters
[1] 4

attr(,"class")
[1] "CallRoutine"      "NativeSymbolInfo"

About codywu2010

a programmer
This entry was posted in Uncategorized. Bookmark the permalink.

1 Response to finding “C_cov”

  1. Adrian Dușa says:

    Thank you very much, this post probably saved me a few hours of work

Leave a comment