buf_read() (and therefore sbuf_read()) is hardened in three
ways without changing its int *len ABI:
INT_MAX - 2 bytes are now rejected with -1
instead of silently truncating off_t -> int and producing a
negative *len that the rest of the function then misused as a
buffer size. Verified: a 3 GiB sparse file used to SIGSEGV in
buf_read at (*buf)[real_size] = 0; after the fix it returns
-1 cleanly.read()'s ssize_t return is now stored in ssize_t, not
size_t. On a read() error (returning -1), the previous
code would index (*buf)[SIZE_MAX] — heap corruption. The new
code frees the buffer and returns -1.if (fd <= 0) -> if (fd < 0) so that fd 0 is treated as a
valid descriptor (it is the stdin slot, returned by open() if
stdin has been closed).d_pass(D_Parser*, D_ParseNode*, int pass_number) now rejects
negative pass_number values explicitly. Previously the bound check
was upper-only (pass_number >= npasses), so a negative argument
fell through to &p->t->passes[pass_number] and a subsequent
pp->kind read landed before the passes[] array — typically a
SIGSEGV, occasionally a silent garbage dispatch. The fix adds the
matching lower bound; the function now calls Rf_error("bad pass number: %d") for any negative argument.
udparse() (and therefore the dparse() legacy wrapper) now returns
NULL when called with a NULL buffer. Previously a NULL buf
was forwarded into exhaustive_parse, where the scanner dereferenced
it (*s inside white_space) and crashed. The new guard at the top
of udparse short-circuits the call before any state is touched.
new_D_Parser() now rejects a negative sizeof_ParseNode_User.
Previously a negative argument was stored verbatim, then the per-
PNode allocation in make_PNode
uint l = sizeof(PNode) - sizeof(d_voidp) + sizeof_user_parse_node
underflowed to a small value, and the rest of make_PNode wrote past
the resulting tiny buffer. valgrind on the pre-fix build reported
281 invalid writes of size 8, all rooted in make_PNode (parse.c:965).
After the fix, new_D_Parser calls Rf_error with a clear message;
valgrind reports 0 errors.
dparse_sexp() (the entry point used by every per-grammar
dparse_<gram>() wrapper) now reads its input file through
buf_read() instead of sbuf_read() + strlen(). Two bugs are
fixed:
sbuf_read() returned
NULL, and the subsequent strlen(NULL) was an immediate
SIGSEGV. The new path raises
Rf_error("could not read grammar input file: '<path>'").strlen() (which returns size_t) was
narrowed to the int buf_len parameter of legacy dparse(),
silently truncating to a negative value. The new path uses the
length returned by buf_read() directly and forwards it to
udparse() (full unsigned int range). Once buf_read() is
itself promoted to size_t (separate fix), the path will safely
handle inputs up to UINT_MAX.Add udparse(D_Parser*, char *buf, unsigned int buf_len) as a
memory-safe alternative to dparse(D_Parser*, char *buf, int buf_len).
Existing callers of dparse still compile and link unchanged; the
ABI of dparse is preserved. Internally, dparse now rejects
negative buf_len (returns NULL) and forwards positive values to
udparse, which holds the actual parser implementation.
Previously, callers had to cast strlen(buf) to int, which
silently truncated to a wrong (often negative) value when the input
exceeded INT_MAX bytes; that negative length then propagated into
p->end = buf + buf_len, making the parser read random memory before
the buffer.
New code should call udparse(p, buf, (unsigned int)strlen(buf)) to
avoid the cast and safely accept inputs up to UINT_MAX bytes. Both
functions are exported and registered via R_RegisterCCallable and
re-declared in the consumer headers (src/dparse.h, src/dparser.h,
src/dparser2.h, src/dparserPtr.h).
Makevars header order, strict options so that dparser
will compile on older versions of R (#19, #20, #22)Changed language access to not use SET_TYPEOF (as required by CRAN)
Changed compilation to use strict headers, as requested by CRAN.
Changed interface so that functional changes will not cause segmentation faults when other libraries are not recompiled against this library (removed binary linkage). However changes to the dparser parsing C structures will likely cause a segmentation fault. Since the structures have not changed very much over time, but CRAN requests small changes to the functions frequently, this will probably be sufficient for most cases.
Changed gram.c to handle NULL strings without printing them (as requested by CRAN)
Changed util.c to avoid security warnings for error/warnings in R (as requested by CRAN)
Parsing errors during dparser() evaluation now give the line number for the error.
dparser2.h that declares functions instead of defines them.Initialized version string to zero length string to fix valgrind issues
Change flags to suppress false positive memory leaks (might be lost errors)
sprintf as indicated by some new Mac M1 checks for rxode2.system() instead of do.call("system",...)stderr and stdoutUpdated R dparser to use the more recent version of dparser
Applied custom changes to fix un-sanitized behavior
Added a NEWS.md file to track changes to the package.