Introducing Maëlle Salmon, rOpenSci’s new Research Software Engineer

We’re very pleased to be introducing someone who needs no introduction in the R community. Join us in welcoming Maëlle Salmon to rOpenSci as a Research Software Engineer (part time, working from Nancy, France). We’d like to formally introduce her here and share a bit about the kinds of things she’ll be working on. Maëlle did a B.Sc. in Biology with an emphasis on maths and quantitative work, two Masters degrees – one in Ecology and one in Public Health – and a Ph.D. in epidemiological statistics at the Ludwig-Maximilian University in Germany. Her thesis dealt with statistical algorithms for aberration detection in time series of counts of reported cases of infectious diseases. Most recently, Maëlle worked as a data manager and statistician for the CHAI project. Maëlle has contributed six packages to rOpenSci to date, and has written about…
Original Post: Introducing Maëlle Salmon, rOpenSci’s new Research Software Engineer

nodbi: the NoSQL Database Connector

DBI What is DBI? DBI is an R package. It defines an interface to relational database management systems (R/DBMS) that other R packages build upon to interact with a specific relational database, such as SQLite or PostgreSQL. NoSQL NoSQL databases are a very broad class of database that can include document databases such as CouchDB and MongoDB, key-value stores such as Redis, and more. They are generally not row-column relational stores though, though can include that. NoSQL is often thought of now as “not only SQL”. You can imagine how it is relatively straight-forward to create a common interace to row-column oriented databases, and DBI is great for that. However, a common interface to NoSQL datbases is a bit harder to wrap your head around for various reasons. One of the most obvious of which is that they don’t share…
Original Post: nodbi: the NoSQL Database Connector

fulltext v1: text-mining scholarly works

The problem Text-mining – the art of answering questions by extracting patterns, data, etc. out of the published literature – is not easy. It’s made incredibly difficult because of publishers. It is a fact that the vast majority of publicly funded research across the globe is published in paywall journals. That is, taxpayers pay twice for research: once for the grant to fund the work, then again to be able to read it. These paywalls mean that every potential person text-mining will have different access: some have access through their university, some may have access through their company, and others may only have access to whatever happens to be open access. On top of that, access for paywall journals often depends on your IP address – something not generally on top of mind for most people. Another hardship with text-mining…
Original Post: fulltext v1: text-mining scholarly works

5 Things I Learned Making a Package to Work with Hydrometric Data in R

One of the best things about learning R is that no matter your skill level, there is always someone who can benefit from your experience. Topics in R ranging from complicated machine learning approaches to calculating a mean all find their relevant audiences. This is particularly true when writing R packages. With an ever evolving R package development landscape (R, GitHub, external data, CRAN, continuous integration, users), there is a strong possibility that you will be taken into regions of the R world that you never knew existed. More experienced developers may not get stuck in these regions and therefore not think to shine a light on them. It is the objective of this post to explore some of those regions in the R world that were highlighted for me when the tidyhydat package was reviewed by rOpenSci. tidyhydat is…
Original Post: 5 Things I Learned Making a Package to Work with Hydrometric Data in R

.rprofile: Karthik Ram

Karthik Ram is a Data Scientist at the Berkeley Institute for Data Science and Berkeley Institute for Global Change Biology. He is a co-founder of rOpenSci, a collective to support the development of R-based tools which facilitate open science and access to open data. In this interview, Karthik and I discuss the birth of rOpenSci, tools and life hacks for staying sane while managing the constant stress of work fires and the importance of saying no. [This interview occurred at the 2017 rOpenSci unconference] KO: What is your name, job title, and how long have you been using R? KR: My name is Karthik Ram I’m a research scientist at the University of California, Berkeley. I’m an ecologist by training but have been working in the ‘data science’ space for 15 years. My real introduction to R was during my…
Original Post: .rprofile: Karthik Ram

Community Call – Writing Packages to Support Research Communities – zoon & greta

Join our Community Call on Tuesday, January 30th (January 31 for our Australian friends) Nick Golding, 2017 rOpenSci Fellow, will talk about two R packages he has developed recently. zoon aims to promote open and reproducible research in ecological modeling by helping researchers share their code in a modular way and produce reproducible research artifacts. Nick has recently been trying to bootstrap a community around this idea and says this is a much harder problem. greta lets you write out and fit statistical models (like Stan or BUGS) but right in R. It uses tensorflow to make models scale to massive data, and is designed to be used and extended by other modeling packages. greta relies on some nice R tricks and lots of thinking about designing APIs for both users and developers. Agenda Welcome (Stefanie Butland, rOpenSci Community Manager,…
Original Post: Community Call – Writing Packages to Support Research Communities – zoon & greta

.rprofile: Jenny Bryan

Jenny Bryan @JennyBryan is a Software Engineer at RStudio and is on leave from being an Associate Professor at the University of British Columbia. Jenny serves in leadership positions with rOpenSci and Forwards and as an Ordinary member of The R Foundation. KO: What is your name, your title, and how many years have you worked in R? JB: I’m Jenny Bryan, I am a software engineer at RStudio (still getting used to that title)., And I am on leave from being an Associate Professor at the University of British Columbia. I’ve been working with R or it’s predecessors since 1996. I switched to R from S in the early 2000s. KO: Why did you make the switch to R from S? JB: It just seemed like the community was switching over to R and I didn’t have a specific…
Original Post: .rprofile: Jenny Bryan

Magick 1.6: clipping, geometries, fonts, fuzz, and a bit of history

This week magick 1.6 appeared on CRAN. This release is a big all-round maintenance update with lots of tweaks and improvements across the package. The NEWS file gives an overview of changes in this version. In this post we highlight some changes. library(magick) stopifnot(packageVersion(‘magick’) >= 1.6) If you are new to magick, check out the vignette for a quick introduction. Perfect Graphics Rendering I have fixed a few small rendering imperfections in the graphics device. The native magick graphics device image_graph() now renders identical or better quality images as the R-base bitmap devices png, jpeg, etc. One issue was that sometimes magick graphics would show a 1px black border around the image. It turned out this is caused by rounding of clipping coordinates. When R calculates clipping area it often ends up at non-whole values. It is then up to…
Original Post: Magick 1.6: clipping, geometries, fonts, fuzz, and a bit of history

Exploratory Data Analysis of Ancient Texts with rperseus

Introduction When I was in grad school at Emory, I had a favorite desk in the library. The desk wasn’t particularly cozy or private, but what it lacked in comfort it made up for in real estate. My books and I needed room to operate. Students of the ancient world require many tools, and when jumping between commentaries, lexicons, and interlinears, additional clutter is additional “friction”, i.e., lapses in thought due to frustration. Technical solutions to this clutter exist, but the best ones are proprietary and expensive. Furthermore, they are somewhat inflexible, and you may have to shoehorn your thoughts into their framework. More friction. Interfacing with the Perseus Digital Library was a popular online alternative. The library includes a catalog of classical texts, a Greek and Latin lexicon, and a word study tool for appearances and references in other…
Original Post: Exploratory Data Analysis of Ancient Texts with rperseus

The Value of Welcome, part 2: How to prepare 40 new community members for an unconference

I’ve raved about the value of extending a personalized welcome to new community members and I recently shared six tips for running a successful hackathon-flavoured unconference. Building on these, I’d like to share the specific approach and (free!) tools I used to help prepare new rOpenSci community members to be productive at our unconference. My approach was inspired directly by my AAAS Community Engagement Fellowship Program (AAAS-CEFP) training. Specifically, 1) one mentor said that the most successful conference they ever ran involved having one-to-one meetings with all participants prior to the event, and 2) prior to our in-person AAAS-CEFP training, we completed an intake questionnaire that forced us to consider things like “what do you hope to get out of this” and “what do you hope to contribute”. A challenge of this year’s unconference was the fact that we were…
Original Post: The Value of Welcome, part 2: How to prepare 40 new community members for an unconference