fulltext v1: text-mining scholarly works

The problem Text-mining – the art of answering questions by extracting patterns, data, etc. out of the published literature – is not easy. It’s made incredibly difficult because of publishers. It is a fact that the vast majority of publicly funded research across the globe is published in paywall journals. That is, taxpayers pay twice for research: once for the grant to fund the work, then again to be able to read it. These paywalls mean that every potential person text-mining will have different access: some have access through their university, some may have access through their company, and others may only have access to whatever happens to be open access. On top of that, access for paywall journals often depends on your IP address – something not generally on top of mind for most people. Another hardship with text-mining…
Original Post: fulltext v1: text-mining scholarly works

5 Things I Learned Making a Package to Work with Hydrometric Data in R

One of the best things about learning R is that no matter your skill level, there is always someone who can benefit from your experience. Topics in R ranging from complicated machine learning approaches to calculating a mean all find their relevant audiences. This is particularly true when writing R packages. With an ever evolving R package development landscape (R, GitHub, external data, CRAN, continuous integration, users), there is a strong possibility that you will be taken into regions of the R world that you never knew existed. More experienced developers may not get stuck in these regions and therefore not think to shine a light on them. It is the objective of this post to explore some of those regions in the R world that were highlighted for me when the tidyhydat package was reviewed by rOpenSci. tidyhydat is…
Original Post: 5 Things I Learned Making a Package to Work with Hydrometric Data in R

.rprofile: Karthik Ram

Karthik Ram is a Data Scientist at the Berkeley Institute for Data Science and Berkeley Institute for Global Change Biology. He is a co-founder of rOpenSci, a collective to support the development of R-based tools which facilitate open science and access to open data. In this interview, Karthik and I discuss the birth of rOpenSci, tools and life hacks for staying sane while managing the constant stress of work fires and the importance of saying no. [This interview occurred at the 2017 rOpenSci unconference] KO: What is your name, job title, and how long have you been using R? KR: My name is Karthik Ram I’m a research scientist at the University of California, Berkeley. I’m an ecologist by training but have been working in the ‘data science’ space for 15 years. My real introduction to R was during my…
Original Post: .rprofile: Karthik Ram

Community Call – Writing Packages to Support Research Communities – zoon & greta

Join our Community Call on Tuesday, January 30th (January 31 for our Australian friends) Nick Golding, 2017 rOpenSci Fellow, will talk about two R packages he has developed recently. zoon aims to promote open and reproducible research in ecological modeling by helping researchers share their code in a modular way and produce reproducible research artifacts. Nick has recently been trying to bootstrap a community around this idea and says this is a much harder problem. greta lets you write out and fit statistical models (like Stan or BUGS) but right in R. It uses tensorflow to make models scale to massive data, and is designed to be used and extended by other modeling packages. greta relies on some nice R tricks and lots of thinking about designing APIs for both users and developers. Agenda Welcome (Stefanie Butland, rOpenSci Community Manager,…
Original Post: Community Call – Writing Packages to Support Research Communities – zoon & greta

.rprofile: Jenny Bryan

Jenny Bryan @JennyBryan is a Software Engineer at RStudio and is on leave from being an Associate Professor at the University of British Columbia. Jenny serves in leadership positions with rOpenSci and Forwards and as an Ordinary member of The R Foundation. KO: What is your name, your title, and how many years have you worked in R? JB: I’m Jenny Bryan, I am a software engineer at RStudio (still getting used to that title)., And I am on leave from being an Associate Professor at the University of British Columbia. I’ve been working with R or it’s predecessors since 1996. I switched to R from S in the early 2000s. KO: Why did you make the switch to R from S? JB: It just seemed like the community was switching over to R and I didn’t have a specific…
Original Post: .rprofile: Jenny Bryan

Magick 1.6: clipping, geometries, fonts, fuzz, and a bit of history

This week magick 1.6 appeared on CRAN. This release is a big all-round maintenance update with lots of tweaks and improvements across the package. The NEWS file gives an overview of changes in this version. In this post we highlight some changes. library(magick) stopifnot(packageVersion(‘magick’) >= 1.6) If you are new to magick, check out the vignette for a quick introduction. Perfect Graphics Rendering I have fixed a few small rendering imperfections in the graphics device. The native magick graphics device image_graph() now renders identical or better quality images as the R-base bitmap devices png, jpeg, etc. One issue was that sometimes magick graphics would show a 1px black border around the image. It turned out this is caused by rounding of clipping coordinates. When R calculates clipping area it often ends up at non-whole values. It is then up to…
Original Post: Magick 1.6: clipping, geometries, fonts, fuzz, and a bit of history

Exploratory Data Analysis of Ancient Texts with rperseus

Introduction When I was in grad school at Emory, I had a favorite desk in the library. The desk wasn’t particularly cozy or private, but what it lacked in comfort it made up for in real estate. My books and I needed room to operate. Students of the ancient world require many tools, and when jumping between commentaries, lexicons, and interlinears, additional clutter is additional “friction”, i.e., lapses in thought due to frustration. Technical solutions to this clutter exist, but the best ones are proprietary and expensive. Furthermore, they are somewhat inflexible, and you may have to shoehorn your thoughts into their framework. More friction. Interfacing with the Perseus Digital Library was a popular online alternative. The library includes a catalog of classical texts, a Greek and Latin lexicon, and a word study tool for appearances and references in other…
Original Post: Exploratory Data Analysis of Ancient Texts with rperseus

The Value of Welcome, part 2: How to prepare 40 new community members for an unconference

I’ve raved about the value of extending a personalized welcome to new community members and I recently shared six tips for running a successful hackathon-flavoured unconference. Building on these, I’d like to share the specific approach and (free!) tools I used to help prepare new rOpenSci community members to be productive at our unconference. My approach was inspired directly by my AAAS Community Engagement Fellowship Program (AAAS-CEFP) training. Specifically, 1) one mentor said that the most successful conference they ever ran involved having one-to-one meetings with all participants prior to the event, and 2) prior to our in-person AAAS-CEFP training, we completed an intake questionnaire that forced us to consider things like “what do you hope to get out of this” and “what do you hope to contribute”. A challenge of this year’s unconference was the fact that we were…
Original Post: The Value of Welcome, part 2: How to prepare 40 new community members for an unconference

Announcing a New rOpenSci Software Review Collaboration

rOpenSci is pleased to announce a new collaboration with the Methods and Ecology and Evolution (MEE), a journal of the British Ecological Society, published by Wiley press . Publications destined for MEE that include the development of a scientific R package will now have the option of a joint review process whereby the R package is reviewed by rOpenSci, followed by fast-tracked review of the manuscript by MEE. Authors opting for this process will be recognized via a mark on both web and print versions of their paper. We are very excited for this partnership to improve the rigor of both scientific software and software publications and to provide greater recognition to developers in the fields of ecology and evolution. It is a natural outgrowth of our interest in supporting scientists in developing and maintaining software, and of MEE’s mission…
Original Post: Announcing a New rOpenSci Software Review Collaboration

changes: easy Git-based version control from R

Are you new to version control and always running into trouble with Git?Or are you a seasoned user, haunted by the traumas of learning Git and reliving them whilst trying to teach it to others?Yeah, us too. Git is a version control tool designed for software development, and it is extraordinarily powerful. It didn’t actually dawn on me quite how amazing Git is until I spent a weekend in Melbourne with a group of Git whizzes using Git to write a package targeted toward Git beginners. Whew, talk about total Git immersion! I was taking part in the 2017 rOpenSci ozunconf, in which forty-odd developers, scientists, researchers, nerds, teachers, starving students, cat ladies, and R users of all descriptions form teams to create new R packages fulfilling some new and useful function. Many of the groups used Git for their…
Original Post: changes: easy Git-based version control from R