The Friday #rstats PuzzleR : 2018-01-26

Time for another look at what’s new and interesting in the #rstats world with the help of Peter Meissner’s (@marvin_dpr) crossword.r . The answers to last week’s puzzle have been posted (it seemed to make more sense posting the answers a week later vs the Monday after). There is a dedicated category — puzzler — to make it easier to find these later on, all in one place. That category also has it’s own RSS feed. Related To leave a comment for the author, please follow the link and comment on their blog: R – rud.is. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more… If…
Original Post: The Friday #rstats PuzzleR : 2018-01-26

The Friday #rstats PuzzleR : 2018-01-19

Peter Meissner (@marvin_dpr) released crossword.r to CRAN today. It’s a spiffy package that makes it dead simple to generate crossword puzzles. He also made a super spiffy javascript library to pair with it, which can turn crossword model output into an interactive puzzle. I thought I’d combine those two creations with a way to highlight new/updated packages from the previous week, cool/useful packages in general, and some R functions that might come in handy. Think of it as a weekly way to get some R information while having a bit of fun! This was a quick, rough creation and I’ll be changing the styles a bit for next Friday’s release, but Peter’s package is so easy to use that I have absolutely no excuse to not keep this a regular feature of the blog. I’ll release a static, ggplot2 solution…
Original Post: The Friday #rstats PuzzleR : 2018-01-19

Bitcoin (World Map) Bubbles

We’re doing some interesting studies (cybersecurity-wise, not finance-wise) on digital currency networks at work-work and — while I’m loathe to create a geo-map from IPv4 geolocation data — we: do get (often, woefully inaccurate) latitude & longitude data from our geolocation service (I won’t name-and-shame here); and, there are definite geo-aspects to the prevalence of mining nodes — especially Bitcoin; and, I have been itching to play with the nascent nord palette in a cartographical context… so I went on a small diversion to create a bubble plot of geographical Bitcoin node-prevalence. I tweeted out said image and someone asked if there was code, hence this post. You’ll be able to read about the methodology we used to capture the Bitcoin node data that underpins the map below later this year. For now, all I can say is that wasn’t…
Original Post: Bitcoin (World Map) Bubbles

Can’t Stop at 21: Twitter Recipe #22 — Tying Up Loose Threads

NOTE: The likelihood of this recipe being added to the recent practice bookdown book is slim, but I’ll try to keep the same format for the blog post. Problem You want to collect all the tweets in a Twitter tweet thread Solution Use a few key functions in rtweet to piece the thread elements back together. Discussion In Twitterland, a “thread” is a series of tweets by an author that are in a reply chain to each other which enables them to be displayed sequentially to form a larger & (ostensibly) more cohesive message. Even with the recent 280 character tweet-length increase, threads are still popular and used daily. They’re very easy to distinguish on Twitter but there is no Twitter API call to collect up all the pieces of these threads. Let’s build a function — get_thread() — that…
Original Post: Can’t Stop at 21: Twitter Recipe #22 — Tying Up Loose Threads

A bookdown “Hello World” : Twenty-one (minus two) Recipes for Mining Twitter with rtweet

The new year begins with me being on the hook to crank out a book on advanced web-scraping in R by July (more on that in a future blog post). The bookdown package seemed to be the best way to go about doing this but I had only played with the toy/default examples of it and wanted to test out the platform with a “Hello, World”-like example of a “real” book to iron out issues and avoid more refactoring later on than I know I will have to do. I’ve been on an rtweet kick as of late (I have no idea why) and had an e-copy of O’Reilly’s 21 Recipes for Mining Twitter in the their synced Dropbox folder (it was a free giveaway a few years ago) and decided to make an rtweet version of it in a…
Original Post: A bookdown “Hello World” : Twenty-one (minus two) Recipes for Mining Twitter with rtweet

R⁶ — Capture Tweets with tweet_shot()

(You can find all R⁶ posts here) A Twitter discussion: I’m going to keep my eyes out for this one! Would love to have an easy way to embed tweets in Rmd talks! — Jeff Hollister (@jhollist) December 30, 2017 that spawned from Maëlle’s recent look-back post turned into a quick function for capturing an image of a Tweet/thread using webshot, rtweet, magick and glue. Pass in a status id or a twitter URL and the function will grab an image of the mobile version of the tweet. The ultimate goal is to make a function that builds a tweet using only R and magick. This will have to do until the new year. tweet_shot <- function(statusid_or_url, zoom=3) { require(glue, quietly=TRUE) require(rtweet, quietly=TRUE) require(magick, quietly=TRUE) require(webshot, quietly=TRUE) x <- statusid_or_url[1] is_url <- grepl(“^http[s]://”, x) if (is_url) { is_twitter <- grepl(“twitter”,…
Original Post: R⁶ — Capture Tweets with tweet_shot()

2017. Quantified. In. R.

2017 is nearly at an end. We humans seem to need these cycles to help us on our path forward and have, throughout history, used these annual demarcation points as a time of reflection of what was, what is an what shall come next. To that end, I decided it was about time to help quantify a part of the soon-to-be previous annum in R through the fabrication of a reusable template. Said template contains various incantations that will enable the wielder to enumerate their social contributions on: StackOveflow GitHub Twitter WordPress through the use of a parameterized R markdown document. The result of one such execution can be found here (for those who want a glimpse of what I was publicly up to in 2017). Want to see where you contributed the most on SO? There’s a vis for…
Original Post: 2017. Quantified. In. R.

New Package swatches is Now on CRAN

It’s been a long time coming, but swatches is now on CRAN. What is “swatches”? First off, swatches has nothing to do with those faux-luxury brand Swiss-made timepieces. swatches is all about color. R/CRAN has plenty of color picking packages. The colourlovers by @thosjleeper is one of my favs. But, color palettes have been around for ages. Adobe has two: Adobe Color (ACO) and Adobe Swatch Exchange (ASE); GIMP has “GPL”; OpenOffice has “SOC” and KDE has the unimaginative “colors”. So. Many. Formats. Wouldn’t it be great if there were a package that read them all in with a simple read_palette() function? Well, now there is. I threw together a fledgling version of swatches a few years ago to read in ACO files from a $DAYJOB at the time and it cascaded from there. I decided to resurrect it and…
Original Post: New Package swatches is Now on CRAN

R⁶ Series — Random Sampling From Apache Drill Tables With R & sergeant

(For first-timers, R⁶ tagged posts are short & sweet with minimal expository; R⁶ feed) At work-work I mostly deal with medium-to-large-ish data. I often want to poke at new or existing data sets w/o working across billions of rows. I also use Apache Drill for much of my exploratory work. Here’s how to uniformly sample data from Apache Drill using the sergeant package: library(sergeant) db <- src_drill(“sonar”) tbl <- tbl(db, “dfs.dns.aaaa.parquet“) summarise(tbl, n=n()) ## # Source: lazy query [?? x 1] ## # Database: DrillConnection ## n ## ## 1 19977415 mutate(tbl, r=rand()) %>% filter(r <= 0.01) %>% summarise(n=n()) ## # Source: lazy query [?? x 1] ## # Database: DrillConnection ## n ## ## 1 199808 mutate(tbl, r=rand()) %>% filter(r <= 0.50) %>% summarise(n=n()) ## # Source: lazy query [?? x 1] ## # Database: DrillConnection ## n ##…
Original Post: R⁶ Series — Random Sampling From Apache Drill Tables With R & sergeant

mqtt Development Log : On DSLs, Rcpp Modules and Custom Formula Functions

I know some folks had a bit of fun with the previous post since it exposed the fact that I left out unique MQTT client id generation from the initial 0.1.0 release of the in-development package (client ids need to be unique). There have been some serious improvements since said post and I thought a (hopefully not-too-frequent) blog-journal of the development of this particular package might be interesting/useful to some folks, especially since I’m delving into some not-oft-blogged (anywhere) topics as I use some new tricks in this particular package. Thank The Great Maker for C++ I’m comfortable and not-too-shabby at wrapping C/C++ things with an R bow and I felt quite daft seeing this after I had started banging on the mosquitto C interface. Yep, that’s right: it has a C++ interface. It’s waaaaay easier (in my experience) bridging…
Original Post: mqtt Development Log : On DSLs, Rcpp Modules and Custom Formula Functions