The Friday #rstats PuzzleR : 2018-01-19

Peter Meissner (@marvin_dpr) released crossword.r to CRAN today. It’s a spiffy package that makes it dead simple to generate crossword puzzles. He also made a super spiffy javascript library to pair with it, which can turn crossword model output into an interactive puzzle. I thought I’d combine those two creations with a way to highlight new/updated packages from the previous week, cool/useful packages in general, and some R functions that might come in handy. Think of it as a weekly way to get some R information while having a bit of fun! This was a quick, rough creation and I’ll be changing the styles a bit for next Friday’s release, but Peter’s package is so easy to use that I have absolutely no excuse to not keep this a regular feature of the blog. I’ll release a static, ggplot2 solution…
Original Post: The Friday #rstats PuzzleR : 2018-01-19

Bitcoin (World Map) Bubbles

We’re doing some interesting studies (cybersecurity-wise, not finance-wise) on digital currency networks at work-work and — while I’m loathe to create a geo-map from IPv4 geolocation data — we: do get (often, woefully inaccurate) latitude & longitude data from our geolocation service (I won’t name-and-shame here); and, there are definite geo-aspects to the prevalence of mining nodes — especially Bitcoin; and, I have been itching to play with the nascent nord palette in a cartographical context… so I went on a small diversion to create a bubble plot of geographical Bitcoin node-prevalence. I tweeted out said image and someone asked if there was code, hence this post. You’ll be able to read about the methodology we used to capture the Bitcoin node data that underpins the map below later this year. For now, all I can say is that wasn’t…
Original Post: Bitcoin (World Map) Bubbles

Can’t Stop at 21: Twitter Recipe #22 — Tying Up Loose Threads

NOTE: The likelihood of this recipe being added to the recent practice bookdown book is slim, but I’ll try to keep the same format for the blog post. Problem You want to collect all the tweets in a Twitter tweet thread Solution Use a few key functions in rtweet to piece the thread elements back together. Discussion In Twitterland, a “thread” is a series of tweets by an author that are in a reply chain to each other which enables them to be displayed sequentially to form a larger & (ostensibly) more cohesive message. Even with the recent 280 character tweet-length increase, threads are still popular and used daily. They’re very easy to distinguish on Twitter but there is no Twitter API call to collect up all the pieces of these threads. Let’s build a function — get_thread() — that…
Original Post: Can’t Stop at 21: Twitter Recipe #22 — Tying Up Loose Threads

A bookdown “Hello World” : Twenty-one (minus two) Recipes for Mining Twitter with rtweet

The new year begins with me being on the hook to crank out a book on advanced web-scraping in R by July (more on that in a future blog post). The bookdown package seemed to be the best way to go about doing this but I had only played with the toy/default examples of it and wanted to test out the platform with a “Hello, World”-like example of a “real” book to iron out issues and avoid more refactoring later on than I know I will have to do. I’ve been on an rtweet kick as of late (I have no idea why) and had an e-copy of O’Reilly’s 21 Recipes for Mining Twitter in the their synced Dropbox folder (it was a free giveaway a few years ago) and decided to make an rtweet version of it in a…
Original Post: A bookdown “Hello World” : Twenty-one (minus two) Recipes for Mining Twitter with rtweet

R⁶ — Capture Tweets with tweet_shot()

(You can find all R⁶ posts here) A Twitter discussion: I’m going to keep my eyes out for this one! Would love to have an easy way to embed tweets in Rmd talks! — Jeff Hollister (@jhollist) December 30, 2017 that spawned from Maëlle’s recent look-back post turned into a quick function for capturing an image of a Tweet/thread using webshot, rtweet, magick and glue. Pass in a status id or a twitter URL and the function will grab an image of the mobile version of the tweet. The ultimate goal is to make a function that builds a tweet using only R and magick. This will have to do until the new year. tweet_shot <- function(statusid_or_url, zoom=3) { require(glue, quietly=TRUE) require(rtweet, quietly=TRUE) require(magick, quietly=TRUE) require(webshot, quietly=TRUE) x <- statusid_or_url[1] is_url <- grepl(“^http[s]://”, x) if (is_url) { is_twitter <- grepl(“twitter”,…
Original Post: R⁶ — Capture Tweets with tweet_shot()

2017. Quantified. In. R.

2017 is nearly at an end. We humans seem to need these cycles to help us on our path forward and have, throughout history, used these annual demarcation points as a time of reflection of what was, what is an what shall come next. To that end, I decided it was about time to help quantify a part of the soon-to-be previous annum in R through the fabrication of a reusable template. Said template contains various incantations that will enable the wielder to enumerate their social contributions on: StackOveflow GitHub Twitter WordPress through the use of a parameterized R markdown document. The result of one such execution can be found here (for those who want a glimpse of what I was publicly up to in 2017). Want to see where you contributed the most on SO? There’s a vis for…
Original Post: 2017. Quantified. In. R.

New Package swatches is Now on CRAN

It’s been a long time coming, but swatches is now on CRAN. What is “swatches”? First off, swatches has nothing to do with those faux-luxury brand Swiss-made timepieces. swatches is all about color. R/CRAN has plenty of color picking packages. The colourlovers by @thosjleeper is one of my favs. But, color palettes have been around for ages. Adobe has two: Adobe Color (ACO) and Adobe Swatch Exchange (ASE); GIMP has “GPL”; OpenOffice has “SOC” and KDE has the unimaginative “colors”. So. Many. Formats. Wouldn’t it be great if there were a package that read them all in with a simple read_palette() function? Well, now there is. I threw together a fledgling version of swatches a few years ago to read in ACO files from a $DAYJOB at the time and it cascaded from there. I decided to resurrect it and…
Original Post: New Package swatches is Now on CRAN

R⁶ Series — Random Sampling From Apache Drill Tables With R & sergeant

(For first-timers, R⁶ tagged posts are short & sweet with minimal expository; R⁶ feed) At work-work I mostly deal with medium-to-large-ish data. I often want to poke at new or existing data sets w/o working across billions of rows. I also use Apache Drill for much of my exploratory work. Here’s how to uniformly sample data from Apache Drill using the sergeant package: library(sergeant) db <- src_drill(“sonar”) tbl <- tbl(db, “dfs.dns.aaaa.parquet“) summarise(tbl, n=n()) ## # Source: lazy query [?? x 1] ## # Database: DrillConnection ## n ## ## 1 19977415 mutate(tbl, r=rand()) %>% filter(r <= 0.01) %>% summarise(n=n()) ## # Source: lazy query [?? x 1] ## # Database: DrillConnection ## n ## ## 1 199808 mutate(tbl, r=rand()) %>% filter(r <= 0.50) %>% summarise(n=n()) ## # Source: lazy query [?? x 1] ## # Database: DrillConnection ## n ##…
Original Post: R⁶ Series — Random Sampling From Apache Drill Tables With R & sergeant

mqtt Development Log : On DSLs, Rcpp Modules and Custom Formula Functions

I know some folks had a bit of fun with the previous post since it exposed the fact that I left out unique MQTT client id generation from the initial 0.1.0 release of the in-development package (client ids need to be unique). There have been some serious improvements since said post and I thought a (hopefully not-too-frequent) blog-journal of the development of this particular package might be interesting/useful to some folks, especially since I’m delving into some not-oft-blogged (anywhere) topics as I use some new tricks in this particular package. Thank The Great Maker for C++ I’m comfortable and not-too-shabby at wrapping C/C++ things with an R bow and I felt quite daft seeing this after I had started banging on the mosquitto C interface. Yep, that’s right: it has a C++ interface. It’s waaaaay easier (in my experience) bridging…
Original Post: mqtt Development Log : On DSLs, Rcpp Modules and Custom Formula Functions

Inter-operate with ‘MQTT’ Message Brokers With R (a.k.a. Live! BBC! Subtitles!)

Most of us see the internet through the lens of browsers and apps on our laptops, desktops, watches, TVs and mobile devices. These displays are showing us — for the most part — content designed for human consumption. Sure, apps handle API interactions, but even most of that communication happens over ports 80 or 443. But, there are lots of ports out there; 0:65535, in fact (at least TCP-wise). And, all of them have some kind of data, and most of that is still targeted to something for us. What if I told you the machines are also talking to each other using a thin/efficient protocol that allows one, tiny sensor to talk to hundreds — if not thousands — of systems without even a drop of silicon-laced sweat? How can a mere, constrained sensor do that? Well, it doesn’t…
Original Post: Inter-operate with ‘MQTT’ Message Brokers With R (a.k.a. Live! BBC! Subtitles!)