The world’s first protein database for Machine Learning and AI

[unable to retrieve full-text content]dSPP is the world first interactive database of proteins for AI and Machine Learning, and is fully integrated with Keras and Tensorflow. You can access the database at peptone.io/dspp
Original Post: The world’s first protein database for Machine Learning and AI

Taxonomy of Methods for Deep Meta Learning

[unable to retrieve full-text content]This post discusses a variety of contemporary Deep Meta Learning methods, in which meta-data is manipulated to generate simulated architectures. Current meta-learning capabilities involve either support for search for architectures or networks inside networks.
Original Post: Taxonomy of Methods for Deep Meta Learning

Golden State Warriors Analytics Exercise

[unable to retrieve full-text content]This post outlines a data analysis exercise undertaken by students in a recent University of San Francisco MBA class, in which they were forced to make difficult data science trade-offs between gathering data, preparing the data and performing the actual analysis.
Original Post: Golden State Warriors Analytics Exercise

nanotime 0.2.0

A new version of the nanotime package for working with nanosecond timestamps just arrived on CRAN. nanotime uses the RcppCCTZ package for (efficient) high(er) resolution time parsing and formatting up to nanosecond resolution, and the bit64 package for the actual integer64 arithmetic. Thanks to a metric ton of work by Leonardo Silvestri, the package now uses S4 classes internally allowing for greater consistency of operations on nanotime objects. Changes in version 0.2.0 (2017-06-22) Rewritten in S4 to provide more robust operations (#17 by Leonardo) Ensure tz=”” is treated as unset (Leonardo in #20) Added format and tz arguments to nanotime, format, print (#22 by Leonardo and Dirk) Ensure printing respect options()$max.print, ensure names are kept with vector (#23 by Leonardo) Correct summary() by defining names (Leonardo in #25 fixing #24) Report error on operations that are meaningful for type;…
Original Post: nanotime 0.2.0

All the fake data that’s fit to print

charlatan makes fake data.Excited to annonunce a new package called charlatan. While perusingpackages from other programming languages, I saw a neat Python librarycalled faker. charlatan is inspired from and ports many things from Python’shttps://github.com/joke2k/faker library. In turn, faker was inspired fromPHP’s faker,Perl’s Faker, andRuby’s faker. It appears that the PHPlibrary was the original – nice work PHP. Use cases What could you do with this package? Here’s some use cases: Students in a classroom setting learning any task that needs a dataset. People doing simulations/modeling that need some fake data Generate fake dataset of users for a database before actual users exist Complete missing spots in a dataset Generate fake data to replace sensitive real data with before public release Create a random set of colors for visualization Generate random coordinates for a map Get a set of randomly…
Original Post: All the fake data that’s fit to print

Plotting partial pooling in mixed-effects models

In this post, I demonstrate a few techniques for plotting information from arelatively simple mixed-effects model fit in R. These plots can help us developintuitions about what these models are doing and what “partial pooling” means. The sleepstudy dataset For these examples, I’m going to use the sleepstudy dataset from the lme4package. The outcome measure is reaction time, the predictor measure is days ofsleep deprivation, and these measurements are nested within participants—wehave 10 observations per participant. I am also going to add two fakeparticipants with incomplete data to illustrate partial pooling. library(lme4) #> Loading required package: Matrix #> Loading required package: methods library(dplyr) library(tibble) # Convert to tibble for better printing. Convert factors to strings sleepstudy sleepstudy %>% as_tibble() %>% mutate(Subject = as.character(Subject)) # Add two fake participants df_sleep bind_rows( sleepstudy, data_frame(Reaction = c(286, 288), Days = 0:1, Subject…
Original Post: Plotting partial pooling in mixed-effects models

RcppCCTZ 0.2.3 (and 0.2.2)

A new minor version 0.2.3 of RcppCCTZ is now on CRAN. RcppCCTZ uses Rcpp to bring CCTZ to R. CCTZ is a C++ library for translating between absolute and civil times using the rules of a time zone. In fact, it is two libraries. One for dealing with civil time: human-readable dates and times, and one for converting between between absolute and civil times via time zones. The RcppCCTZ page has a few usage examples and details. This version ensures that we set the TZDIR environment variable correctly on the old dreaded OS that does not come with proper timezone information—an issue which had come up while preparing the next (and awesome, trust me) release of nanotime. It also appears that I failed to blog about 0.2.2, another maintenance release, so changes for both are summarised next. Changes in…
Original Post: RcppCCTZ 0.2.3 (and 0.2.2)

Large integers in R: Fibonacci number with 1000 digits, Euler Problem 25

The Fibonacci Sequence occurs in nature: The nautilus shell. Euler Problem 25 takes us back to the Fibonacci sequence and the problems related to working with very large integers. The Fibonacci sequence follows a simple mathematical rule but it can create things of great beauty. This pattern occurs quite often in nature, like to nautilus shell shown in the image. The video by Arthur Benjamin at the end of this post illustrates some of the magic of this sequence. Large Integers in R By default, numbers with more than 7 digits are shown in scientific notation in R, which reduces the accuracy of the calculation. You can change the precision of large integers with the options function but R struggles with integers with more than 22 digits. This example illustrates this issue. factorial(24) [1] 6.204484e+23 > options(digits=22) > factorial(24) [1] 620448401733239409999872 …
Original Post: Large integers in R: Fibonacci number with 1000 digits, Euler Problem 25

Top KDnuggets tweets, Jun 14-20: 5 EBooks to Read Before Getting into A Data Science or Big Data Career

[unable to retrieve full-text content]Also 10 Free Must-Read Books for #MachineLearning and #DataScience; #Keras implementation of a simple Neural Net module for relational reasoning; Applying #deeplearning to real-world problems
Original Post: Top KDnuggets tweets, Jun 14-20: 5 EBooks to Read Before Getting into A Data Science or Big Data Career