Sales Analytics: How to Use Machine Learning to Predict and Optimize Product Backorders

Sales, customer service, supply chain and logistics, manufacturing… no matter which department you’re in, you more than likely care about backorders. Backorders are products that are temporarily out of stock, but a customer is permitted to place an order against future inventory. Back orders are both good and bad: Strong demand can drive back orders, but so can suboptimal planning. The problem is when a product is not immediately available, customers may not have the luxury or patience to wait. This translates into lost sales and low customer satisfaction. The good news is that machine learning (ML) can be used to identify products at risk of backorders. In this article we use the new H2O automated ML algorithm to implement Kaggle-quality predictions on the Kaggle dataset, “Can You Predict Product Backorders?”. This is an advanced tutorial, which can be difficult…
Original Post: Sales Analytics: How to Use Machine Learning to Predict and Optimize Product Backorders

It’s tibbletime v0.0.2: Time-Aware Tibbles, New Functions, Weather Analysis and More

Today we are introducing tibbletime v0.0.2, and we’ve got a ton of new features in store for you. We have functions for converting to flexible time periods with the ~period formula~ and making/calculating custom rolling functions with rollify() (plus a bunch more new functionality!). We’ll take the new functionality for a spin with some weather data (from the weatherData package). However, the new tools make tibbletime useful in a number of broad applications such as forecasting, financial analysis, business analysis and more! We truly view tibbletime as the next phase of time series analysis in the tidyverse. If you like what we do, please connect with us on social media to stay up on the latest Business Science news, events and information! Introduction We are excited to announce the release of tibbletime v0.0.2 on CRAN. Loads of newfunctionality have been…
Original Post: It’s tibbletime v0.0.2: Time-Aware Tibbles, New Functions, Weather Analysis and More

HR Analytics: Using Machine Learning to Predict Employee Turnover

Employee turnvover (attrition) is a major cost to an organization, and predicting turnover is at the forefront of needs of Human Resources (HR) in many organizations. Until now the mainstream approach has been to use logistic regression or survival curves to model employee attrition. However, with advancements in machine learning (ML), we can now get both better predictive performance and better explanations of what critical features are linked to employee attrition. In this post, we’ll use two cutting edge techniques. First, we’ll use the h2o package’s new FREE automatic machine learning algorithm, h2o.automl(), to develop a predictive model that is in the same ballpark as commercial products in terms of ML accuracy. Then we’ll use the new lime package that enables breakdown of complex, black-box machine learning models into variable importance plots. We can’t stress how excited we are to…
Original Post: HR Analytics: Using Machine Learning to Predict Employee Turnover

It’s tibbletime: Time-Aware Tibbles

We are very excited to announce the initial release of our newest R package,tibbletime. As evident from the name, tibbletime is built on top of thetibble package (and more generally on top of the tidyverse) with the mainpurpose of being able to create time-aware tibbles through a one-timespecification of an “index” column (a column containing timestamp information). There are a ton of useful time functions that we can now use such as time_filter(), time_summarize(), tmap(), as_period() and time_collapse(). We’ll walk through the basics in this post. If you like what we do, please follow us on social media to stay up on the latest Business Science news, events and information! As always, we are interested in both expanding our network of data scientists and seeking new clients interested in applying data science to business and finance. If interested, contact us.…
Original Post: It’s tibbletime: Time-Aware Tibbles

alphavantager: An R interface to the Free Alpha Vantage Financial Data API

We’re excited to announce the alphavantager package, a lightweight R interface to the Alpha Vantage API! Alpha Vantage is a FREE API for retreiving real-time and historical financial data. It’s very easy to use, and, with the recent glitch with the Yahoo Finance API, Alpha Vantage is a solid alternative for retrieving financial data for FREE! It’s definitely worth checking out if you are interested in financial analysis. We’ll go through the alphavantager R interface in this post to show you how easy it is to get real-time and historical financial data. In the near future, we have plans to incorporate the alphavantager into tidyquant to enable scaling from one equity to many. If you like what you read, please follow us on social media to stay up on the latest Business Science news, events and information! As always, we…
Original Post: alphavantager: An R interface to the Free Alpha Vantage Financial Data API

Tidy Time Series Analysis, Part 4: Lags and Autocorrelation

In the fourth part in a series on Tidy Time Series Analysis, we’ll investigate lags and autocorrelation, which are useful in understanding seasonality and form the basis for autoregressive forecast models such as AR, ARMA, ARIMA, SARIMA (basically any forecast model with “AR” in the acronym). We’ll use the tidyquant package along with our tidyverse downloads data obtained from cranlogs. The focus of this post is using lag.xts(), a function capable of returning multiple lags from a xts object, to investigate autocorrelation in lags among the daily tidyverse package downloads. When using lag.xts() with tq_mutate() we can scale to multiple groups (different tidyverse packages in our case). If you like what you read, please follow us on social media to stay up on the latest Business Science news, events and information! As always, we are interested in both expanding our…
Original Post: Tidy Time Series Analysis, Part 4: Lags and Autocorrelation

Tidy Time Series Analysis, Part 3: The Rolling Correlation

In the third part in a series on Tidy Time Series Analysis, we’ll use the runCor function from TTR to investigate rolling (dynamic) correlations. We’ll again use tidyquant to investigate CRAN downloads. This time we’ll also get some help from the corrr package to investigate correlations over specific timespans, and the cowplot package for multi-plot visualizations. We’ll end by reviewing the changes in rolling correlations to show how to detect events and shifts in trend. If you like what you read, please follow us on social media to stay up on the latest Business Science news, events and information! As always, we are interested in both expanding our network of data scientists and seeking new clients interested in applying data science to business and finance. If interested, contact us. If you haven’t checked out the previous two tidy time…
Original Post: Tidy Time Series Analysis, Part 3: The Rolling Correlation

BizSci Package Updates: Formerly timekit… Now timetk :)

We have several announcements regarding Business Science R packages. First, as of this week the R package formerly known as timekit has changed to timetk for time series tool kit. There are a few “breaking” changes because of the name change, and this is discussed further below. Second, the sweep and tidyquant packages have several improvements, which are discussed in detail below. Finally, don’t miss a beat on future news, events and information by following us on social media. The timetk package (formerly timekit) is a relatively new package that is aimed at assisting users with working with time series in R. It helps users switch back and forth between time based “tibbles” (tidy data frames with dates or date times) and the other time series objects in R (xts, zoo, ts, etc). Equally important, timetk includes functions that…
Original Post: BizSci Package Updates: Formerly timekit… Now timetk 🙂

Tidy Time Series Analysis, Part 2: Rolling Functions

In the second part in a series on Tidy Time Series Analysis, we’ll again use tidyquant to investigate CRAN downloads this time focusing on Rolling Functions. If you haven’t checked out the previous post on period apply functions, you may want to review it to get up to speed. Both zoo and TTR have a number of “roll” and “run” functions, respectively, that are integrated with tidyquant. In this post, we’ll focus on the rollapply function from zoo because of its flexibility with applying custom functions across rolling windows. If you like what you read, please follow us on social media to stay up on the latest Business Science news, events and information! As always, we are interested in both expanding our network of data scientists and seeking new clients interested in applying data science to business and finance.…
Original Post: Tidy Time Series Analysis, Part 2: Rolling Functions

sweep: Extending broom for time series forecasting

We’re pleased to introduce a new package, sweep, now on CRAN! Think of it like broom for the forecast package. The forecast package is the most popular package for forecasting, and for good reason: it has a number of sophisticated forecast modeling functions. There’s one problem: forecast is based on the ts system, which makes it difficult work within the tidyverse. This is where sweep fits in! The sweep package has tidiers that convert the output from forecast modeling and forecasting functions to “tidy” data frames. We’ll go through a quick introduction to show how the tidiers can be used, and then show a fun example of forecasting GDP trends of US states. If you’re familiar with broom it will feel like second nature. If you like what you read, don’t forget to follow us on social media to…
Original Post: sweep: Extending broom for time series forecasting