In this blog post I will discuss missing data imputation and instrumental variables regression. Thisis based on a short presentation I will give at my job. You can find the data used here on thiswebsite: http://eclr.humanities.manchester.ac.uk/index.php/IV_in_R The data is used is from Wooldridge’s book, Econometrics: A modern Approach.You can download the data by clicking here. This is the variable description: 1. inlf =1 if in labor force, 1975 2. hours hours worked, 1975 3. kidslt6 # kids < 6 years 4. kidsge6 # kids 6-18 5. age woman’s age in yrs 6. educ years of schooling 7. wage estimated wage from earns., hours 8. repwage reported wage at interview in 1976 9. hushrs hours worked by husband, 1975 10. husage husband’s age 11. huseduc husband’s years of schooling 12. huswage husband’s hourly wage, 1975 13. faminc family income, 1975 14.…
Original Post: Missing data imputation and instrumental variables regression: the tidy approach
Getting set up If there is one realisation in life, it is the fact that you will never have enough CPU or RAM available for your analytics. Luckily for us, cloud computing is becoming cheaper and cheaper each year. One of the more established providers of cloud services is AWS. If you don’t know yet, they provide a free, yes free, option. Their t2.micro instance is a 1 CPU, 500MB machine, which doesn’t sound like much, but I am running a Rstudio and Docker instance on one of these for a small project. The management console has the following interface: So, how cool would it be if you could start up one of these instances from R? Well, with the cloudyr project it makes R a lot better at interacting with cloud based computing infrastructure. With this in mind, I…
Original Post: Interacting with AWS from R
Fuel tax debates So, there’s currently a vibrant debate on a small New Zealandish corner of Twitter about a petrol tax coming into effect in Auckland today, and the different impacts of such taxes on richer and poorer households. The Government has released analysis from the Stats NZ Household Expenditure Survey showing higher petrol consumption per household for higher income households (and hence paying more of the new tax). Sam Warburton, an economist with the New Zealand Institute, argues in response that poorer households have older, often larger, less efficient vehicles, leading to higher fuel costs per kilometre. This means a fuel tax will not only result in poor people paying more as a percentage of their income (as any sales tax on basic commodities will do), but paying more per kilometre. Further, poor people are more likely to live…
Original Post: Spend on petrol by income by @ellis2013nz
A new RcppArmadillo release 0.8.600.0.0, based on the new Armadillo release 8.600.0 from this week, just arrived on CRAN. It follows our (and Conrad’s) bi-monthly release schedule. We have made interim and release candidate versions available via the GitHub repo (and as usual thoroughly tested them) but this is the real release cycle. A matching Debian release will be prepared in due course. Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language–and is widely used by (currently) 479 other packages on CRAN. A high-level summary of changes follows (which omits the two rc releases leading up to 8.600.0). Conrad did his usual impressive load of upstream changes, but…
Original Post: RcppArmadillo 0.8.600.0.0
This morning I was scrolling through Twitter and noticed Alberto Cairo share this lovely data visualization piece by Adam J. Calhoun about the varying prevalence of punctuation in literature. I thought, “I want to do that!” It also offers me the opportunity to chat about a few of the new options available for tokenizing in tidytext via updates to the tokenizers package.Adam’s original piece explores how punctuation is used in nine novels, including my favorite Pride and Prejudice. Related To leave a comment for the author, please follow the link and comment on their blog: Rstats on Julia Silge. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading)…
Original Post: Punctuation in literature
Another fine illusion: in this one, the pairs of horizontal lines are all smooth sine curves, despite the appearance of the jagged zig-zags: It’s really hard for me at least to tell that the zig-zags in the light grey region are actually curved. Zooming all the way in may help if you want to check for yourself. That’s all from us at the blog for this week. Next week, we’ll be taking a break for the week of the US Independence Day holiday, but we’ll be back the week of Monday July 9, reporting from the useR! conference in Brisbane. In the meantime, have a great weekend!
Original Post: Because it's Friday: Wavy Lines Illusion
The animation below, by Shanghai University professor Guy Abel, shows migration within and between regions of the world from 1960 to 2015. The data and the methodology behind the chart is described in this paper. The curved bars around the outside represent the peak migrant flows for each region; globally, migration peaked during the 2005-2010 period and the declined in 2010-2015, the latest data available. This animated chord chart was created entirely using the R language. The chord plot showing the flows between regions was created using the circlize package; the tweenr package created the smooth transitions between time periods, and the magick package created the animated GIF you see above. You can find a tutorial on making this animation, including the complete R code, at the link below. Guy Abel: Animated Directional Chord Diagrams (via Cal Carrie) Related To leave a…
Original Post: Global Migration, animated with R
The animation below, by Shanghai University professor Guy Abel, shows migration within and between regions of the world from 1960 to 2015. The data and the methodology behind the chart is described in this paper. The curved bars around the outside represent the peak migrant flows for each region; globally, migration peaked during the 2005-2010 period and the declined in 2010-2015, the latest data available. This animated chord chart was created entirely using the R language. The chord plot showing the flows between regions was created using the circlize package; the tweenr package created the smooth transitions between time periods, and the magick package created the animated GIF you see above. You can find a tutorial on making this animation, including the complete R code, at the link below. Guy Abel: Animated Directional Chord Diagrams (via Cal Carrie)
Original Post: Global Migration, animated with R
[unable to retrieve full-text content]KNIME Fall Summit takes place Nov 6-9 in Austin, Texas. Registration is now open, and KDnuggets readers save 10% on top of early bird rates with code KDNUGGETS!
Original Post: KNIME Fall Summit in Austin, November 6-9, 2018 registrations now open!
[unable to retrieve full-text content]This post introduces the prospect of fulfilling the need for a modern graph query language with GSQL
Original Post: Modern Graph Query Language – GSQL