Probability functions beginner

On this set of exercises, we are going to explore some of the probability functions in R with practical applications. Basic probability knowledge is required. Note: We are going to use random number functions and random process functions in R such as runif, a problem with these functions is that every time you run them you will obtain a different value. To make your results reproducible you can specify the value of the seed using set.seed(‘any number’) before calling a random function. (If you are not familiar with seeds, think of them as the tracking number of your random numbers). For this set of exercises we will use set.seed(1), don’t forget to specify it before every random exercise. Answers to the exercises are available here If you obtained a different (correct) answer than those listed on the solutions page, please…
Original Post: Probability functions beginner

A Guide to Instagramming with Python for Data Analysis

[unable to retrieve full-text content]I am writing this article to show you the basics of using Instagram in a programmatic way. You can benefit from this if you want to use it in a data analysis, computer vision, or any other cool project you can think of.
Original Post: A Guide to Instagramming with Python for Data Analysis

Tesseract and Magick: High Quality OCR in R

Last week we released an update of the tesseract package to CRAN. This package provides R bindings to Google’s OCR library Tesseract. install.packages(“tesseract”) The new version ships with the latest libtesseract 3.05.01 on Windows and MacOS. Furthermore it includes enhancements for managing language data and using tesseract together with the magick package. Installing Language Data The new version has several improvements for installing additional language data. On Windows and MacOS you use the tesseract_download() function to install additional languages: tesseract_download(“fra”) Language data are now stored in rappdirs::user_data_dir(‘tesseract’) which makes it persist across updates of the package. To OCR french text: french <- tesseract(“fra”) text <- ocr(“https://jeroen.github.io/images/french_text.png”, engine = french) cat(text) Très Bien! Note that on Linux you should not use tesseract_download but instead install languages using apt-get (e.g. tesseract-ocr-fra) or yum (e.g. tesseract-langpack-fra). Tesseract and Magick The tesseract developers recommend…
Original Post: Tesseract and Magick: High Quality OCR in R

Update on Our ‘revisit’ Package

On May 31, I made a post here about our R package revisit, which is designed to help remedy the reproducibility crisis in science. The intended user audience includes reviewers of research manuscripts submitted for publication, scientists who wish to confirm the results in a published paper, and explore alternate analyses, and members of the original research team itself, while collaborating during the course of the research. The package is documented mainly in the README file, but we now also have a paper on arXiv.org, which explains the reproducibility crisis in detail, and how our package addresses it. Reed Davis and I, the authors of the software, are joined in the paper by Prof. Laurel Beckett of the UC Davis Medical School, and Dr. Paul Thompson of Sanford Research. Related To leave a comment for the author, please follow the…
Original Post: Update on Our ‘revisit’ Package

Visualising Water Consumption using a Geographic Bubble Chart

A geographic bubble chart is a straightforward method to visualise quantitative information with a geospatial relationship. Last week I was in Vietnam helping the Phú Thọ Water Supply Joint Stock Company with their data science. They asked me to create … Continue reading → The post Visualising Water Consumption using a Geographic Bubble Chart appeared first on The Devil is in the Data. Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more… If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook…
Original Post: Visualising Water Consumption using a Geographic Bubble Chart

Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R

I often create character variables (i.e. variables with strings of text as their values) in SAS, and they sometimes don’t render as expected.  Here is an example involving the built-in data set SASHELP.CLASS. Here is the code: data c1; set sashelp.class; * define a new character variable to classify someone as tall or short; if height > 60 then height_class = ‘Tall’; else height_class = ‘Short’; run; * print the results for the first 5 rows; proc print data = c1 (obs = 5); run; Here is the result: Alfred M 14 69.0 112.5 Tall Alice F 13 56.5 84.0 Shor Barbara F 13 65.3 98.0 Tall Carol F 14 62.8 102.5 Tall Henry M 14 63.5 102.5 Tall What happened?  Why does the word “Short” render as “Shor”? This occurred because SAS sets the length of a new character…
Original Post: Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R

How to build an image recognizer in R using just a few images

Microsoft Cognitive Services provides several APIs for image recognition, but if you want to build your own recognizer (or create one that works offline), you can use the new Image Featurizer capabilities of Microsoft R Server.  The process of training an image recognition system requires LOTS of images — millions and millions of them. The process involves feeding those images into a deep neural network, and during that process the network generates “features” from the image. These features might be versions of the image including just the outlines, or maybe the image with only the green parts. You could further boil those features down into a single number, say the length of the outline or the percentage of the image that is green. With enough of these “features”, you could use them in a traditional machine learning model to classify…
Original Post: How to build an image recognizer in R using just a few images

How to build an image recognizer in R using just a few images

Microsoft Cognitive Services provides several APIs for image recognition, but if you want to build your own recognizer (or create one that works offline), you can use the new Image Featurizer capabilities of Microsoft R Server.  The process of training an image recognition system requires LOTS of images — millions and millions of them. The process involves feeding those images into a deep neural network, and during that process the network generates “features” from the image. These features might be versions of the image including just the outlines, or maybe the image with only the green parts. You could further boil those features down into a single number, say the length of the outline or the percentage of the image that is green. With enough of these “features”, you could use them in a traditional machine learning model to classify…
Original Post: How to build an image recognizer in R using just a few images