R, the master troll of statistical languages

Warning: what follows is a somewhat technical discussion of my love-hate relationship with the R statistical language, in which I somehow manage to waste 2,400 words talking about a single line of code. Reader discretion is advised.

I’ve been using R to do most of my statistical analysis for about 7 or 8 years now–ever since I was a newbie grad student and one of the senior grad students in my lab introduced me to it. Despite having spent hundreds (thousands?) of hours in R, I have to confess that I’ve never set aside much time to really learn it very well; what basic competence I’ve developed has been acquired almost entirely by reading the inline help and consulting the Oracle of ~~Bacon~~ Google when I run into problems. I’m not very good at setting aside time for reading articles or books or working my way through other people’s code (probably the best way to learn), so the net result is that I don’t know R nearly as well as I should.

That said, if I’ve learned one thing about R, it’s that R is all about flexibility: almost any task can be accomplished in a dozen different ways. I don’t mean that in the trivial sense that pretty much anyÂ substantive programming problem can be solved in any number of ways in just about any language; I mean that for even very simple and well-defined tasks involving just one or two lines of code there are often many different approaches.

To illustrate, consider the simple task of selecting a column from a data frame (data frames in R are basically just fancy tables). Suppose you have a dataset that looks like this:

In most languages, there would be one standard way of pulling columns out of this table. Just one unambiguous way: if you don’t know it, you won’t be able to work with data at all, so odds are you’re going to learn it pretty quickly. R doesn’t work that way. In R there are many ways to do almost everything, including selecting a column from a data frame (one of the most basic operations imaginable!). Here are four of them:

I won’t bother to explain all of these; the point is that, as you can see, they all return the same result (namely, the first column of the ice.cream data frame, named ‘flavor’).

This type of flexibility enables incredibly powerful, terse code once you know R reasonably well; unfortunately, it also makes for an extremely steep learning curve. You might wonder why that would be–after all, at its core, R still lets you do things the way most other languages do them. In the above example, you don’t have to use anything other than the simple index-based approach (i.e., data[,1]), which is the way most other languages that have some kind of data table or matrix object (e.g., MATLAB, Python/NumPy, etc.) would prefer you to do it. So why should the extra flexibility present any problems?

The answer is that when you’re trying to learn a new programming language, you typically do it in large part by reading other people’s code–and nothing is more frustrating to a newbie when learning a language than trying to figure out why sometimes people select columns in a data frame by index and other times they select them by name, or why sometimes people refer to named properties with a dollar sign and other times they wrap them in a vector or double square brackets. There are good reasons to have all of these different idioms, but you wouldn’t know that if you’re new to R and your expectation, quite reasonably, is that if two expressions look very different, they should do very different things. The flexibility that experienced R users love is very confusing to a newcomer. Most other languages don’t have that problem, because there’s only one way to do everything (or at least, far fewer ways than in R).

Thankfully, I’m long past the point where R syntax is perpetually confusing. I’m now well into the phase where it’s only frequently confusing, and I even have high hopes of one day making it to the point where it barely confuses me at all. But I was reminded of the steepness of that initial learning curve the other day while helping my wife use R to do some regression analyses for her thesis. Rather than explaining what she was doing, suffice it to say that she needed to write a function that, among other things, takes a data frame as input and retains only the numeric columns for subsequent analysis. Data frames in R are actually lists under the hood, so they can have mixed types (i.e., you can have string columns and numeric columns and factors all in the same data frame; R lists basically work like hashes or dictionaries in other loosely-typed languages like Python or Ruby). So you can run into problems if you haphazardly try to perform numerical computations on non-numerical columns (e.g., good luck computing the mean of ‘cat’, ‘dog’, and ‘giraffe’), and hence, pre-emptive selection of only the valid numeric columns is required.

Now, in most languages (including R), you can solve this problem very easily using a loop. In fact, in many languages, you would have to use an explicit for-loop; there wouldn’t be any other way to do it. In R, you might do it like this*:

numeric_cols = rep(FALSE, ncol(ice.cream))
for (i in 1:ncol(ice.cream)) numeric_cols[i] = is.numeric(ice.cream[,i])

We allocate memory for the result, then loop over each column and check whether or not it’s numeric, saving the result. Once we’ve done that, we can select only the numeric columns from our data frame with data[,numeric_cols].

This is a perfectly sensible way to solve the problem, and as you can see, it’s not particularly onerous to write out. But of course, no self-respecting R user would write an explicit loop that way, because R provides you with any number of other tools to do the job more efficiently. So instead of saying “just loop over the columns and check if is.numeric() is true for each one,” when my wife asked me how to solve her problem, I cleverly said “use apply(), of course!”

apply() is an incredibly useful built-in function that implicitly loops over one or more margins of a matrix; in theory, you should be able to do the same work as the above two lines of code with just the following one line:

apply(ice.cream, 2, is.numeric)

Here the first argument is the data we’re passing in, the third argument is the function we want to apply to the data (is.numeric()), and the second argument is the margin over which we want to apply that function (1 = rows, 2 = columns, etc.). And just like that, we’ve cut the length of our code in half!

Unfortunately, when my wife tried to use apply(), her script broke. It didn’t break in any obvious way, mind you (i.e., with a crash and an error message); instead, the apply() call returned a perfectly good vector. It’s just that all of the values in that vector were FALSE. Meaning, R had decided that none of the columns in my wife’s data frame were numeric–which was most certainly incorrect. And because the code wasn’t throwing an error, and the apply() call was embedded within a longer function, it wasn’t obvious to my wife–as an R newbie and a novice programmer–what had gone wrong. From her perspective, the regression analyses she was trying to run with lm() were breaking with strange messages. So she spent a couple of hours trying to debug her code before asking me for help.

Anyway, I took a look at the help documentation, and the source of the problem turned out to be the following: apply() only operates over matrices or vectors, and not on data frames. So when you pass a data frame to apply() as the input, it’s implicitly converted to a matrix. Unfortunately, because matrices can only contain values of one data type, any data frame that has at least one string column will end up being converted to a string (or, in R’s nomenclature, character) matrix. And so now when we apply the is.numeric() function to each column of the matrix, the answer is always going to be FALSE, because all of the columns have been converted to character vectors. So apply() is actually doing exactly what it’s supposed to; it’s just that it doesn’t deign to tell you that it’s implicitly casting your data frame to a matrix before doing anything else. The upshot is that unless you carefully read the apply() documentation and have a basic understanding of data types (which, if you’ve just started dabbling in R, you may well not), you’re hosed.

At this point I could have–and probably should have–thrown in the towel and just suggested to my wife that she use an explicit loop. But that would have dealt a mortal blow to my pride as an experienced-if-not-yet-guru-level R user. So of course I did what any self-respecting programmer does: I went and googled it. And the first thing I came across was the all.is.numeric() function in the Hmisc package which has the following description:

Tests, without issuing warnings, whether all elements of a character vector are legal numeric values.

Perfect! So now the solution to my wife’s problem became this:

library(Hmisc)
apply(ice.cream, 2, all.is.numeric)

…which had the desirable property of actually working. But it still wasn’t very satisfactory, because it requires loading a pretty large library (Hmisc) with a bunch of dependencies just to do something very simple that should really be doable in the base R distribution. So I googled some more. And came across a relevant Stack Exchange answer, which had the following simple solution to my wife’s exact problem:

sapply(ice.cream, is.numeric)

You’ll notice that this is virtually identical to the apply() approach that crashed. That’s no coincidence; it turns out that sapply() is just a variant of apply() that works on lists. And since data frames are actually lists, there’s no problem passing in a data frame and iterating over its columns. So just like that, we have an elegant one-line solution to the original problem that doesn’t invoke any loops or third-party packages.

Now, having used apply() a million times, I probably should have known about sapply(). And actually, it turns out I did know about sapply–in 2009. A Spotlight search reveals that I used it in some code I wrote for my dissertation analyses. But that was 2009, back when I was smart. In 2012, I’m the kind of person who uses apply() a dozen times a day, and is vaguely aware that R has a million related built-in functions like sapply(), tapply(), lapply(), and vapply(), yet still has absolutely no idea what all of those actually do. In other words, in 2012, I’m the kind of experienced R user that you might generously call “not very good at R”, and, less generously, “dumb”.

On the plus side, the end product is undeniably cool, right? There are very few languages in which you could achieve so much functionality so compactly right out of the box. And this isn’t an isolated case; base R includes a zillion high-level functions to do similarly complex things with data in a fraction of the code you’d need to write in most other languages. Once you throw in the thousands of high-quality user-contributed packages, there’s nothing else like it in the world of statistical computing.

Anyway, this inordinately long story does have a point to it, I promise, so let me sum up:

If I had just ignored the desire to be efficient and clever, and had told my wife to solve the problem the way she’d solve it in most other languages–with a simple for-loop–it would have taken her a couple of minutes to figure out, and she’d probably never have run into any problems.
If I’d known R slightly better, I would have told my wife to use sapply(). This would have taken her 10 seconds and she’d definitely never have run into any problems.
BUT: because I knew enough R to be clever but not enough R to avoid being stupid, I created an entirely avoidable problem that consumed a couple of hours of my wife’s time. Of course, now she knows about both apply() and sapply(), so you could argue that in the long run, I’ve probably still saved her time. (I’d say she also learned something about her husband’s stubborn insistence on pretending he knows what he’s doing, but she’s already the world-leading expert on that topic.)

Anyway, this anecdote is basically a microcosm of my entire experience with R. I suspect many other people will relate. Basically what it boils down to is that R gives you a certain amount of rope to work with. If you don’t know what you’re doing at all, you will most likely end up accidentally hanging yourself with that rope. If, on the other hand, you’re a veritable R guru, you will most likely use that rope to tie some really fancy knots, scale tall buildings, fashion yourself a space tuxedo, and, eventually, colonize brave new statistical worlds. For everyone in between novice and guru (e.g., me), using R on a regular basis is a continual exercise in alternately thinking “this is fucking awesome” and banging your head against the wall in frustration at the sheer stupidity (either your own, or that of the people who designed this awful language). But the good news is that the longer you use R, the more of the former and the fewer of the latter experiences you have. And at the end of the day, it’s totally worth it: the language is powerful enough to make you forget all of the weird syntax, strange naming conventions, choking on large datasets, and issues with data type conversions.

Oh, except when your wife is ~~yelling at~~ gently reprimanding you for wasting several hours of her time on a problem she could have solved herself in 5 minutes if you hadn’t insisted that she do it the idiomatic R way. Then you remember exactly why R is the master troll of statistical languages.

* R users will probably notice that I use the = operator for assignment instead of the <- operator even though the latter is the officially prescribed way to do it in R (i.e., a <- 2 is favored over a = 2). That’s because these two idioms are interchangeable in all but one (rare) use case, and personally I prefer to avoid extra keystrokes whenever possible. But the fact that you can do even basic assignment in two completely different ways in R drives home the point about how pathologically flexible–and, to a new user, confusing–the language is.

46 thoughts on “R, the master troll of statistical languages”

Cosma Shalizi says:

June 8, 2012 at 7:39 am

I am, of course, going to share this story with my students at the end of my “programming through R” class this fall.

But why get rid of the non-numeric columns at all? If you’re doing regression, just give the appropriate column names in the formulas, and pass the whole data frame to lm.

Reply
Tal Yarkoni says:

June 8, 2012 at 8:37 am

Of course you’ll share it at the end… because you’re the master troll of statistics instructors. 🙂

There was a bunch of other stuff involved in processing the data; e.g., most of the variables were continuous but the people who collected it had taken the odd step of coding missing responses as a 6 (on an otherwise 5-point scale), so recoding was necessary for numerical columns. Plus she didn’t just want to drop the non-numeric columns; there were a bunch of factors (race, gender, etc.) that went into the regression as well as nominal covariates. So there were principled reasons for having to identify all and only the numeric columns.

Reply
Thom says:

June 8, 2012 at 8:47 am

I’m glad it isn’t just me … I love R but the apply() family always catches me. My heuristic is never to use apply() type functions without checking ?apply, ?sapply etc. first

I don’t do enough R programming to know the differences automatically … I’ve also recently realized they can often be avoided by rowMeans() and colMeans() …

Reply
Alan Parker says:

June 8, 2012 at 2:57 pm

Hilarious. And so true. I laughed a lot. Especially at the “stubborn insistence on pretending that he knows what he’s doing”. I’m with you 100% … unfortunately. The point that using a (gasp) loop is not actually going to cause a thunderbolt to strike us is a good one. We’re never going to be Shalizi or Wickham or Ikara, so get over it and go for the simple route.

Reply
Jake says:

June 8, 2012 at 10:03 pm

Amusingly, apply() is just a wrapper for an R for-loop anyway…

Reply
Tal Yarkoni says:

June 8, 2012 at 10:45 pm

Thom, to be honest, I’m a bit glad to hear you have this problem too; makes me feel better about myself if you still struggle with this kind of stuff! But on the other hand, maybe that means the road to gurudom is even longer than I thought…

Alan, I hear you and agree… but then again, at least one of the people who’ve commented in this thread is going to be Shalizi or Wickham or Ikara, so… 😉

Jake, sure, but unless you’re doing something that really places a premium on CPU cycles, I think most programmers’ energies (well, mine at least) are directed at minimizing the amount of code they write, not the total amount that gets executed. Personally if I need to do anything that involves more than 10 – 20 lines of code and isn’t completely specific to my own project, I’ll usually spend a few minutes searching for existing packages that could save me the trouble. But it’s quite possible I’m just a particularly lazy programmer!

Reply
Leo Fernandino says:

June 10, 2012 at 9:30 am

Great post. As a beginner R user, I will take it as a cautionary tale, especially when dispensing advice to a significant other.

Reply
zbicyclist says:

June 10, 2012 at 12:29 pm

“actually, it turns out I did know about sapplyâ€“in 2009. A Spotlight search reveals that I used it in some code I wrote for my dissertation analyses. But that was 2009, back when I was smart.”

I hate to tell you this, but this problem gets worse over time as your own history gets longer (I wrote my first stat code in 1969, just to give you some perspective).

Luckily, all these “desktop search” functions can help you find your own answers, assuming you digitized them at some point.

Final word of advice: Yeah, I know it’s better (more efficient) to use other structures besides loops in R, but it’s also better to get the code working quickly and accurately. A loop is a wonderful thing.

Reply
Kenn says:

August 2, 2012 at 8:34 am

“Thatâ€™s no coincidence; it turns out that sapply() is just a variant of apply() that works on lists. ”

Actually, sapply and lapply are much more basic than apply. Just look at the code. Apply does looping in R whereas lapply is an internal (C-level) function and sapply is just lapply plus some simplifying. But there is also vapply that is much better than any of these and that your wife should have used. (But I have never figured out how it works).

Reply
1. dwinsemius says:
  
  April 4, 2018 at 8:40 pm
  
  It’s too bad that teh names of `apply` and `sapply/lapply` look so similar. The `apply` function should only be used when the user knows that each row will be coerced to the “lowest common class” as it is passed to teh functional third argument. It’s a really dangerous function. All of them do “looping”, but what they loop on varies between `apply` versus `lapply/sapply`.
  
  Reply
Andrea Mezei says:

September 17, 2012 at 6:07 am

Hi, I read your blog Im currently looking for Business Analysts who are specialist in R lanquage. I work at American Software company Micros Fidelio, Im the HR Manager. Can you please send me candidates or names who are professionist in this?

Thanks

Mezei Andrea

Reply
Alejandro says:

September 24, 2012 at 3:01 am

damn dude, you didn’t know about sapply? n00b

Reply
John P says:

April 29, 2013 at 2:51 am

R gives the world its ten-thousand-and-first computer language. However, I have found that using R as a standalone language is a bad idea. It’s much, much better to prepare data for R, and to receive data from R, from a scripting language like Perl or Python or Ruby. The extraordinarily limited number of data types, the lack of pointers (references), and a host of other things make this tough sledding for people who are used to languages that can stand on their own.

R’s convenience functions for textual data are hilariously underpowered. It’s nice that R circumvents the bloatedness of old SPSS or SAS programs and it’s also nice that R is so easy to call from all the major scripting languages. However, I can only see R fanatics insisting that this is a full tool all by itself because most apps in the world need statistics as ONE of the outputs of a piece of software. That’s why your, for example, Perl script calls R as a kind of Perl convenience function … and it is VERY convenient for that.

Reply
Thom says:

April 29, 2013 at 3:21 am

John P. : “However, I can only see R fanatics insisting that this is a full tool all by itself because most apps in the world need statistics as ONE of the outputs of a piece of software.”

That’s a bit of a straw man – even R fanatics don’t tend to use or advocate R for general programming – just for statistics. Most R fanatics (that’s I’m aware of) will happily use other languages to call R (e.g., Python) or use R to manage other software (JAGs etc.).

Mind you, lots of R users probably over-use R in the sense that some other language would be more efficient for their task, but that’s because of switching and other costs. However, that’s a general rule of programming (or indeed technology).

Reply
Wendell says:

May 7, 2013 at 2:00 pm

Thom: “even R fanatics donâ€™t tend to use or advocate R for general programming â€“ just for statistics.”

Not disputing your general argument, but I just finished reading the book, “Quantitative Corpus Linguistics with R”, in which R is promoted for text processing. I think it was the most mind-numbing misapplication of a programming language I have ever seen.

Reply
liberal says:

October 8, 2013 at 6:27 am

Great post. I particularly liked the bit about the multiple ways to select a column from a data frame.

Honestly, though, “there is more than one way to do it” doesn’t make for great language design (IMHO); it makes for steeper learning curves. (Cf perl.)

Reply
Pingback: [citation needed]» Blog Archive » The homogenization of scientific computing, or why Python is steadily eating other languages’ lunch
Manolo says:

November 18, 2013 at 4:49 pm

I feel your pain regarding reading R code from other people… but I am going to flip it… I fear the day someone tries to figure out my code. Take for example my function NameClass which I wrote a couple of years ago and I use almost every time I open RStudio.
function(df) {
nc <- as.data.frame(names(df))
for (i in 1:dim(nc)[1]) {
if (class(df[,i])[1] =="labelled") {
nc[i,2] <- class(df[,i])[2]
} else {
nc[i,2] <- class(df[,i])
}
names(nc) <- c("var.name","var.class")
}
nc$var.name <- as.character(nc$var.name)
nc$var.class <- as.factor(nc$var.class)
message("Dataframe contains two variables var.name & var.class")
return(nc)
}

Reply
John Blischak says:

November 19, 2013 at 10:43 am

What a coincidence! I ran into this exact same problem earlier today. I realized I could use `sapply`, but I didn’t investigate why `apply` hadn’t worked. Thanks for the explanation!

Reply
Christian Hudon says:

November 22, 2013 at 2:54 pm

You can actually do assignment 3 different ways in R:

a = 1
a a

Reply
ZL 'Kai' Burington says:

November 25, 2013 at 6:17 am

As to using <- for assignment. The reason I do it is to avoid confusing assignments with function arguments, where = is the only operator allowed. It just means when I look back on my code I can easily separate the two.

Reply
Ken says:

January 10, 2014 at 4:48 am

I have the same experience, God knows how many hours I’ve spent trying to debug an R program. I think 13 years of Matlab programming has had it’s effects on me.

Reply
Pingback: Is Python Becoming the King of the Data Science Forest? - Experfy Insights
JK says:

September 21, 2014 at 4:25 pm

A little late to the party here, but thank you for letting me know it’s not just me. I’ve been programming in various languages for decades, I think I’m pretty good at it, but R wrong-foots me every time. Expectations: violated!

Reply
MN says:

October 22, 2014 at 3:14 pm

I cringe now when I see variable.names with periods to represent a space between compound words. The funny thing is that I never had a problem with it until I learned other languages. That is when I realized that the period usually represented a method or attribute being appended to the variable name and not part of the variable name itself.

“When we were young”

Reply
Jim Abraham says:

March 28, 2015 at 11:34 pm

As a long-time Perl programmer, it’s bittersweet to see another language get this kind of attention. Perl was the original, or at least the best, “There’s More Than One Way To Do It” language. That was the official motto of Perl. Like R, Perl had context: what you get when you do something depends on what context you’re doing it in. Evaluating a list in a scalar context gives you the number of items in the last, e.g. @list = (1,2,3);
$num = @list #scalar context
@list2 = @list #list2 has all the items of list

The greatest way of working with lists (which is all data frames really are), is the functional way, which is to say, to apply a function to each item in the list. The map-reduce approach is as old as Lisp, if not older, but you hear people (like Google) talking about it as if it’s just occurred to them.

In Perl, as in R, map() or sapply() takes a function (in your case isNumeric), which it “applies” or “maps” to each item in the list.

This is incredibly powerful. Consider the greatest of all examples of the apply principle, the Schwartzian transform, which you can read about on Wikipedia. Here we’ll sort a list of words based upon the length of each word:

@sorted = map { $_->[0] }
sort { $a->[1] $b->[1] } # use numeric comparison
map { [$_, length($_)] } # calculate the length of the string
@unsorted;

Without a functional, map-based approach, this would take a lot more code and a lot of temporary variables. In addition, the list items are addressed in the speediest way possible — you’re certainly not copying each one on to the stack half a dozen times.

So this was a great trip down memory lane. As a programmer who also does data analysis, I’m frequently grateful that I had real development experience before I got to R, since R, like Perl before it, is really just a collection of a lot of great ideas stolen from other languages. Like Perl, R has many ways to do it; like Perl, it’s a point of pride for the coder to use the most terse syntax available; like Perl, another’s R code is hard to understand. But R is also like Perl in that the goal is to get the job done — while other languages are busy worrying about purity or philosophy or religious issues, R, like Perl, the “Swiss Army Chainsaw” of languages, it quietly getting the job done. –Jim Abraham

Reply
1. Thom says:
  
  March 30, 2015 at 2:34 am
  
  I may have to steal the “Swiss Army chainsaw” description! I also have a feeling that all open source languages will tend towards this over time. You can’t keep something pure as long as you add functionality and especially if it is added by thousands of different people independently. A case in point: formula syntax – several R packages have specialised formula syntax that is unavoidable and couldn’t be anticipated in advance.
  
  Reply
Stash says:

November 11, 2015 at 3:13 pm

R is BS.

Put on your bigboy pants and use grown up code such as IDL.

Reply
Pingback: R versus Python (in portuguese) – CSBL
Pingback: R, the master troll of statistical languages (2012) – Daily Hackers News
Dale Gulledge says:

February 16, 2016 at 7:58 pm

You’ve missed a third assignment operator:

rep(FALSE, ncol(ice.cream)) -> numeric_cols

That fact is completely consistent with your point.

Reply
Peter Apps says:

February 28, 2016 at 4:35 am

Warning; pedantry ahead …..

A steep learning curve, which you and one of your commenters use as a figure of speech for something being difficult to learn, is actually generated when something is easy to learn, because a learning curve plots correct responses vs number of trials. If you keep getting it wrong time after time your learning curve is flat.

Reply
Oz says:

March 22, 2017 at 12:41 pm

I hate to program in R

Reply
Jorge says:

July 26, 2017 at 3:32 pm

The only reason to learn R is that R is becoming popular and I don’t know why (since there are other languages that can do the task).

Reply
1. Thom says:
  
  July 26, 2017 at 4:31 pm
  
  For statistical modelling there still aren’t really any practical alternatives (outside commercial products) for many tasks.
  
  Reply
2. Sir Huddleston Fuddleston says:
  
  July 26, 2017 at 9:16 pm
  
  I am a very old programmer. I’ve used Perl, Java, Scala, R, and Python pretty extensively for data science tasks, which I’ve been doing for genomics analysis for 20 years now. I have to say, the language that I find truly infuriating is Python. I think it’s garbage, and if you’re interested, I could go on at length about it. Like Java, it survives not because it’s well suited to its task (or any task), but because really superb libraries have been written in it. It’s the language of killer apps, in the same way that Windows won the office because of its killer app, Excel, not because it’s not a piece of shit.
  
  R has always been very good at its job I think people who think Python is great are simply people who know nothing else. Again, happy to go into detail any time. Kids. Get off my lawn.
  
  Reply
  1. Sir Huddleston Fuddleston says:
    
    July 26, 2017 at 9:20 pm
    
    And another thing: R is not “becoming popular.” It’s like data science in general, or machine learning, or deep learning. What’s happened is that the mainstream media have noticed it. People (like me) have been using R, k-means clustering, SVM, and neural nets for data science tasks (like biomarker discovery and expression analysis) for a very long time.
    
    Kids who jump on the data science bandwagon are IDENTICAL to those people who jumped on the web programming bandwagon in the late 90s. We were inundated with people who thought playing around with HTML and Javascript made them programmers. All those people are gone now, once they discovered that real, rather than toy, programming is *hard*. Not merely hard cognitively — often it’s hard, grinding toil, like data munging or doing endless data transformations before you can even get to the good stuff.
    
    Five years from now, all those data science newbies will have moved on to the next thing, and the people who were interested in data science for *solving hard problems* will still be here. Tech is just a tool. No tool is any better than any other. It just matters what’s appropriate for the job.
    
    Reply
    1. Thom says:
      
      October 30, 2017 at 6:22 am
      
      There are data evidencing the increased use of R. However, I suspect it is part of a general shift from stats packages to statistical analysis environments and statistical programming. Thus most of the growth is in stats users who are non-programmers discovering python, R etc.
      
      Reply
  2. dwinsemius says:
    
    April 4, 2018 at 8:52 pm
    
    You have an extremely confused idea about the history of computing. (And I seriously I doubt that you are as old as I am.) The Excel program started on the Mac. I got my first copy (version 1.0) in 1985. It was only several years later that it was finally ported to Windows. It was better than Lotus 1-2-3 which was the dominant spreadsheet in business from 1983 to the early 1990’s It was only after Windows was in version 3.1 that it was stable enough for general use. The reason Windows “won” was that it was built on top of PC-DOS which had been adopted by the businesses of the world. And I’m a committed R user, so this rant has nothing to do with the Python vs R discussion.
    
    Reply
3. Sir Huddleston Fuddleston says:
  
  July 26, 2017 at 9:32 pm
  
  Hey, sorry to go on and on, but I re-read your original post, and I was struck by a conversation I had at a hackathon recently. It was with a 22 year old kid who only knew Python. He said the problem with Perl was that there were too many ways to do something, and Python was great because it limited you to the one “Pythonic” way of doing things.
  
  Since he had no exposure to other languages, he had no idea that many “Pythonic” ways of doing things simply suck. They’re hacks perpetrated on the language to work around its fundamental defects. Take functional programming. In Python, you’re stuck with ersatz workarounds for lambda functions in operating on lists, because, as Guido admitted a while ago (I’m paraphrasing) “Python wasn’t built to be functional, and we’re not wasting so many good minds on making it that way.”
  
  The dude didn’t realize that map/reduce was not a Google idea, that it had been around since Lisp and John McCarthy — and that of all the languages to finally implement functional constructs which had been around since Lisp, Python is the suckiest at it.
  
  Sorry, I kind of got off track there, into a Python rant. I was going to say that I showed this novice the Schwartzian transform in Perl, and it blew his mind. The thing about languages with decent functional constructs, like Ruby, Perl, and R is, like Larry was fond of saying, more than one way to do it is valuable — you can write “baby language” Perl (or R), and it will work. You can advance in your knowledge of the language, and find more efficient ways to things.
  
  Python’s “one way to do it” actually hurts adoption because for every goddamn thing, you have to go look up Python’s often non-obvious way to do it. If you don’t find it, you’re screwed.
  
  Anyway, I guess the point I was making is “more than one way to do it is actually a powerful thing”. But like all tools, it can be abused and it’s not for every job. Java libraries are so much more stable partly because the barrier to writing anything in it is so high that you have to put in the time (and also, static typing is good for big projects).
  
  Anyway, anyway, we should discuss this over a beer, if you’re ever in Cambridge, MA. I promise not to post any more.
  
  Reply
  1. liberal says:
    
    October 29, 2017 at 10:25 am
    
    LOL. If you think that Python is a bad language and Perl and R are good languages, you’re a programmer, not a software engineer.
    
    Reply
    1. Sir Huddleston Fuddleston says:
      
      October 29, 2017 at 3:04 pm
      
      I gave my reasons. Whereas you sound like a fanboi. More to the point, if you think Perl and R would be chosen by “programmers” vs Python, you a) don’t seem to realize that software engineering involves programming, and that b) software engineers qua engineers would be more likely to choose Perl or R over Python, whereas a pure algorithm guy would not. It would appear that English is another language you’re not any good at.
      
      Reply
Right Arrow Operator says:

October 7, 2017 at 12:40 pm

Regarding the operators of assignment, the differences are described here with examples: https://www.quora.com/Why-do-people-use-the-assignment-operator-instead-of-in-R-Is-there-any-difference-between-them

Reply
Pingback: AgricoLab | Out of the coding comfort zone
Hugo Toledo says:

October 8, 2018 at 11:13 pm

This article will never grow old. Thanks from all of us who continue to benefit from reading your wise words.

Reply
zednick zytnick says:

June 1, 2023 at 7:13 pm

I know a lot of other languages and I started learning R yesterday. It’s straight up a troll language written by autists. It’s as if someone set out to write a language as indecipherable as C but with their own twisted brain functions built in. Messing with the = sign, not having ++ or even +=, and forcing wonky nesting of if-else if-else structures are just beyond the pale.

Reply

R, the master troll of statistical languages

Related

46 thoughts on “R, the master troll of statistical languages”

Leave a Reply to Tal Yarkoni Cancel reply

Share this:

Related

46 thoughts on “R, the master troll of statistical languages”

Leave a Reply to Tal Yarkoni Cancel reply