Data languages and salaries

Last month O’Reilly published their annual DataScientist Salary and Tools Survey. It brought a lot of attention and was the most read article for several weeks at R-bloggers. This is the second year of this report which is an anonymous survey to expose the tools successful data analysts and engineers use, and how those tool choices might relate to their salary. 800 respondents who work in and around the data space, and from a variety of industries across 53 countries and 41 U.S. states.

They found that tools from, what they describe as cluster 3 (Python, R, Matlab,…), increase the average data scientist salary by $1,900 per tool. On the contrary tools in Cluster 1 (SPSS, SQL, Excel, SAS…) bring down salaries by $1,100 per tool. Specifically the report states “The median salary of respondents who use tools from Cluster 1 but not a single tool from the other four clusters is $82k, well below the overall median [which is $98,000]”.

The data was collected from Strata conference attendees which is made of a broad spectrum of data analysts. So, I thought, why don’t we use another source and focus on economics? I checked what Linkedin says about salaries in the economiscs sector about data softwares in the USA and these are the results:

*salaries are in US dollars, the number of jobs are according to LinkedIn USA in December 2014.

Seems that the maximum salary is reached by the combination R+SQL or R alone. But the largest number of opportunities are for those who know SQL, around 750.

This results match those reached by O'Reilly. R seems to be growing and salaries grow accordingly.

Education in the OECD

Economic theory and empirical research stresses how important human capital is for economic growth and society’s welfare. The more educated and productive individuals are, the richer and freer societies are. In order to know the potential economic growth, then, it’s key to measure and compare human capital among countries and regions. Yet, human capital is by definition very hard to measure, by human capital economists mean knowledge, imagination, creativity…
The simplest way to measure human capital is by measuring knowledge in a very specific field. This is what the OECD does every year in a survey to adults of 22 countries. The survey includes reading and mathematical questions and the results of which always draws a lot of attention.

The last results were published in a report back in September and proved to be very interesting. The report compared the literacy level with variables such as educational attainment, unemployment and earnings.
One of the most striking results is how different literacy and maths levels are in each country. In fact, the average 18 years old teenager in Japan, Finland and the Netherlands has a higher literacy and maths level than the average post-graduated in Spain or Italy (post-graduates younger than 35) and, even more surprisingly, even the average 16 years old teenager in Japan has a similar literacy and maths level than the average the average post-graduated in Spain or Italy.

The incentives to study tertiary education are also very different on a country basis (incentives as earnings and without taking into account education costs). The average post graduate worker in Chile, Brazil and Hungary doubles the average salary of a worker with upper secondary studies.
But if we assume free international labour markets, the best thing you could do if you are a post-graduate worker is to move to US on the other hand if you are a worker with below upper secondary education then move to Denmark.

Who is happy?

The European Social Survey is an extraordinary data set providing information about the social activities of 42,000 people in 22 European countries. Economists have been using it to analyse and study social behaviour. This paper from 2006 wrote by Benesch, Stutzer and the misbehaved Bruno Frey analyse the impact of time spent watching TV and self-reported life satisfaction.

Interestingly when one controls for the major factors of human satisfaction, i.e. Financial satisfaction, feeling of safety, trust in people, social activities; time spent watching TV still has an statistical significant negative impact on human happiness and the more you watch the more unhappy it makes you in an exponential way. (I think Youtube may have the same negative impact.)

Even though it’s not the purpose of the paper it’s interesting to see that the most import factor for life satisfaction is financial stability (the desire to be rich has a negative impact, though) followed by be engaged in social activities.

More specifically, according to the regression analysis the happiest person is either an early 30s year old, or retired, woman, who doesn’t live abroad but lives in a farm or house in the countryside, self-employed, volunteers in community service, highly educated, married, living without children at home and working around 30 to 35 hours a week.

The history of culture diffusion

Nature, the magazine, published a 5 minute animation about the spread of culture and ideas through the history and the world (from 600AD to present day) by following birth and death place of main personalities in history like Leonardo da Vinciy. One can see the cultural activity of the Renaissance in Italy and the Rome empire, and the French, American and British cultural and scientific explosion of the XVII and XVIII century. It clearly is a bit Eurocentric but is also interesting and beautiful, anyway.


Life expectancy is accelerating

The Economist published this week male’s life expectancy at birth in UK since 1971. According to their source life expectancy in 1979 was 69 years old, in 2012 it was 79. Not sure why they selected males instead of female (perhaps data is less volatile for males). In 40 years British males’ life expectancy at birth has increased by 10 years, that’s 3 months every year on average! The most interesting thing, though, is that life expectancy has been accelerating. The rate at which life expectancy increased in the 1970’s was lower than today. I added the black dotted line which shows the best fitted line to the original graph. From it one can estimate that, in fact, in 1970 the increase in life expectancy was 1.8 months every year, in 2012 that increase was 4.2 months every year. That’s an acceleration of 1 month every twenty years.

There’s of course a limit here, one can’t increase its life expectancy by 12 months per year or more because that would mean you are immortal. In any case we are living longer at an accelerating rate.

Creative destruction

Mancur Olson pointed out many years ago that economies seem to grow much faster after major wars or other societal revolutions. That were the case of Japan, Germany, and France after World War II. Olson's story was that wartime destruction and revolution dissolved the old vested interests and let new leaders come to the fore. War and revolutions remove the older generations and bring in new generations and technologies.

Another XX century economist, Joseph Schumpeter, argued that the process by which economic growth occurs is the so called “creative destruction”, the replacement or destruction of old technologies and methods by new and more efficient ones. Understandably, this process is always confronted by the old establishment and their opposition is likely to be successful because they tend to have a structured and strong lobby. Economist such as William Easterly argue that institutions that defend economic freedom and protect individual economic liberties are the key for that “creative destruction” to succeed. Daron Acemoglu goes one step further, though, his point in his new book “Why nations fail” is that freedom and individual liberties don’t happen spontaneously, they stem from inclusive institutions, i.e. institutions that embody a broad majority of the society and where political power is not owned by just a few. In other words, the political contest between levelled groups of interest end up reaching the lowest common denominator: individual liberty.

Therefore, from Acemoglu’s point of view, Olson’s observation about war regeneration is a process that only occur IF “inclusive institutions” are in place, otherwise the new generations will just supplant and replicate the previous extractive groups and the protected old technologies just as black American slaves did in Liberia or Mugabe did in Zimbabwe to name a few. 

Big data hype

An interesting lecture about statistics and Big Data hype delivered by Berkeley’s professor Terry Speed. Apparently we seem to be at the end of the upper trend for Big Data. So the excitement will soon be over and the expectations of what can be extracted from Big Data will soon be more… realistic. Meanwhile let’s enjoy it.

Quality of Government

In 2010 the European comissioned a report on the quality of government by region in Europe. The  report was prepared by the research team at the Quality of Government Institute of University of Gothenburg in Sweden.

The primary task of this project was to create data for quality of government (QoG). Although a recent proliferation of QoG data have emerged since the mid 1990s, no quantification of the quality of government has been created or used in this process so far at the regional level. Based on the combination of national level international expert assessments from the World Bank and the largest QoG survey to date to focus on regional variation, they constructed the most complete quantitative estimates of QoG variation for 172 EU regions within 18 countries.

This study is important because numerous academic studies and statements by international organizations, have emphasized that only with a high quality of government can a country reap the benefits of economic growth and social development and foster economic development.

QoG was disaggregated into the following categories or pillars:

1) ‘corruption’,
2) ‘rule of law’,
3) ‘bureaucratic effectiveness’
4) ‘government voice and accountability’/ or ‘strength of democratic and electoral

The general view is the following

One can see that, as expected, northern Europe is the most developed in QoG and not surprisingly Italy and East Europe are lagging behind.

If the EU countries had to be clustered in three groups the result would be this.

France, Belgium, Portugal and Spain make the middle group while Italy is part of the last group.

By pillars the results are the following:

from PDF to CSV

One of the most annoying things about gathering data is that sometimes it is shown in PDF format. As you know the process of copy and paste data from PDF to EXCEL or STATA can be very painful.

Tabula,  came some time ago, is a useful piece of free software to get the data tables out of countless PDF files. It's really is simple to use. Load a PDF file into Tabula, which runs on your computer, highlight the table to extract, and the program does the rest.

Download Tabula here. Find out a little more about it on Source.

Europe's dynamic Historical Atlas

The Centennia Historical Atlas is an impresive program that shows border changes in Europe and the Middle East from the 11th century to the present.
Some dates are not very accurate what still is a nice piece of reasearch.

Olympic Games by GDP

The winter Olympics are about to start in Sochi, Russia and in two years time Rio de Janeiro in Brazil will hold the Summer ones. These two countries are, in terms of GDP per capita, not among the top 50 so I wondered if that is a normal thing or not. I went back to the archives and I managed to plot the following graph. It shows the GDP per capita (from Maddison) of those countries that were awarded with an olympic games against the world's average GDP per capita at the time they were awarded.

During the 80´s and early 90s the countries that held an Olympic event were much richer than the average but thanks to the last bids the index has reached the lowest level since 1950.

Another interesting analysis is to measure country´s GDP GDPpc. The following graph shows the evolution of the GDP (from Penn Tables) of the country that won the olympic auction.

Again we are in the lowest share. This means that more poorer countries are being able to hold the Olympic Games than ever before. I don´t know if that is a good or a bad thing really.

Freedom and language

Last post I talked about the relation between GDP per capita and languages we found that English is around the average. Today I want a show the relation between languages and freedom. Freedom here is measured as an average of the following four indices: Freedom In The World Index, Index of Economic Freedom, Press Freedom Index and Democracy Index. All this indices are published by country and form there we can derive the freedom by language measure weighting each country by its population and language.

The following graph shows the result.

English is clearly above the average and above the other big languages like French and Spanish.

GDP per capita by language

Some new academic papers have found a relation between savings, debt and languages. Apparently, some languages encourage people to save and reduce debt while others do the opposite just by the form of grammatical rules and the construction of future phrases.

Is then sensible to think that language may have a link with GDP. The following graph shows the GDP per capita by language. The data was gathered from IMF, World Bank and CIA.

The first languages (most spoken) of each country were alloted with 80% of the countries´ population and the second language the remaining 20%. Although that can´t be true for all countries I found it a good estimate globally. I couldn't find a more detailed information.

The graph only shows languages with more than a million speakers.

English is in the middle of the graph because many poor countries in Africa speak English and the same happens with French