Tuesday, February 24, 2015

If only descriptions listed all allophones of each phoneme..

I'm at a workshop and made a little doodle that I thought y'all might like, it expresses my inner desirers if I was ever to move from grammatical typology to phonological typology. 


Phonology is the study of which sounds in languages are used to distinguish meaning, phonemes are the sounds that make a difference in meaning. Phonemes don't have sounds, they are multiple sounds with distinctive boundaries to other phonemes - other sets of sounds. A phone is the individual instance of sound. Allophones are all the phones that together make up a phoneme. Phoneme typically have one phone that is the most frequent and is used as shorthand for the entire set. For example, in Swedish [s] and [z] are allophones of the same phoneme, but we usually represent this set by [s] because it is the most common one. However, when we compare to for example English where [s] and [z] are very different animals, in fact so different that a change in them creates a change in meaning - i.e. they are different phonemes - then we cannot compare that directly to the Swedish case of /s/. They consist of different sets.

Basically the point is this: phonemes don't have sounds, they are sets of several sounds that together form a unit. The boundaries that these sets have to other phonemes in the system is crucial to comparing them appropriately. This means that in order to do typology on them we need to know all the member allophones of a phoneme and/or the crucial boundaries to other phonemes.

Most often, however, we don't have this. In order to do something useful anyway we should not compare two systems directly by just counting if they have the same phonemes or not, instead we should take the articulatory features of the phonemes (+voice, +plosive, + rounded etc) and see how similar to phonemes are in those features, how many edits we have to make in order to get from one phoneme to the other (ideally we'd also like to rank features, voicing being less important than manner for example maybe).



This is the topic of my doodles today. See Steven Moran's PhD thesis and database for more on this. For more on what a segment is and how it is difference from a phoneme I highly recommend reading pages 8-10 of Steven's thesis.

References
Moran, Steven. 2012. Phonetics Information Base and Lexicon. University of Washington. (free PDF here)

Moran, Steven & McCloy, Daniel & Wright, Richard (eds.) 2014. PHOIBLE Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://phoible.org, Accessed on 2015-02-23.) 

Saturday, February 21, 2015

New map for Ethnologue and Internationella Modersmålsdagen!

The 21st of February is International Mothertoungue Day! Vilket betyder att resten av detta inlägg blir på svenska.

Ethnologen har nyligen producerat en interaktiv karta över levande språk i deras katalog och de postade den igen idag på sin Facebooksida dagen till ära. Gå och klicka runt på deras karta här! Kom alltid ihåg att Ethnologue har en annorlunda klassifikation av hur hotat ett språk är än UNESCO, deras karta kan du hitta här.

Copyright © 2015 SIL International

Nu är det bara 10 minuter kvar av lördag här hos mig, så nu postar jag innan internationella modersmålsdagen är över.

On the topic of standardising linguistic terminology, pt 2

(Pt1 is a longer text guiding to different resources of linguistic terminology and some of the issues that I bring up here, you can read it here. I continuously update that text, please take care with dates.)

This is a topic that comes up often when one talks to typologists, descriptivists and database-minded
people (and frustrated linguistics students) so I thought I'd get back on it again and share some thoughts.

Recently this topic has been popping up very often for me, I've for example been talking about it with Bettina Klimek of Agile Knowledge Engineering and Semantic Web (AKSW) in Leipzig, and now just recently with Asya Pereltsvaig of Standford and the excellent site Languages of the World. This is an interesting topic to me and very dear to my heart. Having read grammars and tried to fill in typological questionnaires all these issues become much more clear and concrete - it's even one of the reasons we started this blog in the first place. I'm thankful for all the discussions on this topic I've had so far, please share with me your thoughts so that we may advance the discussion further. In the interest of sharing information family, please leave comments on this post on the blog, not on Facebook, Tumblr, twitter etc.

Basically, the issue is this: linguistics is a scientific field that has a lot of terminology that sometimes contradict each other or overlap in non-optimal ways. There's confusion, more in some areas than in others but usually it's extremely hard to find an area that is perfectly free from any terminological controversy. What one author means by "aspect" might not be the same as what another means by this. The major variables that seem to be important, in my personal experience, are

  • what language family does the described language belong to?
  • what geographical area does the described language belong to?
  • was the description in any explicit framework/model/theory?
  • when was it written?
  • in what language was the description written?

Much of linguistic terminology comes from theoretical linguistics and linguistic typology and is often coloured by a certain framework, survey etc. At the same time as we have this great swarm of terminology, we'd also like to be able to do successful comparative work, and we'd like to be sure that we are indeed comparing like with like. We'd also like to, you know, be able to read each others papers and understand what is going on.

Now, how to get at this? In the natural sciences one strives towards standardisation of terms, perhaps this is also the way to go for linguistics? But what happens when one's study object (people and languages) change constantly and when different angles of analyses can have such huge effects? What happens when we haven't studied them all in enough detail yet? How is such a standard to be created and maintained (cf herding cats)? And finally, what happens if those standardised terms are applied uncritically to languages resulting in a misrepresentation of that language which skews our typology? (This is what I call translation grammars from "typologese" elsewhere.)

I'd like to argue that a strive towards complete universal standardisation of linguistic terms is not only impossible but also destructive to the field. We do indeed need to have certain shared basic assumptions, but for a detailed and pragmatically useful definition that can be realistically applied in a specific study (be it comparative or specific) we must carve out those specifics explicitly every time and always remain critical of previous categories. Otherwise there is a danger that we will only find what we thought to look for and stagnate as a field, and that is sad.

Now, don't get me wrong I don't think that we can expect of every descriptive linguist and every comparativist to take apart every term in its atoms and I do understand and appreciate the need of shared conventions. If we had to define every term all the time we wouldn't have time for anything else. Science is not only about objective discovery, it is also about interaction with other scientists. This requires us to be able to understand each others work, compare, repeat studies etc. In order to interact with the rest of the scientific community we adapt our definitions to some sort of norm, this is natural.

In fact, this is very much how we as linguists believe that speakers/signers of languages behave.  In many ways these discussions are not that unsimilar to discussions I sometimes have with language prescriptivists who argue that "well if we are to be as linguistically liberal as you say then everyone would create their own language and no-one would understand each other". The point of language is to communicate (or at least that's one of the points) and the point (or at least one of the points) of science is to interact with each other and by shared knowledge reach higher than any one of us could have reach alone. To accomplish those functions we cannot completely diversify into our own separate islands.

That being said, I believe that language doesn't require the explicit regulation that prescriptivists would like in order to fulfil that function, nor do I believe that linguistics needs an collective enterprise of "once and for all standardising all terms so that we can get something done for once!". As a side note, this was actually kind of the aim of GOLD (Great Ontology of Linguistic Description):

Originally [GOLD] intended to build a single termset that all could use, but this goal was soon seen to be unattainable: there was too much diversity in terms between linguists and sub-communities in linguistics, and considerable reluctance to change them. An ontology, through which these diverse termsets could be linked, thus made the most sense.

Both ends of the spectra are problematic. I've read some descriptions that are full of explicit motivations - but in such foreign framework (for me) that it is extremely hard to get at. I've also read plenty of descriptions with under-defined categories that leave me asking "was this really an article... or was it a determiner.. how can I know?!"

I believe it it possible to strive towards more transparency of the motivations for our analytics choices then is now being made and I do think this will improve our understanding of linguistic diversity. I don't think entirely individualistic term sets are realistic or good, nor do I think one standard is good. Just as I believe that speakers/signers can maintain a norm with shared enough definitions in order to be able to communicate, good enough, so do I believe that linguists can make good linguistics without explicitly defining a norm. The absence of explicit standards encourages people to re-evalutate, discuss and be critical of previous categories, if there was a consensus and standard I believe this would happen less often and that our research would suffer from it.

Please do note that internal systematicity is paramount, be it in descriptive or comparative work. The norms of the large field of linguistics might be dynamic and flexible, but the terms you're applying in your specific research cannot be. This might seem obvious, but it deserves to be said.

If you later would like to go on and compare different descriptions (comparative work) or even compare different typological surveys (levelled up typology) then you do need to work through the definitions, at least the most critical ones for your purpose, and investigate what can be compared directly and what cannot.

As for the overwhelming task of the descriptive linguist, you need not re-invent the wheel just because you are to be explicit about your categories. There are plenty of already existing definitions that one can rely on, the key is 1) to be explicit about the fact that you are using just those and not taking them for granted and 2) be critical when applying them to your specific case. I made a section here on the blog before, that I sometimes update, on the different resources available and some further discussion of these issues.

Now, we can actually do typology without reading grammars and relying on other people's analysis. We can for example do typology via parallel texts (post here) or use tasks and questionnaires designed for gathering data directly from the speakers/signers. Such work has been pioneered at the Max Planck Institute of Psycholinguistics in Nijmegen, you can see some of their stimulus sets and questionnaires here. These kinds of methods actually go all the way back to the foundation of modern typology and Berlin and Kay and their work on the typology of color words, you can see some of it's modern successors here.

Lastly I'd like to bring up a point that I discussed in greater length in the previous post, the issue of keeping comparative and descriptive terms separate. We should not equate terms of description with terms of comparison, the datapoints exist in different systems and therefore have different concerns. A datapoint in a descriptive work mush be true to the material gathered and make sense among the other datapoint of that language, whereas the datapoint in a comparative work must work on the parameter of other languages. They cut up the space differently because of this.

One way of looking at it is that comparative categories are tools and not necessarily "true" entities in themselves (see previous post for overview of Dahl, Bybee and Haspelmath's ideas on this). A standardisation of comparative categories might be more possible and less destructive than standardisation of all terminology, but I don't know what this will gain us really? Stagnation is a danger here too.

So, in closing: we do have shared conventions and norms, this is not weird or bad. These norms are however not to be taken for granted, nor are they enough detailed so that they can be used directly in descriptive or comparative work. In order to do interesting comparative and descriptive work we need more detailed and pragmatically applicable categories. These categories need not, in fact must not, be standardised. If they were standardised we would have problems with stagnation of research. Both ends of the spectra - entirely new terms all the time and complete standard - are bad for our discipline. We need to have more discussions, all of us, on how we should best proceed.

I'd like to, and I can always, write more on this, but I will stop for now. Next time I'll try and recap the Floyd & Haspelmath-debate for some concrete examples of these issues.

Thursday, February 19, 2015

Come join #lingwiki 2! Living in ACT? Even better!

Interested in improving articles on linguistics on wikipedia? Join in a collaborative editing session sometime during 28-29th of March. It doesn't matter where you are, you can participate online or at one of the meet-ups that will be happening. If you're a linguist living in the ACT area in Australia and want to edit together on Saturday the 28th, get in touch with Hedvig by filling in this form right here!

Don’t know how to edit Wikipedia, don’t know what to write? Don't worry, there will be instructional material provided and suggested topics. If you go to a meet-up you'll get even more help.

Wikipedia is often the first go-to place for an overview of a topic, and as such it is what represents our field to the public, other researchers, other students and also potential future students. Wikipedia is a collaborative enterprise, it is what we make it.

There are many surprisingly good articles on topics on linguistics on wikipedia, but many are also lacking severely. Either they don’t exist or they are labeled as “stubs”. Stubs are non-complete articles and they can be lacking in different ways. In these sessions we're mainly targeting linguistics stubs (underdeveloped articles), under-documented languages and biographies of important linguists (in particular women and people of minority background). But you’re welcome to work on any part of wikipedia that interests you. Go have a browse on the wikiprojectpage for linguistics for more info.

Don’t worry about writing the one and only comprehensive article on one topic that you are an expert in, this is mainly about improving faults and contributing basic knowledge to the public. By adding information from glottolog, omniglot, glottopedia, ethnologue, WALS  or making use of other resources available to us we can already improve many articles significantly. And don’t forget the articles on linguists and linguistic terms. Basically, you have skills as a linguist or linguistics student that is very useful, even if you don’t know everything about everything. 

This is all on the initative of Grectchen McCulloch of the blog Allthingslinguistic. Lexicon Valley and of McGill University (Canada). You can read Gretchen's report on the last lingwiki here and some info about on the Humans Who Read Grammars-blog here. The local event in ACT is being organised by Hedvig Skirgård. 

In order to connect with others around the world who are participating we’ll be using the #lingwiki hashtag on twitter and elsewhere. So if you want to comment about what you’re doing or just spread the news use that tag. Regardless of whether you'll be active on the hashtag, check out the how-to slides here and make sure to fill out the survey afterwards at some point over the weekend so that your edits can get added to the summary list. 

Wikipedia articles have certain rules and conditions and there are other editors and bots (automated programs) that go will go through and check your edits. Make sure to keep a bit of an eye on the articles you’ve worked on some time after just in case someone responds to your edits. Often there's an easy fix - don't worry.

Tuesday, February 17, 2015

Illustrating current questions in research on linguistic diversity

In relation to the previous post about the grand challenges of current research in linguistics  I'd like to bring up four questions that are currently the focus of much contemporary research in linguistic diversity, and then illustrate them with maps. They are as follow:
  1. Why are there so many languages in some places and few in others?
    • Why do some languages have more internal variation than others?
  2. Why are there so many language families and isolates in some places and not in others?
  3. Why are languages in certain areas so similar to each other despite not being related to each other, whereas other languages that are in contact are less similar? 
  4. What is the possible design space of language and why do languages cluster in that space the way they do? 
To illustrate these questions I'd like to present you with a series of maps. I really recommend you reading this post here about things to think about when you're reading maps of languages.

First some preliminaries:

THE BASIC WORLD MAP PROJECTION OF WORLDMAPPER 
This here is the basic map project from Worldmapper. The different states of the world are coloured differently, with some suggestions of larger areas being more similar (Central and Southern Africa, Pacific, South America etc). The map projection (how they have chosen to represent a 3D sphere in a 2D map) is similar but not identical to Gall-Peters.


THE WORLD MAP DISTORTED BY POPULATION 
Here is that same map, but in this case the size of the states have been distorted in proportion to population size of humans in that state. You can have a look at many more on their site, including proportion of internet users or income.


© Copyright Sasi Group (University of Sheffield) and Mark Newman (University of Michigan).

LANGUAGES PER STATE (question 1)
Now, let's have a look at the amount of languages per state. This maps looks different from the one distorted by population. Most notably Nigeria, Papua New Guinea and vanuatu are much larger - whereas Europe and the Middle East is smaller. Mexico is bigger too, have a look!


© Copyright Sasi Group (University of Sheffield) and Mark Newman (University of Michigan).

ONE DOT PER LANGUAGE (question 1)
This here is a map from Ethnologue 2009, each dot is on language according to their classification. It for example complements the above map by showing WHERE in Australia and Mexico there are many languages
© 2009 SIL International 

THE GENUS WALS SAMPLE (PROXY FOR PLACE WITH GREAT GENEALOGICAL DIVERSITY) (question 2)
A genus in the World Atlas of Language structures is a group of genealogical related languages with a  shared history dating back no longer than 3500-4000 years. This is different from "family", by family linguists most often mean the very top level of genealogical units. The time depths of families can vary greatly, Proto-Uralic goes back to 7000-2000 BC (estimates vary) whereas Dravidian is only reconstructed to around 500 BC. Using genera as a unit makes for a more even time depth.

Anyway, the WALS features a core sample of 200 languages that is supposed to be included in all chapters - facilitating direct comparison. This sample is meant to be balanced so as to nor overrepresent any area disproportionally, meaning that if we look at where these languages are we get a hint at where there is great genealogical diversity. The 200 WALS sample contains 87 families and 175 genera. Each dot on the map is one language and the form and color of the dot corresponds to one genus. There's an interactive version that you can click around on here. Mind you, this is not a perfect illustration, but it gives you an idea.

SIMILAR BECUASE CONTACT? MAP OF LANGUAGES WITH TONE (question 3)
This is from the World Atlas of Language Structures and Maddiesons chapter on tone. This map features 527 languages: 132 with simple tone systems, 88 with complex and 307 with no tone. What I want to show you here is that there are areas with genealogical diversity, such as Mainland South East Asia and West Africa, that are similar structurally.
Maddieson . 2013. Tone. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wals.info/chapter/13, Accessed on 2014-12-17.)

MULTI-VALUE FEATURES IN APICS AND PRECENTAGES (sub-question to question 1)
So, for the question: Why do some languages have more internal variation than others? This is a hard one, if not impossible, to find illustrations for. It is simply because we don't have that kind of data. The only kind of proxy illustration that I can think of is maps of the APiCS where languages can have more than one value. APiCS stands for the Atlas of Pidgin and Creole Structures and is a sister-project to the WALS in CLLD. Unlike WALS, APiCS specifically targets contact languages and features that are interesting to contact languages. APiCS does not, thank god, make any classification into what is and what is not a creole, pidgin etc. It covers 78 languages that are commonly accepted as occurring somewhere on that spectra. Some say that it is under representing contact languages not lexified of info-european languages, that might be true. I'll leave that to other people and another time.

Why do I bring this up in relation to internal variation, well you see in contrast to WALS languages in APiCS can have more than one value for a feature, for example both SVO and SOV. In addition, each datapoint is given a confidence level (excellent idea). This means that if a language has more than one strategy there were guidelines in WALS that directed which one you'd pick as the representative (probably most often the most frequent one), but in APiCS we get all possible constructions. We don't get to know what the variation correlates with, perhaps it's always SVO in subordinate and SOV in main clauses. Perhaps everyone on the south side uses one construction and everyone on the north another. This we cannot always know. However, if we are interested in variation within a language - not matter what it is conditioned by - this is actually useful. And it might be that variation that is condition by certain grammatical functions are also markers of a change in progress, just as regional variation etc. So, it is interesting.

I wouldn't say that APiCS maps are good at representing where there is more or less internal variation in the worlds languages, for one they only target contact languages which already makes huge difference and also we cannot vouch for how systematic these percentages are across languages and if the can be directly compared. But as illustrations of internal variation goes, this is the first thing I thought of.

Here is feature 11 in APiCS: order of frequency adverb, verb and object.

And yes, the language up there in the middle of Canada that is being so varied is Michif - one of the coolest languages on our planet.

RESTRICTIONS IN DESIGN SPACE (question 4)
In the interesting of not overloading your brains I'll go for a very simple illustration for the fourth question about clusters in the design space - a table of word order frequency. If we imagine all the possible orders for what we have analysed as "subject", "object" and "verb" in languages of the world we come up with 6 orders. This is our possible design space in our model of analysing languages. We could instead think about thematic roles, constituents, topic-comment etc - but for the sake of simplicity and familiarity let's just consider the traditional concepts of subjects, verbs and objects. In this case, we're also ignoring any variation and just picking the "dominant" order for each languages.
If all orders were equally plausible we'd find an even distribution of languages. We do not. Languages  do not spread out evenly in this design space that we've created and that Dryer has surveyed. This might be due to other features of the language and that there are sets of features that travel together, it might be due to information structure and memory, it might be due to just the set of 7 000 languages we have today are quirky. This is to me is one of the major questions in contemporary linguistics:

What are the historical, evolutionary, psychological, communicative, cognitive and social restrictions on the nature of language and its distribution in the possible design space?

Alors.. monsieur professeur Martin Hilpert: je vous donne mes sept questions de la recherche linguistique contemporaine (ou au moins ces que je peux trouver maintenant):

Pourquoi sont ils plus langues dans certain endroits et moins ailleurs?
Pourquoi avaient certains langues plus variation interne que d'autres?
Pourquoi y-a-t-il plus families des langues et des isolates dans certain endroits?
Pourquoi sont les langues de certain endroits plus similaire malgré n'étant pas lié?
Quelle sont les caractéristiques des langue qui sont plus affecté par contact et par étant liées?
Quelle sont les possibilités logical de les caractéristiques des langues?
Quelle sont les raisons pour les langue de notre monde de les distribue comme ils avaient?

</show-off> (I just felt like it, it's late in the night here in Canberra so one can go a bit wild and speak french to oneself, it's ok.)

Jag hoppas du tyckte det var intressanta frågor och bra illustrationer, jag inser att de inte är speciellt nya dock. Men men, i vilket fall. Trevlig kväll!

p..s I'd also like to bring up the issue of sign language typology and how it relates to contact linguistics - which is one of the most interesting things in the universe. But let's leave that for a later time ok?

The Grand Challenges of Current Linguistics: what would you list?

At the SLE conference (Societas Linguistica Europaea) in Poznań 2014 there was a round table discussion on where current linguistics is heading and what the most important problems are that we as a file need to tackle. It was called "Quo vadis linguistics in the 21st century" and the participants were very interesting and experienced researchers.  I'd like to share some of my own thoughts and invite you to share yours with us.

The brilliant blog Diversity Linguistics Comment made a longer post where you can read what the all participants presented, what they see as the most important issues right now. It is a very good read, and not that long. Anyone interested in linguistics should read and really take in what these people are saying.

One of the participants didn't contribute a text to that compilation blog post, instead he contributed with a video. This is the excellent linguist Martin Hilpert of  University of Neuchâtel. He makes videos regularly on his channel. I really, really recommend watching them. Here is his video for that discussion, watch it. I don't know how I can stress this enough. 


I've been asked to give advice to new students before on things to read etc and I often find that task overwhelming, there is so much to recommend! I think I'll start a new tag of posts here,  tips_for_people_interested_in_linguistics. Out of all the things I'd like to recommend, his videos are some of the best and you should totally go check them out.

Now, back to the topic of the video and this discussions that they had down in the European summer in Poland of '14. Discussions like this, about what is important in our field are crucial to the advancement and health of a scientific field. We need to have them and always keep them in the back of our minds. What kind of questions are we trying to answer? Why are those questions interesting? Are there better ways of answering them? What are the achievements that I consider important in my field? Where do those guide me?

I personally think a lot about this, in fact I've had several dips in my academic life where I've been pondering these things almost to the point where I couldn't do anything else. That's when I try and talk to others and ask them how they motivate research, which is a sometimes rewarding and sometimes not rewarding exercise. I really appreciate when non-linguists talk about this with me, or when senior linguists ask me hard questions where I have to motivate why what I'm doing is interesting. It's scary, but it's good. If this is what I am going to be doing then I need to be clear on why. I must say, I am very happy to read about this round table discussions and the points people are bringing up, I sometimes feel a bit alone in my existential moments of critique and problem finding. This is gives me hope and makes me interested in continuing.

Studying and doing research in fields of basic research is to a certain extent a luxury. We're adding to the knowledge of humankind for the sake of doing just that, extending what we know about the world and ourselves without necessarily considering if what we do have practical implications on the world and society around us. This is by no means unique to the social sciences, there is plenty of basic research in physics, biology and elsewhere. Sometimes it grieves me that it is more common to ask representatives of the social sciences to justify basic research than it is to ask the same of representatives of the natural sciences, but I try to not let that bother me too much and instead engage in a as meaningful discussions as the context allows.

I am going to try and have a long hard think about these two questions that Hilpert have posed and return with my thoughts to you, and I invite you to do the same. Tell us your thoughts, they need not be long (don't worry) and if you'd like you can be anonymous.

What are the great achievements of your field of research, in your opinion?

What are the current big challenges of our field?

EDIT: See also this post about illustrating Grand Challenges in linguistics

Sunday, February 15, 2015

Typology as a method - not an area of study

I feel like this Sunday deserves a good quote for some additional food-for-thought before the next week sets in. This time it's an quite old quote by Dahl, as he's talking about quirks of Standard Average European he first declares something that I find to be very true:

I have regarded typology as a method rather than as an area of study in its own right: it is one of several ways to find out about the nature of human language [free PDF of paper here]

As we're prusuing knowledge about what human languages are, can be and what limitations there are of that design space, we can ask many questions that can be answered by different methods - one of them is systematic cross-linguistic comparison - typology. Other questions are more suitable to be dealt with by studying the acquisition of language, or the mapping of language use and the brain, or the history of languages etc.

References
Dahl, Östen (1990) Standard Average European as an exotic language. In Toward a typology of European languages, ed. by Johannes Bechert, Giuliano Bernini and Claude Buridant, 3-8. Berlin: Mouton de Gruyter. [free PDF here]

Friday, February 13, 2015

Are these linguistic features of languages really interesting to correlate, or are the similarities muddled by shared family history or contact?

Seán Roberts and James Winters have produced some nice illustrations on the so called Galton's problem in linguistics. This problem, as it applies to linguistic can be formulated like so:

How do we know that a set of features in languages are correlated independently from shared genealogy or contact?
Some might ask: why is this even a problem? Well, if we are aiming to understand language as the human capacity that we've had for 100 000+ years and all over the world, and all the great diversity that we have and the possible design-space of language (what the limits are for what language can be) - and why the ones we have data on (the living ones today and a few dead ones) cluster the way they do in that design-space: then we'd like to know which variables and data points are dependent and independent, so that we can understand the reality and what is probable to effect it.

Now, that being said: correlations that are dependent of family or contact are not uninteresting: but if that is the case we'd like to test for that so we know for sure.

They have done several post on spurious correlations (for example number of babies and word order), and in general many very interesting stuff. I would reblog every post they make, but in the interest of not over flooding you I recommend you to just start reading their blog as well.

Here's the blog post, go read it!

There are of course also logically dependent features in datasets that one should always be aware of when handling typologic datasets with lots of features. One such example is the position of polar question particles and the type of marking of polar questions in WALS. I.e. one language cannot be coded for both having no question particles (as dominant marking of polar questions) and then having question final question particles. That just logically just do not work with the way the features are set up. The feature of position of question particles is logically dependent on the existence of polar particles (as the dominant marking strategy, not the only). If we mash up these two features of WALS, we can actually see that there are not violations of this logical dependency.

Now, that being said the WALS and many other databases of typological features are often anthologies of several different surveys by different researchers. Sometimes this can lead to discrepancies. So, if possible try and compare surveys done by the same person(s). This is actually not a problem in WALS actually, what I kan see. The authors generally keep to very different areas that do not overlap enough for this to be a problem.

p.s. if you don't have access to interesting datasets such as those they have been using but would like to play around with correlates, well first of you actually do have access to quite a large set (see list here) but also play with Gap Minder and Google N-Grams. When you're using data form a wide time period, always be aware that the methods and goals of gathering data 200 years ago are not directly comparable with data gathered 3 years ago. The kinds of books published in 1820's are not the same as the kind of books published in 2000's (consider for example Harlequin novels which are a large part of book publication nowadays actually).

Wednesday, February 11, 2015

#lingwiki 2

There will a second collaborative editing session of linguistic related topics on wikipedia. You can read Gretchen's report on the last one here and some info about #lingwiki on our blog here. It's once again being coordinated by brilliant Gretchen McCulloch from Allthingslinguistic. I'll spread some information form her onwards to you now:

It will take place the last weekend in March, the 28th-29th. We'll have a peak 3-hour online window between 11pm-2am GMT (that's 7pm-10pm US-EST Saturday aka 7-10am Sunday Singapore time,  10am-1pm Australian EDT, 6pm-9pm US-CST, 4pm-7pm US-PST.

To create a larger timespan for you personally to edit in, just start earlier, continue later, or a little bit of both, whatever's comfortable for your timezone and preferred sleeping habits! If the peak window doesn't work for you at all, also feel free to edit any time on the weekend of March 28-29.

We'll be using and keeping track of the #lingwiki hashtag. Regardless of whether you'll be active on the hashtag,  check out the how-to slides here and make sure to fill out the survey afterwards at some point over the weekend so that your edits can get added to the summary list. As mentioned before, feel free to get together with a couple people in your local area if you want some company. And if there's interest, we might also set up a google hangout so that we can feel like we're in the same room together.

We'll be reminding everyone again as the dates draw closer.

Monday, February 9, 2015

Exciting research on what happens when deaf people who use different sign languages have to communicate!

Kang-Suk Byun of MPI Nijmegen is currently in India researching cross-signing, it's super cool! Go check out this video here where he explains more.

Sign language is the coolest, sign language typology and contact phenomena are even more the coolest.

When people talk about turn taking

It's been such seriousness here lately, it's time for some scientifically motivated gifs!



Whenever people talk about turn taking (the switching of who's speaking in a conversation) and in particular the acquisition of this skill, I can't help but think of the talking twin babies of the you tubes. They are an excellent example of the fact that turn-taking is something that we as beginner humans do and practice on before learning "words". Go watch the video and you'll see what I mean, those are some highly competent conversational partners. Or just try and make a sound or a sign to a baby and watch it evaluated the turns in the conversation and preform very well.

How does turn-taking relate to diversity and description you might ask? Well, we know far to little about the acquisition of non WEIRD-languages and this is a very, very important part of understanding language, both at the specific level and the comparative. Do languages have comparable learning curves? What different types of exposure are there? Do all adult talk baby talk to babies? What difference does it make when you learn language from your peers instead of from adults? How do deaf children acquire language compared to speaking? 

And of course:

What does this mean for our understanding of the constraints on the possible design space of language?

Thursday, February 5, 2015

Are you a linguistics student who want to learn statistics?

I'm suspecting that there are students of linguistics reading this blog, as well as graduated linguistics and non-academic language enthusiastic. Well, if you are a linguist and want to learn statistics you might want to check out this book that is available for free online. I love me some freely available good stuff, and even though this is an old work it's still a very good introduction. You should be reading newer literature too, but if you're just looking to get started this is an easy way to get moving. I actually used this book more than the literature I bought and payed for during my introduction to linguistics. It does not have the most recent and updated methods, but many things are still valid and relevant.

WEIRD and LOL-languages

We've heard before about WEIRD languages and communities, i.e Western Educated Industrialised Rich and Democratic. The point of that label is to highlight the fact that much of the research that has been done in the study of human culture (psychology, linguistics, anthropology, sociology etc) is only representative for a small subset of all humans that live on this planet of ours. Most notably there is currently the "Making Science Less WEIRD"-initiative, you can read more about that and other related items here.

Now, prof. emeritus Östen Dahl has come up with a new useful abbreviation: LOL. It stands for Literate, Offcial and with Lots of users. I learned this from reading the abstracts of the last linguistics conference at the dept of linguistics at MPI-EVA. Here is an excerpt from that abstract that explains the motivation behind this term:

I think, however, that we may also be led a bit astray by the catchy acronym WEIRD in that the adjectives it encapsulates are not necessarily the most adequate for characterizing the biases that have influenced linguistics. It is true that Western (mainly European) languages have been in the focus for a long time; however, even after the Eurocentric bias has started to lose its grip on the choice of languages to be studied, there remains a bias that can be summed up in the acronym “LOL” for “Literate, Official, and with Lots of users”. Even in typological works, the bias is visible. (...) It turns out that there is a very restricted set of LOL languages which are overrepresented in almost any sample. It can also be shown that these languages as a group have a distinct typological profile.  

Sounds like a very good piece of criticism to me, I'm interested in seeing what's coming next.

Here is some awesome free stuff to read on the topic of WEIRDness, in alphabetical order

Cysouw, Michael (2011a). “Quantitative explorations of the worldwide distribution of rare characteristics, or: the exceptionality of northwestern European languages”. In: Horst Simon & Heike Wiese (Eds.), Expecting the Unexpected: Exceptions in Grammar. Berlin: De Gruyter Mouton. 411-431. [free PDF]

Cysouw, Michael (2011b): “Some more details about the definition of rarity" 437-441. [free PDF]

Dahl, Östen (1990) Standard Average European as an exotic language. In Toward a typology of European languages, ed. by Johannes Bechert, Giuliano Bernini and Claude Buridant, 3-8. Berlin: Mouton de Gruyter. [free PDF here]

Henrich, Joseph, Heine, Steven J., & Norenzayan, Ara (2010). The weirdest people in the world? Behavioral and Brain Sciences 33, 61-83. doi: 10.1017/S0140525X0999152X [free PDF here]

Majid, A., & Levinson, S. C. (2010). WEIRD languages have misled us, too [Comment on Henrich et al.]. Behavioral and Brain Sciences, 33(2-3), 103 [free PDF here] 


Tuesday, February 3, 2015

Goodbye to Linguistics at MPI-EVA and current research

In Leipzig there is the Max Planck Institute for Evolutionary Anthropology (MPI-EVA). There is a linguistics department there and their director is Bernard Comrie. He is now retiring and the department is closing down. This department has been very, very important in the research field of linguistic typology (systematic cross-linguistic comparison). I cannot begin list all their contributions, why don't you just have a look here? Perhaps you've heard of WALS?

In honour of the department's history and also to discuss the future of research into linguistic diversity they are organising a closing conference this spring: Diversity Linguistics: Retrospects and Prospects. They're inviting former and current researchers of the department to come and present their work.

If you are interested in linguistics and want to know more about current research in linguistic typology/diversity linguistics then go have a look at the abstracts of this conference. Looking at abstracts of interesting workshops and conferences is generally a good idea if you want to keep updated, even if you can't go there. My interest was for example piqued by, among others, Östen Dahls talk on "How WEIRD are WALS languages?", Sebastian Drudes talk on "Languages, "languoids", ISO-codes and the Glottolog: Creating reference systems for language diversity and variation", Ulrike Zeshans talk on "Early pidginisation of incipient signed jargon" and Jeffrey Heaths "Typology and extreme languages". If you look you'll also find that our fellow tumblrer Linguisten is participating; Jan Wohlgemuth will be talking on "A typology of language naming principles". So, much excitement to say the least!!

In 2013 they hosted the biennial conference of the Association for Linguistic Typology (together with the University of Leipzig). In the closing of that conference Johanna Nichols talked about what this department has meant for the field and how we now need to search for a new such forum for typologists to exchange ideas and collaborate. What will happen next remains to be seen, but this is indeed the end of an era.

Also, don't worry: there is still linguistics in the Max Planck Society, primarily at the department of Language and Cognition at the MPI of Pscyholinguistics in Nijmegen and the newly started department of Linguistics and cultural evolution at the MPI of Science of Human History in Jena (read more about the Max Planck society and this newly started department in our post here and on the DLC blog here).

Monday, February 2, 2015

"Just Thinking..." #2: what if some isolates are old creoles?

I was just thinking..
What if some isolates are old creoles? 
If there are really old creoles around, would we not first classify them as isolates? Or as "really weird and simplified members" of their main lexifier's genealogical group? How would we know that they are old creoles if we don't have the sociohistorical context?

I was talking to my friend Abbie Hantgan the other day, she's worked on an isolate language called Bangime [bang1363, dba] and we got to thinking. What if Bangime is an old Dogon lexified creole? What if this is true of more isolates?

Very often we know about the existence of contact languages because we know the socio-historical context of their creation. But that type of knowledge doesn't go back that far, and there is no reason to assume that there weren't contact languages before that. So how do we recognise languages that are old creoles? If we didn't know better perhaps we've been classifying them as isolates, i.e. languages with no living relatives. The oldest creoles we know about today are probably the Portuguese lexified creoles of West Africa (for example Angolar and Cape Verde) that are supposedly from 1500-1550 (Daval-Markussen p.c.).

Bakker, Daval-Markussen, Parkvall and Plag (2011) proposed certain features that are more common in creoles languages, and a huge debate ensued (see this for example). Later Michaelis, Haspelmath and Blasi (2013) also showed that there are significant similarities between contact languages.

Soo... what would happen if we compared isolates + contact languages/creoles with the rest of the languages of the world.. are some isolates really similar to contact languages/creoles.. The ones I can think of on the top of my head do not fit the "profile", but then again I don't know about every language that has been called an isolate.

Perhaps it's just silly and stupid.. but I'd just be neat, wouldn't it? And it's not like it's difficult to test.

How many isolates and contact languages are there again? Just as a reference point:



References
Bakker, Peter, Aymeric Daval-Markussen, Mikael Parkvall & Ingo Plag. 2011. Creoles are typologically distinct from non-creoles. Journal of Pidgin and Creole Languages 26(1). 5–42. (PDF here Campbell, Lyle (unpublished) Language Isolates and Their History, or, What’s Weird, Anyway? (PDF here)
Hammarström, Harald & Forkel, Robert & Haspelmath, Martin & Nordhoff, Sebastian. (2014) Glottolog 2.3. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://glottolog.org, Accessed on 2015-02-02.)
Hantgan, Abbie (2010) A Grammar of Bangime (Draft) (PDF here)

Lewis, M. Paul, Gary F. Simons, and Charles D. Fennig (eds.). 2014. Ethnologue: Languages of the World, Seventeenth edition. Dallas, Texas: SIL International. Online version: http://www.ethnologue.com.

Michaelis, Susanne Maria & Maurer, Philippe & Haspelmath, Martin & Huber, Magnus (eds.) 2013. Atlas of Pidgin and Creole Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://apics-online.info, Accessed on 2015-02-02.)

Michaelis, Susanne & Martin Haspelmath & Damián Blasi (2013) Grammatical simplicity in a cross-linguistic perspective: APiCS meets WALS}. Presentation at the workshop: Creole and pidgin language structure in cross-linguistic perspective

Join the tribe of people making linguistics online better

The people behind Ethnologue recently did a blog post where they talk about the upcoming new edition*. They also write:

We're really grateful for the Ethnologue staff here and for contributors and commenters all around the world who point out many things that need to be fixed--may their tribe increase!

In case you didn't know, you can contribute with feedback to the Ethnologue that they will take into consideration for future editions. Other similar sites also have methods for you to contribute, so I thought I'd do a brief post about how you can contribute to making linguistics on the internets better. I'll cover four different resources that you can contribute to and make a difference.

This is especially relevant if you are doing research on a specific language or set of languages that is not well described (i.e. practically most languages). You can contribute with your specialised in-depth knowledge about what sets your language apart form the neighbours, what the name is that they community themselves prefer to be called etc.

Join the tribe!

1) contribute to Ethnologue (and ISO 639-3)
2) contribute to Glottolog
3) contribute to Wikipedia
4) contribute to Glottopedia

1a) ETHNOLOGUE 
Ethnologue is a catalogue of the worlds languages, edited and distributed by the Summer Institute of Linguistics (SIL) International. Ethnologue contains lots of information, besides classifying all known living language varieties (and plenty dead too) into dialects, languages and families it also contains information about speaker populations, writing systems and much more.

If you want to give feedback on information in Ethnologue you can either email them at Ethnologue_Editor (at) sil.org or become a registered user and submit feedback directly on the page for the language. Each language page has a little feedback section, here is the one for Gambian Wolof for example. It's very easy to become a user, don't be scared that it is complicated.

However, if you want to give feedback on wether a language should be split into several or lumped with others, or suggest new languages not described - then you need to give feedback to ISO 639-3.

1b) ISO 639-3
The SIL International also maintain an international standard of languages names, the ISO 639-3. This is the three letter code that you commonly see references next to language, for example: Senegalese Wolof [wol], Ikulu [ikl] and Alutor [alr]. The ISO 639-3 is updated every year, usually around the 21 of February (the International Mother Tongue Day). You can contribute to it by submitting a change request. Right now that process is a bit complicated, they plan on making it more effective soon by creating web forms.

Ethnologue is not exactly the same thing as the ISO 639-3 codes. Yes, they are both edited by SIL International and every entity that is deemed a language in ISO 639-3  is counted as a language in the Ethnologue. However, 639-3 are codes for the standardisation of language names (and thereby also languages) whereas the Ethnologue also makes statements about languages genealogical relationship to each other, number of speakers/signers, vitality and more information. The International Organisation for Standardisation does not govern Ethnologue, only ISO 639-3.

There are other ISO codes that concern languages, the 639-3 is the only one that is curated by the SIL International and that is concerned with what is construed as the "language-level" of analysis at a global level (i..e not families or dialects, and for the entire world). The other codes are linked to the 639-3 and it is also the one that is the most frequently used code set.

2) GLOTTOLOG
Glottolog is another resource that provides classification and meta-information about languages. In glottolog you can find
  • classification of language varieties into dialects, languages and families
  • genealogical relationships between languages
  • citations for the classification of varieties and the genealogical relationships between languages
  • bibliographies of languages, i.e. lists of references that treat different language
  • alternative names for languages
  • codes (glottocodes) for each node in the trees, i.e. not only for the "language-level" but also families, dialects and everything in-between
  • links to ISO 639-3, other CLLD-sites, MultiTree etc.
  • location of languages (dots not polygons, i.e longitudes and latitudes)
These classification are not identical the those of Ethnologue and ISO 639-3, in short it is often the case that Glottolog is more splitting than Ethnologue. Personally I find it very hand that each of these decisions is being accompanied by a reference, that way I know why it has been decided the way it has.

There is not information in Glottolog on number of speakers or vitality. However, knowing how many descriptions there are of a language is HIGHLY useful.

Would you like to contribute to Glottolog in some way? Perhaps you know a reference that deals with a  language but it is not listed, perhaps you know an alternate name that Glottolog should know, perhaps you disagree with the classification of varieties into languages and dialects? There are a two different ways you can contribute: email or GitHub.

E-mail: Each node (language, family, isolate, dialect) has a glottocode and a page. This is the page for the North-Central Atlantic branch of the Atlantic-Congo Family for example and this is the Essin dialect of the language Bayot.  Each of these glottocode-specific pages has a little alarm bell. See picture to the right. Click that thing, it will take you to whatever email client you have set as default on your web browser. Now you can write whatever it is you want to pass on to the Glottolog editors concerning that languoid.

Github: Github is a place online where people can share code. All CLLD-sites are run through Github. This alternative is for people who either don't mind creating a user on Github for doing a few simple things, or for people who already are users of github. You can also use Github to have discussions, for example through "issues". Issues can be tagged, assigned to people etc. There is a repository for Glottolog data, you can submit issues here.

3) WIKIPEDIA
Wikipedia has revolutionised the spread of knowledge, of course also in linguistics. This is often the first place people look for information to orient themselves in a new field. And as such it is plenty annoying when one find mistakes, but you know what? You can contribute to Wikipedia, we all can!

Earlier this year there was a collaborative effort to improve linguistics, #lingwiki. If you are new to wikipedia editing this is a great place to start, read more about it on Grecthen McCulloch's blog and also here on Humans Who Read Grammars. There is information there about what articles need work and how Wikipedia editing works.

Also, have a look at this manual for how to edit wikipedia in general. Not all pages on Wikipedia are protected, meaning you can edit them without being a registered user. However, I strongly recommend that you register as an official user, it is quick and easy and it lets you keep track of what you have going on.

Would you perhaps like to join the next online collaborative wikipedia edi-a-thon?

4) GLOTTOPEDIA
Glottopedia is Wikipedia for Linguists by Linguists. Glottopedia is what happened when the people at WikiLingua at the University of Trier and Linguipedia at the MPI in Leipzig merged. It's free, it's directed to linguists and you can contribute yourself. There's already lots of articles and it's run by several very prominent scholars of linguistics. Also, our wonderful fellow tumblr Linguisten (aka Jan Wohlegemuth) is an editor!

Glottopedia works almost exactly like Wikipedia, you can read more about editing Glottopedia here.


Ok, that's it for now. I hope you found this useful and that you will join the tribe of people improving these different resources, it is always needed and appreciated. For the most recent edition of Ethnologue they made nearly 60,000 updates and corrections. It is not easy maintaining a resource on so many entities that is also every changing. Sometimes helping out with this is easier than you think, even correcting things that you find "obvious" is very helpful.
-------------

* Yes, you read that right. Thought that the most recent edition that came out last year would stick around for a while? Apparently not. Ethnologue says that they've got the production process organised effectively in such a way so that they can updated much more often. The 17th edition that is currently in effect came out in 2014, the previous ones came out 2009, 2005, 2000 and 1996 respectively.