The Data Deleters: a bill proposed by Senators Lee and Rubio would wipe people of color off federal maps


High-quality data is now a matter of life and death. We depend on these data to plan safe roads, regulate toxic chemicals, and track infectious diseases. In my work as a microbiologist, we’ve seen amazing advances in medical and environmental research by making large, standardized DNA sequencing datasets accessible to everyone.

The same applies to geographic data in the social sciences. Where racial disparities are occurring, high-quality datasets will help to uncover them. Where issues like housing availability are closer to equitable, analysis of Big Data can suggest what policies helped. These connections bind data science to civil rights, in areas from housing discrimination to environmental racism to voter suppression.

Many of us have spoken out against recent assaults on climate change data maintained by the EPA. Fundamental geographic databases crucial for policy analysis are now also under attack.

A stealth anti-science and anti-civil rights provision is embedded in proposed senate Bill S.103 “‘Local Zoning Decisions 5 Protection Act of 2017”, and its companion bill H.R. 482 in the house.  Section 3 of those bills would supersede all existing laws to ban all federal funding for using or maintaining geospatial databases that track racial disparities in our communities. It would also ban databases that track disparities in access to affordable housing. I confirmed that this language is still present as of 1/30/17.

The key provision, Section 3, is only one sentence long (You can read the official full text of the bills for yourself at the links above.) :


Notwithstanding any other provision of law, no Federal funds may be used to design, build, maintain, utilize, or provide access to a Federal database of geospatial information on community racial disparities or disparities in access to affordable housing.

The provision is absurdly broad and intentionally vague. Using federal funds to do any research at all on racial discrimination could potentially violate it, if that research involves a ‘database’ that records ‘geospatial information’ – a low bar that most research studies would trip over.

We scientists like to hedge, to find nuance.  But this provision is quite simple. There is no rational, non-discriminatory basis for a blanket ban on studying inequality using federal databases. If you thought racial discrimination in housing were minimal, you’d want more data out in the open, not less. You’d want giant billboards up everywhere showing how fair our housing policy was. The only reason to place a blanket ban on researchers building and using geospatial databases of racial disparities is if you know discrimination is happening, and you want to cover it up.  

That is precisely what Senator Mike Lee (R-Utah), Senator Marco Rubio (R-FL) and 22 representatives from the house have done in this bill.

I’m sure that, when pressed, some might argue that state databases will take up some of the slack.  Perhaps, but that would still produce a situation worse than the status quo, in which we have both federal and state databases.  It would also make national analyses of racial disparities much harder. Data science benefits greatly from consistently annotated and easily accessible datasets.  The most likely effect of eliminating federal databases of racial inequality would be to have fewer data, and to make those data that are available  harder to access and compare.  This will limit the power of research studies, and potentially cast doubt on their results.

Discriminated against in housing?  Now they can claim there is ‘no scientific data’ to back it up.  Because they deleted it.

Trying to get a federal research grant to study racial disparities in zoning law? You won’t, because that study would require you to build a database, and now funds can’t be used for that purpose.

Fighting a chemical plant in your backyard? Good luck arguing that it’s placement was discriminatory if your expert witnesses don’t have the data on community racial disparities to build you a map.

This provision has no purpose other than to shield discrimination. Its effect is to literally wipe the concerns of people of color from Federal maps.

As scientists, we will safeguard key datasets from political interference. As citizens, we will resist cowardly attempts to shield discrimination and racism from the bright light of public scrutiny.  As decent human beings, we will never support legislation make it harder for people in our communities to drink clean water or find a place to live, no matter what the color of their skin.

The house bill is currently in the House Committee on Financial Services.  The senate bill is in the Banking, Housing, and Urban Affairs committee. Citizens from all over the country have expressed their shock and outrage over these bills on social media, calling for their rejection. Advocates and scientists from every state are now calling their representatives to demand action, and spreading the word online. A list of senators and representatives sponsoring this bill can be found below.  If your representatives are on this list, please call them and express your opposition.  If they are not, please call your representatives and make them aware that they should oppose this bill. If you live in Utah or Florida, I especially urge you to demand that Senators Lee and Rubio to withdraw this bill, and let them know that this betrayal of fundamental American values will not soon be forgotten.

Thanks everyone for your help and support.

Acknowledgements: I would like to thank  Dr. Jan Marie Eberth, a professor at the UNC Arnold School and Deputy Director of the South Carolina Rural Health research center, who alerted me to this bill.

p.s. here are Senators and Representatives sponsoring or co-sponsoring the bill (see the official page here).  If your representatives are on this list, please call them and express your opposition. If they are not, please call your representatives and make them aware that they should oppose this bill.


Sponsor: Sen. Lee, Mike [R-UT] (Salt Lake City Office, Phone: 801-524-5933, Fax: 801-524-5730)

Co-sponsor: Sen. Rubio, Marco [R-FL]* (Phone: Miami office: (305) 418-8553; Orlando office (407) 254-2573; Tampa office: (813) 287-5035)


Sponsor: Rep. Gosar, Paul A. [R-AZ-4] (Phone: Phone: (480) 882-2697)

Cosponsors: (as of 1/31/17. See here for current official cosponsors):

Rep. Biggs, Andy [R-AZ-5] 01/24/2017
Rep. Franks, Trent [R-AZ-8]* 01/12/2017
Rep. McClintock, Tom [R-CA-4]* 01/12/2017
Rep. Rohrabacher, Dana [R-CA-48]* 01/12/2017
Rep. Buck, Ken [R-CO-4]* 01/12/2017
Rep. Webster, Daniel [R-FL-11]* 01/12/2017
Rep. Yoho, Ted S. [R-FL-3]* 01/12/2017
Rep. Blum, Rod [R-IA-1]* 01/12/2017
Rep. King, Steve [R-IA-4]* 01/12/2017
Rep. Massie, Thomas [R-KY-4]* 01/12/2017
Rep. Smith, Jason [R-MO-8]* 01/12/2017
Rep. Joyce, David P. [R-OH-14] 01/20/2017
Rep. Duncan, Jeff [R-SC-3]* 01/12/2017
Rep. Blackburn, Marsha [R-TN-7]* 01/12/2017
Rep. DesJarlais, Scott [R-TN-4]* 01/12/2017
Rep. Duncan, John J., Jr. [R-TN-2]* 01/12/2017
Rep. Babin, Brian [R-TX-36]* 01/12/2017
Rep. Burgess, Michael C. [R-TX-26]* 01/12/2017
Rep. Poe, Ted [R-TX-2]* 01/12/2017
Rep. Sessions, Pete [R-TX-32]* 01/12/2017
Rep. Brat, Dave [R-VA-7]* 01/12/2017
Rep. Grothman, Glenn [R-WI-6]* 01/12/2017



2/3/17 Updated to fix a typo (I put 1/30/16 instead of 1/30/17).  Thanks to Greg Caporaso  for spotting the mistake.

GCMP: Visit to Penn State and First look at Australia Data



We recently had a great trip to Penn State, where we visited with Mónica Medina  and her group. Ryan McMinds, Becky Vega Thurber and I headed out there to work with Mónica’s group on the Global Coral Microbiome Project (GCMP).  Mónica, Joe Pollock, and the whole lab were amazing hosts. We stayed in Mónica’s house and had the chance to spend some time with her lovely tia and wonderful daughters.

The overall project aims to understand the microbes living on reef-building corals, which are thought to play key roles in corals’ resistance or vulnerability to environmental stressors like climate change and algal competition.  We are working with the Earth Microbiome Project to assess bacterial diversity in a large global collection of coral samples. In the meantime, we are moving forward with a subset of samples collected in Australia.

For this project, we have enough preliminary data from our previous work and the literature on coral microbiomes to form fairly specific hypotheses.  So we decided to be fairly formal about framing our key hypotheses, the testable predictions for each hypothesis, and planning ahead of time many of the specific analyses we’d do to test those predictions.

Some of the key questions we’re trying to address include:

  • How do different ‘habitats’ within a coral, such as mucus vs. tissue vs. skeleton differ in microbial community?  We predicted that the microbial community in the coral surface mucus layer (SML), which is a key interface between the coral and its environment, will be more strongly influenced by local environmental factors than the microbial community within coral tissues. We predicted that the tissue community would be more driven by the evolutionary history of the coral.
  • Have distantly related corals with similar life-history strategies converged on similar microbiomes?  We are testing a number of concrete predictions in this area for features ranging from the abundance of microbial antibiotic production pathways (we predict there will be more in stress-resistant corals) to the extent of inter-individual variability in different types of corals.
  • Can we identify clear cases of co-evolution between corals and their microbes?  A key prediction of the coral holobiont theory is corals and their bacteria are symbiotic partners that have co-evolved over long periods of time.  This is a challenging idea to test, but our sampling scheme was designed to have enough power to try to address these questions.  We’re first assessing what bacteria, archaea and Symbiodinium lineages are found in most or all of our coral specimens, and will then move on to evolutionary analyses of these groups. 

These are just a few of the ideas we’re kicking around at the moment. Any of these predictions may very well be incorrect, and we’re happy to find that out – mostly we’re excited to have data in hand, and grateful to the large network of coral scientists that helped us get to the point where we can start testing these predictions in a more  definitive way than previously possible. The results from some of these evolutionary and ecological questions will help to inform model-building in later stages of the project, where we are going to test whether incorporating information on microbial diversity can improve models of which coral species are vulnerable to disease and bleaching.

During our visit, we worked with Joe Pollock to start analyzing DNA sequence data for this project, and with Styles Smith to connect these data to ongoing bacterial genome sequencing efforts. Having everyone in the same room proved to be very useful for advancing the project rapidly. We now have Symbiodinium (ITS2)  data for most of these samples, and bacterial/archaeal  (16S rRNA gene) data for all of them. Although long OTU-picking and beta-diversity runs ate up the first few days, we were able to summarize this sample set into some nice tables for publication; conduct basic quality-control, OTU picking, and core diversity analyses; revise our taxonomic annotations of Symbiodinium diversity (more on this later); set up our organizational system; and got a preliminary look at what the data are telling us about our four or five of our key predictions.

Right now we’re working on a Dropbox model, with standardized folders (input/output/procedure subfolders) for different sub-analyses, and using IPython notebooks  or bash scripts to record analysis procedures.  So far this is working fairly well, although it helps being in the same place to rapidly coordinate what’s happening in each folder, especially if multiple people are contributing to the same analysis. Disambiguating those types of synchronous contributions is a place where a more formal version management system like GitHub could be advantageous.  For example the Earth Microbiome Project is coordinating their analysis in this way (see here). We may still go there, but for now the team is small and connected enough that we might avoid the overhead.

In any case, many thanks to everyone in Mónica’s group for a great visit. As this analysis matures,  I’ll try to write more on approaches we’re taking to connect our microbiological data to coral functional traits and life-history strategies.

Short Teaching Module: Perspectives on Microbial Community Change in Health and Disease

I recently put together a short interactive teaching module on microbial community change in health and disease.  Students reacted well to the exercise, so I thought I’d share it here (see below for materials). The basic idea is that after a short introduction getting folks excited about the microbiome, students break up into small groups and try to figure out what kinds of community changes might underly some disease scenarios.  The group then discusses these ideas together, and relates each scenario to a real example.

The main goal of the lesson is to introduce many core ideas in microbial ecology, like alpha-diversity, beta-diversity, richness, evenness, etc. in a very short period of time, using examples that will be relevant to many folks. A secondary goal is to introduce the utility of tables of bacterial abundance across samples for sorting out these different patterns.  A natural follow-on would be to actually convert those tables to electronic form and have students use them in microbial community analysis tools, or write their own python scripts to quantify these patterns (which could be improved with statistical tests later on).


So, for example, this first example illustrates a case where there is one microbe (the red dots) that is present in all the diseased samples, but none of the healthy ones. This might represent a classic bacterial disease caused by a single pathogen, and a likely candidate for fulfillment of Koch’s Postulates.

Several of the easier scenarios, like this one, also have microbe-microbe interactions embedded in them. For example, in the above scenario, the aqua and blue microbes are strongly negatively correlated in samples from both healthy and diseased patients. These bonus patterns can give groups that quickly get a solid idea about what might be causing disease something more meaty to explore while other groups continue thinking about their scenarios. They also introduce an alternative way of looking at microbial communities that will be important later on. Finally,  these microbe-microbe interactions also help illustrate the utility of tables for spotting and quantifying patterns in microbial communities.


In this case, a table of the microbes across samples isn’t really needed to see the main trend with disease (red dots, bottom row) , but might help identify microbe-microbe interactions that are harder to spot visually.  Here, the aqua and blue microbes (2nd and 3rd rows) trade off in abundance across samples.

Here’s a handout summarizing the different scenarios from the lesson:



Here are the lesson materials:

Overview and GuidePerspectives_on_Microbial_Communities_Exercise_Overview_r2

Lesson Powerpoint (via SlideShare)[link]

Handouts: All handout files, including the original Illustrator files (16 MB download) as a single zip. [link].   The files includes all of the scenarios without tables, their associated tables alone (basically an OTU table for each scenario), or the diagram with the table. The summary handout image is also included.




US-Indonesia Kavli Frontiers of Science Videos

This summer, I had the amazing opportunity to travel to Makassar, Indonesia to attend the US-Indonesia Kavli Frontiers of Science Forum.  This conference is unique in that it features U.S., Australian, and Indonesian scientists from diverse disciplines.  So for example, you might -purely hypothetically *cough*- be giving a talk on coral microbes, and get some really interesting questions about microbial metabolism and detecting alien life in (very) remote sensing data from Jason Rhodes, from NASA’s Jet Propulsion laboratory.  I found it to be a very refreshing chance to interact with a broader range of scientists than I might normally.

The conference also had an element of diplomacy to it – it is one of several scientific exchange programs initiated to follow through on the commitments laid out in Obama’s 2009 Cairo speech, which called for greater cultural and scientific exchange with predominantly muslim countries.

The Amirul Mukminin Mosque (Masjid Amirul Mukminin) in downtown Makassar, seen from our conference hotel.

Certainly our hosts were incredibly kind and generous, and the folks I interacted with both off and on the conference site were very welcoming. It was great to meet Jamaluddin Jompa, who was the first Indonesian scientist to give a keynote at the International Coral Reef Symposium, and many young Indonesian scientists. I also found Bahasa Indonesia to be an incredibly fun and approachable language – though unfortunately I don’t think there will be many opportunities to practice it around here. My fledgling attempts were enough to egg Mónica Medina into including a short paragraph in Indonesian in here opening statement- so I’ll count that as a win.

In any case, I’m revisiting this older trip because the conference has recently posted video from all of the talks.  These are all intended for a broad audience, and are generally quite approachable.  Be warned, they have archives going back several years, and its pretty easy to burn an afternoon checking them out.

My talk on coral microbiomes focused on a long-term field experiment studying the effects of nutrient pollution and overfishing on corals and their microbes in the Florida Keys (Vega Thurber and Burkepile labs), and the Global Coral Microbiome Project, which seeks to characterize the microbial diversity of evolutionarily diverse corals from many sites around the globe (Vega Thurber and Medina labs – with help and collaboration from many, many others).

Kim Ritchie’s intro on benefical microbes in the ocean is here:


A menu with all the talks can be found on the Kavli Website [link]. I would particularly recommend the talks by Vikram Ravi on supermassive black hole evolution; Christopher Mores on his experiences running an Ebola clinic; Maxime Aubert on the discovery of the oldest dated cave paintings in Indonesia; and Enid Montague on developing apps to improve hospital visits.  Kiki Vierdayanti also gave a very interesting talk on x-ray emissions from black holes, with some interesting comments on what its like doing astrophysics as an Indonesian researcher.

Enjoy the videos, and if you ever have the chance to attend one of these I would strongly encourage it.




Peter Norvig’s iPython notebook on probability

By way of Daniel McDonald, I recently came across Peter Norvig’s  iPython notebook exploring probability. Peter Norvig is the director of research at Google, and his courses tend to do a great job of breaking down complex concepts into digestible ideas and clean code.  This notebook is no exception.

The notebook explores basic probability and the Monte Haul problem using some straightforward code for exploring sample spaces.  It then extends this code to deal with some statistical ‘paradoxes’ like the Two Child Problem. Some of the most interesting parts of the discussion hinge on the sample spaces that would be required to make particular results true.

In any case, if you are interested in statistics and/or python, this is a good read.   If you like the notebook, you may also be interested in checking out his free online course on programming principles (h/t Justin Kuczynski), or “Artificial Intelligence: a modern approach”, his canonical text on artificial intelligence, which has accumulated a few* citations.


Live map of Global Coral Microbiome Project Sample Photos

A live map of samples collected for the global coral microbiome project is now up on the Vega Thurber lab web page.  Ryan McMinds linked together the more than 3,400 raw sample photos for the project, which are on Flickr, with their overall map coordinates.  (You can also find pretty reef photos and some tourist pics of down time from the expeditions).

The fidelity of GPS and satellite imagery being what it is, I’ve found the map to be useful in getting a broad sense of where samples were taken in relation to broad geographic features, and also the detailed surroundings of each coral.

Reunion GCMP sampling locations
Sampling locations around Isle de la Reunion for the Global Coral Microbiome Project

For example, the proximity of some corals to town is easy to see.

Detail shot of one sampling location from Isle de la Reunion
A more detailed view of samples from the West coast of Isle de la Reunion.

From these local maps we can drill down directly to the corals themselves.


One main theme in the project overall is to share data with the community.  We thought that we’d start by uploading the raw sample photos to Flickr, and organizing them in ways that make them accessible. This also has added benefits for the team, including making photos easier to share within the group.  Coral identification (especially at the species level) is notoriously challenging.  Sharing photos online enables feedback from the community (feel free to drop us a line in the comments) and from collaborators that specialize in coral systematics.  The photos are also linked to sample ids for molecular data, making it easy to look up the surroundings of a particular coral.  Finally, Ryan has  used hierarchies in Light Room to make it easier to manage tags linked to coral taxonomy and location.

We’re looking forward to using tools like these to make our lives easier as we work to characterize the relationships between corals and their microbes.