Abstract
The past decade has seen an increase in public attention on the role of campaign donations and outside spending. This has led some donors to seek ways of skirting disclosure requirements, such as by contributing through nonprofits that allow for greater privacy. These nonprofits nonetheless clearly aim to influence policy discussions and have a direct impact, in some cases, on electoral outcomes. We develop a technique for identifying nonprofits engaged in political activity that relies not on their formal disclosure, which is often understated or omitted, but on text analysis of their websites. We generate political activity scores for 339,818 organizations and validate our measure through crowdsourcing. Using our measure, we characterize the number and distribution of political nonprofits and estimate how much these groups spend for political purposes.
While political science has focused much of its attention on campaign contributions by political action committees (PACs), recent spending on politically related activity by nonprofits may be greater in magnitude. For example, in 2012 the Planned Parenthood Action Fund, a nonprofit advocacy group, spent twice as much as Planned Parenthood Votes, its PAC counterpart, in the entire 2012 cycle (Planned Parenthood Federation of America 2012). Similar examples can be found across the political spectrum. Although nonprofit organizations—groups organized under section 501(c) of the U.S. Code—are subject to limits on their political activity, many spend significant amounts of money to promote their political opinions. Anecdotal and journalistic evidence suggests that in the last few years a growing number of donors with political aims have begun channeling donations through nonprofits in lieu of, or in addition to, direct contributions to candidates or PACs. Since donors to nonprofits need not be disclosed, some have referred to this channel of influence as “dark money.” Failing to observe these cash flows threatens the validity of research on special-interest politics.
Formal studies have not been conducted, but there is good reason to think that nonprofits do not adequately disclose their political activities. Evidence from related activities such as lobbying and campaign contributions suggests that political actors often disclose the minimum allowable by law and may even introduce errors into their reports to make transparency more difficult (LaPira and Thomas 2014). Legal requirements to disclose vary by the type of activity and the type of nonprofit, and in many cases they are not very strict. The vast majority of nonprofits do not need to disclose their donors, and approval procedures have often been pro forma. In 2013 the Internal Revenue Service (IRS) became concerned about the political activities of nonprofits and sought to initiate reviews, but the agency lacked an effective means of identifying which organizations to review among the many thousands. Its approach generated a scandal when it targeted political nonprofits using key words (like “tea party”) in organizations’ names (Drawbaugh and Dixon 2014).
Whether or not a nonprofit was originally created with the intention of engaging in political activity, it may over time develop political aims. Since many nonprofits aim to shape the world toward their conception of the common good or the good of their members, it is only natural that at some point they would consider political means of doing so, whether to further their goals or simply to protect themselves against possible political threats. Recently a number of such activities have been documented illustrating the increasing role played by nonprofits in politics. For example, in the 2012 campaign more than $300 million in dark money was spent by nonprofits directly aimed at political campaigns, despite not legally being identified as such (Maguire 2014a). The overall trend suggests that spending by such groups has grown much more rapidly than other forms of political spending in the past decade (Maguire 2014b). New forms of funding for nonprofits have even included foreign governments specifically trying to influence U.S. policy outcomes through their 501(c) grantees. Unlike a lobbyist acting on behalf of foreign entities, these nonprofits do not have to register their funding source (Lipton 2014).
Understanding the role played by such undisclosed funding is naturally difficult since it is not obvious how to identify the relevant actors. Using machine learning algorithms and text analysis, we identify which groups engage in political advocacy. This is challenging insofar as many groups seek to obfuscate or understate the extent to which they operate in a partisan or political manner. Out of 339,818 nonprofits that filed in 2012, we identify those with a political focus by using information released by the IRS as well as a new data set we collected of text scraped from the website of each organization. Although we presume that political organizations strategically choose the text content of their websites, we also presume that our text-mining algorithm can identify subtle clues that nonetheless classify political organizations as such. By calibrating against a subset of known political organizations, we are able to pick up the features that correspond to political activity. We then validate this claim through crowdsourcing: we have independent third-party coders identify whether a random sample of organizations are political.
This analysis allows us to estimate interesting quantities relevant to the U.S. political landscape, such as the aggregate political activity by these statutorily nonpartisan organizations. We present an array of descriptive analyses of these organizations across issue area, type, and geographical location and, given assumptions about general trends, provide estimates of politically adjusted revenue (PAR)—the part of nonprofits’ revenue that is devoted to political activity. Such a comprehensive analysis of the political behavior of nonprofits has not to our knowledge been attempted. Our results suggest that even a conservative approach to estimating the value of nonprofit political activity shows it to be quite substantial. Future research should examine the role of this spending in special-interest politics and in political mobilization.
POLITICAL ACTIVITY BY NONPROFITS
Not surprisingly, given that the role of nonprofits in campaigns has been identified as significant only recently, the existing academic literature does not address the political role of nonprofits directly.1 There is naturally considerable scholarship on the role of money in political campaigns and the strategies and actors involved in generating such funds. In this section, we review some of the reasons why nonprofits play a unique role in political campaigns and issue advocacy, starting from the fact that nonprofits face different disclosure requirements. We also review anecdotal accounts of actors strategically using nonprofits to avoid disclosure, and we consider the possible relationship to political polarization as well as the broad ways in which nonprofits may engage in political work beyond federal election campaigns.
An obvious concern about the role of money in politics is whether it inhibits fair competition or simply allows political actors to express their views. Empirical studies using variation in state laws suggests that the public is interested in passing such rules because contribution limits may indeed promote competitiveness (Stratmann 2010). To the extent that competing parties or interest groups stand to benefit or gain differentially from such regulations, however, it is natural that political actors would seek to influence such rules.2 Indeed, the new role played by nonprofits is only one of the latest ways in which the U.S. political landscape has been transformed by the introduction of new forms of giving and spending. Attempts to regulate campaign spending began in the United States in the late nineteenth and early twentieth centuries but only took hold in 1972 with the enactment of the Federal Election Campaign Act of 1971. These rules have been modified repeatedly, including by Congress, the courts, and the Federal Election Commission (FEC). In 1979, for example, the FEC opened the door to “soft money”—money that is not given to an individual federal candidate and for which restrictions on donation size are relaxed.
Another approach to campaign finance involves reporting requirements rather than limits on spending. Such disclosure requirements are important because they may affect voters’ perceptions. A study of soft money and issue advocacy found that voters are not well informed about who is responsible when this money is used to fund advertisements and that being given this information changes their reaction to the advertisement and the election (Magleby and Monson 2004). This suggests that the issue of whether donors are disclosed is substantively important, and indeed the difference between disclosing and nondisclosing groups has recently become part of the public debate. With respect to the political role of nonprofits, most 501(c) organizations are not required to disclose their donors.3 These organizations consist of everything from hospitals to universities to labor unions to think tanks. The Supreme Court ruled in 1958 in favor of such protections of anonymity in NAACP v. Alabama because individuals might fear that disclosure of their political beliefs would lead to personal reprisals.4
More recently, this protection has been used by nonprofits to avoid transparency laws adopted as part of the Bipartisan Campaign Reform Act of 2002. Crossroads GPS, a conservative 501(c)(4) organization, for example, spent $190 million overall as reported to the IRS in 2012, but reported to the FEC that only around $70 million of that was election-related spending (Edsall 2014). The FEC does not require that “educational” activities, or activities meant to “persuade,” be reported, nor does it put any limits on this spending. The role of such spending, conceptualized as “outside lobbying” by Ken Kollman (1998) to reflect its relationship to attempts to influence Congress directly, may be to mobilize group members, to reveal high levels of public support for a measure, or to act as a costly signal. To evaluate such theories, the spending must be observed, but nonprofit organizations may be missed by approaches relying on FEC data or surveys.
Observing this spending is also important to understanding how campaign money contributes to or ameliorates political polarization. Insofar as donors are making a strategic choice to contribute either anonymously or publicly, we should not expect that the subset who make public contributions will be representative of all donors. Although it is clear that a significant amount of money is funneled toward indirect public advocacy, there is no systematic way to measure this activity at this time.
Our estimates of the politicality of nonprofits allow us to compare total nonprofit income by type and geographic region. Although nonprofits have been spending money for political purposes for decades, they spent less than $15 million per cycle until 2008, according to the Center for Responsive Politics (CRP). In the 2008 cycle, nonprofits expanded their involvement in “issue advocacy,” which avoids being labeled as electioneering by carefully avoiding any reference to specific candidates (even when advocacy for a certain perspective on an issue clearly favors one candidate over another). By the CRP’s estimate, in the 2008 cycle nonprofit spending jumped to more than $70 million, and then to almost $300 million in 2012 (Maguire 2014a). Of course, restrictions are placed on nonprofits’ engagement in political behavior, but the boundary between what is allowed and what is not is unclear and being expanded. Enforcement of these rules is also an issue since the FEC, comprising equal numbers of Democratic and Republican members, has been deadlocked over rulings on constraints on political activity.5
DATA AND METHODS
The first step in identifying political nonprofits is specifying how “political” will be defined, but establishing a comprehensive definition is beyond the scope of this paper. The question has been a concern of political philosophy from at least Confucius and Plato up to contemporary debates about feminism, liberalism, and communitarianism. Since we are primarily concerned with issue advocacy and electioneering, our working definition is based on Supreme Court rulings and the FEC requirements for disclosure, and it focuses on specific observable characteristics related to political activities.
In particular, electioneering advertisement has been defined in Buckley v. Valeo as relying on the so-called magic words test: “express advocacy” is relevant to the FEC only if it uses particular phrases such as “vote for.”6 References to individual candidates thus do not imply that speech falls into the more heavily regulated category of express advocacy. We broaden this definition to include all references to a current political office-holder or candidate. This is important since we are directly looking for the kind of deregulated political speech that tends to fall under the broad auspices of “issue advocacy” wherein political issues and candidates are discussed, but without explicit calls for support through the “magic words” of post-Buckley express advocacy. As such, we must include a nonprofit as political if it makes reference to proposed or current legislation or regulation if we are to pick up this kind of issue advocacy.
Finally, we include the most strictly regulated form of speech—that which expressly promotes or opposes political candidates or policies. In essence, this final category picks up what is regulated by the FEC as “political” under current law. Thus, our definition is wider in scope than current law insomuch as it includes electioneering and issue advocacy (in addition to simple express advocacy).
The main methodological goal of this project is to identify 501(c) organizations that are political. Since there is no comprehensive method for attaining a list of the agendas of every nonprofit organization in the United States, we rely on an approach that imputes the politicality of organizations through their descriptions of themselves online. This approach grows out of a substantial literature on the scaling of political texts but is novel owing to the unique challenges of this environment (Grimmer and Stewart 2013).
Unfortunately, we do not have a simple sampling frame of texts to use. There are no well-defined manifestos for each organization, nor does every organization have a simple corpus of writings and publications. To address this problem, we rely on a novel method of collecting texts to use. Vast quantities of text are available on the Internet about the majority of the organizations in our study population. Our goal is to gain a rich set of texts on which to ground an analysis by matching organizations with their web pages. This matching task is difficult, however, since there are no comprehensive listings of the web presence of nonprofit organizations. We use the Yahoo! BOSS Search API to rapidly perform a query for the name of each organization in our data set and retrieve the fifty best URL results, in JSON format. We then scrape the web text at these URLs and clean the HTML using Beautiful Soup in Python. In general, we use only the top result, but when this is unavailable, we use the second-best result.
Classifiers are machine learning algorithms that seek to distinguish between two or more classes (in our case, political/nonpolitical). These algorithms take a set of data with ex ante “labels” indicating class membership and learn how best to use other “features” of the data (such as words) to predict these classes. Since we do not have clean training labels (categorizations as political or not) for even a subset of organizations, our labels are almost certainly measured with error, even though we take a supervised learning approach. We adopt two methods to create labels, both of which frequently label political groups as nonpolitical. The first method is simply to examine the names of organizations and label them as political when they include one of a set of keywords.7 This approach is very basic, but it provides a good number of effective matches. Moreover, this approach reflects the controversial method used by the IRS to target groups (predominantly conservative, according to critics) for audits of political activity. This approach gives us 3,255 groups labeled as political, with the remainder tentatively labeled as nonpolitical.
Our second labeling scheme is to use responses to questions asked by the IRS on the tax returns of nonprofits (made available to the public by the IRS). These questions are binary choices as to whether they influence legislation, engage in propaganda, or try to influence public elections. These criteria are incomplete, however, since political nonprofits do much of their political work through direct issue advocacy, which need not be reported to the IRS. That is, only organizations rated as political according to our third criterion would be included in this scheme. Probably the lion’s share of political nonprofits answer “no” to these questions. Nevertheless, this approach provides us with 8,343 labeled political organizations. In addition, we collect the web pages of the full population of 10,921 political action committees registered with the FEC and label each of these as political to supplement our training data with groups known to be political.
Altogether, we have 435,495 groups in our sample for which we have names and employer identification numbers. Of these groups, we identify 339,818 groups with valid websites.8 Our approach is based on the assumption that groups with website text similar to that of groups labeled political are likely to be political. By adjusting the penalty associated with making certain types of error, we embed knowledge about the direction of errors in our classifier. In other words, we embed an expectation that organizations labeled as nonpolitical are much more likely to be political than organizations labeled as political are likely to be nonpolitical. Thus, our algorithm will produce a range of scores that assign some probability of being political to nonpolitical groups but are more likely to uncover political groups that would otherwise go unidentified. We weight each class of labels (political/nonpolitical) by the inverse of their ubiquity in the initial data. Thus, we take an organization’s use on its website of language similar to that of a known political organization as a strong signal that it should be properly classified as political.
It is useful for understanding our contribution to compare our approach to how the Center for Responsive Politics imputes the politicality of nonprofits. The CRP focuses on the largest groups that disclose federal political expenditures to the FEC and then uses these groups to trace other associated nonprofits.9 We instead develop an index of politicality based not on disclosure to government agencies but on actual public-facing behavior. There are a number of benefits to this approach. First, it provides a broader understanding of the political behavior of nonprofits by considering spending that need not always be reported to the federal government, such as spending on local or state politics, issue advocacy, turnout mobilization, and policy research. Since much of the concern surrounding dark money centers on the paucity of disclosure requirements, it is important to develop tools that do not rely on disclosure. Our machine learning approach contributes to other types of text analysis involving politicians’ speeches and statements, legislation, and the news media.
To simplify, our approach seeks to understand which organizations look most like our labeled political organizations on the basis of what they say. Loading the entire corpus into computer memory is not a feasible solution for data of this size, so our model choice is guided by the availability of appropriate online machine learning algorithms, which need not be trained all in a single call but can instead be called progressively on small portions of the overall data. To compare to common methodologies used by political scientists, consider Wordscores or Correspondence Analysis, two common tools to capture latent dimensions of textlike data (Lowe 2008; Greenacre 1984). These methods cannot be divided into steps that utilize only part of the full data and then update iteratively. Instead, training must be performed all at one time. Similarly, traditional support vector machines struggle in classification problems with many observations, despite their ability to deal with high-dimensional feature representations.
Our model consists of a naive Bayes classifier trained iteratively on our labeled training data (Hastie, Tibshirani, and Friedman 2009; Rennie et al. 2003). Naive Bayes classifiers, though somewhat rudimentary, provide simple and rough means that are often sufficient for good classification (Zhang 2004). Naive Bayes provides a basic but imminently scalable solution to text classification. Mathematically, naive Bayes can be seen as a sort of linear regression in log space of labels on word frequencies.10
Naive Bayes is trained iteratively in mini-batches of 1,000 documents in Python using Scikit-learn (Pedregosa 2011). This generates the predicted probabilities of being political for each organization, which are used prominently in the analyses to follow. Groups receive high predicted probabilities when the text of their website uses language similar to that of a group initially labeled as political. Although this is expressed as a probability, given that our initial training data are imperfect, these predictions should not be interpreted as the probability of an organization being political but rather as an index determining the similarity of its language to that of labeled political organizations. For this reason, we refer to this measure hereafter as a “probability index,” or simply a “measure.”
To extract a binary measurement of politicality, the next section evaluates the accuracy of the measure and derives the appropriate threshold at which to divide political from nonpolitical organizations using crowdsourcing.
VALIDATION
Having generated a measure of whether a nonprofit is likely to be political, we demonstrate that this measure is not the same as our initial training labels and further that this measure accords with what humans reading the text of the website would believe about its political content. To begin we note that, while it makes little difference whether we create our training labels using the IRS-based or the keyword/names-based approach, the fact that the politicality measure we generate is quite different from the initial training labels suggests that it adds value over a more naive approach. The correlation between our measure using IRS labels and our measure using name-based labels is 0.89, while the correlation between the training labels themselves is 0.83. These similarities end, however, when we compare the output of naive Bayes to the training labels. The correlation between training labels and our generated IRS-based measure is 0.25 (and 0.17 for the name-based measures). This means that the measures we produce using naive Bayes are indeed quite different from the initial labels, though still similar to each other. That is, our measure identifies a substantially different set of organizations than do the reporting standards under the existing regulatory regime.
To get a sense of how our naive Bayes model distinguishes between political and nonpolitical organizations we can look at the loadings on words to understand which of them provide the most leverage. The top one hundred most political and nonpolitical words for each model are presented in table 1. The most political words for both the model using name-based labels and the model based on IRS reports refer to what is clearly political in nature: for instance, partisan politics (democratic, republican, conservative, liberal), political institutions (congress, house, senate, fec), and other political actors (committees, pac, obama). Nonpolitical words refer to what is nonpolitical in nature, such as religion (church, ministries, baptist, christ), social societies (fraternal, league, elks, legion, rotary), education (elem, educational, school, scholarship, pta, students), and charitable organizations (museum, grants, grantmaking, volunteer, nonprofit, foundations, foundation, trust).
Next we consider how this approach measures up to human coding. An ideal evaluation would be based on a careful review of each nonprofit’s activities and expenditures. Given resource constraints, our approximation of this was to use crowdsourcing to evaluate whether 408 nonprofits engage in political activities. Contributors were given the name of a nonprofit (taken from the IRS manifest of nonprofit filings), asked to find the website of the organization through an Internet search, and then asked to respond to five questions.11 We first asked if they were able to find a website unique to that organization, and then whether they found a reference to a political issue, to an elected political leader or candidate, or a political activity such as a get-out-the-vote effort.12
The nonprofits were chosen based on a stratified random sample to increase the proportion of political nonprofits that were evaluated. One-third of this sample were from the least likely to be political (< 0.8), one-third from the moderately political (0.8 to 0.9), and the remainder from the most likely to be political (> 0.9). The evaluation of politicality was undertaken not by the researchers directly but through crowdsourcing, using the CrowdFlower service. Among the advantages of crowdsourcing are scalability, speed, the “wisdom of the crowds” arising from multiple coders, and the inability of researchers to bias the results in their favor.
On the other hand, especially if not well supervised, crowdsourcing can suffer from a lack of sophistication or attention by contributors.13 Many different algorithms have been proposed for aggregating the results of crowdsourced data. Kenneth Benoit and his colleagues (forthcoming) provide a review of such methods and suggest that for their purposes a simple averaging approach is roughly comparable to more complex methods. Given that the CrowdFlower screens responses based on an adaptive algorithm, our approach is simply to use majority vote among the evaluators who passed the screening process and evaluated the data. Thus, we label a nonprofit as political if a majority of the CrowdFlower contributors identified any of the three indicators—mentions of political issues, political leaders or candidates, or political activities.
Of the fifty-five people who attempted to contribute to the project, eighteen did not pass the initial screening, which is aimed at removing contributors who simply guess randomly. Those who did pass provided 1,442 trusted judgments. Each organization was reviewed by at least three different individuals. Contributors seem to have been reasonably satisfied with the clarity of the instructions, the pay, and the quality of the test questions: the nineteen contributors who chose to evaluate our study rated it as 4 out of 5 on average. For the following analysis, we generally label as political those nonprofits that receive a rating of 0.99 or higher on our index, but we present results across other possible cutoffs on the theory that the success of one’s intended use depends on a willingness to trade off Type I and Type II error.
We begin with a very simple sanity check: are there more political nonprofits at higher levels of our index? Figure 1 shows that this is indeed the case: the proportion of political organizations increases from 10 to more than 40 percent as our measure increases. This suggests that some political nonprofits are not correctly labeled as such by our measure. We examine this possibility by looking at the operating characteristics of our measure. To provide a baseline we compare our scores against the (name/keyword) labels we used in training, based on self-reporting to the IRS and keyword searches using the organizations’ names, as described earlier.
A good measure not only is good at identifying political groups (that is, it has few false negatives) but also does not falsely report a nonpolitical group as political (it has a low false positive rate). With this in mind, in figure 2 we present two measures of evaluation for the training labels and our two measures (trained using the labels from name keywords and from self-disclosure to the IRS) and consider how well the measures perform across different cutoff levels for our politicality index. Evaluations of sensitivity (the number of true positives divided by true positives and false negatives), on the left, suggest that our measure does considerably better than our training labels in picking up organizations that are indeed political. That is, conditional on an organization being political, our algorithm does a substantially better job of predicting this politicality.
Evaluations of specificity (the number of true negatives divided by true negatives and false positives) appear on the right side of the figure. This result implies that, conditional on being nonpolitical, our algorithm is able to retain most of the power in detecting nonpolitical organizations. Given that fewer than 25 percent of all nonprofit organizations are political, guessing that no organizations are political can ensure a high specificity; this is why the initial training labels (which label only half a percent of organizations as political) do well, as seen on the top right. The IRS- and name-based methods are fairly similar on all measures, though the model based on name-based training labels performs slightly better. This similarity is due to the large number of PACs included in the training data as examples of political organizations.
The key takeaway is that, even with a high threshold, we are able to greatly improve on recall without overly sacrificing on specificity. That is, we can identify a lot of political organizations with few false positives. In using common machine learning metrics to balance sensitivity and specificity, we ultimately chose a threshold of 0.99. For the IRS-based model, we correctly classify 85 percent of political organizations, whereas our training labels based on self-disclosure to the IRS identify only 8 percent. We incorrectly classify more groups as political than the training labels do, but only because the IRS rarely labels any group as political. Given the challenges of this classification task, false positives are unavoidable. Our model still correctly identifies nonpolitical groups 57 percent of the time. The IRS labels have very high specificity (about 98 percent), but note that guessing that all nonprofits are nonpolitical will ensure that 100 percent of nonpolitical nonprofits are identified as such, yet the measure will have no utility.
Finally, we note that while the CrowdFlower approach is useful for validating our measure on a small data set, it is not feasible as a replacement for our method. The cost of obtaining three ratings for each organization in the validation set (including some by coders deemed untrustworthy and subsequently discarded) averaged $0.43 per organization. Assuming fixed marginal costs, scaling this to the 339,818 organizations we analyze would cost about $146,121.
HOW MUCH IS SPENT AND BY WHOM?
For the first aggregate look at the politicality of nonprofits, we examine the politicality of strata determined by the subsection of the U.S. code under which organizations are organized and program areas are operationalized through the National Taxonomy of Exempt Entities (NTEE) codes. Examples of these codes are “Educational Institutions,” “Crime, Legal Related,” “Animal Related,” and so forth. There are twenty-six distinct broad classification codes, and each one is associated with a number of subcodes that identify more specifically the type of work a nonprofit does. This taxonomy is self-reported. Approximately one-third of the organizations in our study are not given NTEE codes in the data released by the IRS, and so they are excluded from any subsequent analysis that requires these codings.
Figure 3 provides a sense of the program areas and subsections that most often tend to be political. Each axis is ordered such that, marginalizing over the other axis, the categories become increasingly political as they get farther from the origin in the lower left. Thus, veterans’ organizations and labor unions are the subsections with the highest proportion of political organizations. Likewise, social science research institutes and civil rights, social action, and advocacy organizations have the highest rates of politicality among NTEE classes. Unshaded strata indicate an insufficient sample size (fewer than 50 groups) to demonstrate meaningful patterns in the data.
It is informative to specifically examine the distribution of political groups organized under subsection 4: social welfare organizations. Crucially, organizations dealing with the environment that are organized under this subsection are very likely to be political in nature, as are organizations classified as civil rights, social action, or advocacy groups. This is unsurprising. A conservation group that works to, for instance, maintain parks or trails will find it beneficial to organize under subsection 3 to gain tax advantages for donors (who can deduct contributions from their own taxes). However, such benefits come with increased scrutiny and decreased freedom to take overt political action (such as donating to political candidates’ PACs). It is natural, then, for groups concerned with environmental policy to organize under subsection 4, which allows for broad flexibility to exert political action. The increased politicality of environmental 501(c)(4)s relative to 501(c)(3)s bears out this story. This provides some initial evidence of organizational movement between subsections based on the scope of a nonprofit’s activities.
To get a sense of the geographic distribution of political nonprofits, we examine the states in which organizations are headquartered. This is, of course, a rough measure, given that national organizations are likely to be headquartered in New York City or Washington, D.C., yet are also likely to be interested in policy outcomes throughout the country. We examine this geographical distribution in figure 4. In general, more organizations exist where there are more people, and the figure thus resembles a population map of the United States.
To look at this in a more fine-grained way, figure 5 shows the number of groups in a given state for every 100,000 individuals in that state. A number of features stand out. First, very low-population states in the Great Plains, such as Montana and Wyoming, appear to have a relatively high number of political organizations given their size. The Northeast dominates the map with more organizations per capita than most of the rest of the country. Unsurprisingly, Washington, D.C., also has a large number of organizations relative to its population, possibly as a result of nationally oriented nonprofits being centered in the Northeast combined with a base level of nonprofit political activity in every state. That is, certain organizations (such as veterans’ organizations, Elk lodges, and some advocacy groups) tend to have at least one local chapter in each state but add additional registered nonprofits in response to state characteristics other than its population.
We use our measure to create an estimate of the political money spent by nonprofits, or politically adjusted revenue. This estimate relies on three key elements. We begin with revenue data provided for each nonprofit by the IRS. We then use our index of the likelihood of each organization being involved in politics, as validated by the crowdsourced evaluations.14 Finally, we need an estimate of the percentage of each nonprofit’s budget that goes toward political activity. We have no particular data from which to make this estimate, nor is there any clear way to do it, so we simply use a figure of 1 percent and invite readers to adjust our estimates according to their prior beliefs. For example, those who believe that nonprofits spend 0.1 percent (or one-tenth-of-one percent) of their budget on political activities can divide the numbers in figure 6 or figure 7 by 10. We expect that some organizations spend less than 1 percent on political activities, while others spend more. For example, 501(c)(3) nonprofits are allowed to spend between 5 and 20 percent of their expenditures on lobbying without jeopardizing their nonprofit status.15 While not representative of nonprofits as a whole, Crossroads GPS spent nearly half of its budget on direct political expenditure and much of the remainder on grants to other political organizations. Given that organizations’ budgets are included in our estimate of PAR conditional on being classified as political through our algorithm, we interpret PAR as a conservative estimate of the actual political expenditure of nonprofits.
Furthermore, we assume that three types of organizations spend none of their budgets on politics. We exclude all NTEE-coded health or general rehabilitation organizations (like hospitals), educational institutions (like university endowments), and human services organizations (like the Red Cross). These organizations probably spend some amount of money on politics, but the sheer amount of revenue they generate makes it likely that they devote a much smaller fraction of it to politics than do other organizations in our sample. Including these groups increases our estimates, but by less than an order of magnitude. Finally, we must also assume that any organization for which we were unable to find a website spends no money on politics. This assumption is quite plausible considering that these organizations tend to be either very small or organized as a very closely held trust (and are thus unlikely to be politically engaged).
Figure 6 shows the breakdown of money across strata defined by program area (operationalized by NTEE code) and the subsection of the tax code under which groups are organized. It is important to note that some strata are relatively sparse and that PAR is most reliable when larger numbers of organizations are aggregated. Compounding this difficulty is that NTEE codes are missing for a large fraction of our sample. Nonetheless, a number of patterns stand out. First, 501(c)(3) organizations consistently have the largest PAR. This is not entirely surprising considering the large number and diversity of these organizations. They tend to have larger budgets than groups organized under different parts of the tax code. Among these groups, the two program areas which are associated with the highest PAR are ‘Science and Technology Research Institutes’ and ‘International Foreign Affairs and National Security’. The latter clearly encompasses a political/policy dimension and is unsurprising. The former consists of many industry and professional associations. One example of an organization of this type imputed to be political is the American Society of Mechanical Engineers (ASME), a 501(c)(3) professional organization for mechanical engineers. We impute a PAR for ASME of $400k in 2011, which may seem high prima facie. In that same year, however, ASME disclosed $214k in lobbying expenditures (Center for Responsive Politics 2011). This suggests our estimate is unlikely to be overly high, given that it would include any state-level lobbying and issue advocacy campaigns that ASME may engage in.
It may be surprising that program areas with high rates of politicality are associated with (generally) lower PAR, but not when we consider that they receive relatively less money than do organizations of other types. To demonstrate that these groups still represent significant political spending, we break down the civil rights, social action, and advocacy label (a program type with high rates of politicality) to its constituent subcodes in figure 7 to identify more specifically the issue areas on which these groups focus. The figure demonstrates the distribution of money among organizations with this NTEE label by issue area. It should be noted that PAR is likely to underrepresent the amount of money these groups devote to politics if they devote more than 1 percent of their budget to politics. Civil liberties advocacy groups are estimated to have a much higher PAR than other groups with this NTEE label, which subsumes groups like the American Civil Liberties Union as well as Second Amendment advocacy groups. Since organizations may choose where to place themselves, these categories are not as cleanly discriminating as might be hoped. For instance, many organizations choose to place themselves in the more general civil rights category than a more specific category that might also apply to them. Nevertheless, figure 7 gives some perspective on the relative PAR across issue areas.
Aggregating over all strata provides an estimate of the total PAR by nonprofits of $760 million in 2011. This number may be compared to the Center for Responsive Politics’ estimate of the direct political expenditure of nonprofit organizations of $309 million for the entire 2012 election cycle (that is, 2011 and 2012). It may also be compared to the total amount raised by PACs in the 2012 cycle of $1.4 billion, or the amount raised by candidates’ PACs of $453.3 million (Center for Responsive Politics, n.d.). Although this estimate is not precise, the amount of dark money being spent during modern election cycles is clearly large enough that it must be accounted for to properly assess the role of money in the American political system. It is a striking finding that our conservative estimate of the political activity of nonprofits is in fact higher than extant estimates. That fact can be squared with the CPR’s estimates by understanding the loose regulatory framework in which nonprofits operate. The vast majority of this spending need not be reported.
CONCLUSION
The mechanics of political spending in congressional and presidential races is rapidly changing as laws and court rulings have made possible new approaches and organizational forms. Understanding how such activity redraws the political map requires being able to identify the relevant actors involved and the resources they expend even as such organizations aim to avoid public transparency. Existing approaches based on the names of organizations or self-reported activity vastly underestimate the degree to which organizations involve themselves in politics. We have developed a novel approach to classifying the political status of nonprofits based not merely on their voluntary self-disclosure but also on the text of their websites. This measure provides vital leverage in understanding the scope of this increasingly controversial type of political engagement. In seeking to validate this measure by allowing independent coders to determine whether a random sample of nonprofits are political, we find that, even though our measure is imperfect, it does a significantly better job of determining nonprofits’ political status than using disclosures to the IRS or searches based on nonprofit names.
With these estimates in hand, we provide an overview of the nonprofits involved in political activity by issue area and geography. The relatively recent nature of the changes we describe makes it difficult to know whether the patterns we identify in the last few years will continue. Nevertheless, the political spending of nonprofits is substantial and growing, and we must grapple with it in order to understand interest group politics today.16 Future work ought to identify variation in the type and extent of political engagement among those nonprofits we identify, as well as the relationship of these organizations to other political actors and the strategies of political campaigns.
Acknowledgments
The authors would like to thank the participants in the Russell Sage Foundation “Big Data in Political Economy” conference and two anonymous reviewers for their comments. This research was made possible by funding support from the New York University George Downs Prize.
FOOTNOTES
↵1. For one recent paper that does address the role of nonprofits, specifically in relation to climate change, see Ramey and Rothenberg (n.d.).
↵2. For evidence suggesting that firms do not benefit from soft-money contributions, see Ansolabehere et al. (2004).
↵3. Only 501(c)(3) private foundations are required to disclose their donors.
↵4. National Association for the Advancement of Colored People v. Alabama, 357 U.S. 449 (1958).
↵5. For an earlier example of such deadlock, see Salant (2009). For a recent example, see Gold (2013).
↵6. James L. Buckley, et al. v. Francis R. Valeo, Secretary of the United States Senate, et al., 424 U.S. 1 (1976).
↵7. We use the following stems as keywords: “action fund,” “advoca,” “politic,” “republican,” “democrat,” “conservativ,” “liberal,” “libertar,” “socialis,” “communis,” “constitution,” “whig,” “federalis,” “freedom,” “liberty,” “government,” “progressiv,” “feminis,” “human right,” “public interest,” and “national secur.”
↵8. Some smaller organizations have no independent web presence, and thus search results return only third-party websites that republish the data provided in bulk by the IRS.
↵9. That is, the CRP begins by examining organizations that reported large political expenditures to the FEC (thereby leaving out most spending on issue advocacy, voter mobilization, and state- and local-level efforts). It then notes all grants larger than $25,000 made by or to these political nonprofits. From this, it estimates the indirect funding of politics through the grantees of organizations to provide a sense of the “attributable spending” of these groups. This methodology relies, however, on the disclosure of particular types of political money (to either the IRS or the FEC), which organizations may seek to avoid when such disclosure is not statutorily required. See OpenSecrets.org, “Political Nonprofits: Methodology,” http://www.opensecrets.org/outsidespending/methodology.php (accessed August 9, 2016).
↵10. Words are alphabetic tokens of length 3 to 17. In our classification model, we use “Laplace smoothing,” which adds a small amount (we use a value of 2) to every token’s frequency count, effectively placing a prior on the informational content of very infrequent words and shrinking down their effects.
↵11. Internal Revenue Service, “Exempt Organizations Business Master File Extract,” available at: http://www.irs.gov/Charities-&-Non-Profits/Exempt-Organizations-Business-Master-File-Extract-EO-BMF (updated March 14, 2016); and IRS, “SOI Tax Stats: Annual Extract of Tax-Exempt Organization Financial Data,” available at: http://www.irs.gov/uac/SOI-Tax-Stats-Annual-Extract-of-Tax-Exempt-Organization-Financial-Data (updated May 8, 2015).
↵12. See the supplemental appendix (http://bit.ly/1O77GdG) for additional instructions and precise wording.
↵13. To address this we required that respondents be in the United States, be of the highest competence among CrowdFlower contributors (“highest quality”), and be removed from the set of contributors if they answered questions too quickly. Contributors were also required to maintain at least 66 percent accuracy while participating, as well as to answer six questions as a pretest before their answers were counted as part of the data set. Even after passing this initial screening, contributors could still be judged to be “untrustworthy” based on an algorithm proprietary to CrowdFlower that makes any use of test questions for which the correct answer is known ex ante. These test questions were based on organizations that were clearly either political or nonpolitical, such as the American Energy Alliance (which describes itself as an “organization that engages in grassroots public policy advocacy and debate concerning energy and environmental policies”) and Grand Ledge Area Youth Football Inc. (a small nonpolitical organization that supports a sports league).
↵14. That is, for a given nonprofit, PAR is the product of revenue, an indicator for whether the group is political (greater than 0.99 on our politicality score), 0.4215 (the precision of our score), and the fraction of the average political group’s budget that is actually devoted to political aims. This is not a way to estimate individual nonprofits precisely, but we believe that it provides a rough estimate for the aggregate spending of organizations.
↵15. Internal Revenue Service, “Measuring Lobbying Activity: Expenditure Test,” available at: http://www.irs.gov/Charities-&-Non-Profits/Measuring-Lobbying-Activity:-Expenditure-Test (last updated March 28, 2016).
↵16. For one direction of future work, see Dimmery and Peterson (2014).
- Copyright © 2016 by Russell Sage Foundation. All rights reserved. Printed in the United States of America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Reproduction by the United States Government in whole or in part is permitted for any purpose. The authors would like to thank the participants in the Russell Sage Foundation “Big Data in Political Economy” conference and two anonymous reviewers for their comments. This research was made possible by funding support from the New York University George Downs Prize. Direct correspondence to: Drew Dimmery at drew{at}drewdimmery.com, Department of Politics, Second Floor, 19 W 4th St., New York, NY 10012; and Andrew Peterson at ajp502{at}nyu.edu, Department of Politics, Second Floor, 19 W 4th St., New York, NY 10012.
Open Access Policy: RSF: The Russell Sage Foundation Journal of the Social Sciences is an open access journal. This article is published under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.