Natural and Quasi-Natural Experiments to Evaluate Cybersecurity Policies

Over the past decade, numerous countries around the world have developed and implemented national cybersecurity strategies. Yet very few of these strategies have been subject to evaluations. As a result, it is difficult to judge the performance of strategies, the programs that comprise them, and the cost-effectiveness of funds spent. Natural and quasi-natural experiments are a promising set of research methods for the evaluation of cybersecurity programs. This paper provides an overview of the methods used for natural or quasi-natural experiments, recounts past studies in other domains where the methods have been used effectively, and then identifies cybersecurity activities or programs for which these methods might be applied for future evaluations (e.g., computer emergency response teams in the EU, cybersecurity health checks in Australia, and data breach notification laws in the United States). 

Benjamin Dean
July 04, 2017

Over the past decade, numerous countries across the world have developed and implemented national cybersecurity strategies. Each strategy comprises a set of objectives and various programs to achieve those objectives. Tens of billions of dollars in taxpayer funds have been diverted from other purposes to pay for these strategies. A number of countries’ recent strategies are reviews of previous ones.  

Unfortunately, there are still no definitive answers to questions such as: Have these strategies achieved their overall objectives? Which programs contributed the most to these objectives? By how much (or little)? Where have funds been most effectively spent? What improvements might be made?  

By most accounts, the cybersecurity situation globally is getting worse, in spite of the many measures being taken. There is a real need to improve assessment and evaluation of cybersecurity policies so as to inform and guide policy change.  

With a new generation of cybersecurity strategies now being rolled out, it is timely to consider what evaluation techniques might be employed at the outset, so as to better track the performance of programs and the cost-effectiveness of funds spent. In doing so, public policies might better address the present state of cybersecurity nationally and globally in the future.  

One promising technique for the evaluation of some cybersecurity programs is the use of natural and quasi-natural experiments. These broad groups of research designs and methods avoid the potentially high cost, possible ethical issues, and the impracticality of randomized control trials in a domain like cybersecurity. At the same time, they provide relatively robust measures of the counterfactual and net social/economic impact of policy decisions.  

This paper will start with a background on national cybersecurity strategies. This will be followed by an explanation of common evaluation techniques with a special emphasis on natural and quasi-natural experiments. Finally, the paper will identify instances in which such research designs and methods might most effectively be used to evaluate certain programs that commonly comprise cybersecurity strategies and how such evaluations might be done in practice.  

Cybersecurity Strategies

Over the past decade, at least 20 countries have developed and implemented national cybersecurity strategies.At least five of them have updated their strategies since their first edition (Australia, Czech Republic, Estonia, Netherlands, and the United Kingdom).  

The objectives within these strategies are broadly similar across countries. According to the European Union Agency for Network and Information Security (ENISA) in 2014, the most commonly recurring objectives of the strategies in Europe include: developing cyber defense policies and capabilities, achieving cyber resilience, reducing cybercrime, supporting industry on cybersecurity, and securing critical information infrastructures. These objectives are broadly similar to those in the strategies of non-European countries.  

Countries fund and implement various activities or programs to achieve these objectives. Some activities involve implementing new or revised legislation. Others involve discrete programs such as research and development grants, training, awareness campaigns, or “capacity building” of target segments such as small and medium enterprises.  

Large amounts of public funds are being reallocated from other policy areas, such as education or healthcare, to fund these cybersecurity strategies. For instance, in the United States, an estimated $19 billion is expected to be allocated to cybersecurity measures in the 2017 White House budget proposal.ii While not strictly a national cybersecurity strategy, this level of funding nonetheless amounts to a great deal given the annual totals have exceeded $10 billion annually for the past five years. Australia’s most recent cybersecurity strategy involves a more modest $57.5 million per annum over four years.iii Some strategies, unfortunately, do not mention how much will be spent. Examples include the Canadian Cybersecurity Strategy of 2010 and European Union’s Cybersecurity Strategy of 2013.  

Program Evaluation: A Primer 

Program evaluation is a mainstay of the evidence-based policymaker’s toolkit. In the review of policy evaluation in innovation and technology, the authors define evaluation as “a process that seeks to determine as systematically and objectively as possible the relevance, efficiency and effect of an activity in terms of its objectives, including the analysis of the implementation and administrative management of such activities.”iv 

Policy decisions should lead to outcomes where the social and economic benefits of a policy intervention outweigh the related costs. In the event that the policy intervention does not deliver a net economic and social benefit at least equal to the long-term bond rate, then public resources are not being used in an optimal way.The task of evaluation methods is to determine these costs and benefits in a statistically robust way. 

There are many benefits to be gained from program evaluation. For instance, in the United States in 2013, a Government Accountability Office (GAO) survey found that of the 37 percent of federal managers who had undertaken evaluations in the past, 80 percent reported that “those evaluations contributed to a moderate or greater extent to improving program management or performance and to assessing program effectiveness or value.”vi Referring specifically to cybersecurity policy evaluations, ENISA claims that such evaluations lead to benefits in terms of greater accountability of public action, increased credibility domestically and with international partners, providing an evidence-based input to long-term planning, and providing support for outreach and enhancement of public image, among many others.vii  

In short, program evaluation leads to better policy decisions and improved policy outcomes over time. Given the current state of cybersecurity, and the explicit objective of cybersecurity strategies to improve this state, program evaluation plays an integral role in achieving these objectives over time. 

Yet in the United States, evaluation of public programs is not widely used. The same GAO study found that only 37 percent of federal managers had completed an evaluation of any program, operation, or project. Of those who had undertaken evaluations, the factor most commonly cited as having hindered evaluations to a great or very great extent was lack of resources to implement evaluation findings (33 percent). More attention to and funding for the undertaking of evaluations and implementation of evaluation findings would go a long way towards improving policy outcomes.  

While in-depth financial audits have been undertaken of cybersecurity strategies in the United States and United Kingdom, these serve a different purpose than that of program evaluations. Rather than examining the impacts of the strategies along the stated objectives, these audits tend to focus on “verifying the correctness of financial statements, and assessing how economically, effectively and efficiently the funds are spent.”viii  

The European Commission undertook an extensive ex ante impact assessment of its cybersecurity strategy in 2013. The impact assessment is notable for its clear problem definition, identification of drivers behind the problem, identification of shortcomings of the status quo, clear justification for policy intervention, clear objectives, policy alternatives, and the identification of indicators/metrics to eventually evaluate the progress in achieving the stated objectives. This is a solid foundation on which to conduct future policy evaluation. However, the cost-benefit calculations of the assessment belie the dearth of robust evidence on which to base policy decisions. Many assumptions were made to estimate even the costs of policy options. The benefits are “extremely difficult to estimate for a number of reasons,” including the difficulty to assess “to what extent enhanced NIS would mitigate the negative impact of security incidents.”ix There is much to be done to reliably measure what these benefits might be in the future.  

Evaluation of cybersecurity strategies has been urged by various organizations in the past. The Business Industry Advisory Committee (BIAC) to the Organisation for Economic Co-operation and Development (OECD) recommended that “efficient national cybersecurity strategies and policies should be periodically evaluated and updated so that improvements can be implemented to face new security threats.”x As far back as 2009, the GAO suggested that there were opportunities to enhance federal cybersecurity in the United States through many measures including, “enhancing independent annual evaluations.” These recommendations were not acted upon. In a recent example, in January 2016 the GAO reported that although the Department of Homeland Security (DHS) had developed metrics for measuring the performance of the National Cybersecurity Protection System (NCPS, also known as EINSTEIN), “they do not gauge the quality, accuracy, or effectiveness of the system’s intrusion detection and prevention capabilities.”xi After spending more than $6 billion in funding over the last ten years, DHS was thus unable to describe the value provided by NCPS.  

The majority of European countries have provisions in their cybersecurity strategies for monitoring and evaluation, and the listed evaluation methods include progress reports, updates, and questionnaires amongst stakeholders, among some others.xii What are not mentioned, however, are evaluation methods of an advanced enough level to estimate the counterfactual (i.e., what would have happened were the policy intervention not to have been pursued). This is important because without such an estimation, we are unable to determine the impacts of the policy intervention(s) or the cost-effectiveness of funds that have been spent on said intervention(s). 

A number of past efforts have attempted to evaluate the cost-effectiveness of cybersecurity approaches by testing them with “Red Cell” attacks. More commonly referred to as “penetration testing” (or simply “pen testing”) in cybersecurity, these exercises involve the “simulation of an attack on a system, network, piece of equipment or other facility, with the objective of proving how vulnerable that system or ‘target’ would be to a real attack.”xiii One notable past example of such an exercise at the national level in the United States was Eligible Receiver in 1997, which involved a “red team” from the NSA infiltrating the Pentagon’s systems.xiv Such exercises are intended to identify then address existing vulnerabilities in systems and to better develop incidence responses or risk mitigation measures in the future. However, such exercises do not provide an understanding of the cost-effectiveness, counterfactual, or net social/economic benefits of the security measures or programs put in place. They are thus not able to be used for the evaluation of many of the programs or policies that appear in current cybersecurity strategies.  

A number of additional approaches and techniques have been suggested for the evaluation of cybersecurity strategies in the past. In the impact assessment for the EU Cybersecurity Strategy, the final section of the report covers monitoring and evaluation. Of the methods proposed to evaluate the strategy’s three objectives, the recurring methods include surveys of competent authorities, comparative implementation reports, progress reports, and outcome assessments. The monitoring indicators provided only cover input metrics, such as the “number of Member States having appointed a NIS competent authority which is adequately staffed and equipped to carry out EU-level cooperation,” rather than output metrics such as reduction in breach incidents or reductions in economic losses from cybercrime.  

The BIAC to the OECD recommended that evaluation of cybersecurity strategies might be achieved through the compilation of periodic comparisons between the strategies of different nations, the production of country reports to share information about security incidents and the level of damages, and the planning of recurrent cybersecurity risk assessments.xv Unfortunately, these recommended measures lacked specific techniques for evaluation. In the same report, the Civil Society Internet Society Advisory Council (CSISAC) to the OECD suggested that assessment might be achieved by measuring cybersecurity strategies against their impacts on metrics such as “values recognised by democratic societies, such as freedom of expression, privacy, due process, and transparency.”xvi These values however are not metrics in a strict definition sense of the word “metrics.”xvii 

ENISA has previously advocated for a logical framework (log frame) model to evaluate cybersecurity policies. This model involves identifying the program objectives, inputs (financial and personnel), activities, outputs, outcomes and impacts, and final evaluation. The final evaluation then flows back into the future policymaking process. However, none of these approaches advocates the use of a research design that employs the rigor that is required to accurately and reliably determine the net social and economic costs of a cybersecurity policy intervention.

Complicating Factors for Evaluation of National Cybersecurity Strategies 

Inherent in any attempt at evaluating national cybersecurity strategies are the complications that arise from overlapping and sometimes contradictory outcomes at different “levels” of cybersecurity. One could conceive of cybersecurity as existing across an international level, national level, organizational level (broken up into public and private organizations, then again by size of the organization by headcount or revenue), and the individual level. Improving cybersecurity for entities or actors at one level may come at the expense of cybersecurity for those at other levels.  

This has been very clearly demonstrated in the decades-long tug-of-war between national law enforcement/intelligence services and civil liberty/private sector entities over encryption (also known as the “crypto-wars”).xviii The most recent chapter of this story is the FBI and Apple case regarding the phone of one of the perpetrators of the San Bernardino shootings in the United States. On the one hand, permitting law enforcement/intelligence services to insert “backdoors” into encryption standards, so as to permit the decryption of communications by designated “bad actors,” may result in greater overall security (e.g., in the form of thwarted terrorist attacks). On the other hand, such a measure comes at the known expense of the cybersecurity of an individual’s communications and exchanges with private sector entities through, for example, online shopping or banking.xix Thus, implementing measures to improve cybersecurity at one level comes at the expense of cybersecurity at other levels. 

Such conflicts or contradictions can also be seen at the international level. For instance, all countries might benefit on a net basis from international cooperation around increasing cybersecurity and fighting cybercrime. Yet such cooperation does not occur organically, even between countries or regions with commonly held values and interests (e.g., the European Union and United States). To some extent, this divergence is driven by the interests of the nation-states, or organizations within those nation-states such as intelligence agencies, which may have an interest in continuing espionage that is facilitated by poor cybersecurity. 

Finally, it is difficult to simulate relatively more beneficial alternative courses of actions based purely on the evaluation of past empirical experience. At a national level, it may make sense to pursue policies that in the past have effectively addressed safety concerns brought on by drastic technological change (e.g., U.S. automobile safety and product liability in the 1950s and 1960s). Likewise, it may be relatively more beneficial to transition away from old cybersecurity paradigms, such as firewalls (“building higher walls”), towards a cloud-based paradigm that removes the need for security at an individual or user level. Yet such efforts may not be considered due to strong interests in maintaining the status quo at an organizational level, particularly in the private sector.  

Overcoming contradictions and trade-offs such as these at different levels of cybersecurity policy analysis is the challenge that faces policymakers today. Evaluation of the outcomes of policies at different levels will thus be essential so as to develop policies that maximize net social and economic benefits, across some or all levels, in the future.  

Randomized Controlled Trials 

A number of methodologies and research designs might be deployed to evaluate policy interventions and thereby determine the counterfactual and the net social and economic costs or benefits of a policy option. The gold standard for research design to evaluate policies is the randomized controlled trial.xx 

The simplest form of this method involves randomly allocating a population into two groups: a control group and a treatment group. As the names suggest, the treatment group will receive the intended intervention (in the case of cybersecurity policy, this might be training or a “health check” assessment). The control group will receive nothing. The two groups will be tracked across certain variables. Once the study has ended and the results have been analyzed, the difference between the variables for the two groups will represent the impact of the intervention. The control group represents the counterfactual—what would have happened without the intervention. This allows for an estimation of the effects of the policy intervention, which in turn permits an estimation of its cost-effectiveness. 

To date, randomized controlled trials have not been proposed, much less attempted, for cybersecurity policies. This could be due to the various constraints of this research method. Writing on healthcare interventions, which share some characteristics with cybersecurity policy interventions, Sanson-Fisher et al. cite these limitations as the limited availability of sufficient populations, the time available for follow-up, threats to external validity, contamination of the control or treatment groups, cost, and ethical concerns.  

One underexplored set of methods, which overcomes these limitations and provides a more rigorous analysis than simply recording inputs or asking a “competent authority” for progress reports consists of natural or quasi-natural experiments.xxi 

Natural and Quasi-Natural Experiments

A natural experiment involves identifying a random event that has an impact on the subject of study but no impact on the variable one intends to measure. Put more formally, in a natural experiment, “the treatment (the independent variable of interest) varies through some naturally occurring or unplanned event that happens to be exogenous to the outcome (the dependent variable of interest).”xxii Naturally occurring events might be extreme weather like a hurricane or outbreak of an epidemic. As such, many studies have been conducted in New Orleans following Hurricane Katrina in 2005. These studies have examined outcomes ranging from recidivism rates to labor markets and wages.xxiii An unplanned event might be mass migration, such as the Mariel boatlift, in which large numbers of Cubans arrived in Florida in 1980.xxiv 

Natural experiments compare the outcomes of two groups of subjects that are separated into groups because of the introduction of the exogenous variable. They resemble randomized controlled trials except that the researcher has no control over the random assignment characteristic. A key point is that the groups do not self-select into behavior. Their reception of the treatment is random or close to random. Natural experiments require that the two groups be broadly comparable (at least with regard to the characteristics pertinent to the study), along with a way to record relevant metrics.xxv Quasi-natural experiments, by contrast, do not involve random application of a treatment. Instead, a treatment is applied due to social or political factors, such as a change in laws or implementation of a new government program. The recipients of the treatment are thus not randomly but intentionally chosen according to some predetermined criteria. The group that receives the treatment in a quasi-natural experiment is called the comparison group instead of the control group.xxvi  

By virtue of the fact that natural and quasi-natural experiments are conducted in the “real” world, that is, outside of a laboratory or in an artificial setting, their generalizability and relevance for policy and decisionmaking is enhanced.xxvii However, they suffer from a lower ability to attribute causation to a treatment relative to randomized controlled trials. Moreover, if policy changes constitute responses on the part of political decisionmakers to changes in a variable of interest, analysis of these changes as a natural or quasi-natural experiment may yield biased impacts of the estimates of the policy.xxviii Nevertheless, these elements can be controlled for using statistical methods. 

Natural and quasi-natural experiments can be conducted using a number of methods and designs. Shadish et al. identify 18 such methods.xxix The table below provides a non-exhaustive overview of the more well-established methods and some of their pros and cons. As a general though not absolute rule of thumb, as one proceeds down the list, the robustness and generalizability of the results increases at the expense of practicality and/or cost. The methods can be broadly differentiated according to the following characteristics/configurations:  

  • Prospective/retrospective studies, which either look forward on some phenomenon (sometimes termed a “pretest”) or backward (“post-test”) on some past phenomenon.  

  • Use of a control group, a comparison group, or neither. The configuration of such groups can be done in a multitude of ways, though the key difference between a control and comparison group is random selection in the former’s case.  

  • Use of longitudinal (or “panel”) data to study phenomena and their impact over time or use of cross-sectional data to compare one or more groups across specific variables or characteristics at a single point in time.  

Methods for Natural and Quasi-Natural Experiments

This wide range of possible methods demonstrates the flexibility of natural and quasi-natural experiments. This flexibility means that the evaluation method can be customized to accommodate the specific nature of the policy intervention. This flexibility thus makes such research methods promising avenues for evaluation of a variety of cybersecurity policies. Yet each of these methodologies has relative weaknesses and strengths. Some are thus more suitable for evaluating certain types of activities or policies over others. 

Applying Natural or Quasi-Natural Experiments to Cybersecurity Policies 

This section examines some of the methods used for natural or quasi-natural experiments, recounts past studies in other domains where the methods have been used effectively, then identifies cybersecurity activities or programs for which these methods might be applied for future evaluation. 

The examples below are meant to be illustrative of the possibilities for natural or quasi-natural experiments in evaluating cybersecurity policies already in place within major cybersecurity strategies worldwide. It is by no means an exhaustive list.  

There are countless other ways in which these proposals could be reconfigured based on cost and ethical or political constraints to allow for certain methods to be used. There are also countless other programs or activities within these strategies that could be evaluated with such methods. Finally, a number of natural experiments arise in the form of real intrusions into key information systems. Two high-profile past incidents linked to state-sponsored attackers include Moonlight Maze and Titan These are the “dogs that didn’t bark” in a sense. While they did not aim at disruption, they could have caused disruption. They are thus observationally equivalent to more damaging attacks and could be used as a part of potential control or comparison groups.  

Evaluating Computer Emergency Response Teams in the EU

Among many proposed activities, the EU Cybersecurity Strategy calls for the establishment of national computer emergency readiness teams (also sometimes referred to as computer emergency response teams) in countries that do not already have one (e.g., Cyprus, Ireland, and Poland).xxxi It is estimated that each of these teams will cost €2.5 million to set up.xxxii This creates conditions that might permit a natural or quasi-natural experiment to evaluate the impact of this policy option.  

One way to do this would be through a difference-in-differences method. This involves having two groups, monitoring certain characteristics of these groups, then comparing the groups before and after the policy intervention. The difference between the two periods shows the impact of the intervention (for the treatment group) and the impact of maintaining the status quo (for the control group).xxxiii  

This method was used by Colman, Joyce, and Kaestner in a study evaluating the effect of a parental notification law on abortion and birth rates in Texas.xxxiv Shortcomings of this approach include the difficulties in ensuring that the findings can be applied to a certain population and the difficulty in measuring outcomes that only accrue in the long term.xxxv These shortcomings can be limited through adjustments to the experimental design but should be kept front of mind in any attempt to utilize this evaluation method. 

If one wanted to evaluate whether the creation of a computer emergency readiness team achieves its objectives (e.g., reducing the frequency of detected intrusions on government networks annually, reducing the days that companies cannot earn revenue due to an information security failure, etc.), one might stagger the creation of the computer readiness teams over time across three countries. One country would be chosen at random to receive a computer emergency response team. The remaining two, for a certain time period, would not receive such a team. The order in which the countries are chosen for the intervention could be done at random. This would create a situation where one country is the treatment group and two remaining countries are a control group. The output metrics in each of the three countries, such as number of days that companies are disabled after incurring an information security incident, could be tracked and compared over time to see whether or not the computer emergency readiness team has an impact on incidents as compared with countries that do not possess one. Note that this approach does not preclude the control group from eventually receiving a computer emergency readiness team. It just delays the implementation of the initiative so as to determine the likely net costs or benefits of the intervention. 

One could relax the randomization requirement, thereby creating a quasi-natural experiment, if needed. This would reduce the internal validity of the evaluation but, given political constraints, might be the only feasible option. Regardless, the evaluation would certainly yield superior results when compared to the proposed methods in the EU impact assessment for its cybersecurity strategy, which go no further than measuring input metrics through imprecise methods such as surveys of those interested parties who receive the funding. Such an approach could also be used to evaluate similar initiatives in other countries. For example, in the United States, the plan for “the Department of Homeland Security to increase the number of Federal civilian cyber defense teams.”xxxvi 

Evaluating Cybersecurity “Health Checks” in Australia

The Australian Cybersecurity Strategy of 2016 calls for two activities to bolster the cyber defenses of Australian companies. The first is the “introduction of national voluntary Cybersecurity Governance ‘health checks’ to enable boards and senior management to better understand their cybersecurity status.” These health checks are to be modelled on the United Kingdom’s FTSE 350 governance health checks. In time, it is intended that they will be available for public and private organizations, tailored to size and sector. The second is to “provide support for small businesses to have their cybersecurity tested by certified practitioners.” 

Scant detail is given on the budgets for these activities, the metrics for their monitoring, and potential methods for evaluation. Fortunately, policy interventions such as these create the ripe conditions for natural or quasi-natural experiments.  

A cross-sectional comparison method could be used to estimate the impact of these health checks or cybersecurity testing. Such a method simply involves separating the participants into two comparable groups, delivering the intervention to only one of these groups, monitoring the two groups across certain metrics, then measuring the difference between outcomes of the two groups following the intervention.  

The U.S. Department of Housing and Urban Development (HUD) used such a research design to determine the impact of a grant program to encourage resident management of low-income public housing projects in the 1990s.xxxvii The key part of this study was that the participants self-selected into the study by requesting the grant; allocation was not random. This made it a quasi-natural experiment. The researchers thus had to find a comparable group of public housing projects that did not receive the grant, track those housing projects with the same metrics, then determine the difference between the two so as to identify the treatment effect. This requirement could potentially be overcome if recipients of the grant are chosen at random.  

There will be a limited budget available for the two Australian interventions. Presumably, the companies that wish to take advantage of these programs will have to apply for them. One way to implement this intervention would be to open up a limited number of spots for firms and accept applications in excess of the number of spots. A lottery could be held to determine which companies receive the intervention and which do not. This randomly allocates companies into a control and treatment group. However, this does not address the issue of self-selection, given that companies have come forward to apply for the program, but this can be controlled for in the subsequent analysis. To monitor the two groups, as a condition of applying for the intervention, applicants could be required to report on a variety of metrics regardless of their ultimate selection for the program. This would allow monitoring and subsequent analysis of the companies that are not chosen for the intervention (the control group). The companies that are randomly chosen would be the treatment group. The difference between the outcomes for the two groups would be the treatment effect.  

As with the HUD housing project, the requirement for random allocation of intervention receipt could be relaxed if a comparable comparison group of companies can be found. This could be accomplished similarly by requiring applicants, regardless of their selection into the program, to report certain metrics over time. The carrot in this scheme for those who participate but do not receive the intervention will, at some point in the future, receive it. Though this would introduce bias into the results that would have to be corrected for.xxxviii Such an approach could also be used to evaluate similar initiatives in other countries. For example, in the United States, there is a plan for the Small Business Administration (SBA), partnering with the Federal Trade Commission, the National Institute of Standards and Technology (NIST), and the Department of Energy, to offer cybersecurity training to reach over 1.4 million small businesses and small business stakeholders through 68 SBA district offices, nine NIST manufacturing extension partnership centers, and other regional networks.xxxix 

Mandatory Data Breach Notification Laws in the EU and the United States

In the absence of a federal law, U.S. data breach notification laws differ from state to state. While there are federal laws concerning the protection of consumer data, these differ across industries. This environment creates opportunities for the conduct of natural or quasi-natural experiments. 

Using a difference-in-differences methodology, one could conduct a quasi-natural experiment to determine the impact of mandatory data breach notification laws and regulations in the United States.  

Card and Krueger used such a research design to estimate the impact of changes in the minimum wage in New Jersey.xl In 1992, New Jersey increased its minimum wage from $4.25 to $5.05 per hour. To evaluate the impact of the law on employment, the scholars surveyed fast food restaurants in New Jersey and neighboring Pennsylvania (where the minimum wage was kept constant). Fast food restaurants in New Jersey were thus the treatment group and those in Pennsylvania the comparison group.xli A key part of this study is that, although the participants in each group were not randomly allocated, the two groups were broadly comparable. 

Data breach notification laws change frequently in states across the United States.xlii The changes typically involve tightening the requirements for notification, such as the length of time before a company must notify interested parties of a breach or the civil penalties for non-compliance. Comparing the impact of changes to these laws would be a simple case of surveying firms in a concerned industry in one state and surveying a similar cohort of firms in the same industry in another state where the laws are broadly similar, then comparing the differences in outcomes between the two groups after a specified period of time. Outcomes that could be measured include the incidence of breaches on the firms themselves or the number of affected parties due to data breaches. 

Such a design could also be modified to determine the impact of new mandatory data breach notification rules to be introduced as a part of the EU General Data Protection Regulation in 2018. The key here would be to compare cohorts across EU countries with similar cohorts in countries where the rules will not be implemented (e.g., Switzerland or perhaps the United Kingdom).  


There are substantial benefits to program evaluation. There have been consistent calls from numerous organizations for evaluations of cybersecurity policies to be undertaken. Yet, in spite of substantial investment of public funds in cybersecurity strategies over the past decade, very few of the programs that comprise these strategies have been evaluated. 

Natural and quasi-natural experiments provide a cost-effective, robust, and flexible set of methods for the evaluation of the programs and activities that comprise cybersecurity strategies worldwide. The very fact that cybersecurity programs involve the introduction of an exogenous intervention means that, at a very minimum, quasi-natural experiments can be feasibly and affordably undertaken.  

This paper has highlighted three examples where natural or quasi-natural experiments might be used to evaluate existing cybersecurity programs: computer emergency readiness teams in the EU, “health checks” in Australia, and data breach notification laws in the United States. These are just a few examples of the many possible options that could be pursued if policymakers wish to determine the effectiveness of their cybersecurity policy interventions and thus improve these interventions in the future. Moreover, methods that would be conducive to the evaluation of cybersecurity policies beyond natural or quasi-natural experiments also exist. Simulations, commonly used in war gaming, or permissioned testing of network infrastructure, already commonly used in the form of penetration testing, also hold great potential.  

Given the present state of cybersecurity worldwide, and the strong similarities between policy interventions across countries, there are enormous benefits to be had here. Even marginal improvements in cybersecurity policies would lead to much-needed improvements in overall cybersecurity. Lessons learned in one country could be readily applied to other countries where the same or similar interventions are pursued. With so many countries presently implementing new or revised cybersecurity strategies, now is an ideal time to begin undertaking robust evaluations of the policies that comprise these strategies using natural or quasi-natural experiments.  

Benjamin C. Dean is a former fellow for cybersecurity and Internet governance and alumnus at Columbia SIPA.



i. “Cybersecurity Policy Making at a Turning Point: Analysing a new generation of national cybersecurity strategies for the internet economy” (report, OECD, 2012); “An Evaluation Framework for National Cybersecurity Strategies” (report, ENISA, 27 November 2014).      

ii. White House, “The President’s Budget for Fiscal Year 2017” (Office of Management and Budget2016).    

iii. Commonwealth of Australia, Australia’s Cybersecurity Strategy: Enabling Innovation, Growth and Prosperity (Canberra: 2016).

iv. G. Papaconstantinou and W. Polt, “Policy Evaluation in Innovation and Technology: An Overview,” in “Policy Evaluation in Innovation and Technology: Towards Best Practices” (report,  Organization for Economic Co-operation and Development, 1997).

v. “OECD Studies of SMEs and Entrepreneurship: Thailand,” (report, OECD, 2011).

vi. “Strategies to Facilitate Agencies’ Use of Evaluation in Program Management and Policy Making” (report, GAO, 2013).

vii. “An Evaluation Framework for National Cybersecurity Strategies.”

viii. Ibid.

ix. “Impact Assessment: Proposal for a Directive of the European Parliament and of the Council Concerning measures to ensure a high level of network and information security across the Union” (staff working document, European Commission, 2013).

x. “Cybersecurity Policy Making at a Turning Point: Analysing a new generation of national cybersecurity strategies for the internet economy” (report, OECD, 2012).

xi. “DHS Needs to Enhance Capabilities, Improve Planning, and Support Greater Adoption of Its National Cybersecurity Protection System” (report, GAO, 2016).

xii. “An Evaluation Framework for National Cybersecurity Strategies.”       

xiii. Kevin M. Henry, Penetration Testing: Protecting Networks and Systems (IT Governance Publishing, 2012).

xiv. Scott W. Beidleman, “Defining and deterring cyberwar” (report, U.S. Army War College, 6 January 2009).      

xv. “Cybersecurity Policy Making at a Turning Point: Analysing a new generation of national cybersecurity strategies for the internet economy.”

xvi. Ibid.

xvii. “A standard of measurement” being a commonly accepted definition of “metrics.”

xviii. Abelson et al., “Keys Under Doormats: Mandating insecurity by requiring government access to all data and communications” (paper, Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, 6 July 2015).

xix. Ibid.

xx. R.W. Sanson-Fisher, B. Bonevski, L.W. Green, and C. D. Este, “Limitations of the randomized controlled trial in evaluating population-based health interventions,” American Journal of Preventative Medicine 33, no. 2 (2007), 155-162.

xxi. Dahlia K. Remler and Gregg G. Van Ryzin, Research Methods in Practice: Strategies for Description and Causation (SAGE Publishing, 2015).

xxii. Ibid.

xxiii. David S. Kirk, “A natural experiment of the consequences of concentrating former prisoners in the same neighborhoods” (Proceedings of the National Academy of Sciences of the United States of America 112, no. 22, 2015), 6943-6948.

xxiv. D.G. De Silva, R.P. McComb, Y. Moh, A.R. Schiller, and A.J. Vargas, “The Effect of Migration on Wages: Evidence from a Natural Experiment,” The American Economic Review 100, no. 2 (2010), 321-326.

xxv. Remler and Van Ryzin, Research Methods in Practice: Strategies for Description and Causation.

xxvi. Ibid.

xxvii. Ibid.

xxviii. Timothy Besley and Anne Case, “Unnatural Experiments? Estimating the Incidence of Endogenous Policies,” Economic Journal 100 (November 2000), 672-694; Consider for example the case of police numbers and crime rates. If cities are increasing police numbers, they are likely doing this in response to rising crime rates. A naïve analysis of such an environment might conclude that increasing the number of police officers increases the crime rate. See S.D. Levitt, “Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crime,” American Economic Review 87 (1997), 270-290.

xxix. William R. Shadish, Thomas D. Cook, and Donald T. Campbell, Experimental and Quasi-Experimental Designs for Generalized Causal Inference (Boston, MA: Houghton Mifflin, 2002).

xxx. P. Pawlak and P. Petkova, “State-sponsored hackers: hybrid armies?” (report, Institute for Security Studies, January 2015).          

xxxi. Impact Assessment: Proposal for a Directive of the European Parliament and of the Council Concerning measures to ensure a high level of network and information security across the Union.”

xxxii. Ibid.

xxxiii. Remler and Van Ryzin, Research Methods in Practice: Strategies for Description and Causation.

xxxiv. Silvie Colman and Ted Joyce, “Minors’ Behavioral Responses to Parental Involvement Laws: Delaying Abortion until Age 18,” Perspectives on Sexual and Reproductive Health 41, no. 2 (2009), 199-126.

xxxv. Remler and Van Ryzin, Research Methods in Practice: Strategies for Description and Causation.

xxxvi. White House, “Fact Sheet: The Cybersecurity National Action Plan, Office of the Press Secretary” (Office of Management and Budget2016).         

xxxvii. Remler and Van Ryzin, Research Methods in Practice: Strategies for Description and Causation.

xxxviii. If companies know that they will receive future interventions for cybersecurity health checks or testing by certified practitioners, they might not purchase these services—when they otherwise might have—and thereby expose themselves to greater risk of information security failure. This would, in turn, influence the results because the control group may be subject to information security failures that they would not otherwise have incurred had they purchased the services privately. Moreover, there’s a possibility that of a “silent graveyard,” that is, companies that go bankrupt due to an information security failure in the meantime, never end up receiving the treatment later on. In such a scenario, when comparing the control and treatment groups, the treatment effect would be greater than it might otherwise have been had the promise of future support not been made.

xxxix. White House. “Fact Sheet: The Cybersecurity National Action Plan, Office of the Press Secretary.”       

xl. David Card and Alan Krueger, “Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania,” American Economic Review (1994), 772-793.

xli. Card and Krueger also separated their samples into restaurants that were already paying relatively high wages (greater than $5). They found no impact from the raising of the minimum wage.

xlii. Lei Shen and Rebecca Eisner, “United States: New and Proposed U.S. Data Breach Notification Laws, Mayer Brown,” Mondaq, 9 July 2014.