Examination and Verification 2016 Findings Report

This “Examination and Verification 2016 Findings Report” and the accompanying “Results” workbook were submitted on December 1, 2016 to the Mayor, City Council, and the Departments of Sanitation, Correction, and Housing Preservation and Development.

Process and Complications
In December 2015, the New York City Council voted in favor of Intro No. 916-A, a law requiring an agency designated by the Mayor to conduct examinations and verifications of the compliance of certain mayoral agencies with the Open Data Law. The purpose of this law is to improve citywide compliance by creating a more systematic way to locate datasets that may have inadvertently or purposefully been excluded in agencies’ self-reported Open Data compliance plans.

In January 2016, Mayor de Blasio signed Int. No. 916-A and it became Local Law 8 of 2016. He designated the Mayor’s Office of Data Analytics (MODA) to conduct this process. MODA then prepared an Examination and Verification plan, which was approved by the Commissioner of the Department of Investigation.

The Department of Sanitation (DSNY), Department of Correction (DOC), and Department of Housing Preservation and Development (HPD) were the three agencies named for the first round of the Examination and Verification process.

MODA’s plan required these agencies to assemble the following items:

  • Dataset questionnaire
  • Executive Certification Letter
  • Public nominations

The dataset questionnaire familiarized MODA with each agency’s current Open Data footprint, routine information reporting requirements, data management systems, and organizational structure. In the certification letter, an executive at the agency attested to the accuracy and completeness of the information provided.

From October 28 to November 14, 2016, the Open Data team invited the public to suggest additional datasets for consideration. While users always have the option to nominate datasets for publication on the Open Data Portal, this window for public feedback specifically invited the public to participate in the Examination and Verification process.

Based on this information, MODA expected to assemble a comprehensive inventory of eligible datasets within the surveyed agencies that MODA analysts could determine to be “public” or “private.” The “public” datasets that had not previously been disclosed in the agencies’ compliance plans would be named in this report and subjected to future publication on the Open Data Portal.

This proved more complicated than the plan intended. Creating a list of eligible datasets is not a cut and dried process, and raised questions regarding the definitions of “public,” “data,” and “dataset.” The way data is represented reflects a series of decisions about the collection, organization, and depiction of digital information; distilling a stable “dataset” within this series is often a complex matter of discretion involving an array of actors.

Consider a few examples:

  • The Department of Sanitation (DSNY) publishes monthly garbage collection statistics in PDFs on its website. These reports are formatted as tables of statistics, which are surfaced from underlying data. Is the “dataset” in question the tables of statistics in the reports, or the unstructured information that is aggregated into reported metrics?
  • The Department of Correction (DOC) maintains the “Inmate Information System,” a jail management technology that is rife with Personally Identifiable Information. This data is aggregated and reported as an indicator in the Mayor’s Management Report (MMR). If a properly de-identified dataset might be contrived, but does not yet exist, is it eligible for publication on Open Data?
  • The Department of Housing Preservation and Development (HPD) conducts building inspections, some of which result in violations issued. The agency maintains an “inspections file” and a “violations file.” Violations data is published on the Open Data Portal; inspections data is not. When a data source represents similar or redundant information to data already on the Portal, should it be published?

Datasets are like waves: it is not always clear where one ends and the other begins.
These questions warrant further consideration and clarification. We outline these challenges and proposals for steps forward in the Recommendations for Better Citywide Compliance ” section of this report.

Summary of Results
Itemized results from the agency surveys can be found in the “Examination and Verification 2016 Results Workbook.” This information is meant to give users a snapshot of the technical environment of the agency and a better understanding of how data from an agency’s data system becomes a usable dataset on the Open Data Portal.

We encourage members of the public to make use of this information and the dataset nomination process – which guarantees a formal review and timely response by Local Law 109 of 2015 – to help us push the City closer to fulfilling the intention of the Open Data Law.

In summary, we found:

  • All three agencies are in good standing with the Open Data Law.
  • Several data systems may contain data that can be published on the Open Data Portal, but warrant further review.
  • Three dataset nominations received during the public feedback window were referred to HPD. One was determined to be a dataset managed by the Department of Finance, one was determined to not be an existing dataset, and one is under further review.
  • None of the agencies listed any new datasets determined to be public through the FOIL review process.

Results Snapshot

DSNY

DOC

HPD

Total Open Data Datasets

15

12

9

Currently on Open Data Portal (ODP)

14

9

7

Planned for future release

1

3

2

 

 

 

 

Datasets currently on ODP - Automations

 

 

 

Automated

14

9

6

Non-automated

0

0

1

 

 

 

 

Datasets currently on ODP - Update Frequency

 

 

 

Biannually

0

0

1

Quarterly

0

1

0

Monthly

7

7

6

Weekly

1

0

0

Daily

0

1

0

As needed

6

0

0

 

 

 

 

Data associated with MMR indicators*

15

12

11

On ODP or future release

1

2

5

Public but not on ODP or future release

14

3

3

Private

0

7

3

 

 

 

 

Public requests for data

10

0

6

Already available on ODP

6

0

2

Potentially new public data

2

0

1

Non-public data

0

0

0

Not agency data

2

0

3


Recommendations for Better Citywide Compliance
Up until this year, the Open Data Law required City agencies to self-submit compliance plans that laid out a timeline for publishing “public datasets.” After the package of amendments to the Open Data Law was passed during the last year, the agency compliance plan became supplemented by the following legal mandates:

  • Public requests: Local Law 109 of 2015 guarantees timely and thorough responses to all public requests for new datasets on the Open Data Portal.
  • Timely updates: Local Law 110 of 2015 requires all data published on agency websites to be included and kept up-to-date on the Open Data Portal.
  • FOIL responses including data: Local Law 7 of 2016 requires agencies to review Freedom of Information Law (FOIL) requests containing data to determine whether they contained new public datasets that could be published on the Open Data Portal.
  • Examinations and Verifications: Local Law 8 of 2016 requires MODA to examine three mayoral agencies each year to verify that all public datasets have been disclosed.

These statutory measures collaborate to form a framework for locating data that are, by Local Law 11 of 2012, eligible for publication on the Open Data Portal by the end of 2018. Over and above the specific statutory mandates for Open Data compliance, it is incumbent on MODA, City agencies, and other citywide actors to fill in other identified gaps in the Open Data program.
To that end, the Examination and Verification law requires that the office or agency conducting the examinations and verifications “make recommendations to improve the disclosure and inclusion of all public data sets required to be on the single web portal.” We outline a series of specific recommendations below.

  1. Agencies should make their technical ecosystems more accommodating to Open Data by:
    1. Using automations, rather than manual uploads, to update datasets currently on the Open Data Portal.
    2. Writing Open Data requirements into procurements of new data systems and analytics technologies.
    3. Allocating more resources to Open Data personnel, especially Open Data Coordinators.
  2. The Open Data team should empower Open Data Coordinators by:
    1. Surveying Open Data Coordinators to better understand their roles, priorities, and communication preferences.
    2. Producing documents clarifying the roles and responsibilities of Open Data Coordinators, including guidelines on complying with legal mandates.
  3. MODA should improve the Examination and Verification plan for future years by:
    1. Consulting with the Department of Investigation on potential improvements.
    2. Creating clear guidelines and definitions of “data” and “dataset.”
    3. Creating clear guidelines on determining whether a dataset is “public” or “private.”

Opening data is not a one-off obligation: it is an ongoing, virtuous cycle that creates operational efficiency and communication across silos of government. As Open Data becomes the norm for city data, it makes agencies more aware of the data they have and the data they produce – and spurs better upkeep and disclosure of information. As Open Data shifts into an established, routine process across NYC government, we will continue to look for ways to try out bold new ideas, as well as opportunities to make incremental adjustments to existing processes.

*Note: Agencies were required to report names of datasets associated with each of their MMR indicators. The number of clearly public datasets is totaled in “On ODP or future release.” The number of clearly private datasets is totaled in “Private.” The datasets totaled in “Public but not on ODP or future release” are less definitive. Many do not refer to a specific dataset, but a database or application that requires further investigation as to whether a dataset eligible for publication on Open Data could be produced. For more detail, see the Examination and Verification 2016 Results Workbook.