DataCap For Indiana Elevation Catalog On Filecoin
Introduction
This document outlines the DataCap application for the Indiana Statewide Elevation Catalog, a comprehensive collection of LiDAR data managed by the State of Indiana Geographic Information Office and IOT Office of Technology. This critical dataset includes digital LiDAR LAS files dating back to 2011-2013 and the NRCS-funded 2016-2020 collection. These files are stored in AWS and are essential for various analyses and applications across numerous projects. This application requests 300 TiB of DataCap to ensure the long-term preservation and accessibility of this valuable public dataset on the Filecoin network. Securing this DataCap will allow for the creation of eight replicas of the 35TiB dataset, ensuring redundancy and data integrity. This will guarantee that the data remains accessible and safe for future use.
Project Overview
The Indiana Statewide Elevation Catalog is a significant resource comprising digital LiDAR LAS files. These files are crucial for various applications, including environmental monitoring, infrastructure planning, and disaster management. The data is meticulously organized into a tile grid scheme covering the entire state of Indiana, making it easy to access and process. The current AWS storage solution provides accessibility, but migrating this data to the Filecoin network will enhance its long-term preservation and accessibility, leveraging the decentralized storage capabilities of Filecoin. The state's commitment to open data access is exemplified through this initiative, ensuring that researchers, government agencies, and the public can benefit from this valuable resource. The project aims to create a robust, reliable, and decentralized storage solution for the Indiana Statewide Elevation Catalog, making it accessible to a broader audience and securing its availability for future generations.
Background and History
The State of Indiana Geographic Information Office and IOT Office of Technology have been managing this series of digital LiDAR LAS files. This started with collections dating back to 2011-2013 and includes the NRCS-funded 2016-2020 collection. These datasets are stored as uncompressed LAS files for cloud storage and access on AWS. Each year’s data is organized into a tile grid scheme, covering the entire geography of Indiana, and the tiles are named to reflect each tile's lower-left coordinate. This meticulous organization facilitates accurate data management and retrieval. The AWS storage solution has been effective, but transitioning to the Filecoin network will provide enhanced data preservation and accessibility, ensuring this valuable resource remains available for diverse applications and research endeavors. The historical context of this data, spanning several years of collection efforts, underscores its importance for long-term environmental and infrastructure studies. The commitment to maintaining this data and making it accessible highlights Indiana's dedication to open data initiatives.
Data Description
The dataset comprises digital LiDAR LAS files. These files are essential for generating high-resolution elevation models of the state of Indiana. The data is organized into tiles, each representing a specific geographic area, and stored in uncompressed LAS format to maintain data integrity and quality. The file naming convention, based on the lower-left coordinate of each tile, ensures easy identification and retrieval. The data spans several years, offering a comprehensive view of the state's topography over time. Key attributes of the dataset include high spatial resolution, accurate elevation measurements, and comprehensive geographic coverage. These attributes make the data invaluable for applications such as flood risk assessment, infrastructure planning, and natural resource management. The data's format, storage, and organization are optimized for efficient access and processing, ensuring users can effectively utilize this resource for various analytical purposes.
DataCap Request Details
The total DataCap requested is 300 TiB, with an expected size of a single dataset (one copy) at 35 TiB. The project plans to store eight replicas to ensure data redundancy and availability. A weekly allocation of 100 TiB is requested to facilitate the data transfer and storage process. The on-chain address for the first allocation is f1n4zgam2zn56bgqbv4qiznlz32cpurhy7iuowhwa. This DataCap allocation will enable the secure and decentralized storage of the Indiana Statewide Elevation Catalog on the Filecoin network, making it accessible to a broader audience and ensuring its long-term preservation. The requested capacity is carefully calculated to accommodate the dataset size, replication requirements, and the planned duration of storage. This strategic approach to DataCap allocation will support the successful migration and maintenance of the dataset on Filecoin, maximizing its impact and utility.
Data Type and Justification
This application is for a Public, Open Dataset (Research/Non-Profit). The Indiana Statewide Elevation Catalog is a valuable resource for researchers, government agencies, and the public. Making this data openly accessible promotes scientific research, informs policy decisions, and supports various applications that benefit society. The dataset's non-profit nature aligns with Filecoin's mission to support public goods and open knowledge initiatives. The data is crucial for environmental monitoring, infrastructure planning, and disaster management, all of which have significant societal impacts. By storing this data on Filecoin, the project ensures its long-term availability and accessibility, fostering innovation and collaboration across various fields. The open data nature of this project is a key justification for the DataCap request, as it aligns with the principles of transparency, knowledge sharing, and public benefit.
Data Storage and Retrieval
The data is currently stored on AWS Cloud. Migrating this data to the Filecoin network will leverage Filecoin’s decentralized storage capabilities, enhancing data preservation and accessibility. The project plans to distribute the data to storage providers using HTTP or FTP servers, shipping hard drives, and IPFS. The expected retrieval frequency for this data is yearly, primarily for research and periodic updates. The data is planned to be stored on Filecoin for 1.5 to 2 years. Storage deals will be made in Greater China and other parts of Asia. The selection of these regions aims to diversify the storage locations and ensure global accessibility. The data distribution strategy involves a combination of methods to accommodate the dataset size and the capabilities of storage providers. The long-term storage plan ensures that the data remains available for future use, supporting ongoing research and applications.
Storage Provider Selection
Storage providers were identified through Slack. The providers include:
- f03601451 Hong Kong
- f03609158 Hong Kong
- f02826762 Hong Kong
- f02827135 Hong Kong
- f02827010 XinJiang
- f02825281 XinJiang
- f03623232 Hong Kong
- f03610683 Hong Kong
These providers were chosen based on their location, reliability, and capacity. The project aims to make deals directly with these storage providers, adhering to the Fil+ guidelines to ensure data integrity and network alignment. The selection process involved evaluating the providers' track record, infrastructure, and commitment to the Filecoin network. By working with a diverse set of providers, the project aims to enhance data redundancy and ensure resilience against potential failures. The storage providers' locations in Asia reflect the project's goal to distribute the data globally, making it accessible to users in different regions.
Data Preparation and Distribution
As the Data Preparer is located in India, the data will be prepared using standard data processing tools and techniques. The specifics of the tooling and technical details are yet to be finalized, but the process will ensure the data is properly formatted and optimized for storage on the Filecoin network. The data will be distributed to storage providers using a combination of HTTP/FTP servers, hard drive shipping, and IPFS. This multi-faceted approach ensures flexibility and efficiency in data transfer. The preparation process will include data validation, format conversion if necessary, and organization to facilitate easy retrieval and use. The distribution strategy is designed to accommodate the dataset's size and the capabilities of the storage providers, ensuring a smooth and efficient transfer process. The ultimate goal is to make the data readily available to the Filecoin network while maintaining its integrity and quality.
Data Accessibility and Retrieval
The dataset is confirmed as a public dataset that can be retrieved by anyone on the network. This commitment to open access ensures the broadest possible impact and utilization of the data. The expected retrieval frequency is yearly, primarily for research purposes and periodic updates. The data will be accessible through standard Filecoin retrieval mechanisms, leveraging the network's decentralized storage infrastructure. The project aims to provide clear instructions and documentation to facilitate data retrieval, ensuring users can easily access and utilize the data for their respective applications. The public nature of the dataset aligns with the principles of open science and knowledge sharing, maximizing the societal benefits of this valuable resource. The long-term storage plan ensures that the data remains accessible for future research and applications, supporting ongoing innovation and discovery.
Conclusion
This DataCap application for the Indiana Statewide Elevation Catalog aims to secure the long-term preservation and accessibility of a valuable public dataset. By storing this data on the Filecoin network, the project will enhance its resilience, accessibility, and utility for researchers, government agencies, and the public. The requested DataCap of 300 TiB will enable the creation of eight replicas, ensuring data redundancy and integrity. The project’s commitment to open data principles and strategic distribution plan will maximize the impact of this resource, supporting various applications and fostering innovation. This initiative exemplifies the potential of decentralized storage to support public goods and advance scientific knowledge.