Spatial data analysis in CSC new Puhti supercomputer

During the autumn 2019 CSC new computing environment was opened for users, including Puhti supercomputer and Allas object storage. The new documentation is available at https://docs.csc.fi

Puhti

Puhti is meant for large scale data analysis and simulations and provides more computing power than normal desktop machines have. Puhti is the replacement of the older Taito supercomputer. A general introduction to spatial data analysis in CSC computing enviroment is given here.

Puhti sofware

Several GIS applications have been installed to Puhti: Python and R with spatial libraries, FORCE, GDAL, LasTools, mapnik, OrfeoToolbox, PDAL, QGIS, SagaGIS, SNAP, sen2cor, sofi3D, solaris, Zonation.

  • Python GIS environment (geoconda) has some new libraries e.g. ArcGIS Python API, xarray, dask, ncview GUI and libraries related for accessing Allas (boto3, swiftclient)
  • Spatial R libraries are now under the main R environment in Puhti, but have their own documentation page – GIS R libraries. R-env includes also aws.s3 library for accessing Allas.
  • Keras and Tensorflow 2.0 modules for machine learning include geopandas and rasterio packages for enabling reading spatial data to deep learning models.
  • solaris is a new Python tool for image segmentation using deep learning modules.
  • There is no plans to install GRASS nor Taudem currently to Puhti, if these or some other software is needed, or you notice some problems, please contact CSC servicedesk.

Puhti GIS data

Puhti has a lot of Finnish GIS data locally available, including almost all Paituli data, all SYKE open data, LUKE VMIs and Satellite mosaics produced by SYKE and FMI in Paikkatietoalusta project.

  • Data is in /appl/data/geo directory.
  • As new data are available: Sentinel1 SAR mosaics, Sentinel2 index mosaics, historical Landsat satellite image mosaics and historical Landsat NDVI mosaics.
  • Puhti includes also NLS infrared orthoimages, which were not available in Taito.
  • 10m DEM data is updated to the latest available.
  • NLS 2m DEM, lidar, infrared ortophotos, all SYKE datasets and satellite mosaics are updated in Puhti automatically every Monday.
  • There are no plans to move the automatically classified lidar data files to Puhti, only the lidar data classified with the help of stereo models is available. If some additional data is needed, or you notice some problems, please contact CSC servicedesk.

Allas

Allas is the new object storage service which is meant for all data storage during a project’s lifetime. Allas can be used from Puhti, cPouta, local desktop or files can even be made public, so that anybody can access them. Puhti has less local disk space than Taito: 10GB per user, 50GB per project and 1TB in scratch per project. Although these can be increased from service desk, Allas is the long term storage for bigger data sets.

  • Here is a migration tutorial for moving data from Taito to Allas and Puhti
  • There is also a recorded webinar in Youtube about data migration from Taito to Allas and Puhti
  • GDAL and GDAL-based tools (inc several R and Python GIS libraries) support reading data directly from Allas. More info.

Any questions or comments can be sent to servicedesk@csc.fi.