After 14 issues, 17 PRs, and over 40 commits later, GSoC has finally come to an end. Here is a short demo on the major features added to Pharo’s libraries:

Most of the Phase 3 was spent on creating the DataSet library, tweaking it, and creating blog posts.

A list of issues and PRs created uptil Phase 2 are:

Here is the list of blog posts describing the contributions to Pharo:

  1. Summary of the proposal
  2. Community bonding period
  3. Phase 1 progress
  4. Phase 2 progress
  5. Phase 3 blogs: DataSet library, DataFrame IO comparison and DataFrame Joins.

You can view all the blog posts at this link.

Future work

Improving DataFrameTypeDetector

As described in a previous blog post, improving the DataFrameTypeDetector class will lead to faster load times of csv files.

DataSet library

Identifing and adding additional datasets to the library. Also, modifing the library such that adding new datasets would require adding minimal amount of code.


Creating data analysis tutorials using the new DataSet library, DataFrame and PolyMath together.

Additional improvements to DataFrame

Refer to the roadmap of the DataFrame library. Features like database IO, time series, spec editor etc need to be added to the library.