2 minutes
GSoC 2019 Phase 1 progress
Today (June 24th) marks the completion of phase 1 of GSoC. Here is the summary of what I have been working on for past few weeks:
PolyMath library
I spent most of my community bonding period as well as first week of phase 1 on PolyMath library, exploring codebase, fixing bugs and adding features. My most favorite part was adding t-SNE implementation! Here is a peek at a couple of visualizations:
The visualization is made possible using Roassal3, be sure to check it out at https://github.com/ObjectProfile/Roassal3.
Here is the list of issues I created:
- Math-TSNE is incomplete
- PMVector > < operators modify in-place
- PMVector sum is extremely slow
- PMStandardScalar fails when scale = 0
And here is the list of PRs that solved them:
- Implementing the t-SNE algorithm
- Fix PMVector comparison operators
- Removed PMVector sum for speedup
- Removed == method from PMVector
- Refactored PMTSNE to include steps
- Added vizualization examples for PMTSNE
- Fixed ZeroDivideError in PMStandardizationScaler
DataFrame library
The main focus for phase 1 was getting the library work with missing data - initializing with missing values, reading files, and providing methods to fill the data.
Here is the list of issues created by me:
- Handling missing data - collection of all issues related to missing data
DataFrame select:
fails when no rows are selected- DataSeries does not support boolean operators with scalars
- Add JSON read/write support
- DataFrameInternal - Using OrderedCollection over Array2D
- DataFrame addRow does not consider key order
PRs which solve some of these issues:
- Boolean operators for DataSeries
- Added support for DataFrame init with missing values
- Added ability to remove nil from DataFrame and Series
- Added DataSeries fillNilsWith method
- Added method to convert missing values from files
- DataFrameTypeDetector now works with nil
Next steps
Some of the features planned for phase 2 include implenting joins, adding json support and a new DataSet
library. I will be creating a detailed post after the first evaluation (first week of July). Be sure to star and follow DataFrame and PolyMath on Github for updates on the progress! :)