The community bonding period ended on May 27th, here is the summary of the work I did till that date. I was suggested to contribute to PolyMath, the parent library of DataFrame, which is used for scientific computing. Here are the issues solved:
1. t-SNE implementation in Pharo
t-SNE is a visualization algorithm, through which you can judge your input dataset, reveal it’s clusters, etc. The algorithm implemented was incomplete, as mentioned in Issue#115. I worked torwards completing the algorithm, having a running version in the commit 7f3a812. We started using Github Boards for tracking the progress, which can be found here. Overall, I spent about 2 weeks learning the details and implementing the algorithm. I’ll be creating a seperate post detailing the algorithm soon.
2. Speeding up PMVector sum
While profiling t-SNE, I discovered that current implementation of PMVector
sum was very slow.
sum is used to calculate the sum of PMVector, and is also used in PMMatrix to calculate sums of rows and columns. You can read more about the issue here: Issue#125. It was easily fixed by relying on parent
sum, which increased speed quite a lot.
3. PMVector comparision operators
PMVector had a redundant and perhaps, wrong operator
==, which was removed (Issue#90, PR#108). Also, operators
< modified the source vector in-place, causing unintuitive behavior while working with PMMatrix: Issue#122, PR#123.
First two weeks were mostly spent on learning about PolyMath and it’s classes, integrating Roassal and DataFrame with PolyMath, and exploring codebase for DataFrame. Overall, it has been productive 3 weeks!