I spent the last 2 weeks exploring PolyMath, DataFrame and Roassal. These three libraries were developed independently, and solve different goals: PolyMath for scientific computing, DataFrame for data analysis, and Roassal for visualization. However, the work cohesively, due to the class structure of these libraries.
To demonstrate this, here is a piece of code utilising all three libraries:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
|
"Iris plotting in Roassal"
reader := DataFrameCsvReader new.
fileRef := 'iris.csv' asFileReference.
df := DataFrame readFrom: fileRef using: reader.
df_x := df columns: #('Sepal length' 'Sepal width' 'Petal length' 'Petal width').
dataServer := PMMemoryBasedDataServer new.
dataServer data: df_x.
finder := PMClusterFinder new: 3 server: dataServer type: PMEuclideanCluster.
finder minimumRelativeClusterSize: 0.01.
clusters := finder evaluate.
y := DataSeries new name: 'Output'.
df withIndexDo: [ :row :index |
y add: index->(finder indexOfNearestCluster: ({row at: 'Sepal length'. row at: 'Sepal width'. row at: 'Petal length'. row at: 'Petal width'} asPMVector)).
].
df addColumn: y.
m := PMMatrix rows: df_x.
m := (PMStandardizationScaler new) fitAndTransform: m.
pca := PMPrincipalComponentAnalyserJacobiTransformation new componentsNumber: 2.
pca fit: m.
reduced := pca transform: m.
transformOutput := Dictionary newFrom: {
3->'Iris-setosa' .
2->'Iris-versicolor' .
1->'Iris-virginica' .
}.
df column: 'Output' transform: [ :column |
column collect: [ :number |
transformOutput at: number.
].
].
df addColumn: (reduced atColumn: 1) named: 'PCA x'.
df addColumn: (reduced atColumn: 2) named: 'PCA y'.
b := RTGrapher new.
ds_setosa := RTData new.
ds_setosa label: 'Iris setosa'.
ds_setosa dotShape circle color: Color red trans.
ds_setosa points: (df select: [:row | ((row at: #Type) = 'Iris-setosa') & ((row at: #Type) = (row at: #Output))]).
ds_setosa interaction popupText: [ :row | 'Actual: ', (row at: #Type) asString, '. Predicted: ', (row at: #Output) asString ].
ds_setosa x: [ :row | row at: 'PCA x' ].
ds_setosa y: [ :row | row at: 'PCA y' ].
b add: ds_setosa.
ds_versicolor := RTData new.
ds_versicolor label: 'Iris versicolor'.
ds_versicolor dotShape circle color: Color blue trans.
ds_versicolor points: (df select: [:row | ((row at: #Type) = 'Iris-versicolor') & ((row at: #Type) = (row at: #Output))]).
ds_versicolor interaction popupText: [ :row | 'Actual: ', (row at: #Type) asString, '. Predicted: ', (row at: #Output) asString ].
ds_versicolor x: [ :row | row at: 'PCA x' ].
ds_versicolor y: [ :row | row at: 'PCA y' ].
b add: ds_versicolor.
ds_virginica := RTData new.
ds_virginica label: 'Iris virginica'.
ds_virginica dotShape circle color: Color green trans.
ds_virginica points: (df select: [:row | ((row at: #Type)) = 'Iris-virginica' & ((row at: #Type) = (row at: #Output))]).
ds_virginica interaction popupText: [ :row | 'Actual: ', (row at: #Type) asString, '. Predicted: ', (row at: #Output) asString ].
ds_virginica x: [ :row | row at: 'PCA x' ].
ds_virginica y: [ :row | row at: 'PCA y' ].
b add: ds_virginica.
ds_misclassified := RTData new.
ds_misclassified label: 'Misclassified'.
ds_misclassified dotShape circle color: Color black trans.
ds_misclassified points: (df select: [:row | (row at: #Type) ~= (row at: #Output)]).
ds_misclassified interaction popupText: [ :row | 'Actual: ', (row at: #Type) asString, '. Predicted: ', (row at: #Output) asString ].
ds_misclassified x: [ :row | row at: 'PCA x' ].
ds_misclassified y: [ :row | row at: 'PCA y' ].
b add: ds_misclassified.
b.
|
Here is it exported from Roassal as HTML (go on - hover over the points!):
Note: I have clustered first, then applied PCA. Exercise for the reader - try the opposite!
In the last week, I begun implementing t-SNE for PolyMath. It is exciting to dissect the paper, working through the nitty-gritty details and translating them into code. By the end of next week, it will be the first paper I have implemented! I also am writing an accompanying post, which will explain the math behind the algorithm - it is one of the reason that this week’s post is small!
You can track the progress of the implementation at this link: t-SNE project board.