Added comments about using randomly_subsample() when using compute_mean_squared_distance()

with large datasets. --HG-- extra : convert_revision : svn%3Afdd8eb12-d10e-0410-9acb-85c331704f74/trunk%403815

Added comments about using randomly_subsample() when using compute_mean_squared_distance()
with large datasets. --HG-- extra : convert_revision : svn%3Afdd8eb12-d10e-0410-9acb-85c331704f74/trunk%403815
3eb5d816 · Davis King · 07093165 · 3eb5d816 · 3eb5d816
Commit 3eb5d816 authored Sep 10, 2010 by Davis King
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 4 deletions

krr_regression_ex.cpp examples/krr_regression_ex.cpp +3 -2

rank_features_ex.cpp examples/rank_features_ex.cpp +3 -2

No files found.
--- a/examples/krr_regression_ex.cpp
+++ b/examples/krr_regression_ex.cpp
@@ -51,8 +51,9 @@ int main()
    // Here we set the kernel we want to use for training.   The radial_basis_kernel 
    // has a parameter called gamma that we need to determine.  As a rule of thumb, a good 
    // gamma to try is 1.0/(mean squared distance between your sample points).  So 
-    // below we are using a similar value.   
-    const double gamma = 3.0/compute_mean_squared_distance(samples);
+    // below we are using a similar value computed from at most 2000 randomly selected
+    // samples.
+    const double gamma = 3.0/compute_mean_squared_distance(randomly_subsample(samples, 2000));
    cout << "using gamma of " << gamma << endl;
    trainer.set_kernel(kernel_type(gamma));


--- a/examples/rank_features_ex.cpp
+++ b/examples/rank_features_ex.cpp
@@ -101,8 +101,9 @@ int main()
    // you should try the same gamma that you are using for training.  But if you don't
    // have a particular gamma in mind then you can use the following function to
    // find a reasonable default gamma for your data.  Another reasonable way to pick a gamma
-    // is often to use 1.0/compute_mean_squared_distance(samples).  This second way has the
-    // bonus of being quite fast.  
+    // is often to use 1.0/compute_mean_squared_distance(randomly_subsample(samples, 2000)).  
+    // It computes the mean squared distance between 2000 randomly selected samples and often
+    // works quite well.
    const double gamma = verbose_find_gamma_with_big_centroid_gap(samples, labels);

    // Next we declare an instance of the kcentroid object.  It is used by rank_features()