Skip to content
Projects
Groups
Snippets
Help
Loading...
Sign in
Toggle navigation
D
dlib
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
钟尚武
dlib
Commits
cc2de0e9
Commit
cc2de0e9
authored
Nov 23, 2012
by
Davis King
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Improved the ranking example
parent
cc708d04
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
57 additions
and
14 deletions
+57
-14
svm_rank_ex.cpp
examples/svm_rank_ex.cpp
+57
-14
No files found.
examples/svm_rank_ex.cpp
View file @
cc2de0e9
...
...
@@ -9,7 +9,7 @@
In this example, we will create a simple test dataset and show how to learn
a ranking function
on
it. The purpose of the function will be to give
a ranking function
from
it. The purpose of the function will be to give
"relevant" objects higher scores than "non-relevant" objects. The idea is
that you use this score to order the objects so that the most relevant
objects come to the top of the ranked list.
...
...
@@ -43,16 +43,17 @@ int main()
// should rank higher than other vectors. So what we do is make
// examples of relevant (i.e. high ranking) and non-relevant (i.e. low
// ranking) vectors and store them into a ranking_pair object like so:
ranking_pair
<
sample_type
>
query
;
ranking_pair
<
sample_type
>
data
;
sample_type
samp
;
// Make one relevant example.
samp
=
1
,
0
;
query
.
relevant
.
push_back
(
samp
);
data
.
relevant
.
push_back
(
samp
);
// Now make a non-relevant example.
samp
=
0
,
1
;
query
.
nonrelevant
.
push_back
(
samp
);
data
.
nonrelevant
.
push_back
(
samp
);
// Now that we have some data, we can use a machine learning method to
// learn a function that will give high scores to the relevant vectors
...
...
@@ -66,17 +67,29 @@ int main()
// linear_kernel.
typedef
linear_kernel
<
sample_type
>
kernel_type
;
// Now make a trainer and tell it to learn a ranking function based on
// our data.
svm_rank_trainer
<
kernel_type
>
trainer
;
decision_function
<
kernel_type
>
rank
=
trainer
.
train
(
query
);
decision_function
<
kernel_type
>
rank
=
trainer
.
train
(
data
);
// Now if you call rank on a vector it will output a ranking score. In
// particular, the ranking score for relevant vectors should be larger
// than the score for non-relevant vectors.
cout
<<
"ranking score for a relevant vector: "
<<
rank
(
data
.
relevant
[
0
])
<<
endl
;
cout
<<
"ranking score for a non-relevant vector: "
<<
rank
(
data
.
nonrelevant
[
0
])
<<
endl
;
// These output the following:
/*
ranking score for a relevant vector: 0.5
ranking score for a non-relevant vector: -0.5
*/
cout
<<
"ranking score for a relevant vector: "
<<
rank
(
query
.
relevant
[
0
])
<<
endl
;
cout
<<
"ranking score for a non-relevant vector: "
<<
rank
(
query
.
nonrelevant
[
0
])
<<
endl
;
// If we want an overall measure of ranking accuracy, we can find out
// how often a non-relevant vector was ranked ahead of a relevant
// vector like so. This is a number between 0 and 1. A value of 1
// means everything was ranked perfectly.
cout
<<
"accuracy: "
<<
test_ranking_function
(
rank
,
query
)
<<
endl
;
// vector using test_ranking_function(). In this case, it returns a
// value of 1, indicating that the rank function outputs a perfect
// ranking.
cout
<<
"accuracy: "
<<
test_ranking_function
(
rank
,
data
)
<<
endl
;
// We can also see the ranking weights:
cout
<<
"learned ranking weights:
\n
"
<<
rank
.
basis_vectors
(
0
)
<<
endl
;
...
...
@@ -87,12 +100,42 @@ int main()
// In the above example, our data contains just two sets of objects.
// The relevant set and non-relevant set. The trainer is attempting to
// find a ranking function that gives every relevant vector a higher
// score than every non-relevant vector. Sometimes what you want to do
// is a little more complex than this.
//
// For example, in the web page ranking example we have to rank pages
// based on a user's query. In this case, each query will have its own
// set of relevant and non-relevant documents. What might be relevant
// to one query may well be non-relevant to another. So in this case
// we don't have a single global set of relevant web pages and another
// set of non-relevant web pages.
//
// To handle cases like this, we can simply give multiple ranking_pair
// instances to the trainer. Each ranking_pair representing the
// relevant/non-relevant sets for a particular query. An example is
// shown below (for simplicity, we reuse our data from above to make 4
// identical "queries").
std
::
vector
<
ranking_pair
<
sample_type
>
>
queries
;
queries
.
push_back
(
query
);
queries
.
push_back
(
query
);
queries
.
push_back
(
query
);
queries
.
push_back
(
query
);
queries
.
push_back
(
data
);
queries
.
push_back
(
data
);
queries
.
push_back
(
data
);
queries
.
push_back
(
data
);
// We train just as before.
rank
=
trainer
.
train
(
queries
);
// Now that we have multiple ranking_pair instances, we can also use
// cross_validate_ranking_trainer(). This performs cross-validation by
// splitting the queries up into folds. That is, it lets the trainer
// train on a subset of ranking_pair instances and tests on the rest.
// It does this over 4 different splits and returns the overall ranking
// accuracy based on the held out data.
cout
<<
"cv-accuracy: "
<<
cross_validate_ranking_trainer
(
trainer
,
queries
,
4
)
<<
endl
;
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment