Skip to content
Projects
Groups
Snippets
Help
Loading...
Sign in
Toggle navigation
S
serviceRec
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
郭羽
serviceRec
Commits
5b8a4fe3
Commit
5b8a4fe3
authored
4 years ago
by
郭羽
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
特征工程优化
parent
4777e78a
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
2 deletions
+4
-2
featureEng.py
spark/featureEng.py
+4
-2
No files found.
spark/featureEng.py
View file @
5b8a4fe3
...
...
@@ -764,8 +764,6 @@ if __name__ == '__main__':
expDF
=
spark
.
sql
(
expSql
)
# ratingDF = samplesNegAndUnion(clickDF,expDF)
ratingDF
=
clickDF
.
union
(
expDF
)
print
(
"pos size:"
+
str
(
clickDF
.
count
()),
"neg size:"
+
str
(
expDF
.
count
()))
ratingDF
=
ratingDF
.
withColumnRenamed
(
"time_stamp"
,
"timestamp"
)
\
.
withColumnRenamed
(
"device_id"
,
"userid"
)
\
.
withColumnRenamed
(
"card_id"
,
"itemid"
)
\
...
...
@@ -782,6 +780,10 @@ if __name__ == '__main__':
print
(
"添加label..."
)
ratingSamplesWithLabel
=
addSampleLabel
(
ratingDF
)
posCount
=
ratingSamplesWithLabel
.
filter
(
F
.
col
(
"label"
)
==
1
)
.
count
()
negCount
=
ratingSamplesWithLabel
.
filter
(
F
.
col
(
"label"
)
==
0
)
.
count
()
print
(
"pos size:"
+
str
(
posCount
),
"neg size:"
+
str
(
negCount
))
# 数据字典
dataVocab
=
{}
...
...
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment