Skip to content
Projects
Groups
Snippets
Help
Loading...
Sign in
Toggle navigation
S
serviceRec
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
郭羽
serviceRec
Commits
ef5bbbd0
Commit
ef5bbbd0
authored
3 years ago
by
宋柯
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
模型上线
parent
877cf714
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
13 additions
and
5 deletions
+13
-5
featureEngSk.py
spark/featureEngSk.py
+13
-5
No files found.
spark/featureEngSk.py
View file @
ef5bbbd0
...
...
@@ -1103,19 +1103,17 @@ if __name__ == '__main__':
spark
=
get_spark
(
"SERVICE_FEATURE_CSV_EXPORT_SK"
)
spark
.
sparkContext
.
setLogLevel
(
"ERROR"
)
#获取点击曝光数据
#
clickDF, expDF, ratingDF, startDay, endDay = get_click_exp_rating_df(trainDays, spark)
clickDF
,
expDF
,
ratingDF
,
startDay
,
endDay
=
get_click_exp_rating_df
(
trainDays
,
spark
)
#item Es Feature
#
itemEsFeatureDF = get_item_es_feature_df()
itemEsFeatureDF
=
get_item_es_feature_df
()
#计算 item 统计特征
#
clickStaticFeatures, expStaticFeatures = getItemStaticFeatures(itemStatisticStartDays + trainDays + 1, startDay, endDay)
clickStaticFeatures
,
expStaticFeatures
=
getItemStaticFeatures
(
itemStatisticStartDays
+
trainDays
+
1
,
startDay
,
endDay
)
#计算线上推理 item 统计特征
predictClickStaticFeatures
,
predictExpStaticFeatures
=
getPredictItemStaticFeatures
(
itemStatisticStartDays
)
predictClickStaticFeatures
.
show
(
100
,
False
)
predictExpStaticFeatures
.
show
(
100
,
False
)
#user Profile Feature
userProfileFeatureDF
=
getUserProfileFeature
(
spark
,
addDays
(
-
trainDays
-
1
,
format
=
"
%
Y-
%
m-
%
d"
),
addDays
(
-
1
,
format
=
"
%
Y-
%
m-
%
d"
))
...
...
@@ -1221,6 +1219,15 @@ if __name__ == '__main__':
print
(
"训练数据写入 耗时s:{}"
.
format
(
time
.
time
()
-
write_time_start
))
#存入线上预测特征
# card_id | ITEM_NUMERIC_click_count_sum | ITEM_NUMERIC_click_count_avg | ITEM_NUMERIC_click_count_stddev
predictClickStaticDF
=
predictClickStaticFeatures
.
toPandas
()
# card_id | ITEM_NUMERIC_exp_count_sum | ITEM_NUMERIC_exp_count_avg | ITEM_NUMERIC_exp_count_stddev
predictExpStaticDF
=
predictExpStaticFeatures
.
toPandas
()
#
print
(
"总耗时:{} mins"
.
format
((
time
.
time
()
-
start
)
/
60
))
spark
.
stop
()
\ No newline at end of file
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment