Skip to content
Projects
Groups
Snippets
Help
Loading...
Sign in
Toggle navigation
M
meta_base_code
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
宋柯
meta_base_code
Commits
f46acb3e
Commit
f46acb3e
authored
Nov 23, 2020
by
litaolemo
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update
parent
8757a3d8
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
4 deletions
+8
-4
spark_test.py
task/spark_test.py
+8
-4
No files found.
task/spark_test.py
View file @
f46acb3e
...
...
@@ -74,12 +74,13 @@ spark.sql("CREATE TEMPORARY FUNCTION is_json AS 'com.gmei.hive.common.udf.UDFJso
spark
.
sql
(
"CREATE TEMPORARY FUNCTION arrayMerge AS 'com.gmei.hive.common.udf.UDFArryMerge'"
)
task_list
=
[]
task_days
=
8
task_days
=
3
for
t
in
range
(
1
,
task_days
):
for
t
in
range
(
2
,
task_days
):
day_num
=
0
-
t
now
=
(
datetime
.
datetime
.
now
()
+
datetime
.
timedelta
(
days
=
day_num
))
last_30_day_str
=
(
now
+
datetime
.
timedelta
(
days
=-
30
))
.
strftime
(
"
%
Y
%
m
%
d"
)
tomorrow_str
=
(
datetime
.
datetime
.
now
()
+
datetime
.
timedelta
(
days
=
day_num
+
1
))
.
strftime
(
"
%
Y
%
m
%
d"
)
today_str
=
now
.
strftime
(
"
%
Y
%
m
%
d"
)
today_str_format
=
now
.
strftime
(
"
%
Y-
%
m-
%
d"
)
yesterday_str
=
(
now
+
datetime
.
timedelta
(
days
=-
1
))
.
strftime
(
"
%
Y
%
m
%
d"
)
...
...
@@ -88,6 +89,9 @@ for t in range(1, task_days):
new_urser_device_id_sql
=
r"""
select t2.device_id as device_id from
(select device_id from online.ml_device_day_active_status where partition_date = '{today_str}' and active_type in (1,2)) t2
LEFT join (
select first_device from online.ml_user_history_detail where partition_date = '{tomorrow_str}' and last_active_date = '{today_str}'
) on first_device = t2.device_id
LEFT JOIN
(
select distinct device_id
...
...
@@ -152,8 +156,8 @@ for t in range(1, task_days):
)dev
on t2.device_id=dev.device_id
WHERE spam_pv.device_id IS NULL
and dev.device_id is null
"""
.
format
(
today_str
=
today_str
,
yesterday_str_format
=
yesterday_str_format
,
today_str_format
=
today_str_format
)
and dev.device_id is null
and first_device is not null
"""
.
format
(
today_str
=
today_str
,
yesterday_str_format
=
yesterday_str_format
,
today_str_format
=
today_str_format
,
tomorrow_str
=
tomorrow_str
)
print
(
new_urser_device_id_sql
)
new_urser_device_id_df
=
spark
.
sql
(
new_urser_device_id_sql
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment