If you are not using nested select, then IN and NOT IN queries work fine.
But if you are looking for nested select, here is a alternate way to solve it :
For IN queries :
[SQL] select id,name from user where id in (select id from students)
[HIVE] select user.id,user.name from user JOIN (select id from students) tmp on (tmp.id = user.id)
For NOT IN queries :
[SQL] select id,name from user where id not in (select id from students)
[HIVE] select user.id,user.name from user LEFT OUTER JOIN students ON (students.id = user.id) where students.id is null
Above will work only when students.id is supposed to be not null. Essentially idea is to have no rows from second table and all rows from first table in left outer join. So, accordingly one can modify where clause. Also in case of multiple joins, you might encounter duplicate rows in result. In that use,
[HIVE] select distinct user.id,user.name from user LEFT OUTER JOIN students ON (students.id = user.id) where students.id is null
Distinct will involve one more step of map reduce in HIVE.
But if you are looking for nested select, here is a alternate way to solve it :
For IN queries :
[SQL] select id,name from user where id in (select id from students)
[HIVE] select user.id,user.name from user JOIN (select id from students) tmp on (tmp.id = user.id)
For NOT IN queries :
[SQL] select id,name from user where id not in (select id from students)
[HIVE] select user.id,user.name from user LEFT OUTER JOIN students ON (students.id = user.id) where students.id is null
Above will work only when students.id is supposed to be not null. Essentially idea is to have no rows from second table and all rows from first table in left outer join. So, accordingly one can modify where clause. Also in case of multiple joins, you might encounter duplicate rows in result. In that use,
[HIVE] select distinct user.id,user.name from user LEFT OUTER JOIN students ON (students.id = user.id) where students.id is null
Distinct will involve one more step of map reduce in HIVE.
Very nice hack, Pratyush!
ReplyDelete