本文共 3474 字,大约阅读时间需要 11 分钟。
新的impala已经支持udf了,在测试环境部署了1.2.3版本的cluster.
在运行测试udf时遇到下面这个错误: java.lang.IllegalArgumentException (表明向方法传递了一个不合法或不正确的参数。) 经过确认这是一个bug: https://issues.cloudera.org/browse/IMPALA-791 The currently impala 1.2.3 doesn't support String as the input and return types. You'll instead have to use Text or BytesWritable. 1.2.3版本的impala udf的输入参数和返回值还不支持String,可以使用import org.apache.hadoop.io.Text类代替String Text的api文档: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Text.html 重要的几点: Constructor: Text(String string) Construct from a string. Method: String toString() Convert text back to string void set(String string) Set to contain the contents of a string. void set(Text other) copy a text. void clear() clear the string to empty 在eclipse中测试Text类的用法: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | package com.hive.myudf; import java.util.Arrays; import java.util.regex.Pattern; import java.util.regex.Matcher; import org.apache.hadoop.io.Text; public class TextTest { private static Text schemal = new Text( "http://" ); private static Text t = new Text( "GET /vips-mobile/router.do?api_key=04e0dd9c76902b1bfc5c7b3bb4b1db92&app_version=1.8.7 HTTP/1.0" ); private static Pattern p = null ; private static Matcher m = null ; public static void main(String[] args) { p = Pattern. compile( "(.+?) +(.+?) (.+)" ); Matcher m = p.matcher( t.toString()); if (m.matches()){ String tt = schemal + "test.test.com" +m.group( 2 ); System. out .println(tt); //return m.group(2); } else { System. out .println( "not match" ); //return null; } schemal .clear(); t.clear(); } } |
测试udf:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | package com.hive.myudf; import java.net.URL; import java.util.regex.Matcher; import java.util.regex.Pattern; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; import org.apache.log4j.Logger; public class UDFNginxParseUrl extends UDF { private static final Logger LOG = Logger.getLogger(UDFNginxParseUrl. class ); private Text schemal = new Text( "http://" ); private Pattern p1 = null ; private URL url = null ; private Pattern p = null ; private Text lastKey = null ; private String rt; public UDFNginxParseUrl() { } public Text evaluate(Text host1, Text urlStr, Text partToExtract) { LOG.debug( "3args|args1:" + host1 + ",args2:" + urlStr + ",args3:" + partToExtract); System. out.println( "3 args" ); System. out.println( "args1:" + host1 + ",args2:" + urlStr + ",args3:" + partToExtract); if (host1 == null || urlStr == null || partToExtract == null ) { //return null; return new Text( "a" ); } p1 = Pattern.compile( "(.+?) +(.+?) (.+)" ); Matcher m1 = p1.matcher(urlStr.toString()); if (m1.matches()){ LOG.debug( "into match" ); String realUrl = schemal.toString() + host1.toString() + m1.group( 2 ); Text realUrl1 = new Text(realUrl); System. out.println( "URL is " + realUrl1); LOG.debug( "realurl:" + realUrl1.toString()); try { LOG.debug( "into try" ); url = new URL(realUrl1.toString()); } catch (Exception e){ //return null; LOG.debug( "into exception" ); return new Text( "b" ); } } if (partToExtract.equals( "HOST" )) { rt = url.getHost(); LOG.debug( "get host" + rt ); } //return new Text(rt); LOG.debug( "get what" ); return new Text( "rt" ); } } |
几个注意的地方:
1.function是和db相关联的。 2.jar文件存放在hdfs中 3.function会被catalog缓存本文转自菜菜光 51CTO博客,原文链接:http://blog.51cto.com/caiguangguang/1359312,如需转载请自行联系原作者