Hi Everyone,
Recently i come with a requirement about Hadoop Hive integration in Oracle SOA, For same i wrote a blog which you can find on below link.
https://oraclesoaosbbpmapi.blogspot.com/2019/03/hadoop-hive-data-base-fetching-records.html
After successfully configuring hive in weblogic, We faced a new issue which is about connection.
Before going deep about issue and solution let's see how HIVE DB manage connection on server side.
Recently i come with a requirement about Hadoop Hive integration in Oracle SOA, For same i wrote a blog which you can find on below link.
https://oraclesoaosbbpmapi.blogspot.com/2019/03/hadoop-hive-data-base-fetching-records.html
After successfully configuring hive in weblogic, We faced a new issue which is about connection.
Before going deep about issue and solution let's see how HIVE DB manage connection on server side.
- when ever we make a data base connection for HIVE DB, it create an internal session and generates a session id which is valid for a specific period of time which is configured on server level.
- After that period of time Hive create a new session id and process the query.
How weblogic data source manage connection:
- When we create a data source in weblogic, We define a connection pool size by which we can use existing connections which are not in use by any other resource by which weblogic minimize the connection cost.
Issue due to complete different behavior of connection management in both env:
- We run a query around 11AM IST to fetch the data.
- On hive DB we configured a session time out 30 minutes.
- When we run a query again after 11:35AM, It will generate hive session error in place of re-creating a new connection in place of creating new session with hive.
Why no new connection is created:
- When we configured weblogic data source and define connection pool, at that time we configured two properties which are
- Initial capacity: We usually make it as 1.
- Minimum capacity: We usually make it as 1.
- Due to property Minimum Capacity, Weblogic always keep at-least one minimum connection in it's connection pool and when a resource request for a connection it releases same from it's pool with existing session ID which will invalidated after session time.
When you will run an instance, You will see below error message in SOA instance.
Invalid SessionHandle: SessionHandle [af5ff8d7-2f51-4d75-8d4c-52f15e049121]
How to fix this issue:
There are total three solution that i found which will work without any failure.
- Data source connection pool configuration
- Hadoop Session time out configuration, It is not safe and guaranteed to work in all case.
- Data source based HIVE session management.
- Hadoop Session time out configuration, It is not safe and guaranteed to work in all case:
- Increasing the timeout for the Apache Hadoop environment for resolving the behavior: In the Cloudera Manager (CM), navigate to: Hive > Configuration > Search: Session
- Data source based HIVE session management: Add below properties in JDBC Connection string
- jdbc:hive2://IP:10000/default?hive.server2.idle.session.timeout=3600ms
- While configuring about connection string you may get below error if client side session handling is not enabled.
- Test failed: Failed to open new session: org.apache.hive.service.cli.HiveSQLException: java.lang.IllegalArgumentException: Cannot modify hive.server2.idle.session.timeout at runtime. It is not in list of params that are allowed to be modified at runtime
- For same you to make some change in hiveserver2 hive-site.xml file.
- <property> <name>hive.security.authorization.sqlstd.confwhitelist</name> <value>hive.server2.idle.session.timeout</value> </property>Data source connection pool configuration
- Weblogic data source connection pool:
- Click on Hive data source
- Go to connection pool tab
- Search for initial capacity=0
- Search for minimum capacity=0
- Activate the change.
By any one the way, you can fix the issue.
Have question post me here!!!!!!!!!!
Comments
Post a Comment