top of page
Search

David McGinnis
Mar 17, 20206 min read
Testing a Hive Patch on a Local System
[...] I needed to get a Hive cluster running my code and a Confluent cluster that could output Avro messages in the proper format to test.
307 views
0 comments


David McGinnis
Mar 10, 20204 min read
A Crash Course in Proper Oozie Usage
[...] focus on best practices such as when and why you should use Oozie, and when to use bundles.
348 views
0 comments

David McGinnis
Feb 25, 20207 min read
Debugging From The Field: The Case of the Empty Files
A team at a client was using Spark to read and write to a Kafka topic. [...] files that would be written that were completely empty.
342 views
1 comment

David McGinnis
Dec 3, 20195 min read
Debugging From The Field: The Case of the Ignored Configuration Change
We made the change on a Sunday, but four days later, the number of files had not appreciably changed in the YARN logs directory.
161 views
0 comments


David McGinnis
Nov 26, 20194 min read
A Crash Course in YARN Log Aggregation
The system which maintains the application logs in HDFS is called the Log Aggregation system and is flexible [...]
7,744 views
1 comment


David McGinnis
Nov 12, 20194 min read
Debugging from the Field: Smartsense Activity Explorer Stops Working
This client didn't use the activity explorer much [...] I tried to run one of the paragraphs and immediately [failed].
302 views
0 comments


David McGinnis
Oct 22, 20194 min read
Running Garbage Collection on Your Cluster
At a high level, [CGC] is merely going through the cluster, taking inventory of the data and processes that run on the cluster...
108 views
0 comments


David McGinnis
Oct 6, 20195 min read
YARN Capacity Scheduler and Node Labels Part 3
We will discuss how partitions play with YARN queues. Finally, we will return to the example given in the first part of this series.
888 views
0 comments


David McGinnis
Sep 29, 20194 min read
YARN Capacity Scheduler and Node Labels Part 2
How do we ensure that GPU jobs run on worker nodes with GPUs without buying expensive GPUs for all of our worker nodes?
885 views
0 comments


David McGinnis
Sep 22, 20195 min read
YARN Capacity Scheduler and Node Labels Part 1
I'm going to explore exactly how YARN works with queues, and the various mechanisms available to control how YARN does this.
2,465 views
0 comments


David McGinnis
Sep 8, 20194 min read
Debugging From the Field: Sudden Kerberos Failure in HiveServer2 Instance
A client has a medium sized kerberized HDP 2.6.0 installation. Hive Interactive is disabled and Hive is set up to use HTTP transport mode.
587 views
0 comments
bottom of page