Monday, January 22, 2018

[Big Data] Hadoop Core

Hadoop in Layman's term
Lets say you have file that contains name of all the people who lives in your apartment complex. You want to see how many people has same name as yours. You write a program that reads the file and outputs how many people has same name as yours. Now, lets say you want to know how many people in your city has same name as yours. The data is too big to fit in a single computer. So you get some 20 computers and connect them which forms a cluster. You install Hadoop on the cluster. Now, you start writing name of the people of your city in a file in Hadoop File System (HDFS) which in turn starts breaking your data into small chunks and writes into your 20 computers. You keep appending the file until you have written all the names of your city. Now, you submit the program that you previously have written for a single computer to find the count of name match on Hadoop. Hadoop takes your program, and asks your 20 computers to run the program parallelly. Hadoop then asks your 20 computers to aggregate and provide you the result.

Some terms:
node: a small computer capable of storing data and do processing on the data
cluster: combination of lots of node

Hadoop provides 2 functionalities.
0. Distributed fault tolerant data storage (hdfs)
1. Batch processing on stored data (map-reduce)

# HDFS
hdfs is a Unix like distributed file system. It splits large files into small blocks and stores them in different nodes. hdfs stores each block by default 3 times in 3 separate nodes to provide safety of the data in case of a node failure.

## Services
0. many data nodes: these service is run on those nodes that stores data. send heart beat and block information to master node. clients connects to data nodes to read/write data.
1. master name node: stores meta data about which data block is stored in which data node. guides client to write/read data to/from appropriate data node
2. check point node: creates check point for the name node. this is not a hot back up for master name node.

## How hdfs works
Write
0. client connect to name node and asks which data nodes to write data to
1. name node gives data node address to client
2. client connects to data nodes and writes data to data nodes
3. data nodes takes action to replicate data to other data nodes guided by name node

Read
0. client connects to name node and asks which data nodes to read from
1. name node gives data node addresses
2. client connects to data nodes and read data
3. in case of a datanode failure, client reads the data block from another data node guided by name node


# MapReduce
Clients want to process data stored in hdfs. A client submits a "Job" to MapReduce that MapReduce runs across diffrent nodes in the cluster.

## Services
0. Job tracker: master service to monitor jobs. A job is ran as many tasks in several nodes distributedly. retries failed task attempts. Schedules incoming jobs from different clients.
1. Task tracker: runs on the same physical machine as the data node. several tasks executed in a distributed system accomplishes the Job that a client submits. A task has one or more attempts. Sends heartbeat and task status to job tracker. It runs on its own JVM on a datanode.

## How Mapreduce works
0. Clients submits job to job tracker
1. Job tracker assigns tasks to task trackers that are close to the data blocks.
2. Task trackers executes tasks. and writes the result to hdfs with replication
3. if a task tracker fails, job tracker assigns the task to another task tracker


# YARN
Abstract framework for distributed processing. MapReduce is a concrete YARN application. It divides duty of the Job Tracker into Application master, Resource manager. Task tracker acts as a node manager, which have "Containers" on which a map or reduce task can be executed. Number of containers on a node is configurable.


## Map Reduce
Data processing a is done in 2 phases.
0. Map phase: apply map function on input key value pairs to generate intermediate key value pairs. group intermediate key value pairs by intermediate keys. each group will contain one key and one or more values.
Components used during map phase:
a. InputFormat: Reads file line by line.
b. RecordReader: Reads input key, value pair from a line using InputFormat.
c. Mapper: Contains map function to apply on input key, value pairs and produces intermediate key, value pairs.
d. Combiner: Performs a local reduction on intermediate key, value pair.
e. Partitioner: Decides which intermediate key, value should go to which partition.

1. Reduce phase: apply reduce function on grouped intermediate key value pairs.
Components of reduce phase
a. Shuffle: Decides on which partition this reducer should operate on.
b. Sort: Sorts data on a single partition by key.
c. Reducer: given an intermediate key and a set of values, performs reduce operation and produces output key value pair.
d. RecordWriter: used to store one key, value pair.
f. OutputFormat: creates the record writer and writes content of the RecordWriter.

Component used by both phase:
WritableInterface: specifies how to read/write data to/from a file. Integer data is written as IntWritable, read as IntWritable.

Misc Notes
0. A map-reduce job can contain only one mapper job and only one reducer job. So a job such as word counter can be created. To create MR jobs pipeline, framework such as Crunch can be used.
1. You can append data on a hdfs file. There is no way to modify existing content of a file stored in HDFS.
1. Hadoop Streaming: Executing shell, python etc. script as jobs. Example:
hadoop jar hadoop-streaming.jar -input input -output outputdir
-mapper org.apache.hadoop.mapreduce.Mapper -reduce /bin/wc



Reference:
0. hadoop just the basics - slides
1. hadoop just the basics - youtube video






Thursday, January 18, 2018

[Book Take Away] The Zen Programmer

Buddha
Siddhartha Gautama, A prince from Nepal born 500 years before christ. At age of 26, he learns about death, distress and disease. He left his house to find the remedy of this basic problems. In his journey, he understood four noble truth that expresses reality of pain. He found eightfold path that would minimize pain of a human-being. He is known as the first "Buddha" which means awakened.

# Buddhism
Teachings of the first Buddha. In Buddhism there is no God.

## Four noble truth
0. There exists dissatisfaction.
1. Root cause of dissatisfaction is desire, hatred and wrong thoughts.
2. If the root causes are gone, pain will seize.
3. Eight-fold path helps eliminate root cause.

## Eight-fold path to nirvana
Eight-fold paths are not commandments. These can be considered more as "Best Practices".

0. Right view: Understand four noble truth. See things without prejudice. There is no correct or wrong view. Right view can be considered as absence of any view.
1. Right intention: Right intention means acting without any desire. Right intention is absence of any intention.
2. Right speech: Right speech is no speech or amount of speech that is absolutely necessary.
3. Right action: Right action is no action or action that have minimum possible impact.
4. Right livelihood: Right livelihood is no livelihood or a livelihood with minimum possible impact to surroundings.
5. Right effort: An effort where mind(thought) is constantly monitored.
6. Right mindfulness: Leading life with full attention on body and mind.
7. Right concentration: Keeping balance between chaos and tranquility of mind.

## Zen Buddhism
A sect of Buddhism. Not thinking about anything is Zen. Once this mind-clearing technique is mastered, everything such as walking, standing, sitting or eating becomes Zen practice.

Some Zen terms:
### Hell: Situation a person creates for oneself and surroundings through wrong thoughts and action.
### Ghost: The thoughts that keep desiring good looking sexual partner or comfortable belongings.
Zen does not give a person something. Zen does not make one happy. Zen does not tell a person do something good. Zen is about clearing unnecessary stuffs, keeping mind and dwelling empty. Zen acknowledge there is by default pain and dissatisfaction. Zen teaches to minimize pain.

Why Zen programming?
A programmers day is a combination of the following
0. overtime
1. ambitious requirements
2. wrong team
3. high expectation
4. not dealing with life
5. motivation by threat
6. changing requirement
7. greed
8. comparison with others
9. burn out
so on and so forth. Dealing with so many chaos is not easy. A programmer will need a way to deal with all these stresses. Zen can help her with these problems.

The nature of mind
# Chaos and rational thinking: Chaotic thoughts increases dissatisfaction. Rational thinking sees everything as the same and decreases desire. Chaos in mind needs to kept under control by minimizing chaos making actions and thoughts. Chaotic thinking is sometimes good for creativity yet it needs to be under check.

# Associative thinking: A process of mind's drifting with a hint. Example: you think about apple, next your mind thinks automatically about mac book and Steve jobs.

Zen Practices
Ki:  (Breath and vitality) Breathing mindfully.

Kizen: Keeping chaos and rational thinking in balance and being mindful about associative thinking.

Task breakdown: A larger task needs to be broken on smaller chunks.

Reflection: Every once on a while focus on what you are thinking and what you are doing.

Focus time: Set aside a time slot for uninterrupted work.

Email check: Set aside a less productive hours for non-pressing email.

Chair relaxation: Every once in a while, while sitting on the chair, focus on breathing.

Walking relaxation: Walking while being fully aware of the surrounding.

Sleep: If you are tired take a nap. Go to your car if necessary.

Work without holiday: If you work mindfully, following the above practice, everyday you'll feel you are in holiday.

Drink Tea: Be mindful about every step of making a cup of tea. Observe every sip. There should not be any other thoughts other than you and your cup of tea.

Clean: Keep your desk clean. There will be less things to get distracted with.

Defeat mind monkey: Do not roam around websites. Clear desk mindfully. Mindful work helps defeat mind monkey.

Take break: Take a real holiday. Stay away from computers. In work you focus more on mind. Focus more on body when in holiday.

Todo list: Make a weekly list of tasks. It is important how many tasks you accomplished rather than how much time you put (Saif disagrees with this).

Two minute rule: If you are in middle of something and another task comes from another source. If the task is non-pressing and can be done in 2 minutes, switch from original task, otherwise write the task down in your todo list and keep doing the original task.

Pomodoro principle: Break your task in 25 minutes slots(1 Pomodoro). Take 5 minutes break after each Pomodoro. After 5 consecutive Pomodoros, take 30 minutes break.

Chain: Take a calendar. Put a X mark on the calendar as you have done the work you want to do regularly. Try to make a chain of Xs. Try your best not to break the chain..

Personalized Kanban: Use your wall and sticky notes to make your own personalized Kanban. In one section, put works you want to do. In another section put the works that are in progress. Don't clutter in progress with too many tasks. (More than four are too many).

Don't become an extremist: Keep a healthy balance of work and rest.

Frequently made complains:
0. Others don't treat me well: There will always be someone who treats nice, there will be someone who does not treat others so nice. Accept it.

1. I deserve it: What we think we deserve (such as costly phones) endup in earth's cheapest dumping ground(poor countries of Africa). Beware of what you think you deserve.

2. I had bad childhood: Many people probably had worse childhood than you had, yet they were able to go through it and did what they wanted to do in life. So bad childhood should not be a complain once you are aware of it. Stop leaving in past, act properly now.

3. I know it better: We don't know what will bring good or bad for us. Most of the time everything ends up being the same anyway.

Things to remember:
0. It's your life: No matter what happens to you, it is still your life. Don't forget that every day is a good day.

1. No ego: Some attributes that we think represents us, such as good look, attractive figure, knowledge, money etc are actually pretty volatile. Attaching these things to self might cause more pain when these things go away.

2. Ego makes you do things: You keep doing those things that you think the world thinks of you.

3. Ego-less programming: You are not your work. Your work can be made better with the help of others feedback. Take reviews to your work constructively.

4. Shut up: Only speak up if you absolutely have to. Don't waste your collegue's time with unnecessary chit-chat.

Zen is hard work
Zen is done with body and mind. Do your daily chores with your body and mind.

Career: Sometimes your career path might take you to the path which might not be the best for you. Feel free to say no to such promotions.

Taking care of body is important: You will regret when you are old if you don't take care of your body today.

Learn: Everything changes. Don't stop the room for improvement and learning.

Beware of environment: Know what is in your surroundings and know how to connect the gaps through learning.

Theory needs practice: What you learn, you need to practice them.

Don't become job title addict.

Calm down: In order to see things clearly, you need to calm down.

Keep the beginner mindset: Always look at things through a beginner's eye. Don't have an experts mindset. Being an Expert means a lot of ego. Beware that you can be wrong have the beginner's mindset.

Work being aware: Know what is needed, what help is available, have some preplanning of work.

Karma:
  Good karma that does something good.
  Bad karma is something that does something bad.
  Karma always backfires, a good or bad karma might end up causing troubles and pains. So, strive on no karma.

Code karma: Good code karma is coding with praise in mind. Bad karma is coding without care. Strive to avoid both good and bad code karma. Code with being aware, do only the much that needs to be done.

Buddha Programmer
A person who maintains calm mood and clear sight is a Buddha programmer. She sees good things in her colleagues and speaks up about bad things. She does not look for followers or worshipers. She does not have desire for greatness. She practices for the sake of practice, works for the sake of work, nothing else.

On being a student
0. Listen, learn what your teacher wants to teach you.
1. Keep respect, do your research, come up with short concise and easy to understand question for your teacher.
2. Don't go after your teachers job.
3. If your teacher has some fault, talk with her first.
4. You must give loyalty, honesty and commitment as you are receiving valuable knowledge from your teacher.

On being a teacher
0. Give advice sparingly, only if asked for.
1. Remember people learn on their own, give them time to figure things out by themselves. Only help/answer if a question is asked.
2. If a student becomes less engaged or unwilling to learn from you, simply stop being her teacher, be a colleague.

Best student-teacher relationship is when a teacher becomes a student and student become a teacher.

Hungry ghost
The people who work for recognition, gets angry if something desired is not met.

# Ignorance
There will be hungry ghosts and it is necessary to ignore them.

# Confrontation
If there are only a few hungry ghosts in your team and you have at least the same authority as they have and there is no other option, confront. Know that things will worsen and chaos would increase.

# Manipulation
Identify hungry ghosts who are after you. Find what they want. Do your work and give them recognition.

Incompetence
It's hard to find a truly incompetent person. It's likely that the person is not in right position.

Zennify your project

# As a team leader: Take care of your team. Before someone gets to your teammate, they should get through you.

# Path of ruin: If leaders work with anger,they send anger to his associates. Which goes down the hierarchy. Eventually these falls to the family members at the bottom.

# It's never that bad: People work out of fear, forgetting being mindful of actual outcome of losing job. It's never that bad. A few months will be tough. This is a part of life, no one can expect sunshine and rainbow all the time. For sunshine, night is necessary; for rainbow rain necessary.

# Laugh when desperate: Stop taking yourself seriously. You are just a guy, working on some company. Take things easily. Mindfully try to understand the situation. Think of an action. Do as much as you can. Know failure occurs often, accept it and learn from it. Don't get desperate when failure occurs.

Ten rules of Zen programmer
0. Focus: Do one task with mindfulness at a time. When you are sleeping only sleep. When you are eating only eat. When you are thinking only think.
1. Keep mind clear: Need to clear mind of every temptation such as social network, emails, news and songs.
2. Keep beginner's mind: See things through a beginner's mind.
3. No ego: Remember you are not important. Do not get proud because you can do something well. Everyone is good at something. Do not attach an idea, a piece of code or appearance to "mine".
4. There is no career goal: Be aware of present moment. Mindfully learn, act, work, speak. Do not wait to go to a higher post. Do things with right intention and mindfulness now.
5. Shut up: Do not speak up if you do not have to. Do not waste others time.
6. Mindfulness, care and awareness: Listen to sign of your body. Take rest and breaks. Take care of your body. Remember, no work is beneath you. When some one gives you a task, do it with as much focus as possible. Block all thought of hatred towards the task.
7. There is no boss: If your authority is asking for tasks that is or might hurt your body(unhealthy overtimes), say no.
8. Do something else: Do a thing that is not related to computers.
9. There is nothing special: Remember there is nothing special, you are not special and you are not important. You are not special just because you can craft good code. No one cares who built the pyramid. Everything you posses today, you have to loose everything eventually. Everything will change, decay and go away. Remember this.

What now
Know that
  - Only you can help yourself.
  - Your feeling and thought shapes your reality.
  - You need time for silence and concentration.
  - Take ten minutes of silence and solitude everyday morning. Breathe the morning air.

Reference:
https://www.zenprogrammer.org/