Data Science

Statistical Analysis of the Holy Quran (Part 1)

Statistical Analysis of the Holy Quran (Part 1)

The English and Arabic corpus of the Holy Quran is a rich source for statistical analysis. For instance, the entire test corpora has half a million words and many thousand distinct words. A rich dataset such as the Holy Quran, therefore, provides for an exciting journey of data exploration. More

How to open large text files (>5 GB) on a Mac ?

How to open large text files (>5 GB) on a Mac ?

A month ago, I downloaded a large dataset from Twitter. The .txt file consisted of around 1.5 million tweets in JSON and weighed at 5.5 GB. I wanted to look at the structure of the JSON in order to design a parser for processing the tweets. I was then working on a Sentiment Analysis project. As I attempted to open the file in Sublime Text 2, my powerful Mac just gave up.  More

Graph Theory 101: Directed and Undirected Graphs

Graph Theory 101: Directed and Undirected Graphs

This is a very short introduction to graph theory. We will be talking about directed and undirected graphs, the formulas to find the maximum possible edges for them and the mathematical proofs that underlie the philosophy of why they work. This is my first use of LaTeX on Mr. Geek. More

Measuring influence in a group using social network analysis

Measuring influence in a group using social network analysis

I have decided to publish the contents of my Complex Networks and Web coursework project here on Mr. Geek. The information contained in this post might be complex to some, but I assure you that this will be a good long read. I have included lots of pictures to make sure you don’t get bored in the swathes of text.  More