Word2Vec is a novel technique that produces a vector representation of documents where the meaning and relationships between words is encoded spatially. Therefore, words that are related to each other are closer on the defined feature space. Word2Vec is gaining huge traction in the machine learning community and it is definitely worth to know more about it. This blogpost will illustrate the main characteristics of this methods and it will provide an proof of concept using Clojure libraries.
The second (and last) day of the conference started with presentations from two massive companies: Philip Radley shown how BT is relying on Hadoop to achieve a lot of increase in value for their clients; and Rod Smith (from IBM) defended the position that digital innovation is nowadays driven by real time insights. He claimed that realtime is becoming a critical cornerstone and summarised the three types of data analysis process we have seen in the last years:
- Traditional: Time spent moving data around rather than analysing it.
- Big Data: Driven by contextual data, more time analysing than driving actionable insights.
- Rapid insights: Just in time quick approximations of solutions.
Strata+Hadoop World is one of the main conferences in the world for Big Data technologies and I was lucky enough to attend it last week. Even if this is my second Strata I couldn’t but be amazed at the scale of the conference, with 7 parallel sessions from topics ranging from data science to the future of Hadoop.