I presented my current research in our ISP Seminar series. Gaurav has written a very nice blog post on it (link), summarizing the talk.
I’ll be presenting a talk on the emerging topic of Human Data Interaction. It would be at 5317 Sennott Square, from 1 to 1:30. This is the abstract for the talk:
A data explosion is happening, promising invaluable opportunities in scientific and technological progress, yet this vast potential relies not only on our ability to collect and access this data, but also on us being able to understand it. Sensemaking is the process of building a mental model of the data. It is only after acquiring this intuition that we can apply our mathematical tools (such as Machine Learning methods) in their full power and extract meaning and knowledge out of the raw data. Up until now, this sensemaking process has been done intuitively, usually through conventional visualization techniques (e.g. plotting the data). But the emergence of vast and high dimensional datasets is raising challenging issues not addressable by our current data analytic approaches. For example, current datasets are getting so large that asking even the simplest questions from them may take hours or days of computation. Even after accessing the data, usual visualization techniques may not work due to issues like overplotting. Furthermore, it is not even possible to fully visualize datasets that have hundreds or thousands of dimensions. These issues are providing the motivation for the emergence of a new field: Human Data Interaction.
Currently, HDI is more about asking the right questions and it has few answers to offer; questions like: Facing with a large and high dimensional dataset, how to even find meaningful and interesting questions to ask? How to answer those questions without waiting for hours and losing our train of thought? How to explore the data and navigate in its large and high dimensional space? How to use these explorations to make sense of the data? How these acquired mental models can be used in forming new knowledge and doing real scientific discoveries? How can we disseminate this knowledge? How to communicate our mental model of this complex data object to other people? What are the technical difficulties facing big data exploration and how to overcome them?
In this talk I will first start by giving some examples of a few successful data exploration tools, GigaPan and Time Machine. These tools experiment with a multitude of human data interaction mechanisms in order to help with the process of sensemaking of massive visual datasets and also in the process of communicating that knowledge to others. Then I will discuss the challenges facing HDI and some of initial answers that we can borrow from a multitude of fields, from database systems and human factors to developmental psychology, learning theory and communication theory. In the final section of this talk I will present EVA (Explorable Visual Analytics). EVA is a web-based prototype for understanding and interacting with large and high-dimensional data. We will use EVA to do some hands-on HDI experiments with Census and Twitter datasets.
I’ve recently finished a great course on the history of humanity on coursera: https://www.coursera.org/course/humankind. It was a fascinating series of lectures provoking deep questions on the main development patterns throughout the history, the meaning of happiness, and even the future of humanity. I strongly suggest it to anybody who is interested in thinking about the major forces shaping our past, and possibly even our future.
An interesting article on the drawbacks of open-office spaces:
Have a look at our yearly CREATE lab retreat with a mysterious floating tree trunk …
Startup Engineering is a nice course being presented on coursera now. It provides a hands-on experience on tools, programming languages and techniques suitable for web application development. By the end of the course, you should be able to create a micro-crowd-funding website, accepting bitcoins and supporting your own project! I really enjoy the fast pace of the course which is enhanced by nice lectures and “useful” tutorials!
I strongly suggest the new C++ book by Bjarne Stroustrup for anyone who wants to renew her knowledge on C++, specially on the recent ISO version, C++11.
Today, Time magazine published a featured story on Google earth engine timelapse of earth, a terapixel planetary scale Time Machine, gathered from 30 years of satellite imagery. CREATE Lab had a major role in the makings of this interactive dataset.
Time magazine story: http://world.time.com/timelapse/
Google earth engine (interactive player): http://earthengine.google.org