Planet Mozilla Education

September 03, 2016

Benjamin Smedberg (bsmedberg)

Firefox Modularity and WebExtensions

Last week, Andy McKay posted skeptical thoughts about whether Firefox system addons (also known as go-faster addons) should be required to use the WebExtensions API surface. I believe that all Firefox gofaster addons, as well as our Test Pilot experiments and SHIELD studies, should use WebExtensions. The explanation comes in three main areas:

  1. The core limiting factor in our ability to go faster while still shipping high-quality software isn’t the packaging and shipping mechanisms for our code. It’s the modularity and module boundaries of our codebase.
  2. The module structure and API surfaces of the Firefox/gecko codebase is hurting our product quality, hurting our teams and our ability to attract volunteer contributors, and hurting our ability to go faster. But there are some shining counter-examples!
  3. WebExtensions should be the future for module boundaries in the Firefox frontend. It’s worth planning carefully and going slow enough to get good module boundaries first, so that we can go blazingly fast later.
  4. Coda: by sticking to a single model for Firefox addons and builtin system addons, we can focus supporting work more effectively and make life better for both addon authors and Firefox engineers.

A diagram showing the "clustered cost" of Mozilla's module interdependencies from 1998-2000.

There is a long history of academic research about software modularity. I remember the OSCON talk in 2006 when I first heard about Exploring the Structure of Complex Software Designs: An Empirical Study of Open Source and Proprietary Code: this talk and paper has changed the way I think about software structure and open source and how to organize engineering teams including volunteers. (Incidentally, this seems like another way of saying “everything that’s important to me at work”.)

Quoting one of the followup papers that studies the impact of software modularity on change over time:

Specifically, we show that i) tightly-coupled components are “harder to kill,” in that they have a greater likelihood of survival in subsequent versions of a design; ii) tightly-coupled components are “harder to maintain,” in that they experience more surprise changes to their dependency relationships that are not associated with new functionality; and iii) tightly-coupled components are “harder to augment,” in that the mix of new components added in each version is significantly more modular than the legacy design.

—Alan MacCormack, John Rusnak, and Carliss Y. Baldwin. The Impact of Component Modularity on Design Evolution: Evidence from the Software Industry.

The structure and connectedness of modules in a codebase determines what you can change easily (and therefore quickly) and what things have to change slowly. There is not a single design for a software project: you have to design the structure of your modules and interfaces based on what may need to change in the future. Even more fundamentally, the shape and structure of APIs and modules determines where teams can be innovative most successfully. Relatedly, module size and structure determines where contributors can be most effective and feel most welcome.

There are some shining success stories at Mozilla where good modular design has allowed teams to work independently and quickly. The devtools debugging API is a great example of carefully designing an API surface which then allowed the team to move quickly and innovate. The team spent a lot of time refactoring guts within the JS engine to expose a debugging API which wasn’t tightly tied to the underlying platform. And since devtools are isolated from the rest of the browser in an iframe, there is a natural separation which allows them to be developed as a unit and independently. The advantages of this model became quickly apparent when the team started reusing the same basic debugging structure for a desktop Firefox to debug content in Firefox for Android, or debug the Firefox chrome itself, or even debug content in Chrome!

A lot of the pain and slowness in Firefox development relates to our code not having well-defined, documented, or enforced module boundaries or interfaces. Within the Firefox frontend itself, we have many different kinds of functionality that runs in the single technical context of the main browser window. We sometimes pretend that there are relatively clean boundaries between the Firefox frontend and the underlying platform, but this is a myth. There are pervasive implementation dependencies between things like the tab browser in the frontend and the network/docshell/PSM validation in the platform which often have to be modified together.

Lack of clean modules and interfaces manifest themselves in many ways. It is notoriously difficult to write tests for Firefox. This is pretty natural, since good tests often focus on the interface boundaries between modules. It also makes it extremely difficult to “mock” unrelated modules cleanly for a test. The pervasiveness of “random oranges” is not because tests are written poorly: it is most related to tests being very difficult to write well.

Historically problems maintaining and building Firefox addons and the addon ecosystem is directly related to not having an API surface for addons. Because writing an addon usually involved hacking into the structure of the browser.xul DOM and JS, it was natural that addon authors were frequently affected by browser changes. And it was very difficult for anyone, Firefox hacker or addon hacker alike, to know whether or how any particular change would affect addons. The most common complaint from addon authors was that we didn’t have good API documentation; but this is not a problem with the documentation. It’s a problem of not having good APIs. Similarly, a major user complaint was that addons break too often. This is not a problem of addon authors being bad coders; it is more fundamentally a lack of stable API surfaces that would allow anyone to write code that continues working over time.

In order to make addon development fun and productive again, we’ve made big investments in WebExtensions. WebExtensions is a clearly specified, tested, documented, and maintained API surface targeted at addons. The WebExtensions system will allow addon authors to innovate and be creative within a structure that lets Firefox support them over time. Over the next year, we expect basically all Firefox addons to move to WebExtensions. We’ve already seen this energize the addon community and especially promote addon development by authors who gave up developing old-style Firefox addons.

Within Firefox development itself, we need this same combination of structure and innovation. Starting with the new system addons, we need to spend the time up front to build API surfaces that allow teams within Mozilla to experiment and build new systems safely, with the confidence that they aren’t going to break Firefox by accident. We need to have enough foresight to say where we want to experiment, so that we can make sure that we have the right API and module boundaries in place so that experiments can be successful.

If our system addons are still coded in a way that depends on the internal structure of browser.xul, or XPCOM, or the network stack, then they have inherently high testing cost. We need to re-validate them against each release, and be very cautious about pushing changes because we don’t know whether any change could break the browser.

If, on the other hand, we use WebExtensions and have technical guarantees that addons are isolated from other code, we have the ability to push new changes aggressively and actually independently. QA and release teams don’t need to test every change extensively and in all combinations: instead we can use the technical isolation guarantees of the WebExtensions API to allow teams to be more completely independent, the same way we want addon authors to be essentially independent.

As we start developing system addons using WebExtensions, there are going to be APIs which we don’t want to commit to, or that we’re not sure about. I think we will have to come up with APIs that are not exposed to all addons, but only to particular system addons where we understand the risks and experimental nature. We are going to need a way for many teams to be responsible for their own API surfaces, and not route every change through the addons engineering team. We should embrace these challenges as the cost of success.

Having more code use WebExtensions is a way to focus engineering efforts. We know that both addons and system code needs better support for profiling and performance monitoring, telemetry, unit testing, localization, and other engineering support functions. By having our system addons use the same basic API layer as external addons, we can build tools that improve life for everyone. The cost of building each of these things well is large, but through shared effort we can make our engineering investment return much greater dividends.

At the end of the day, I believe we should end up in a world where most of the Firefox frontend is made available via system addons and glued together using well-defined API surfaces. What would it be like to have our tabbed browser, location bar, search interface, bookmarks system, session restore, etc be independently hackable? It would be awesome! It would mean a higher quality product, with more opportunities for volunteer contributors to improve specific areas. It would likely mean reduced edit/build/run/test cycles. It might even give addon authors the ability to completely replace certain components of the browser if they so choose.

I shared a draft of this post with Andy and other tech leads and got some great feedback. In order for discussion to go to one place, please post any replies or thoughts to the firefox-dev mailing list.

September 03, 2016 12:50 PM

July 01, 2016

Mark Finkle (mfinkle)

Leaving Mozilla

I joined Mozilla in 2006 wanting to learn how to build & ship software at a large scale, to push myself to the next level, and to have an impact millions of people. Mozilla also gave me an opportunity to build teams, lead people, and focus on products. It’s been a great experience and I have definitely accomplished my original goals, but after nearly 10 years, I have decided to move on.

One of the most unexpected joys from my time at Mozilla has been working with contributors and the Mozilla Community. The mentorship and communication with contributors creates a positive environment that benefits everyone on the team. Watching someone get excited and engaged from the process of landing code in a Firefox is an awesome feeling.

People of Mozilla, past and present: Thank you for your patience, your trust and your guidance. Ten years creates a lot of memories.

Special shout-out to the Mozilla Mobile team. I’m very proud of the work we (mostly you) accomplished and continue to deliver. You’re a great group of people. Thanks for helping me become a better leader.

It's a Small World - Orlando All Hands Dec 2015It’s a Small World – Orlando All Hands Dec 2015

July 01, 2016 12:26 PM

June 09, 2016

Mark Surman (surman)

Making the open internet a mainstream issue

The Internet as a global public resource is at risk. How do we grow the movement to protect it? Thoughts from PDF

Today I’m in New York City at the 13th-annual Personal Democracy Forum, where the theme is “The Tech We Need.” A lot of bright minds are here tackling big issues, like civic tech, data privacy, Internet policy and the sharing economy. PDF is one of the world’s best spaces for exploring the intersection of the Internet and society — and we need events like this now more than ever.

This afternoon I’ll be speaking about the open Internet movement: its genesis, its ebb and why it needs a renaissance. I’ll discuss how the open Internet is much like the environment: a resource that’s delicate and finite. And a resource that, without a strong movement, is spoiled by bad laws and consolidation of power by a few companies.

At its core, the open Internet movement is about more than just technology. It’s about free expression and democracy. That’s why members of the movement are so diverse: Activists and academics. Journalists and hackers.

photo via Flickr/ Stacie Isabella Turk/Ribbonheadphoto via Flickr/ Stacie Isabella Turk/Ribbonhead

Today, this movement is at an inflection point. The open Internet is increasingly at risk. Openness and freedom online are being eroded by governments creating bad or uninformed policy, and by tech companies that are creating monopolies and walled gardens. This is all compounded by a second problem: Many people still don’t perceive the health of the Internet as a mainstream issue.

In order to really demonstrate the importance of the open Internet movement, I like to use an analogue: The environmental movement. The two have a lot in common. Environmentalists are all about preserving the health of the planet. Forests, not clearcutting. Habitats, not smokestacks. Open Internet activists are all about preserving the health of the Internet. Open source code, not proprietary software. Hyperlinks, not walled gardens.

The open Internet is also like the environmental movement in that it has rhythm. Public support ebbs and flows — there are crescendos and diminuendos. Look at the cadence of the environmental movement: It became a number of times in a number of places. For example, an early  crescendo in the US came in the late 19th century. On the heels of the Industrial Revolution, there’s resistance. Think of Thoreau, of “Walden.” Soon after, Theodore Roosevelt and John Muir emerge as champions of the environment, creating the Sierra Club and the first national parks. Both national parks and a conservation movement filled with hikers who use them both become mainstream — it’s a major victory.

But movements ebb. In the mid-20th century, environmental destruction continues. We build nuclear and chemical plants. We pollute rivers and air space. We coat our food and children with DDT. It’s ugly — and we did irreparable damage while most people just went about their lives. In many ways, this is where we’re at with the Internet today. There is reason to worry that we’re doing damage and that we might even lose what we built without even knowing it. .

In reaction, the US environmental movement experiences a second mainstream moment. It starts in the 60s: Rachel Carson releases “Silent Spring,” exposing the dangers of DDT and other pesticides. This is a big deal: Citizens start becoming suspicious of big companies and their impact on the environment. Governments begin appointing environmental ministers. Organizations like Greenpeace emerge and flourish.

For a second time, the environment becomes an issue worthy of policy and public debate. Resting on the foundations built by 1960s environmentalism, things like recycling are a civic duty today. And green business practices are the expectation, not the exception.

The open Internet movement has had a similar tempo. It’s first crescendo — its “Walden” moment — was in the 90s. Users carved out and shaped their own spaces online — digital homesteading. No two web pages were the same, and open was the standard. A rough analogue to Thoreau’s “Walden” is John Perry Barlow’s manifesto “A Declaration of the Independence of Cyberspace.” Barlow boldly wrote that governments and centralized power have no place in the digital world.

It’s during this time that the open Internet faces its first major threat: centralization at the hands of Internet Explorer. Suddenly, it seems the whole Web may fall into the hands of Microsoft technology. But there was also a push back and  crescendo — hackers and users rallied to create open alternatives like Firefox. Quickly, non-proprietary web standards re-emerge. Interoperability and accessibility become driving principles behind building the Web. The Browser Wars are won: Microsoft as monopoly over web technology is thwarted.

But then comes inertia. We could be in the open Internet movement’s DDT moment. Increasingly, the Internet is becoming a place of centralization. The Internet is increasingly shaped by a tiny handful of companies, not individuals. Users are transforming from creators into consumers. In the global south, millions of users equate the Internet with Facebook. These developments crystallize as a handful of threats: Centralization. Loss of privacy. Digital exclusion.

Screen Shot 2016-06-09 at 1.35.12 PM

It’s a bit scary: Like the environment, the open Internet is fragile. There may be a point of no return. What we want to do — what we need to do — is make the health of the open Internet a mainstream issue. We need to make the health of the Internet an indelible issue, something that spurs on better policy and better products. And we need a movement to make this happen.

This is on us: everyone who uses the internet needs to take notice. Not just the technologists — also the activists, academics, journalists and everyday Internet users who treasure freedom of expression and inclusivity online.

There’s good news: This is already happening. Starting with SOPA and ACTA a citizen movement for an open Internet started accelerating. We got organized, we rallyied citizens and we took stands on issues that mattered. Think of the recent headlines. When Edward Snowden revealed the extent of mass surveillance, people listened. Privacy and freedom from surveillance online were quickly enshrined as rights worth fighting for. The issue gained momentum among policymakers — and in 2015, the USA Freedom Act was passed.

Then there is 2015’s net neutrality victory: Over 3 million comments flooded the FCC protesting fast lanes and slow lanes. Most recently, Apple and the FBI clashed fiercely over encryption. Apple refused to concede, standing up for users’ privacy and security. Tim Cook was applauded, and encryption became a word spoken at kitchen tables and coffee shops.

Of course, this is just the beginning. These victories are heartening, for sure. But even as this new wave of internet activism builds, the threats are becoming worse, more widespread. We need to fuel the movement with concrete action — if we don’t, we may lose the open Web for good. Today, upholding the health of the planet is an urgent and enduring enterprise. So too should upholding the health of the Internet.

A small PS, I also gave a talk on this topic at re:publica in Berlin last month. If you want to watch that talk, the video is on the re:publica site.

The post Making the open internet a mainstream issue appeared first on Mark Surman.

June 09, 2016 05:35 PM

June 02, 2016

Benjamin Smedberg (bsmedberg)

Concert This Sunday: The King of Instruments and the Instrument of Kings

This coming Sunday, I will be performing a concert with trumpeter Kyra Hill as part of the parish concert series. I know that many of the readers of my site don’t live anywhere near Johnstown, Pennsylvania, but if you do, we’d love to have you, and it’ll be a lot of fun.

Sunday, 5-June 2016
2:30 p.m.
Our Mother of Sorrows Church
415 Tioga Street, Johnstown PA

Why you should come

I am proud that almost all of the music in this program was written in the last 100 years: there are compositions dating from 1919, 1929, 1935, 1946, 1964, 2000, and 2004. Unlike much of the classical music world which got lost around 1880 and never recovered, music for organ has seen a explosion of music and musical styles that continues to the present day. Of course there is the obligatory piece by J.S. Bach, because how could you have an organ concert without any Bach? But beyond that, there are pieces by such modern greats as Alan Hovhaness, Marcel Dupré, Louis Vierne, and Olivier Messiean.

It’s been a while since I performed a full-length concert; it has been fun to get back in the swing of regular practice and getting pieces up to snuff. I hope you find it as enjoyable to listen to as it has been for me to prepare!

June 02, 2016 08:35 PM

April 25, 2016

Mark Surman (surman)

Firefox and Thunderbird: A Fork in the Road

Firefox and Thunderbird have reached a fork in the road: it’s now the right time for them to part ways on both a technical and organizational level.

In line with the process we started in 2012, today we’re taking another step towards the independence of Thunderbird. We’re posting a report authored by open source leader Simon Phipps that explores options for a future organizational home for Thunderbird. We’ve also started the process of helping the Thunderbird Council chart a course forward for Thunderbird’s future technical direction, by posting a job specification for a technical architect.

In this post, I want to take the time to go over the origins of Thunderbird and Firefox, the process for Thunderbird’s independence and update you on where we are taking this next. For those close to Mozilla, both the setting and the current process may already be clear. For those who haven’t been following the process, I wanted to write a longer post with all the context. If you are interested in that context, read on.

Summary

Much of Mozilla, including the leadership team, believes that focusing on the web through Firefox offers a vastly better chance of moving the Internet industry to a more open place than investing further in Thunderbird—or continuing to attend to both products.

Many of us remain committed Thunderbird users and want to see Thunderbird remain a healthy community and product. But both Firefox and Thunderbird face different challenges, have different goals and different measures of success. Our actions regarding Thunderbird should be viewed in this light.

Success for Firefox means continued relevance in the mass consumer market as a way for people to access, shape and feel safe across many devices. With hundreds of millions of users on both desktop and mobile, we have the raw material for this success. However, if we want Firefox to continue to have an impact on how developers and consumers interact with the Internet, we need to move much more quickly to innovate on mobile and in the cloud. Mozilla is putting the majority of its human and financial resources into Firefox product innovation.

In contrast, success for Thunderbird means remaining a reliable and stable open source desktop email client. While many people still value the security and independence that come with desktop email (I am one of them), the overall number of such people in the world is shrinking. In 2012, around when desktop email first became the exception rather than the rule, Mozilla started to reduce its investment and transitioned Thunderbird into a fully volunteer-run open source project.

Given these different paths, it should be no surprise that tensions have arisen as we’ve tried to maintain Firefox and Thunderbird on top of a common underlying code base and common release engineering system. In December, we started a process to deal with those release engineering issues, and also to find a long-term organizational home for Thunderbird.

The Past

On a technical level, Firefox and Thunderbird have common roots, emerging from the browser and email components of the Mozilla Application Suite nearly 15 years ago. When they were turned into separate products, they also maintained a common set of underlying software components, as well as a shared build and release infrastructure. Both products continue to be intertwined in this manner today.

Firefox and Thunderbird also share common organizational roots. Both were incorporated by the Mozilla Foundation in 2003, and from the beginning, the Foundation aimed to make these products successful in the mainstream consumer Internet market. We believed—and still believe—mass-market open source products are our biggest lever in our efforts to ensure the Internet remains a public resource, open and accessible to all.

Based on this belief, we set up Mozilla Corporation (MoCo) and Mozilla Messaging (MoMo) as commercial subsidiaries of the Mozilla Foundation. These organizations were each charged with innovating and growing a market: one in web access, the other in messaging. We succeeded in making the browser a mass market success, but we were not able to grow the same kind of market for email or messaging.

In 2012, we shut down Mozilla Messaging. That’s when Thunderbird became a purely volunteer-run project.

The Present

Since 2012, we have been doggedly focused on how to take Mozilla’s mission into the future.

In the Mozilla Corporation, we have tried to innovate and sustain Firefox’s relevance in the browser market while breaking into new product categories—first with smartphones, and now in a variety of connected devices.

In the Mozilla Foundation, we have invested in a broader global movement of people who stand for the Internet as a public resource. In 2016, we are focused on becoming a loud and clear champion on open internet issues. This includes significant investments in fuelling the open internet movement and growing a next generation of leaders who will stand up for the web.

These are hard and important things to do—and we have not yet succeeded at them to the level that we need to.

During these shifts, we invested less and less of Mozilla’s resources in Thunderbird, with the volunteer community developing and sustaining the product. MoCo continues to provide the underlying code and build and release infrastructure, but there are no dedicated staff focused on Thunderbird.

Many people who work on Firefox care about Thunderbird and do everything they can to accommodate Thunderbird as they evolve the code base, which slows down Firefox development when it needs to be speeding up. People in the Thunderbird community also remain committed to building on the Firefox codebase. This puts pressure on a small, dedicated group of volunteer coders who struggle to keep up. And people in the Mozilla Foundation feel similar pressure to help the Thunderbird community with donations and community management, which distracts them from the education and advocacy work that’s needed to grow the open internet movement on a global level.

Everyone has the right motivations, and yet everyone is stretched thin and frustrated. And Mozilla’s strategic priorities are elsewhere.

The Future

In late 2015, Mozilla leadership and the Thunderbird Council jointly agreed to:

a) take a new approach to release engineering, as a first step towards putting Thunderbird on the path towards technical independence from Firefox; and

b) identify the organizational home that will best allow Thunderbird to thrive as a volunteer-run project.

Mozilla has already posted a proposal for separating Thunderbird from Firefox release engineering infrastructure. In order to move the technical part of this plan further ahead and address some of the other challenges Thunderbird faces, we agreed to contract for a short period of time with a technical architect who can support the Thunderbird community as they decide what path Thunderbird should take. We have a request for proposals for this position here.

On the organizational front, we hired open source leader Simon Phipps to look at different long-term options for a home for Thunderbird, including: The Document Foundation, Gnome, Mozilla Foundation, and The Software Freedom Conservancy. Simon’s initial report will be posted today in the Thunderbird Planning online forum and is currently being reviewed by both Mozilla and the Thunderbird Council.

With the right technical and organizational paths forward, both Firefox and Thunderbird will have a better chance at success. We believe Firefox will evolve into something consumers need and love for a long time—a way to take the browser into experiences across all devices. But we need to move fast to be effective.

We also believe there’s still a place for stable desktop email, especially if it includes encryption. The Thunderbird community will attract new volunteers and funders, and we’re digging in to help make that happen. We will provide more updates as things progress further.

The post Firefox and Thunderbird: A Fork in the Road appeared first on Mark Surman.

April 25, 2016 12:55 PM

April 20, 2016

Guillermo López (willyaranda)

OpenWapp y su historia

OpenWapp es una app para Firefox OS que permite (o permitía) el acceso a Whatsapp a través de dispositivos con el sistema Firefox OS.

Como podréis entender, Firefox OS necesitaba de ciertas aplicaciones para que fuera “mainstream”. Estas aplicaciones las pedía todo el mundo: “¿ah, pero que no tiene Whatsapp?”, “¿pero que no tiene Instagram?”.

De ahí surgió el proyecto OpenWapp. Pero comencemos por el principio.

El principio de todo es la necesidad de que hubiera Whatsapp para Firefox OS, y para eso se necesitaba:

  1. Un servidor intermedio que conectara los terminales con Whatsapp (un proxy, básicamente)
  2. Una librería en el teléfono que conectara directamente con Whatsapp

Los antecedentes

Hubo gente que ya estaba haciendo apps de Firefox OS para conectarse con servicios de mensajería, entre la que destacaba LoquiIM, una app de Adán que presentó en un hackaton que hicimos en Mozilla Hispano y con la que flipé muchísimo. Muchísimo. Ya que tenía una interfaz muy cuidada, el código era muy limpio y tenía la capacidad de acceder a muchos servicios de mensajería, ya fuera ICQ, Facebook Messenger u otras redes Jabber propias o de terceros. Era, simplemente, una pasada. “La app que necesita Firefox OS”.

Pero claro, Loqui tenía un grave problema: no tenía acceso a Whatsapp.

Y aquí apareció otra app, Wassap, creada por Luis Iván Cuende:

Su idea era simple usar un servidor intermedio para hacer de proxy con WhatsApp, y tener un protocolo “privado” entre tu teléfono y su servidor, que después creaba una sesión con WhatsApp y enviaba todos los datos.

Lo bueno de esta idea, es que ya había habido gente que había hecho esos servidores, y estaban disponibles en Github, libres para todo el mundo. Había uno en PHP, y otro en Python, llamado Yowsup, el más usado, más actualizado y el más probado.

Así que Luis Iván hizo eso: creó una app para Firefox OS que se comunicaba con Yowsup en uno de sus servidores, que después se comunicaba con Whatsapp para enviar y recibir mensajes (y todo lo necesario para poder realizar esto). También añadió la comunicación entre Yowsup y su app, mediante comunicación WebSockets, una idea magnífica (Whatsapp en general mantiene un socket para el tiempo real, pues para la web, usar websockets era la idea perfecta).

Esto, por supuesto tenía varios problemas:

  1. Si la app se hacía viral, WhatsApp pediría a Mozilla que eliminara la app (como así pasó).
  2. Si la app se hacía viral, todas las conexiones a WhatsApp no vendrían de diferentes teléfonos, si no de un mismo lugar: el servidor de Luis Iván, con una única IP, fácilmente baneable, por lo que todo el mundo acabaría bloqueado.
  3. Los datos de login/registro/conversaciones pasaban a través del servidor de Luis Iván, por lo que podría hacer incluso cosas maliciosas con los datos si hubiera querido.

Así que había que hacer otra cosa, algo un poco locura, pero que no quedaba otra si se quería que Firefox OS tuviera Whatsapp: que el propio teléfono se comunicara con WhatsApp.

La idea: CoSeMe

Como Wassap de Luis Iván, estaba usando un API muy similar a la de Yowsup, pero portada a WebSocket (permitidme esta licencia técnica, pero a alto nivel), lo mejor era hacer un port completo de Yowsup (en python, pensado para ejecutarse en un servidor), a JavaScript, corriendo como una librería en un cliente final (básicamente, una página web).

De aquí nació CoSeMe: “Conocido Servicio de Mensajería” 😁

La librería sería un port, en un momento determinado del tiempo de Yowsup a JavaScript, con un API tan similar, que cualquier aplicación usando Yowsup, o un wrapper HTTP o WebSockets, podría usar sin demasiados problemas CoSeMe.

CoSeMe funcionó: registro, login, subir y bajar imágenes, protocolo binario, criptografía in-place (modificar datos, y no copiar, debido a la poca capacidad de cálculo y de memoria de los primeros dispositivos Firefox OS).

Acoplando CoSeMe a Wassap y LoquiIM

Una vez estaba la librería lista, era el momento de que tanto Wassap (por ser la app que ya estaba en el Marketplace y que funcionaba con Whatsapp a través del proxy Yowsup), como LoquiIM (muchos protocolos, pero sin soporte para Whatsapp), hiciera uso de ella.

Tanto Luis Iván como Adán hicieron un trabajo increíble en sus respectivas aplicaciones, llevando Whatsapp a dos aplicaciones diferentes, con sus características respectivas, pero usando un mismo backend: la librería de CoSeMe, y con la idea de que, para triunfar, Firefox OS necesitaba urgentemente WhatsApp y un buen cliente

La app: OpenWapp
openwapp

Una vez que había dos aplicaciones que usaban CoSeMe, podría tener sentido hacer una app propia, sobre todo cuando LoquiIM era multi-protocolo, y Wassap fue retirada del Marketplace, sin más actualizaciones.

OpenWapp fue creado en base a código de una aplicación de mensajería, y modificado para funcionar con Whatsapp: registro, login, contactos, conversaciones uno a uno, grupales… Todo lo que teníamos en Whatsapp hasta hace dos años, así que nada de doble check azul, ni varios administradores en los grupos, ni llamadas de audio, ni grabaciones o audio en las conversaciones, ni compartir documentos, ni cifrado en las conversaciones, ni notificaciones push…

OpenWapp hizo su aparición en el Marketplace el 22 de mayo de 2014, con la versión 1.0.0 y se ha ido actualizando hasta hace un par de meses, con los cambios básicos en el protocolo de WhatsApp para que pudiera seguir funcionando de la forma básica: registro en el sistema, poder hacer “login” (que para un usuario final no se ve), enviar y recibir mensajes, última conexión, actualización de tokens, subida y descarga de imágenes y vídeos…

La muerte

OpenWapp Dead

Hace unos pocos días, y dado que no hay desarrolladores interesados en CoSeMe (porque en el 90% de los casos, el desacople es tan importante entre OpenWapp y CoSeMe, que los cambios que hay que realizar para que OpenWapp siga funcionando es cambiar la librería de CoSeMe), he decidido “darla de baja” del Marketplace.

Básicamente enviar una actualización con un mensaje dando las gracias a la gente que ha usado la aplicación y que ha contribuido para que sea una aplicación con 1.5 millones de descargas durante su vida, 2800 revisiones, y con una valoración de unos 3.8 estrellas sobre 5.

Agradecimientos

Ni CoSeMe, ni OpenWapp ni nada de todo esto hubiera sido posible sin los apoyos que tuvo el proyecto en sus inicios, con código, documentación, diseño, ideas, ayudas varias y tiempo. Como open-source, hay que dar las gracias a Adán de Loqui por sus ganas de seguir con la aplicación, y a los contribuyentes de Loqui, que siguen dando guerra con la aplicación, añadiendo cifrado y otras características. También a Salva de la Puente por el apoyo que ha dado al proyecto con sus increíbles conocimientos técnicos. Y a Giovanny y a Adrián por sus revisiones en el Marketplace.

Y por supuesto, a todos los que han usado OpenWapp, han abierto issues en Github, y han añadido pull-requests para cambiar cosas.

Ha sido un viaje de unos dos años y medio. Intenso, bonito, y con la convicción de que, con más apoyo, WhatsApp en Firefox OS hubiera sido una realidad oficial, y no extraoficial como éramos.

Happy hacking.

La entrada OpenWapp y su historia aparece primero en Pijus Magnificus.

April 20, 2016 11:18 AM

April 16, 2016

Mark Finkle (mfinkle)

Pitching Ideas – It’s Not About Perfect

I realized a long time ago that I was not the type of person who could create, build & polish ideas all by myself. I need collaboration with others to hone and build ideas. More than not, I’m not the one who starts the idea. I pick up something from someone else – bend it, twist it, and turn it into something different.

Like many others, I have a problem with ‘fear of rejection’, which kept me from shepherding my ideas from beginning to shipped. If I couldn’t finish the idea myself or share it within my trusted circle, the idea would likely die. I had most successes when sharing ideas with others. I have been working to increase the size of the trusted circle, but it still has limits.

Some time last year, Mozilla was doing some annual planning for 2016 and Mark Mayo suggested creating informal pitch documents for new ideas, and we’d put those into the planning process. I created a simple template and started turning ideas into pitches, sending the documents out to a large (it felt large to me) list of recipients. To people who were definitely outside my circle.

The world didn’t end. In fact, it’s been a very positive experience, thanks in large part to the quality of the people I work with. I don’t get worried about feeling the idea isn’t ready for others to see. I get to collaborate at a larger scale.

Writing the ideas into pitches also forces me to get a clear message, define objectives & outcomes. I have 1x1s with a variety of folks during the week, and we end up talking about the idea, allowing me to further build and hone the document before sending it out to a larger group.

I’m hooked! These days, I send out pitches quite often. Maybe too often?

April 16, 2016 07:30 PM

April 06, 2016

Mark Finkle (mfinkle)

Fun with Telemetry: Improving Our User Analytics Story

My last post talks about the initial work to create a real user analytics system based on the UI Telemetry event data collected in Firefox on Mobile. I’m happy to report that we’ve had much forward progress since then. Most importantly, we are no longer using the DIY setup on one of my Mac Minis. Working with the Mozilla Telemetry & Data team, we have a system that extracts data from UI Telemetry via Spark, imports the data into Presto-based storage, and allows SQL queries and visualization via Re:dash.

With data accessible via Re:dash, we can use SQL to focus on improving our analyses:

loadurl-types

loadurl-retention-effect

dropoff-rate

Roberto posted about how we’re using Parquet, Presto and Re:dash to create an SQL based query and visualization system.

April 06, 2016 04:18 AM

February 22, 2016

Mark Finkle (mfinkle)

Fun with Telemetry: DIY User Analytics Lab in SQL

Firefox on Mobile has a system to collect telemetry data from user interactions. We created a simple event and session UI telemetry system, built on top of the core telemetry system. The core telemetry system has been mainly focused on performance and stability. The UI telemetry system is really focused on how people are interacting with the application itself.

Event-based data streams are commonly used to do user data analytics. We’re pretty fortunate to have streams of events coming from all of our distribution channels. I wanted to start doing different types of analyses on our data, but first I needed to build a simple system to get the data into a suitable format for hacking.

One of the best one-stop sources for a variety of user analytics is the Periscope Data blog. There are posts on active users, retention and churn, and lots of other cool stuff. The blog provides tons of SQL examples. If I could get the Firefox data into SQL, I’d be in a nice place.

Collecting Data

My first step is performing a little ETL (well, the E & T parts) on the raw data using Spark/Python framework for Mozilla Telemetry. I wanted to create two dataset:

Building a Database

I installed Postgres on a Mac Mini (powerful stuff, I know) and created my database tables. I was periodically collecting the data via my Spark scripts and I couldn’t guarantee I wouldn’t re-collect data from the previous jobs. I couldn’t just bulk insert the data. I wrote some simple Python scripts to quickly import the data (clients & events), making sure not to create any duplicates.

fennec-telemetry-data

I decided to start with 30 days of data from our Nightly and Beta channels. Nightly was relatively small (~330K rows of events), but Beta was more significant (~18M rows of events).

Analyzing and Visualizing

Now that I had my data, I could start exploring. There are a lot of analysis/visualization/sharing tools out there. Many are commercial and have lots of features. I stumbled across a few open-source tools:

Even though I wanted to use SQLPad as much as possible, I found myself spending most of my time in pgAdmin. Debugging queries, using EXPLAIN to make queries faster, and setting up indexes. It was easier in pgAdmin. Once I got the basic things figured out, I was able to more efficiently use SQLPad. Below are some screenshots using the Nightly data:

sqlpad-query

sqlpad-chart

Next Steps

Now that I have Firefox event data in SQL, I can start looking at retention, churn, active users, engagement and funnel analysis. Eventually, we want this process to be automated, data stored in Redshift (like a lot of other Mozilla data) and exposed via easy query/visualization/collaboration tools. We’re working with the Mozilla Telemetry & Data Pipeline teams to make that happen.

A big thanks to Roberto Vitillo and Mark Reid for the help in creating the Spark scripts, and Richard Newman for double-dog daring me to try this.

February 22, 2016 07:31 PM

Firefox on Mobile: A/B Testing and Staged Rollouts

We have decided to start running A/B Testing in Firefox for Android. These experiments are intended to optimize specific outcomes, as well as, inform our long-term design decisions. We want to create the best Firefox experience we can, and these experiments will help.

The system will also allow us to throttle the release of features, called staged rollout or feature toggles, so we can monitor new features in a controlled manner across a large user base and a fragmented device ecosystem. If we need to rollback a feature for some reason, we’d have the ability to do that, quickly without needing people to update software.

Technical details:

What is Mozilla Switchboard?

Mozilla Switchboard is based on Switchboard, an open source SDK for doing A/B testing and staged rollouts from the folks at KeepSafe. It connects to a server component, which maintains a list of active experiments.

The SDK does create a UUID, which is stored on the device. The UUID is sent to the server, which uses it to “bucket” the client, but the UUID is never stored on the server. In fact, the server does not store any data. The server we are using was ported to Node from PHP and is being hosted by Mozilla.

We decided to start using Switchboard because it’s simple, open source, has client code for Android and iOS, saves no data on the server and can be hosted by Mozilla.

Planning Experiments

The Mobile Product and UX teams are the primary drivers for creating experiments, but as is common on the Mobile team, ideas can come from anywhere. We have been working with the Mozilla Growth team, getting a better understanding of how to design the experiments and analyze the metrics. UX researchers also have input into the experiments.

Once Product and UX complete the experiment design, Development would land code in Firefox to implement the desired variations of the experiment. Development would also land code in the Switchboard server to control the configuration of the experiment: On what channels is it active? How are the variations distributed across the user population?

Since we use Telemetry to collect metrics on the experiments, the Beta channel is likely our best time period to run experiments. Telemetry is on by default on Nightly, Aurora and Beta; and Beta is the largest user base of those three channels.

Once we decide which variation of the experiment is the “winner”, we’ll change the Switchboard server configuration for the experiment so that 100% of the user base will flow through the winning variation.

Yes, a small percentage of the Release channel has Telemetry enabled, but it might be too small to be useful for experimentation. Time will tell.

What’s Happening Now?

We are trying to be very transparent about active experiments and staged rollouts. We have a few active experiments right now.

You can always look at the Mozilla Switchboard configuration to see what’s happening. Over time, we’ll be adding support to Firefox for iOS as well.

February 22, 2016 01:26 PM

February 17, 2016

Mark Surman (surman)

Help Us Spread the Word: Encryption Matters

Today, the Internet is one of our most important global public resources. It’s open, free and essential to our daily lives. It’s where we chat, play, bank and shop. It’s also where we create, learn and organize.

All of this is made possible by a set of core principles. Like the belief that individual security and privacy on the Internet is fundamental.

Mozilla is devoted to standing up for these principles and keeping the Internet a global public resource. That means watching for threats. And recently, one of these threats to the open Internet has started to grow: efforts to undermine encryption.

Encryption is key to a healthy Internet. It’s the encoding of data so that only people with a special key can unlock it, such as the sender and the intended receiver of a message. Internet users depend on encryption everyday, often without realizing it, and it enables amazing things. It safeguards our emails and search queries, and medical data. It allows us to safely shop and bank online. And it protects journalists and their sources, human rights activists and whistleblowers.

Encryption isn’t a luxury — it’s a necessity. This is why Mozilla has always taken encryption seriously: it’s part of our commitment to protecting the Internet as a public resource that is open and accessible to all.

Government agencies and law enforcement officials across the globe are proposing policies that will harm user security through weakening encryption. The justification for these policies is often that strong encryption helps bad actors. In truth, strong encryption is essential for everyone who uses the Internet. We respect the concerns of law enforcement officials, but we believe that proposals to weaken encryption — especially requirements for backdoors — would seriously harm the security of all users of the Internet.

At Mozilla, we continue to push the envelope with projects like Let’s Encrypt, a free, automated Web certificate authority dedicated to making it easy for anyone to run an encrypted website. Developed in collaboration with the Electronic Frontier Foundation, Cisco, Akamai and many other technology organizations, Let’s Encrypt is an example of how Mozilla uses technology to make sure we’re all more secure on the Internet.

However, as more and more governments propose tactics like backdoors, technology alone will not be enough. We will also need to get Mozilla’s community — and the broader public — involved. We will need them to tell their elected officials that individual privacy and security online cannot be treated as optional. We can play a critical role if we get this message across.

We know this is a tough road. Most people don’t even know what encryption is. Or, they feel there isn’t much they can do about online privacy. Or, both.

This is why we are starting a public education campaign run with the support of our community around the world. In the coming weeks, Mozilla will release videos, blogs and activities designed to raise awareness about encryption. You can watch our first video today — it shows why controlling our personal information is so key. More importantly, you can use this video to start a conversation with friends and family to get them thinking more about privacy and security online.

If we can educate millions of Internet users about the basics of encryption and its connection to our everyday lives, we’ll be in a good position to ask people to stand up when the time comes. We believe that time is coming soon in many countries around the world. You can pitch in simply by watching, sharing and having conversations about the videos we’ll post over the coming weeks.

If you want to get involved or learn more about Mozilla’s encryption education campaign, visit mzl.la/encrypt. We hope you’ll join us to learn about and support encryption.

[This blog post originally appeared on blog.mozilla.org on February 16, 2016]

The post Help Us Spread the Word: Encryption Matters appeared first on Mark Surman.

February 17, 2016 03:44 PM

February 11, 2016

Mark Surman (surman)

MoFo 2016 Goals + KPIs

Earlier this month, we started rolling out Mozilla Foundation’s new strategy. The core goal is to make the health of the open internet a mainstream issue globally. We’re going to do three things to make this happen: shape the agenda; connect leaders; and rally citizens. I provided an overview of this strategy in another post back in December.

goals

As we start rolling out this strategy, one of our first priorities is figuring out how to measure both the strength and the impact of our new programs. A team across the Foundation has spent the past month developing an initial plan for this kind of measurement. We’ve posted a summary of the plan in slides (here) and the full plan (here).

Preparing this plan not only helped us get clear on the program qualities and impact we want to have, it also helped us come with a crisper way to describe our strategy. Here is a high level summary of what we came up with:

1. Shape the agenda

Impact goal: our top priority issues are mainstream issues globally (e.g. privacy).
Measures: citations of Mozilla / MLN members, public opinion

2. Rally citizens

Strength goal: rally 10s of millions of people to take action and change how they — and their friends — use the web.
Measures: # of active advocates, list size

Impact goal: people make better, more conscious choices. Companies and governments react with better products and laws.
Measures: per campaign evaluation, e.g. educational impact or did we defeat bad law?

3. Connect leaders

Strength goal: build a cohesive, world class network of people who care about the open internet.
Measures: network strength; includes alignment, connectivity, reach and size

Impact goal: network members shape + spread the open internet agenda.
Measures: participation in agenda-setting, citations, influence evaluation

Last week, we walked through this plan with the Mozilla Foundation board. What we found: it turns out that looking at metrics is a great way to get people talking about the intersection of high level goals and practical tactics. E.g. we need to be thinking about tools other than email as we grow our advocacy work outside of Europe and North America.

If you’re involved in our community or just following along with our plans, I encourage you to open up the slides and talk them through with some other people. My bet is they will get you thinking in new and creative ways about the work we have ahead of us. If they do, I’d love to hear thoughts and suggestions. Comments, as always, welcome on this post and by email.

The post MoFo 2016 Goals + KPIs appeared first on Mark Surman.

February 11, 2016 04:49 PM

February 08, 2016

Mark Surman (surman)

The Internet is a Global Public Resource

One of the things that first drew me to Mozilla was this sentence from our manifesto:

“The Internet is a global public resource that must remain open and accessible to all.”

These words made me stop and think. As they sunk in, they made me commit.

I committed myself to the idea that the Internet is a global public resource that we all share and rely on, like water. I committed myself to stewarding and protecting this important resource. I committed myself to making the importance of the open Internet widely known.

When we say, “Protect the Internet,” we are not talking about boosting Wi-fi so people can play “Candy Crush” on the subway. That’s just bottled water, and it will very likely exist with or without us. At Mozilla, we are talking about “the Internet” as a vast and healthy ocean.

We believe the health of the Internet is an important issue that has a huge impact on our society. An open Internet—one with no blocking, throttling, or paid prioritization—allows individuals to build and develop whatever they can dream up, without a huge amount of money or asking permission. It’s a safe place where people can learn, play and unlock new opportunities. These things are possible because the Internet is an open public resource that belongs to all of us.

Making the Internet a Mainstream Issue

Not everyone agrees that the health of the Internet is a major priority. People think about the Internet mostly as a “thing” other things connect to. They don’t see the throttling or the censorship or the surveillance that are starting to become pervasive. Nor do they see how unequal the benefits of the Internet have become as it spreads across the globe. Mozilla aims to make the health of the Internet a mainstream issue, like the environment.

Consider the parallels with the environmental movement for a moment. In the 1950s, only a few outdoor enthusiasts and scientists were talking about the fragility of the environment. Most people took clean air and clean water for granted. Today, most of know we should recycle and turn out the lights. Our governments monitor and regulate polluters. And companies provide us with a myriad of green product offerings—from organic food to electric cars.

But this change didn’t happen on its own. It took decades of hard work by environmental activists before governments, companies and the general public took the health of the environment seriously as an issue. This hard work paid off. It made the environment a mainstream issue and got us all looking for ways to keep it healthy.

When in comes to the health of the Internet, it’s like we’re back in the 1950s. A number of us have been talking about the Internet’s fragile state for decades—Mozilla, the EFF, Snowden, Access, the ACLU, and many more. All of us can tell a clear story of why the open Internet matters and what the threats are. Yet we are a long way from making the Internet’s health a mainstream concern.

We think we need to change this, so much so that it’s now one of Mozilla’s explicit goals.

Read Mark Surman’s “Mozilla Foundation 2020 Strategy” blog post.

Starting the Debate: Digital Dividends

The World Bank’s recently released “2016 World Development Report” shows that we’re making steps in the right direction. Past editions have focused on major issues like  “jobs.” This year the report focuses directly on “digital dividends” and the open Internet.

According to the report, the benefits of the Internet, like inclusion, efficiency, and innovation, are unequally spread. They could remain so if we don’t make the Internet “accessible, affordable, and open and safe.” Making the Internet accessible and affordable is urgent. However,

“More difficult is keeping the internet open and safe. Content filtering and censorship impose economic costs and, as with concerns over online privacy and cybercrime, reduce the socially beneficial use of technologies. Must users trade privacy for greater convenience online? When are content restrictions justified, and what should be considered free speech online? How can personal information be kept private, while also mobilizing aggregate data for the common good? And which governance model for the global internet best ensures open and safe access for all? There are no  simple answers, but the questions deserve a vigorous global debate.”

—”World Development Report 2016: Main Messages,” p.3

We need this vigorous debate. A debate like this can help make the open Internet an issue that is taken seriously. It can shape the issue. It can put it on the radar of governments, corporate leaders and the media. A debate like this is essential. Mozilla plans to participate and fuel this debate.

Creating A Public Conversation

Of course, we believe the conversation needs to be much broader than just those who read the “World Development Report.” If we want the open Internet to become a mainstream issue, we need to involve everyone who uses it.

We have a number of plans in the works to do exactly this. They include collaboration with the likes of the World Bank, as well as our allies in the open Internet movement. They also include a number of experiments in a.) simplifying the “Internet as a public resource” message and b.) seeing how it impacts the debate.

Our first experiment is an advertising campaign that places the Internet in a category with other human needs people already recognize: Food. Water. Shelter. Internet. Most people don’t think about the Internet this way. We want to see what happens when we invite them to do so.

The outdoor campaign launches this week in San Francisco, Washington and New York. We’re also running variations of the message through our social platforms. We’ll monitor reactions to see what it sparks. And we will invite conversation in our Mozilla social channels (Facebook & Twitter).

Billboard_Food-Shelter-Water_Red

Billboard_Food-Shelter-Water_Blue

Fueling the Movement

Of course, billboards don’t make a movement. That’s not our thinking at all. But we do think experiments and debates matter. Our messages may hit the mark with people and resonate, or it may tick them off. But our goal is to start a conversation about the health of the Internet and the idea that it’s a global resource that needs protecting.

Importantly, this is one experiment among many.

We’re working to bolster the open Internet movement and take it mainstream. We’re building easy encryption technology with the EFF (Let’s Encrypt). We’re trying to make online conversation more inclusive and open with The New York Times and The Washington Post (Coral Project). And we’re placing fellows and working on open Internet campaigns with organizations like the ACLU, Amnesty International, and Freedom of the Press Foundation (Open Web Fellows Program). The idea is to push the debate on many fronts.

About the billboards, we want to know what you think:

  • Has the time come for the Internet to become a mainstream concern?
  • Is it important to you?
  • Does it rank with other primary human needs?

I’m hoping it does, but I’m also ready to learn from whatever the results may tell us. Like any important issue, keeping the Internet healthy and open won’t happen by itself. And waiting for it to happen by itself is not an option.

We need a movement to make it happen. We need you.

[This blog post originally appeared on blog.mozilla.org on February 8, 2016]

The post The Internet is a Global Public Resource appeared first on Mark Surman.

February 08, 2016 09:43 PM

February 01, 2016

Mark Surman (surman)

Mozilla, Caribou Digital Release Report Exploring the Global App Economy

Mozilla is a proud supporter of research carried out by Caribou Digital, the UK-based think tank dedicated to building sustainable digital economies in emerging markets. Today, Caribou has released a report exploring the impact of the global app economy and international trade flows in app stores. You can find it here.

The findings highlight the app economy’s unbalanced nature. While smartphones are helping connect billions more to the Web, the effects of the global app economy are not yet well understood. Key findings from our report include:

  • Most developers are located in high-income countries. The geography of where app developers are located is heavily skewed toward the economic powerhouses, with 81% of developers in high-income countries — which are also the most lucrative markets. The United States remains the dominant producer, but East Asia, fueled by China, is growing past Europe.
  • Apps stores are winner-take-all. The nature of the app stores leads to winner-take-all markets, which skews value capture even more heavily toward the U.S. and other top producers. Conversely, even for those lower-income countries that do have a high number of developers — e.g., India — the amount of value capture is disproportionately small to the number of developers participating.
  • The emerging markets are the 1% — meaning, they earn 1% of total app economy revenue. 95% of the estimated value in the app economy is captured by just 10 countries, and 69% of the value is captured by just the top three countries. Excluding China, the 19 countries considered low- or lower-income accounted for only 1% of total worldwide value.
  • Developers in low-income countries struggle to export to the global stage. About one-third of developers in the sample appeared only in their domestic market. But this inability to export to other markets was much more pronounced for developers in low-income countries, where 70% of developers were not able to export, compared to high-income countries, where only 29% of developers were not able to export. For comparison, only 3% of U.S. developers did not export.
  • U.S. developers dominate almost all markets. On average, U.S. apps have 30% of the market across the 37 markets studied, and the U.S. is the dominant producer in every market except for China, Japan, South Korea, and Taiwan.

Mozilla is proud to support Caribou Digital’s research, and the goal of working toward a more inclusive Internet, rich with opportunity for all users. Understanding the effects of the global app economy, and helping to build a more inclusive mobile Web, are key. We invite readers to read the full report here, and Caribou Digital’s blog post here.

[This blog post originally appeared on blog.mozilla.org on February 1, 2016]

The post Mozilla, Caribou Digital Release Report Exploring the Global App Economy appeared first on Mark Surman.

February 01, 2016 09:37 PM

January 28, 2016

Mark Surman (surman)

Inspired by our grassroots leaders

Last weekend, I had the good fortune to attend our grassroots Leadership Summit in Singapore: a hands on learning and planning event for leaders in Mozilla’s core contributor community.

FullSizeRender

We’ve been doing these sorts of learning / planning / doing events with our broader community of allies for years now: they are at the core of the Mozilla Leadership Network we’re rolling out this year. It was inspiring to see the participation team and core contributor community dive in and use a similar approach.

I left Singapore feeling inspired and hopeful — both for the web and for participation at Mozilla. Here is an email I sent to everyone who participated in the Summit explaining why:

As I flew over the Pacific on Monday night, I felt an incredible sense of inspiration and hope for the future of the web — and the future of Mozilla. I have all of you to thank for that. So, thank you.

This past weekend’s Leadership Summit in Singapore marked a real milestone: it was Mozilla’s first real attempt at an event consciously designed to help our core contributor community (that’s you!) develop important skills like planning and dig into critical projects in areas like connected devices and campus outreach all at the same time. This may not seem like a big deal. But it is.

For Mozilla to succeed, *all of us* need to get better at what we do. We need to reach and strive. The parts of the Summit focused on personality types, planning and building good open source communities were all meant to serve as fuel for this: giving us a chance to hone skills we need.

Actually getting better comes by using these skills to *do* things. The campus campaign and connected devices tracks at the Summit were designed to make this possible: to get us all working on concrete projects while applying the skills we were learning in other sessions. The idea was to get important work done while also getting better. We did that. You did that.

Of course, it’s the work and the impact we have in the world that matter most. We urgently need to explore what the web — and our values — can mean in the coming era of the internet of things. The projects you designed in the connected devices track are a good step in this direction. We also need to grow our community and get more young people involved in our work. The plans you made for local campus campaigns focused on privacy will help us do this. This is important work. And, by doing it the way we did it, we’ve collectively teed it up to succeed.

I’m saying all this partly out of admiration and gratitude.  But I’m also trying to highlight the underlying importance of what happened this past weekend: we started using a new approach to participation and leadership development. It’s an approach that I’d like to see us use even more both with our core participation leaders (again, that’s you!) and with our Mozilla Leadership Network (our broader network of friends and allies). By participating so fully and enthusiastically in Singapore, you helped us take a big step towards developing this approach.

As I said in my opening talk: this is a critical time for the web and for Mozilla. We need to simultaneously figure out what technologies and products will bring our values into the future and we need to show the public and governments just how important those values are. We can only succeed by getting better at working together — and by growing our community around the world. This past weekend, you all made a very important step in this direction. Again, thank you.

I’m looking forward to all the work and exploration we have ahead. Onwards!

As I said in my message, the Singapore Leadership Summit is a milestone. We’ve been working to recast and rebuild our participation team for about a year now. This past weekend I saw that investment paying off: we have a team teed up to grow and support our contributor community from around the world. Nicely done! Good things ahead.

The post Inspired by our grassroots leaders appeared first on Mark Surman.

January 28, 2016 03:32 PM

January 05, 2016

Mark Finkle (mfinkle)

Firefox on Mobile: Browser or App?

It seems common for people have the same expectations for browsers on Mobile as they do on Desktop. Why is that? I’d rather create a set of Mobile-specific expectations for a browser. Mobile is very application-centric and those applications play a large role in how people use devices. When defining what success means for Firefox on Mobile, we should be thinking about Firefox as an application, not as a browser.

Navigation

Let’s start with navigation. On Desktop, navigation typically starts in a browser. On Mobile, navigation starts on the device home screen. The home screen holds a collection of applications that provide a very task-based workflow. This means you don’t need a browser to do many tasks on Mobile. In fact, a browser is somewhat secondary – it’s where you can end up after starting in a task-specific application. That’s the opposite of Desktop.

One way we started to optimize for this situation is Tab Queues: A way to send content to Firefox, in the background, without leaving your current task/application.

Another way to fit into home screen navigation is to launch favorite websites directly from home screen icons. On Android, Chrome and Firefox have supported this feature for some time, but Google’s Progressive Web Apps initiative will push the concept forward.

If the home screen is the primary way to start navigation, we can add more entry points (icons) for specific Firefox features. We already have a Search activity and we also have access to Logins/Passwords. Both of those could be put on the home screen, if the user chooses, to allow faster access.

Unsurprisingly, a correlation between applications on the home screen and application usage was a key takeaway from a recent comScore study:

“App usage is a reflexive, habitual behavior where those occupying the best home screen real estate are used most frequently.”

Content and Tasks

Creating a path to success means looking for opportunities that we can leverage. Let’s look at analyst reports for situations where browsing is used more than applications on Mobile:

If this is the type of content people access using browsers on Mobile, Firefox should be optimized to handle those tasks and workflows. It’s interesting to think about how we could leverage Firefox to create solutions for these opportunities.

What if we were building a native application that allowed you to subscribe to news, blogs and articles? Would we create a view specific to discovering content? Would we use your browsing history to help recommend content?

What if we were building a native application designed to make researching a topic or product easier? How is that different than a generic tabbed browser?

Some ideas might end up being separate applications themselves, using Firefox as a secondary activity. That keeps Firefox focused on the task of browsing and viewing content, while new applications handle other specific tasks and flows. Those applications might even end up on your home screen, if you want faster access.

Retention and Engagement

Mobile applications, including browsers, struggle with user retention. Studies show that people will try out applications an average of 4.5 times before abandoning.

Browsers have a larger reach than applications on Mobile, while applications are awesome at engagement. How does a browser increase engagement? Again, we should think like an application.

What if we were building a native application that could save links to content? What other features would we add? Maybe we’d add reminders so people wouldn’t forget about those recently saved, but never viewed, links to content. Browsers don’t do that, but applications certainly do.

What if we were building a native application that allowed people to view constantly changing news, sports or retail content? We could notify (or badge parts of the UI) when new content is available on favorite sites.

Metrics

We should be measuring Firefox as an application, and not a browser. Marketshare and pageviews, compared to the OS defaults (Safari and Chrome), may not be the best way to measure success. Why should we measure our success only against how the OS defaults view web content? Why not compare Firefox against other applications?

Research tells us that anywhere from 85% to 90% of smartphone time is spent in applications, leaving 15% to 10% of time spent in browsers. Facebook is leading the pack at 13%, but the percentages drop off to single digits quickly. There is certainly an opportunity to capitalize on that 15% to 10% slice of the pie. In fact, the slice probably ends up being bigger than 15%.

time-spent-us-apps-2014

Treating Firefox as an application means we don’t take on all applications, as a single category. It means we take them on individually, and I think we can create a pretty solid path to success under those conditions.

January 05, 2016 01:35 PM

January 04, 2016

Mark Surman (surman)

How I want to show up this year

As we begin 2016, I have tremendous hope. I feel a clarity and sense of purpose — both in my work and my life — that I haven’t felt for years.

Partly, this feeling flows from the plans we’ve made at Mozilla and the dreams I’ve been dreaming with those I love the most. I invested a great deal in these plans and dreams in 2015. That investment is starting to bear fruit.

This feeling also flows from a challenge I’ve given myself: to fully live my values every day. I’ve gotten rusty at this in the last few years. There are three things specific things I want to do more in 2016:

1. Be present. Listen more, say less. Be vulnerable. Create more space for others.

2. Focus on gratitude and abundance. Build ambitiously and joyfully from what I / we have.

3. Love generously. In all corners of my life. Remember to love myself.

I started to push myself in these areas late last year. As I did, things simply went better. I was happier. People around me were happier. Getting things done started to feel easier and more graceful, even as I/we worked hard and dealt with some painful topics.

Which all adds up to something obvious: how the plans and the dreams work out has alot to do with how I (and all of us) show up everyday.

So, that’s something I want to work on in 2016. I’m hoping the list above will help me stay accountable and on course. And, if you’re someone close to me, that you will, too.

It’s going to be a good year.

The post How I want to show up this year appeared first on Mark Surman.

January 04, 2016 06:25 PM

December 23, 2015

Guillermo López (willyaranda)

Por qué voté a Alberto Garzón

Soy uno de los #Garzoners. Voté a Garzón.

En estas elecciones generales, teóricamente una de las más disputadas de la democracia, parecía que sólo era posible votar a uno de los cuatro grandes partidos: PP, PSOE, Podemos o Ciudadanos.

Sin embargo, había más opciones, muchas más opciones. Nadie hablaba de ellas, ni en los debates, ni casi en encuestas, donde sólo aparecían cuatro posibilidades. Una de ellas era el “outsider” de la izquierda, como lo llamó El Español, Alberto Garzón.

Alberto Garzón es una persona joven, de mi generación (sólo 3 años mayor), con un compromiso social infinitamente mayor a cualquiera de los otros 4 candidatos. No fue elegido a dedazo, o en base a una ordenación de candidatos por orden alfabético, o por ser la cabeza visible televisiva de un partido recién creado. Fue elegido por la mayoría de la militancia en un congreso abierto.

El título dice ‘Alberto Garzón’ y no Izquierda Unida o Unidad Popular. Por primera vez en mi vida, podía votar una lista donde estuviera el candidato a la presidencia del gobierno, y con opciones de que mi voto realmente diera un escaño (soy de Burgos, imaginad el grado de bipartidismo que hay, con sólo 4 posibles escaños en una zona tan conservadora y envejecida).

Así que lo hice.

Lo hice porque creo que Alberto es una persona que merece estar en el congreso y que dé voz a la lucha social y a los problemas de la inmensa mayoría. Porque, como dijo, “el parlamento no es parlamento sin lucha social”. Y si hay que salir a las calles para defender lo que nos han quitado durante los últimos años y que nuestros padres y abuelos se encargaron de conseguir, con mucho, muchísimo sufrimiento habrá que salir.

Alberto es mi candidato ideal por lo que representa: juventud, brillantez, don de palabra, experiencia en la lucha social y un compromiso con la gente y no con las élites.

Alberto Garzón

Defender lo de todos no es lucha sólo de nuestros representantes.

Sin embargo hay un grave problema con Alberto Garzón y es que pertenece a la Izquierda Unida de hoy en día.

Izquierda Unida se refunda o muere. También tengo claro que, como comentaban por Twitter, si no fuera cabeza de lista Garzón, probablemente IU tendría una representación nula en el congreso (más allá de mareas, confluencias y frituras variadas según las zonas). La refundación o se produce ya, buscando gente joven, válida, que venga de la lucha por los derechos sociales y la gente enquistada en su organización se va, o todo desaparece. La cabeza visible está, es reconocible y va a ir a más. Pero si no se cortan de raíz todos los problemas actuales, todo lo ganado se va a perder como casi se hace pocos días atrás. Y si esto obliga a ir en confluencia con otros partidos de izquierda para tener representatividad y fuerza, hay que pensar en hacerlo.

Yo quiero que sea Alberto el que dirija esta refundación, el que dé voz a la verdadera izquierda (“radical” como la llaman algunos), la que defienda los derechos de los trabajadores, la que luche por mantener la dignidad en el servicio médico y educativo público, la que crea que hay que hacer un reparto mucho más equitativo de la riqueza, la que haga que todo esté supeditado al bienestar y beneficio del pueblo (¿es esto radical? Leed la constitución, artículo 128.1), la que crea que la energía debería producirse de forma limpia y respetuosa con el medio ambiente, la que elimine todo tipo de tortura animal, la que crea que España es un estado plurinacional y que los poderes públicos tienen que estar al servicio del pueblo y se rija sobre una justicia real y no económica.

Yo quiero un cambio, real, significativo, en beneficio de todos. Y en eso, espero que Alberto Garzón y la futura izquierda real puedan conseguirlo. Por eso, tienen y tendrán mi voto.

La entrada Por qué voté a Alberto Garzón aparece primero en Pijus Magnificus.

December 23, 2015 01:00 PM

December 21, 2015

Mark Surman (surman)

Mozilla Foundation 2020 Strategy

We outlined a vision back in October for the next phase of Mozilla Foundation’s work: fuel the movement that is building the next wave of open into the digital world.

MoFo Strategy Mapp

Since then, we’ve been digging into the first layer ‘how do we do this?’ detail. As part of this process, we have asked things like: What issues do we want to focus on first? How do we connect leaders and rally citizens to build momentum? And, how does this movement building work fit into Mozilla’s overall strategy? After extensive discussion and reflection, we drafted a Mozilla Foundation 2020 Strategy document to answer these questions, which I’m posting here for comment and feedback. There is both a slide version and a long form written version.

The first piece of this strategy is to become a louder, more articulate thought leader on the rights of internet users and the health of the internet.

Concretely, that means picking the issues we care about and taking a stance. For the first phase of this movement building work, we are going to focus on:

  1. Online privacy: from surveillance to tracking to security, it’s eroding
  2. Digital inclusion: from zero rating to harassment, it’s not guaranteed
  3. Web literacy: the internet is growing, but web literacy isn’t

We’ll show up in these issues through everything from more frequent blog posts and opinion pieces to a new State of the Web report that we hope to release toward the end of 2016.

The other key pieces of our strategy are growing our ‘leadership network’ and creating a full scale ‘advocacy engine’ that both feed and draw from this agenda. As part of the planning process, we developed a simple strategy map to show how all pieces work together:

A. Shape the agenda. Articulate a clear, forceful agenda. Start with privacy, inclusion and literacy over next three years. Focus MoFo efforts here first. Impact: online privacy, digital inclusion and web literacy are mainstream social issues globally.

B. Connect leaders. Continue to build a leadership network to gather and network people are who are motivated by this agenda. Get them doing stuff together, generating new, concrete solutions through things like MozFest and communities of practice. Impact: more people and orgs working alongside Mozilla to shape the agenda and rally citizens.

C. Rally citizens. Build an advocacy group that will rally a global force of 10s of millions of people who take action and change how they — and their friends — use the web. Impact: people make more conscious choices, companies and governments react.

This movement building strategy is meant to complement Mozilla’s product and technology efforts. If we point roughly in the same direction, things like Firefox, our emerging work on things like open connected devices and rallying people to a common cause give us a chance to have an impact far bigger than if we did one of these things alone.

While this builds on our past work, it is worth noting that there are some important differences from the initial thinking we had earlier in the year. We started out talking about a ‘Mozilla Academy’ or ‘Mozilla Learning’. And we had universal web literacy as our top line social impact goal. Along the way, we realized that web literacy is one important area where our movement building work can impact the world — but that there are other issues where we want and need to have impact as well. The focus on a rolling agenda setting model in the current strategy reflects that realization.

It’s also worth calling out: a significant portion of this strategy is not new. In fact, the whole approach we used was to look at what’s working and where we have strengths, and then build from there. Much of what we plan to do with the Leadership Network already exists in the form of communities of practices like Hive, Open News, Science Lab and our Open Web Network. These networks become the key hubs that we build the larger network around. Similarly, we have had increasing success with advocacy and fundraising — we are now going to invest much more here to grow further. The only truly new part is the explicit agenda-setting function. Doing more here should have been obvious before, but it wasn’t. We’ve added it into the core of our strategy to both focus and draw on our leadership and advocacy work.

As you’ll see if you look at the planning documents (slides | long form), we are considering the current documents as version 0.8. That means that the broad framework is complete and fixed. The next phase will involve a) engagement with our community and partners re: how this framework can provide the most value and b) initial roll out of key parts of the plan to test our thinking by doing. Plans to do this in the first half of 2016 are detailed in the documents.

At this stage, we really want reactions to this next level of detail. What seems compelling? What doesn’t? Where are there connections to the broader movement or to other parts of Mozilla that we’re not making yet? And, most important, are there places that you want to get involved? There are many ways to offer feedback, including going to the Mozilla Leadership planning wiki and commenting on this blog.

I’m excited about this strategy, and I’m optimistic about how it can make Mozilla, our allies and the web stronger. And as we move into our next phase of engagement and doing, I’m looking forward to talking to more and more people about this work.

The post Mozilla Foundation 2020 Strategy appeared first on Mark Surman.

December 21, 2015 06:53 PM

November 22, 2015

Mark Finkle (mfinkle)

An Engineer’s Guide to App Metrics

Building and shipping a successful product takes more than raw engineering. I have been posting a bit about using Telemetry to learn about how people interact with your application so you can optimize use cases. There are other types of data you should consider too. Being aware of these metrics can help provide a better focus for your work and, hopefully, have a bigger impact on the success of your product.

Active Users

This includes daily active users (DAUs) and monthly active users (MAUs). How many people are actively using the product within a time-span? At Mozilla, we’ve been using these for a long time. From what I’ve read, these metrics seem less important when compared to some of the other metrics, but they do provide a somewhat easy to measure indicator of activity.

These metrics don’t give a good indication of how much people use the product though. I have seen a variation metric called DAU/MAU (daily divided by monthly) and gives something like retention or engagement. DAU/MAU rates of 50% are seen as very good.

Engagement

This metric focuses on how much people really use the product, typically tracking the duration of session length or time spent using the application. The amount of time people spend in the product is an indication of stickiness. Engagement can also help increase retention. Mozilla collects data on session length now, but we need to start associating metrics like this with some of our experiments to see if certain features improve stickiness and keep people using the application.

We look for differences across various facets like locales and releases, and hopefully soon, across A/B experiments.

Retention / Churn

Based on what I’ve seen, this is the most important category of metrics. There are variations in how these metrics can be defined, but they cover the same goal: Keep users coming back to use your product. Again, looking across facets, like locales, can provide deeper insight.

Rolling Retention: % of new users return in the next day, week, month
Fixed Retention: % of this week’s new users still engaged with the product over successive weeks.
Churn: % of users who leave divided by the number of total users

Most analysis tools, like iTunes Connect and Google Analytics, use Fixed Retention. Mozilla uses Fixed Retention with our internal tools.

I found some nominal guidance (grain of salt required):
1-week churn: 80% bad, 40% good, 20% phenomenal
1-week retention: 25% baseline, 45% good, 65% great

Cost per Install (CPI)

I have also seen this called Customer Acquisition Cost (CAC), but it’s basically the cost (mostly marketing or pay-to-play pre-installs) of getting a person to install a product. I have seen this in two forms: blended – where ‘installs’ are both organic and from campaigns, and paid – where ‘installs’ are only those that come from campaigns. It seems like paid CPI is the better metric.

Lower CPI is better and Mozilla has been using Adjust with various ad networks and marketing campaigns to figure out the right channel and the right messaging to get Firefox the most installs for the lowest cost.

Lifetime Value (LTV)

I’ve seen this defined as the total value of a customer over the life of that customer’s relationship with the company. It helps determine the long-term value of the customer and can help provide a target for reasonable CPI. It’s weird thinking of “customers” and “value” when talking about people who use Firefox, but we do spend money developing and marketing Firefox. We also get revenue, maybe indirectly, from those people.

LTV works hand-in-hand with churn, since the length of the relationship is inversely proportional to the churn. The longer we keep a person using Firefox, the higher the LTV. If CPI is higher than LTV, we are losing money on user acquisition efforts.

Total Addressable Market (TAM)

We use this metric to describe the size of a potential opportunity. Obviously, the bigger the TAM, the better. For example, we feel the TAM (People with kids that use Android tablets) for Family Friendly Browsing is large enough to justify doing the work to ship the feature.

Net Promoter Score (NPS)

We have seen this come up in some surveys and user research. It’s suppose to show how satisfied your customers are with your product. This metric has it’s detractors though. Many people consider it a poor value, but it’s still used quiet a lot.

NPS can be as low as -100 (everybody is a detractor) or as high as +100 (everybody is a promoter). An NPS that is positive (higher than zero) is felt to be good, and an NPS of +50 is excellent.

Go Forth!

If you don’t track any of these metrics for your applications, you should. There are a lot of off-the-shelf tools to help get you started. Level-up your engineering game and make a bigger impact on the success of your application at the same time.

November 22, 2015 04:33 PM

November 07, 2015

Guillermo López (willyaranda)

Publicidad y rastreo, cómo protegerme con Firefox

Es un post demasiado largo, puedes ver el vídeo si no quieres leer.

Al hilo del post que escribí el otro día sobre la publicidad y el rastreo (tracking) en internet, con foco en la página web de el diario El Mundo, un amigo me pidió que escribiera un post con información de cómo me protejo en internet de la publicidad invasiva y el rastreo de usuarios con Firefox, mi navegador.

Disclaimer: es posible que algunas páginas web dejen de funcionar correctamente, por ejemplo, que no funcionen los comentarios en algunas webs, que tengas problemas al hacer login con Facebook en páginas de terceros (esas páginas que te permiten conectarte a ellas con Facebook) y algo más. En el caso de que alguna web no te funcione bien, simplemente desactiva temporal o permanentemente uBlock en dicha página, si así lo crees conveniente.

He de decir que realmente son 3 pasos lo que hago siempre que instalo un Firefox nuevo, y que permiten a los usuarios de internet tener la elección de qué datos dan a terceros y no molestarme con la publicidad intrusiva que sufrimos.

Bloqueador de publicidad

Durante mucho tiempo, este puesto lo tuvo AdBlock (Plus), un complemento que se creó en los primeros momentos de Firefox y que utilizaba todo el potencial de las extensiones.

Sin embargo últimamente ha aparecido como gran contendiente una extensión llamada uBlock Origin (hay un fork, llamado uBlock a secas, pero yo me centro en el original), que, según defienden sus autores, es mucho más eficiente que AdBlock (por conceptos técnicos que no nos vamos a meter aquí), y por lo tanto más rápida y con menos consumo de memoria.

Así que vamos a usar uBlock Origin.

Si no has instalado nunca una extensión en Firefox, realmente te estás perdiendo toda la personalización de este navegador, puesto que pueden hacer cosas muy chulas, y automatizaciones: bloquear publicidad, descargar vídeos de YouTube cuando estás en la página de este, rellenar las contraseñas automáticamente, cambiar el aspecto de Firefox completamente, decirte si un sitio es realmente seguro o no, redirigirte a sitios seguros…

  1. Para instalar uBlock Origin hay que visitar la página de la extensión en Addons Mozilla. Pulsa aquí.
  2. Haz click en el botón + Agregar a Firefox.

    Agregar uBlock Origin a Firefox

    Agregar uBlock Origin a Firefox

  3. Espera que se descargue la extensión.

    uBlock Origin, descargando

    uBlock Origin, descargando

  4. Pulsa en instalar.

    uBlock Origin, instalar

    uBlock Origin, instalar

  5. ¡Ya está!

    uBlock Origin instalado

    uBlock Origin instalado

¿Ves qué fácil? Ya estás protegido contra la mayoría de los anuncios y páginas de rastreo. Pero vamos a ir un poco más allá y dejar nuestro Firefox impoluto para estar protegidos de todo tipo de rastreos y publicidad.

Configurar uBlock Origin

La instalación por defecto de uBlock Origin ya tiene la mayoría de los bloqueadores activados para ocultar la publicidad, por lo que no haría falta hacer nada más, sin embargo, te voy a recomendar varios bloqueadores más que yo suelo tener instalado.

Para ello, haz lo siguiente:

  1. Ve al menú Herramientas > Complementos (o pulsa Cmd+Shift+A en Mac o Control+Shift+A en Windows/Linux).
  2. Pulsa en las preferencias de uBlock Origin.

    uBlock Origin preferencias

    uBlock Origin preferencias

  3. Pulsa en Mostrar Panel de control.

    uBlock Origin, Mostrar Panel de Control

    uBlock Origin, Panel de control

  4. Una vez se te abra la nueva pestaña con la configuración de uBlock, pulsa en la pestaña interna Filtros de terceros.
  5. Activa los filtros tal cual se ven en la siguiente imagen (pulsa en la imagen para verla más grande si lo necesitas)

    uBlock Origin, filtros a activar

    uBlock Origin, filtros a activar

Con esto, podrás tener ya filtrado de publicidad y de rastreo de forma eficiente en Firefox. Pero podemos hacer un paso más.

Activar protección contra rastreo en Firefox

Aunque uBlock Origin ya tiene la protección, Firefox en sus últimas versiones lo tiene incorporado en su núcleo, lo que lo hace mucho más eficiente aún que esta extensión. Esto quiere decir que a parte de ahorrarte recursos de red, te vas a ahorrar procesador. Todo ventajas.

Para ello tienes que hacer lo siguiente.

  1. En la barra de direcciones de Firefox, escribe
    about:config
  2. Te saldrá un gran aviso, que variará según la versión de español que estés usando. En la versión para España es la siguiente:

    uBlock origin - about:config

    uBlock origin – about:config

  3. Pulsa en “¡Tendré cuidado, lo prometo!”
  4. Busca la opción
    privacy.trackingprotection.enabled

    y haz doble clic en ella hasta que su valor cambie a true (es posible que se cambie todo a negrita, es normal)

    uBlock Origin - Protección contra rastreo

    uBlock Origin – Protección contra rastreo

Nota: en futuras versiones de Firefox, esta opción estará disponible dentro de las preferencias de Firefox (Herramientas > Preferencias en Linux/Windows o Firefox > Preferencias en Mac), en el panel de Privacidad. Deberías activar la opción “Usar protección contra rastreo”.

uBlock Origin - Activar protección contra rastreo

uBlock Origin – Activar protección contra rastreo

Conclusiones

¡Listo! Ya estás, aún, más protegido en internet. Ahora deberías comprobar las páginas web que utilizas normalmente y ver si funcionan de una forma correcta. Si no lo hacen, puedes desactivar uBlock si así lo deseas. Además, te aconsejo que también lo desactives en esas páginas que viven de la publicidad, que no es molesta, que no dan información a terceras partes, y cuya información valoras mucho.

Disfruta de una navegación mucho más segura y privada en internet gracias a Firefox y uBlock Origin. Di NO a la publicidad y el rastreo en internet.

Por cierto, no comento cómo hacer nada de esto con Google Chrome, porque si usas este navegador, ya estás expuesto, por defecto, al rastreo de Google y al código cerrado (Chrome no es abierto, lo es Chromium, pero Google mete cosas… que nadie excepto ellos saben cuáles son).

La entrada Publicidad y rastreo, cómo protegerme con Firefox aparece primero en Pijus Magnificus.

November 07, 2015 10:59 AM

October 28, 2015

Guillermo López (willyaranda)

Viajar a Islandia en invierno

Viajar a la naturaleza salvaje

A finales de febrero de 2015, realicé un viaje con mis amigos por el sur de Islandia, con una duración de 7 días. Así que aquí van una serie de consejos que recopilé para quien le sirva para viajar:

Comidas

Caballo (son muy bonitos, pero también se comen), cordero (que no lechazo, para los castellanos), ballena (“es como una vaca en el mar” nos dijo un islandés casi nacionalizado español), tiburón (es fermentado, no nos gustó), y una especie de yogur cremoso, llamado Skyr.

Conclusión

Al fin y al cabo, es un sitio con una naturaleza exageradamente bonita. Si se tienen en cuenta una serie de cosas (lo de arriba son consejos), es uno de los mejores sitios de la tierra, donde ves una unión increíble de la naturaleza, donde los humanos aún no están totalmente acostumbrados. Así que simplemente disfruta de la naturaleza.

Volcanes Bloques glaciar Glaciares Perdidos Carretera cerrada Glaciar a la distancia Cascadas Tormenta de arena

La entrada Viajar a Islandia en invierno aparece primero en Pijus Magnificus.

October 28, 2015 08:00 PM

October 26, 2015

Mark Surman (surman)

Fueling a movement

Mozilla was born from the free and open source software movement. And, as a part of this larger movement, Mozilla helped make open mainstream. We toppled a monopoly, got the web back on an open track, and put open source software into the hands of hundreds of millions of people.

It’s time for us to do this again. Which brings me to this blog’s topic: where should Mozilla Foundation focus its effort over the next five years?

big picture mozilla strategy

If you’ve been following my blog, you’ll know the answer we gave to this question back in June was ‘web literacy’. We dug deep into this thinking over the summer and early fall. As we did, we realized: we need to think more broadly. We need to champion web literacy, but we also need to champion privacy and tinkering and the health of the public internet. We need to fully embrace the movement of people who are trying to make open mainstream again, and add fuel to it. Building on a Mozilla strategy town hall talk I gave last week (see: video and slides), this post describes how we came to this conclusion and where we’re headed with our thinking.

Part of ‘digging deep’ into our strategy was taking another look at the world around us. We saw what we already knew: things are getting worse on the web. Monopolies. Silos. Surveillance. Fear. Insecurity. Public apathy. All are growing. However, we also noticed a flip side: there is a new wave of open afoot. Makers. Open data. Internet activism. Hackable connected devices. Open education. This new wave of open is gaining steam, even more so than we already knew.

This sparked our first big insight: Mozilla needs to actively engage on both sides. We need to tackle big challenges like monopolies and walled gardens, but we also need to add fuel and energy to the next wave of open. This is how we had an impact the first time around with Firefox. It’s what we need to do again.

As the Mozilla project overall, there are a number of things we should do to this end. We should build out from the base we already have with Firefox, reigniting our mojo as a populist challenger brand and developer platform. We should build our values and vision into areas like connected devices and online advertising, pushing ourselves to innovate until we find a new product beachhead. And, we should also back leaders and rally citizens to grow the movement of people who are building a digital world that is open, hackable and ours. As our colleagues at Mozilla Corporation drive on the first two fronts, the Mozilla Foundation can lead the way on the third.

Which brings us to the second place we explored over the summer: what we’ve achieved and built in the last five years. In 2010, we kicked off a new era for Mozilla Foundation with the Drumbeat Festival in Barcelona. At the time, we had half a dozen staff and $250k/year in outside revenue. Since then, we’ve grown to 80 staff, $12M/year in outside revenue and 5,000 active community leaders and contributors. We’ve rallied 1.7M supporters and brought in $40M in grants. More importantly: we have built a vibrant network of friends and allies who share our vision of the internet as a global public resource that belongs to all of us.

mofostrategy

As we looked back, we had a second insight: we have built two very powerful new capabilities into Mozilla. They are:

1. A leadership network: Mozilla has gotten good at gathering and connecting people — from executives to young community leaders — who share our cause. We get people working on projects where they teach, organize and hack with peers, helping them learn about open source and become stronger leaders along the way. The now-annual Mozilla Festival in London provides a snapshot of this network in action.

2. An advocacy engine: we’ve become good at rallying activists and citizens to take action for the open internet. This includes everything from attending local ‘teach-ins’ to signing a petition to donating to our shared cause. Our grassroots Maker Party learning campaign and our mass mobilization around issues like net neutrality are examples of this in action.

If our aim is to fuel the movement that’s driving the next wave of open, these capabilities can be incredibly powerful. They give us a way to support and connect the leaders of that movement. And they give us a way to join in common cause with others in incredibly powerful ways.

In the end, this led us to a very simple strategy that the Mozilla Foundation team will focus on over the coming years:

1. Build and connect leaders (leadership network)
+
2. Rally citizens to our common cause (advocacy engine)
=
3. Fuel the movement (to drive the next wave of open)

This strategy is about doubling down on the strengths we’ve built, strengthening the leadership side and investing significantly more on the advocacy side. The Mozilla Foundation board and team have all expressed strong support for this approach. Over the coming week we’re taking this strategy into an operational planning phase that will wrap up in December.

One important note as we dive into detailed planning: it will be critical that we’re concrete and focused on what ‘hills’ we want to take. Web literacy is definitely one of them. Privacy, walled gardens and the economics of the public internet are also on the list. The final list needs to be short (less than five) and, ideally, aligned with where we are aiming Mozilla’s product and innovation efforts. One of our top tasks for 2016 will be to develop a research and thought leadership program across Mozilla that will set a very specific public agenda, defining which topics we will focus our efforts on.

If you’ve been following my blog, you know this is a significant evolution in our strategic thinking. We’ve broadened our focus from web literacy to fueling the movement of which we are part. Universal web literacy is still central to this — both as a challenge that leaders in our network will tackle and as a goal that our large scale advocacy will focus on. However, looking at the world around us, the capabilities we’ve developed and the work we already have in play, I believe this broader approach is the right one. It adds a powerful and focused prong to Mozilla’s efforts to build and protect the internet as a public resource. Which, in the end, is why we’re here.

As always, comments on this post and the larger strategy are very welcome. Please post them here or email me directly.

Also, PS, here are the video and slide links again if you want more detail.

The post Fueling a movement appeared first on Mark Surman.

October 26, 2015 06:26 PM

October 14, 2015

Guillermo López (willyaranda)

Por qué no debes bloquear los anuncios en Internet

Si te interesa este tema, pásate por el foro de Mozilla Hispano, donde estamos debatiendo sobre esto

Hoy, al ver El Mundo, me he encontrado un artículo bastante destacado de su nueva revista, Papel, que habla sobre la publicidad en internet, y que se titula “Por qué no debes bloquear los anuncios en Internet“, escrito por Adrián Mediavilla (@adrimedia)

Y yo voy a deciros lo contrario: “Por qué debes bloquear los anuncios en Internet”.

Y lo voy a explicar simplemente con datos. Datos reales. Datos que todo el mundo puede corroborar y ver con sus ojos.

Publicidad y anuncios

Si entramos ahora mismo a la portada de El Mundo, nos encontramos con lo siguiente.

Publicidad en la portada de El MundoDe primeras entro, y no me permite ver nada de su web, y me salta una imagen con publicidad que cubre toda la pantalla. Mal empezamos.

Una vez que carga, ya podemos ver algo de texto. Todo entre publicidad. Enormes anuncios, claro.

Portada el Mundo, "limpia"Así que una vez que veo su gran portada, llena de información, entro a la entrevista que tienen en portada. Veamos qué hay.

Entrevista el Mundo, publiVaya. Más publicidad. Más elementos intrusivos. Esperemos entonces a que se oculte toda esa publicidad para leer realmente lo que queremos.

Entrevista el Mundo - limpia¡Caray! Por fin podemos ver algo de información. Por fin podemos encontrar contenido por lo que esta web es conocida: por tener información.

Tracking de usuario

Pero bueno, no es sólo la publicidad, porque si sólo se mostraran esos molestos banners, podría pasar (o no). Es toda la información que dan a terceras partes al pedir esas imágenes.

Hay una fantástica extensión para Firefox llamada Lightbeam que permite visualizar de una manera increíble todas las conexiones que tienen las páginas con otras, ya sea por imágenes, por estilos CSS o, normalmente, por anuncios y trackeo.

En el caso de El Mundo, vamos a ver qué nos da:

El Mundo, trackingHe visitado un sitio, elmundo.es y ha pedido recursos a 30 sitios de terceros. Esto quiere decir que simplemente visitando elmundo.es, hasta 30 sitios diferentes (que serán de 30 personas o empresas diferentes, en a saber qué partes del mundo) tienen información exacta de mi: que he visitado elmundo.es, a qué hora, con qué navegador, con qué sistema operativo… y un montón de información más que damos sin saberlo.

Si usamos un bloqueador de publicidad, vemos las diferencias de forma clara:

el-mundo-third-party-limpioHemos pasado de dar información de 30 sitios a 5, entre los que se incluye: expansión, marca y elmundo.net, todos del mismo grupo. Tiene sentido, puesto que es posible que elmundo.es contenga información de sus hermanos: económico y deportivo.

Tamaño de la página

Pero bueno, si no te convence demasiado el que no puedas leer la información por los anuncios, o que tus datos se estén enviando a aproximadamente unos 30 sitios diferentes, quizás te interese saber que todo eso extra ocupa mucho, y se tiene que descargar a tu navegador para mostrarse.

Firefox tiene una herramienta para desarrolladores que permite visualizar qué se ha descargado, cuánto ocupa y cuánto ha tardado en descargarse.

En mi caso, bajo una conexión bastante rápida de fibra (300/30), los datos que arroja la web sin bloquear nada son los siguientes:

elmundo-full13.7 MB de datos descargados, en 685 solicitudes totales, tardando la descarga 3,80 segundos.

Ahora comparémoslo con lo que da la página bloqueando publicidad y trackers:

elmundo-light4.6MB, en 324 solicitudes totales, tardando la descarga 1,48 segundos.

Los datos son:

Datos. Eso son datos.

Conclusiones

No quiero ni siquiera en lo ético o no de tener que aguantar esa publicidad, gigantescos anuncio tan intrusivos, ni en que terceras partes se enteren de lo que hago en internet simplemente por visitar una página de información.

Sólo expongo los datos. Y creo que hablan por sí mismos. Así que yo, mientras no haya publicidad ética, no intrusiva y que respete las preferencias del usuario, seguiré usando bloqueadores de publicidad.

Si te interesa este tema, pásate por el foro de Mozilla Hispano, donde estamos debatiendo sobre esto

La entrada Por qué no debes bloquear los anuncios en Internet aparece primero en Pijus Magnificus.

October 14, 2015 09:13 PM

October 12, 2015

Guillermo López (willyaranda)

¿Qué cojones celebras en el día de la hispanidad?

Que, por cierto, no es el día de la hispanidad, si no el día de la fiesta nacional. A menos que quieras usar la denominación franquista, claro.

¿Qué celebras? En serio, ¿qué cojones celebras?

¿Celebras el día de España? ¿Una exaltación del nacionalismo? Perfecto. Luego no te quejes de otros nacionalismos dentro de España.

¿Celebras el día de un país que gasta 800.000 euros en decir lo grande que la tiene? ¿En un desfile militar?

¿Celebras este día que supone que los reyes (elegidos democráticamente, como todos sabemos), hacen un ágape para 1500 personas, mientras cientos de miles de familias están por debajo del umbral de la pobreza?

¿Celebras este día de esa nación que hace que haya una niñas de unos 10 años viendo un fastuoso desfile militar y mientras hay más de 2 millones de niños bajo el umbral de la pobreza?

¿Celebras este día, sabiendo que en España hay un 22,20% de paro (la media de la OCDE es del 6,8%)?

¿Lo celebras sabiendo que el 48,80% de los jóvenes entre 15 y 24 años están en paro (la media de la OCDE es del 13,70%)?

¿Celebras este día, cuando la deuda pública es del 100% del PIB?

¿Celebras este día, cuando hemos regalado a los bancos 95.000 millones de euros?

¿Celebras cuando hay cientos de desahucios de viviendas habituales por todo el país? (recuerda el punto anterior)

¿Celebras cuando hay miles de universitarios (parte muy importante del futuro de este país, recuerda), que han tenido que dejar las carreras, o no empezarlas, porque no hay suficientes becas para personas que no tienen suficiente dinero?

En serio, dime, ¿qué cojones celebras el 12 de octubre? ¿de qué te sientes orgulloso?

La entrada ¿Qué cojones celebras en el día de la hispanidad? aparece primero en Pijus Magnificus.

October 12, 2015 07:32 PM

October 02, 2015

Mark Finkle (mfinkle)

Fun With Telemetry: URL Suggestions

Firefox for Android has a UI Telemetry system. Here is an example of one of the ways we use it.

As you type a URL into Firefox for Android, matches from your browsing history are shown. We also display search suggestions from the default search provider. We also recently added support for displaying matches to previously entered search history. If any of these are tapped, with one exception, the term is used to load a search results page via the default search provider. If the term looks like a domain or URL, Firefox skips the search results page and loads the URL directly.

fennec-suggestions-annotated

  1. This suggestion is not really a suggestion. It’s what you have typed. Tagged as user.
  2. This is a suggestion from the search engine. There can be several search suggestions returned and displayed. Tagged as engine.#
  3. This is a special search engine suggestion. It matches a domain, and if tapped, Firefox loads the URL directly. No search results page. Tagged as url
  4. This is a matching search term from your search history. There can be several search history suggestions returned and displayed. Tagged as history.#

Since we only recently added the support for search history, we want to look at how it’s being used. Below is a filtered view of the URL suggestion section of our UI Telemetry dashboard. Looks like history.# is starting to get some usage, and following a similar trend to engine.# where the first suggestion returned is used more than the subsequent items.

Also worth pointing out that we do get a non-trivial amount of url situations. This should be expected. Most search keyword data released by Google show that navigational keywords are the most heavily used keywords.

An interesting observation is how often people use the user suggestion. Remember, this is not actually a suggestion. It’s what the person has already typed. Pressing “Enter” or “Go” would result in the same outcome. One theory for the high usage of that suggestion is it provides a clear outcome: Firefox will search for this term. Other ways of trigger the search might be more ambiguous.

telemetry-suggestions

October 02, 2015 02:26 PM

September 18, 2015

Mark Finkle (mfinkle)

Is Ad Blocking Really About The Ads?

Since Apple released iOS9 with Content Blocking extensions for Safari, there has been a lot of discussion about the ramifications. On one side, you have the content providers who earn a living by monetizing the content they generate. On the other side, you have consumers who view the content on our devices, trying to focus on the wonderful content and avoid those annoying advertisements.

But wait, is it really the ads? The web has had advertisements for a long time. Over the years, the way Ad Networks optimize the efficiency of ad monetization has changed. I think it happened slowly enough that we, as consumers, largely didn’t noticed some of the downsides. But other people did. They realized that Ad Networks track us in ways that might feel like an invasion of privacy. So those people started blocking ads.

Even more recently, we’ve started to discover that the mechanisms Ad Networks use to serve and track ad also create horrible performance problems, especially on mobile devices. If pages load slowly in a browser, people notice. If pages start consuming more bandwidth, people notice. If you provide a way to quickly and easily improve performance and reduce data usage, people will try it. I’d even posit that people care a lot more about performance and data usage, than privacy.

Even if you don’t care about the privacy implications of tracking cookies and other technologies sites use to identify us online, you might want to turn on Tracking Protection in Firefox anyway for a potential big speed boost. – Lifehacker

Is there a way for Ad Networks to clean up this mess? Some people inside Mozilla think it can be done, but it will take some effort and participation from the Ad Networks. I don’t think anyone has a good plan yet. Maybe the browser could help Ad Networks do things much more efficiently, improving performance and reducing bandwidth. Maybe people could choose how much personal information they want to give up.

If you fix the performance, data usage and privacy issues – will people really care that much about advertisements?

September 18, 2015 04:14 AM

September 01, 2015

Guillermo López (willyaranda)

Ridículo

Hay maneras de hacer el ridículo.

Y luego está el Real Madrid con sus porteros.

Es lamentable como una institución tan grande, con tanto poder económico y contactos tenga tantos problemas en su portería.

Pasamos de tener a San Iker, a intentar defenestrarle (Mourinho mediante), con Adán, con Diego López, comprando al mejor portero del mundial de 2014 (Keylor Navas), a Kiko Casilla… echando a Iker (pagando incluso por ello), y ahora el estropicio de De Gea.

No ya sólo es el ridículo de no hacer las cosas en hora (lo llaman burocracía, y no, es hacer las cosas a última hora), si no el cómo se ha tratado al portero titular actual del Madrid.

Y es que, en el fondo, es el karma. El puto karma.

La entrada Ridículo aparece primero en Pijus Magnificus.

September 01, 2015 06:17 PM

August 15, 2015

Mark Finkle (mfinkle)

Random Management: Unblocking Technical Leadership

I’ve been an Engineering Manager for a while now, but for many years I filled a Developer role. I have done a lot of coding over the years. I still try to do a little coding every now and then. Because of my past as a developer, I could be oppressive to senior developers on my teams. When making decisions, I found myself providing both the management viewpoint and the technical viewpoint. This usually means I was keeping a perfectly qualified technical person from participating at a higher level of responsibility. This creates an unhealthy technical organization with limited career growth opportunities.

As a manager with a technical background, I found it difficult to separate the two roles, but admitting there was a problem was a good first step. Over the last few years, I have been trying to get better at creating more room for technical people to grow on my teams. It seems to be more about focusing on outcomes for them to target, finding opportunities for them to tackle, listening to what they are telling me, and generally staying out of the way.

Another thing to keep in mind, it’s not just an issue with management. The technical growth track is a lot like a ladder: Keep developers climbing or everyone can get stalled. We need to make sure Senior Developers are working on suitable challenges or they end up taking work away from Junior Developers.

I mentioned this previously, but it’s important to create a path for technical leadership. With that in mind, I’m really happy about the recently announced Firefox Technical Architects Group. Creating challenges for our technical leadership, and roles with more responsibility and visibility. I’m also interested to see if we get more developers climbing the ladder.

August 15, 2015 10:09 PM

July 27, 2015

Mark Surman (surman)

Mozilla Learning Strategy Slides

Developing a long term Mozilla Learning strategy has been my big focus over the last three months. Working closely with people across our community, we’ve come up with a clear, simple goal for our work: universal web literacy. We’ve also defined ‘leadership’ and ‘advocacy’ as our two top level strategies for pursuing this goal. The use of ‘partnerships and networks’ will also be key to our efforts. These are the core elements that will make up the Mozilla Learning strategy.

Over the last month, I’ve summarized our thinking on Mozilla Learning for the Mozilla Board and a number of other internal audiences. This video is based on these presentations:

As you’ll see in the slides, our goal for Mozilla Learning is an ambitious one: make sure everyone knows how to read, write and participate on the web. In this case, everyone = the five billion people who will be online by 2025.

Our top level thinking on how to do this includes:

1. Develop leaders who teach and advocate for web literacy.

Concretely, we will integrate our Clubs, Hive and Fellows initiatives into a single, world class learning and leadership program.

2. Shift thinking: everyone understands the web / internet.

Concretely, this means we will invest more in advocacy, thought leadership and user education. We may also design ways to encourage web literacy more aggressively in our products.

3. Build a global web literacy network.

Mozilla can’t create universal web literacy on its own. All of our leadership and advocacy work will involve ‘open source’ partners with whom we’ll create a global network committed to universal web literacy.

Process-wise: we arrived at this high level strategy by looking at our existing programs and assets. We’ve been working on web literacy, leadership development and open internet advocacy for about five years now. So, we already have a lot in play. What’s needed right now is a way to focus all of our efforts in a way that will increase their impact — and that will build a real snowball of people, organizations and governments working on the web literacy agenda.

The next phase of Mozilla Learning strategy development will dig deeper on ‘how’ we will do this. I’ll provide a quick intro post on that next step in the coming days.

The post Mozilla Learning Strategy Slides appeared first on Mark Surman.

July 27, 2015 08:03 PM

July 22, 2015

Mark Surman (surman)

Building a big tent (for web literacy)

Building a global network of partners will be key to the success of our Mozilla Learning initiative. A network like this will give us the energy, reach and diversity we need to truly scale our web literacy agenda. And, more important, it will demonstrate the kind of distributed leadership and creativity at the heart of Mozilla’s vision of the web.

networks

As I said in my last two posts, leadership development and advocacy will be the two core strategies we employ to promote universal web literacy. Presumably, Mozilla could do these things on its own. However, a distributed, networked approach to these strategies is more likely to scale and succeed.

Luckily, partners and networks are already central to many of our programs. What we need to do at this stage of the Mozilla Learning strategy process is determine how to leverage and refine the best aspects of these networks into something that can be bigger and higher impact over time. This post is meant to frame the discussion on this topic.

The basics

As a part of the Mozilla Learning strategy process, we’ve looked at how we’re currently working with partners and using networks. There are three key things we’ve noticed:

  1. Partners and networks are a part of almost all of our current programs. We’ve designed networks into our work from early on.
  1. Partners fuel our work: they produce learning content; they host fellows; they run campaigns with us. In a very real way, partners are huge contributors (a la open source) to our work.
  1. Many of our partners specialize in learning and advocacy ‘on the ground’. We shouldn’t compete with them in this space — we should support them.

With these things in mind, we’ve agreed we need to hold all of our program designs up to this principle:

Design principle = build partners and networks into everything.

We are committed to integrating partners and networks into all Mozilla Learning leadership and advocacy programs. By design, we will both draw from these networks and provide value back to our partners. This last point is especially important: partnerships need to provide value to everyone involved. As we go into the next phase of the strategy process, we’re going to engage in a set of deep conversations with our partners to ensure the programs we’re building provide real value and support to their work.

Minimum viable partnership

Over the past few years, a variety of network and partner models have developed through Mozilla’s learning and leadership work. Hives are closely knit city-wide networks of educators and orgs. Maker Party is a loose network of people and orgs around the globe working on a common campaign. Open News and Mozilla Science sit within communities of practice with a shared ethos. Mozilla Clubs are much more like a global network of local chapters. And so on.

As we develop our Mozilla Learning strategy, we need to find a way to both: a) build on the strengths of these networks; and b) develop a common architecture that makes it possible for the overall network to grow and scale.

Striking this balance starts with a simple set of categories for Mozilla Learning partners and networks. For example:

This may not be the exact way to think about it, but it is certain that we will need some sort of common network architecture if we want to build partners and networks into everything. Working through this model will be an important part of the next phase of Mozilla Learning strategy work.

Partners = open source

In theory, one of the benefits of networks is that the people and organizations inside them can build things together in an open source-y way. For example, one set of partners could build a piece of software that they need for an immediate project. Another partner might hear about this software through the network, improve it for their own project and then give it back. The fact that the network has a common purpose means it’s more likely that this kind of open source creativity and value creation takes place.

This theory is already a reality in projects like Open News and Hive. In the news example, fellows and other members of the community post their code and documentation on the Source web page. This attracts the attention of other news developers who can leverage their work. Similarly, curriculum and practices developed by Hive members are shared on local Hive websites for others to pick up and run with. In both cases, the networks include a strong social component: you are likely to already know, or can quickly meet, the person who created a thing you’re interested in. This means it’s easy to get help or start a collaboration around a tool or idea that someone else has created.

One question that we have for Mozilla Learning overall is: can we better leverage this open source production aspect of networks in a more serious, instrumental and high impact way as we move forward? For example, could we: a) work on leadership development with partners in the internet advocacy space; b) have the fellows / leaders involved produce high quality curriculum or media; and c) use these outputs to fuel high impact global campaigns? Presumably, the answer can be ‘yes’. But we would first need to design a much more robust system of identifying priorities, providing feedback and deploying results across the network.

Questions

Whatever the specifics of our Mozilla Learning programs, it is clear that building in partnerships and networks will be a core design principle. At the very least, such networks provide us diversity, scale and a ground game. They may also be able to provide a genuine ‘open source’ style production engine for things like curriculum and campaign materials.

In order to design the partnership elements of Mozilla Learning, there are a number of questions we’ll need to dig into:

A key piece of work over the coming months will be to talk to partners about all of this. I will play a central role here, convening a set of high level discussions. People leading the different working groups will also: a) open up the overall Mozilla Learning process to partners and b) integrate partner input into their plans. And, hopefully, Laura de Reynal and others will be able to design a user research process that lets us get info from our partners in a detailed and meaningful way. More on all this in coming weeks as we develop next steps for the Mozilla Learning process.

The post Building a big tent (for web literacy) appeared first on Mark Surman.

July 22, 2015 03:54 PM

July 02, 2015

Mark Finkle (mfinkle)

Random Thoughts on Engineering Management

I have ended up managing people at the last three places I’ve worked, over the last 18 years. I can honestly say that only in the last few years have I really started to embrace the job of managing. Here’s a collection of thoughts and observations:

Growth: Ideas and Opinions and Failures

Expose your team to new ideas and help them create their own voice. When people get bored or feel they aren’t growing, they’ll look elsewhere. Give people time to explore new concepts, while trying to keep results and outcomes relevant to the project.

Opinions are not bad. A team without opinions is bad. Encourage people to develop opinions about everything. Encourage them to evolve their opinions as they gain new experiences.

“Good judgement comes from experience, and experience comes from bad judgement” – Frederick P. Brooks

Create an environment where differing viewpoints are welcomed, so people can learn multiple ways to approach a problem.

Failures are not bad. Failing means trying, and you want people who try to accomplish work that might be a little beyond their current reach. It’s how they grow. Your job is keeping the failures small, so they can learn from the failure, but not jeopardize the project.

Creating Paths: Technical versus Management

It’s important to have an opinion about the ways a management track is different than a technical track. Create a path for managers. Create a different path for technical leaders.

Management tracks have highly visible promotion paths. Organization structure changes, company-wide emails, and being included in more meetings and decision making. Technical track promotions are harder to notice if you don’t also increase the person’s responsibilities and decision making role.

Moving up either track means more responsibility and more accountability. Find ways to delegate decision making to leaders on the team. Make those leaders accountable for outcomes.

Train your engineers to be successful managers. There is a tradition in software development to use the most senior engineer to fill openings in management. This is wrong. Look for people that have a proclivity for working with people. Give those people management-like challenges and opportunities. Once they (and you) are confident in taking on management, promote them.

Snowflakes: Each Engineer is Different

Engineers, even great ones, have strengthens and weaknesses. As a manager, you need to learn these for each person on your team. People can be very strong at starting new projects, building something from nothing. Others can be great at finishing, making sure the work is ready to release. Some excel at user-facing code, others love writing back-end services. Leverage your team’s strengthens to efficiently ship products.

“A 1:1 is your chance to perform weekly preventive maintenance while also understanding the health of your team” – Michael Lopp (rands)

The better you know your team, the less likely you will create bored, passionless drones. Don’t treat engineers as fungible, swapable resources. Set them, and the team, up for success. Keep people engaged and passionate about the work.

Further Reading

The Role of a Senior Developer
On Being A Senior Engineer
Want to Know Difference Between a CTO and a VP of Engineering?
Thoughts on the Technical Track
The Update, The Vent, and The Disaster
Bored People Quit
Strong Opinions, Weakly Held

July 02, 2015 03:11 AM

July 01, 2015

Guillermo López (willyaranda)

La Grecia moderna

Cuando el PASOK y ND mintieron [2] para que Grecia pudiera entrar en la Unión Europea y en el Euro, Syriza aún no estaba allí.

Cuando se pidió el (primer) rescate a Grecia, Syriza aún no gobernaba.

Cuando la mayor parte del rescate se fue a pagar los intereses de la deuda (principalmente a bancos alemanes y franceses), Syriza aún no gobernaba.

Cuando la Unión Europea decidió que en vez de rescatar a los ciudadanos, haciendo quebrar a los bancos, denunciando a sus gestores, había que rescatar a los bancos (porque sus deudas eran de bancos alemanes y franceses), Syriza aún no gobernaba.

Cuando los gobiernos de PASOK y ND pidieron un segundo rescate para seguir pagando los intereses de la deuda inicial (y el segundo crédito), Syriza aún no gobernaba.

Cuando una gran parte del dinero público se iba a seguir financiando un gran ejército (con compras milmillonarias a Alemania), fruto del “quién la tiene más grande” con Turquía, Syriza aún no gobernaba.

Cuando en enero de 2015 los griegos decidieron en las urnas de forma democrática que no podían seguir pidiendo créditos hasta el infinito y que la austeridad obligada por la Unión Europea les iba a llevar, tarde o temprano, a la desaparición, Syriza empezó a gobernar, con la tarea de restituir derechos y eliminar la austeridad y renegociar la deuda con sus acreedores.

Cuando 6 meses después, Grecia, después de lo que parecen negociaciones muy duras, está muy cerca de salir de la Unión Europea, la culpa es de Syriza, como si los últimos 10 años de gobiernos arrodillados ante bancos, FMI y la austeridad, no hubieran existido.

Syriza tiene la culpa de haber sido la más votada en las últimas elecciones y ser consecuente con su programa y su ideología, la de anteponer la gente al sufrimiento de esta. No como durante últimos 10 años, cuando no gobernaba Syriza.

La entrada La Grecia moderna aparece primero en Pijus Magnificus.

July 01, 2015 08:20 AM

June 16, 2015

Guillermo López (willyaranda)

¡VIVA ZAPATA!

Disclaimer: el título puede ser, o no, un alegato a Guillermo Zapata, concejal de Ahora Madrid que ha tenido que dimitir. O puede ser un alegato a la película ¡Viva Zapata! Puedes elegir.

 

La descontextualización (palabra repetida mil y una veces estos dos últimos días) ha hecho que un concejal de Ahora Madrid (la plataforma ciudadana “radical de izquierdas” que ha conseguido la alcaldía), haya tenido su primera crisis. Crisis producida por los tuits escritos por uno de sus concejales hace unos años, al hilo de otros que escribió Nacho Vigalondo, como este:

Lo de Zapata son unos tuits que cuentan chistes (“entrecomillados”) sobre el holocausto judío, comentarios de dudoso gusto para mucha gente, o sobre Irene Villa y Marta del Castillo, por ejemplo (aunque la propia Irene Villa ha quitado repercusión y el padre de Marta también).

Son chistes que a mucha gente les cuesta digerir (con toda la razón del mundo), y a otro grupo de personas que simplemente se llevan las manos a la cabeza cuando tienen gente a su alrededor que ha dicho lo mismo (no lo enlazo, lo veis diariamente), incluso ya teniendo un cargo público.

Lo que hay que hacer es aceptar (aunque se critiquen, ojo, no respetarlos si se parecen hirientes) los chistes en ambos sentidos: chistes sobre cunetas, y chistes sobre holocausto, siempre que se digan desde un punto de vista irónico o humorístico.

El problema, es que twitter es muy de 140 caracteres y ya. Apenas hay contexto. Puedes enlazar a un tuit, que es una entidad en sí misma, que no tiene contexto (excepto que sea una respuesta a otro tuit), que no se sabe lo que le rodea.

Así pues, ¿es más legítimo hacer chistes (negros, para muchos de dudosa inteligencia y moralidad), que son eso, chistes, a robar millones, a tener contabilidad B, a pagar tus sedes con dinero para los afectados por el terrorismo? ¿cuál es el límite para la dimisión de un político? ¿un chiste? ¿o robar a los ciudadanos?

Ah, y siguiendo MiMesaCojea, sólo por si acaso lo necesito en el futuro:

Ante las reacciones provocadas por mi comentario en la red social Twitter, quiero pedir perdón públicamente. Lo siento. Cometí un error y soy consciente de que, con mis palabras, he podido herir a muchas personas.

Quienes me conocen saben que muchos de mis amigos son negros, árabes, gays, lesbianas, transexuales, judíos, mujeres, enanos, discapacitados con muy diversas discapacidades, personas con enfermedades raras, cineastas españoles, enfermos mentales, víctimas de ETA, víctimas del GAL, víctimas de GRAPO y de Al Queda, víctimas de la violencia de género machista, personas con disfunción eréctil, personas con tumores, drogadictos, obesos mórbidos y, en general, personas pertenecientes a colectivos minoritarios. Incluso conozco a un tío con el sarcoma de Kaposi.

Yo siempre he luchado por sus derechos. Por eso lamento que mi desafortunado comentario en la red social Twitter haya dado una impresión equivocada sobre mi persona. Y lamento, sobre todo, los perjuicios que haya podido causar al colectivo afectado.

Una vez más, mis más sinceras disculpas.

La entrada ¡VIVA ZAPATA! aparece primero en Pijus Magnificus.

June 16, 2015 07:00 AM

May 27, 2015

Benjamin Smedberg (bsmedberg)

Yak Shaving

Yak shaving tends to be looked down on. I don’t necessarily see it that way. It can be a way to pay down technical debt, or learn a new skill. In many ways I consider it a sign of broad engineering skill if somebody is capable of solving a multi-part problem.

It started so innocently. My team has been working on unifying the Firefox Health Report and Telemetry data collection systems, and there was a bug that I thought I could knock off pretty easily: “FHR data migration: org.mozilla.crashes”. Below are the roadblocks, mishaps, and sideshows that resulted, and I’m not even done yet:

Tryserver failure: crashes

Constant crashes only on Linux opt builds. It turned out this was entirely my fault. The following is not a safe access pattern because of c++ temporary lifetimes:

nsCSubstringTuple str = str1 + str2;
Fn(str);
Backout #1: talos xperf failure

After landing, the code was backed out because the xperf Talos test detected main-thread I/O. On desktop, this was a simple ordering problem: we always do that I/O during startup to initialize the crypto system; I just moved it slightly earlier in the startup sequence. Why are we initializing the crypto system? To generate a random number. Fixed this by whitelisting the I/O. This involved landing code to the separate Talos repo and then telling the main Firefox tree to use the new revision. Much thanks to Aaron Klotz for helping me figure out the right steps.

Backout #2: test timeouts

Test timeouts if the first test of a test run uses the PopupNotifications API. This wasn’t caught during initial try runs because it appeared to be a well-known random orange. I was apparently changing the startup sequence just enough to tickle a focus bug in the test harness. It so happened that the particular test which runs first depends on the e10s or non-e10s configuration, leading to some confusion about what was going on. Fortunately, I was able to reproduce this locally. Gavin Sharp and Neil Deakin helped get the test harness in order in bug 1138079.

Local test failures on Linux

I discovered that several xpcshell tests were failing locally on Linux which were working fine on tryserver. After some debugging, I discovered that the tests thought I wasn’t using Linux, because the cargo-culted test for Linux was let isLinux = ("@mozilla.org/gnome-gconf-service;1" in Cc). This means that if gconf is disabled at build time or not present at runtime, the tests will fail. I installed GConf2-devel and rebuilt my tree and things were much better.

Incorrect failure case in the extension manager

While debugging the test failures, I discovered an incorrect codepath in GMPProvider.jsm for clients which are not Windows, Mac, or Linux (Android and the non-Linux Unixes).

Android performance regression

The landing caused an Android startup performance regression, bug 1163049. On Android, we don’t initialize NSS during startup, and the earlier initialization of the addon manager caused us to generate random Sync IDs for addons. I first fixed this by using Math.random() instead of good crypto, but Richard Newman suggested that I just make Sync generation lazy. This appears to work and will land when there is an open tree.

mach bootstrap on Fedora doesn’t work for Android

As part of debugging the performance regression, I built Firefox for Android for the first time in several years. I discovered that mach bootstrap for Android isn’t implemented on Fedora Core. I manually installed packages until it built properly. I have a list of the packages I installed and I’ll file a bug to fix mach bootstrap when I get a chance.

android build-tools not found

A configure check for the android build-tools package failed. I still don’t understand exactly why this happened; it has something to do with a version that’s too new and unexpected. Nick Alexander pointed me at a patch on bug 1162000 which fixed this for me locally, but it’s not the “right” fix and so it’s not checked into the tree.

Debugging on Android (jimdb)

Binary debugging on Android turned out to be difficult. There are some great docs on the wiki, but those docs failed to mention that you have to pass the configure flag –enable-debug-symbols. After that, I discovered that pending breakpoints don’t work by default with Android debugging, and since I was debugging a startup issue that was a critical failure. I wrote an ask.mozilla.org question and got a custom patch which finally made debugging work. I also had to patch the implementation of DumpJSStack() so that it would print to logcat on Android; this is another bug that I’ll file later when I clean up my tree.

Crash reporting broken on Mac

I broke crash report submission on mac for some users. Crash report annotations were being truncated from unicode instead of converted from UTF8. Because JSON.stringify doesn’t produce ASCII, this was breaking crash reporting when we tried to parse the resulting data. This was an API bug that existed prior to the patch, but I should have caught it earlier. Shoutout to Ted Mielczarek for fixing this and adding automated tests!

Semi-related weirdness: improving the startup performance of Pocket

The Firefox Pocket integration caused a significant startup performance issue on some trees. The details are especially gnarly, but it seems that by reordering the initialization of the addon manager, I was able to turn a performance regression into a win by accident. Probably something to do with I/O wait, but it still feels like spooky action at a distance. Kudos to Joel Maher, Jared Wein and Gijs Kruitbosch for diving into this under time pressure.

Experiences like this are frustrating, but as long as it’s possible to keep the final goal in sight, fixing unrelated bugs along the way might be the best thing for everyone involved. It will certainly save context-switches from other experts to help out. And doing the Android build and debugging was a useful learning experience.

Perhaps, though, I’ll go back to my primary job of being a manager.

May 27, 2015 08:22 PM

May 11, 2015

Benjamin Smedberg (bsmedberg)

Hiring at Mozilla: Beyond Resumés and Interview Panels

The standard tech hiring process is not good at selecting the best candidates, and introduces unconscious bias into the hiring process. The traditional resume screen, phone screen, and interview process is almost a dice-roll for a hiring manager. This year, my team has several open positions and we’re trying something different, both in the pre-interview screening process and in the interview process itself.

Hiring Firefox Platform Engineers now!

Earlier this year I attended a workshop for Mozilla managers by the Clayman Institute at Stanford. One of the key lessons is that when we (humans) don’t have clear criteria for making a choice, we tend alter our criteria to match subconscious preferences (see this article for some examples and more information). Another key lesson is that when humans lack information about a situation, our brain uses its subconscious associations to fill in the gaps.

Candidate Screening

I believe job descriptions are very important: not only do they help candidates decide whether they want a particular job, but they also serve as a guide to the kinds of things that will be required or important during the interview process. Please read the job description carefully before applying to any job!

In order to hire more fairly, I have changed the way I write job descriptions. Previously I mixed up job responsibilities and applicant requirements in one big bulleted list. Certainly every engineer on my team is going to eventually use C++ and JavaScript, and probably Python, and in the future Rust. But it isn’t a requirement that you know all of these coming into a job, especially as a junior engineer. It’s part of the job to work on a high-profile open-source project in a public setting. But that doesn’t mean you must have prior open-source experience. By separating out the job expectations and the applicant requirements, I was able to create a much clearer set of screening rules for incoming applications, and also clearer expectations for candidates.

Resumés are a poor tool for ranking candidates and deciding which candidates are worth the investment in a phone screen or interview. Resumés give facts about education or prior experience, but rarely make it clear whether somebody is an excellent engineer. To combat this, my team won’t be using only resumés as a screening tool. If a candidate matches basic criteria, such as living in a reasonable time zone and having demonstrated expertise in C++, JavaScript, or Python on their resumé or code samples, we will ask each candidate to submit a short written essay (like a blog post) describing their favorite debugging or profiling tool.

Why did I pick an essay about a debugging or profiling tool? In my experience, every good coder has a toolbox, and as coders gain experience they are naturally better toolsmiths. I hope that this essay requirement will be good way to screen for programmer competence and to gauge expertise.

With resumés, essays, and code samples in hand, Vladan and I will go through the applications and filter the applications. Each passing candidate will proceed to phone screens, to check for technical skill but more importantly to sell the candidate on the team and match them up with the best position. My goal is to exclude applications that don’t meet the requirements, not to rank candidates against each other. If there are too many qualified applicants, we will select a random sample for interviews. In order to make this possible, we will be evaluating applications in weekly batches.

Interview Process

To the extent possible, the interview format should line up with the job requirements. The typical Mozilla technical interview is five or six 45-minute 1:1 interview sessions. This format heavily favors people who can think quickly on their feet and who are personable. Since neither of those attributes is a requirement for this job, that format is a poor match. Here are the requirements in the job description that we need to evaluate during the interview:

This is the interview format that we came up with to assess the requirements:

During the debrief and decision process, I intend to focus as much as possible on the job requirements. Rather than asking a simple “should we hire this person” question, I will ask interviewers to rate the candidate on each job requirement and responsibility, as well as any desired skillset. By orienting the feedback to the job description I hope that we can reduce the effects of unconscious bias and improve the overall hiring process.

Conclusion

This hiring procedure is experimental. My team and I have concerns about whether candidates will be put off by the essay requirement or an unusual interview format, and whether plagiarism will make the essay an ineffective screening tool. We’re concerned about keeping the hiring process moving and not introducing too much delay. After the first interview rounds, I plan on evaluating the process, and ask candidates to provide feedback about their experience.

If you’re interested, check out my prior post, How I Hire At Mozilla.

May 11, 2015 11:32 AM

April 20, 2015

Benjamin Smedberg (bsmedberg)

Using crash-stats-api-magic

A while back, I wrote the tool crash-stats-api-magic which allows custom processing of results from the crash-stats API. This tool is not user-friendly, but it can be used to answer some pretty complicated questions.

As an example and demonstration, see a bug that Matthew Gregan filed this morning asking for a custom report from crash-stats:

In trying to debug bug 1135562, it’s hard to guess the severity of the problem or look for any type of version/etc. correlation because there are many types of hangs caught under the same mozilla::MediaShutdownManager::Shutdown stack. I’d like a report that contains only those with mozilla::MediaShutdownManager::Shutdown in the hung (main thread) stack *and* has wasapi_stream_init on one of the other threads, please.

To build this report, start with a basic query and then refine it in the tool:

  1. Construct a supersearch query to select the crashes we’re interested in. The only criteria for this query was “signature contains ‘MediaShutdownManager::Shutdown`. When possible, filter on channel, OS, and version to reduce noise.
  2. After the supersearch query is constructed, choose “More Options” from the results page and copy the “Public API URL” link.
  3. Load crash-stats-api-magic and paste the query URL. Choose “Fetch” to fetch the results. Look through the raw data to get a sense for its structure. Link
  4. The meat of this function is to filter out the crashes that don’t have “wasapi_stream_init” on a thread. Choose “New Rule” and create a filter rule:
    function(d) {
      var ok = false;
      d.json_dump.threads.forEach(function(thread) {
        thread.frames.forEach(function(frame) {
          if (frame.function && frame.function.indexOf("wasapi_stream_init") != -1) {
            ok = true;
          }
        });
      });
      return ok;
    }

    Choose “Execute” to run the filter. Link

  5. To get the final report we output only the signature and the crash ID for each result. Choose “New Rule” again and create a mapping rule:
    function(d) {
      return [d.uuid, d.signature];
    }

    Link

One of the advantages of this tool is that it is possible to iterate quickly on the data without constantly re-querying, but at the end it should be possible to permalink to the results in bugzilla or email exchanges.

If you need to do complex crash-stats analysis, please try it out! email me if you have questions, and pull requests are welcome.

April 20, 2015 05:45 PM

April 01, 2015

Joshua Cranmer (jcranmer)

Breaking news

It was brought to my attention recently by reputable sources that the recent announcement of increased usage in recent years produced an internal firestorm within Mozilla. Key figures raised alarm that some of the tech press had interpreted the blog post as a sign that Thunderbird was not, in fact, dead. As a result, they asked Thunderbird community members to make corrections to emphasize that Mozilla was trying to kill Thunderbird.

The primary fear, it seems, is that knowledge that the largest open-source email client was still receiving regular updates would impel its userbase to agitate for increased funding and maintenance of the client to help forestall potential threats to the open nature of email as well as to innovate in the space of providing usable and private communication channels. Such funding, however, would be an unaffordable luxury and would only distract Mozilla from its central goal of building developer productivity tooling. Persistent rumors that Mozilla would be willing to fund Thunderbird were it renamed Firefox Email were finally addressed with the comment, "such a renaming would violate our current policy that all projects be named Persona."

April 01, 2015 07:00 AM

February 17, 2015

Benjamin Smedberg (bsmedberg)

Gratitude Comes in Threes

Today Johnathan Nightingale announced his departure from Mozilla. There are three special people at Mozilla who shaped me into the person I am today, and Johnathan Nightingale is one of them:

Mike Shaver taught me how to be an engineer. I was a full-time musician who happened to be pretty good at writing code and volunteering for Mozilla. There were many people at Mozilla who helped teach me the fine points of programming, and techniques for being a good programmer, but it was shaver who taught me the art of software engineering: to focus on simplicity, to keep the ultimate goal always in mind, when to compromise in order to ship, and when to spend the time to make something impossibly great. Shaver was never my manager, but I credit him with a lot of my engineering success. Shaver left Mozilla a while back to do great things at Facebook, and I still miss him.

Mike Beltzner taught me to care about users. Beltzner was never my manager either, but his single-minded and sometimes pugnacious focus on users and the user experience taught me how to care about users and how to engineer products that people might actually want to use. It’s easy for an engineer to get caught up in the most perfect technology and forget why we’re building any of this at all. Or to get caught up trying to change the world, and forget that you can’t change the world without a great product. Beltzner left Mozilla a while back and is now doing great things at Pinterest.

Perhaps it is just today talking, but I will miss Johnathan Nightingale most of all. He taught me many things, but mostly how to be a leader. I have had the privilege of reporting to Johnathan for several years now. He taught me the nuances of leadership and management; how to support and grow my team and still be comfortable applying my own expertise and leadership. He has been a great and empowering leader, both for me personally and for Firefox as a whole. He also taught me how to edit my own writing and others, and especially never to bury the lede. Now Johnathan will also be leaving Mozilla, and undoubtedly doing great things on his next adventure.

It doesn’t seem coincidental that triumverate were all Torontonians. Early Toronto Mozillians, including my three mentors, built a culture of teaching, leading, mentoring, and Mozilla is better because of it. My new boss isn’t in Toronto, so it’s likely that I will be traveling there less. But I still hold a special place in my heart for it and hope that Mozilla Toronto will continue to serve as a model of mentoring and leadership for Mozilla.

Now I’m a senior leader at Mozilla. Now it’s my job to mentor, teach, and empower Mozilla’s leaders. I hope that I can be nearly as good at it as these wonderful Mozillians have been for me.

February 17, 2015 10:37 PM

February 15, 2015

Chris Tyler (ctyler)

Initial Current and Temperatures on the HiKey from 96Boards



I was fortunate to receive an early access HiKey board from the 96Boards project at Linaro Connect last week.

This board is powered by an 8-core, 64-bit Cortex-A53 ARMv8-A Kirin 620 SOC from HiSilicon with 1GB of LPDDR3 RAM, a Mali 450MP4 GPU, dual USB, eMMC and micro-SD storage, 802.11g/n, and high- and low-speed expansion connectors with I2C, SPI, DSI, GPIO, and USB interfaces.

So far, this has been an incredible board to work with, despite some teething pains with the pre-release/early access software and documentation (and a few minor quibbles with the design decisions behind the 96Boards Consumer Edition spec and this first board). It's not in the same performance class as the ARMv8 server systems that we have in the EHL at Seneca, but it's a very impressive board for doing ARMv8 porting and optimization work -- which is its intended purpose, along with providing a great board for hacker and maker communities.

I experimented with the board last week and took some readings at home today, and thought I'd share some of my findings on board current draw and temperatures, because it may be useful to those planning alternate power supplies and considering temperatures and airflows for cases:

A couple of other random observations about the board:

I'm looking forward to the release of WiFi drivers and UEFI bootloader support soon, as promised by the 96Boards project.

More notes to follow...










February 15, 2015 02:39 AM

January 16, 2015

Mark Finkle (mfinkle)

Firefox for Android: What’s New in v35

The latest release of Firefox for Android is filled with new features, designed to work with the way you use your mobile device.

Search

Search is the most common reason people use a browser on mobile devices. To help make it easier to search using Firefox, we created the standalone Search application. We have put the features of Firefox’s search system into an activity that can more easily be accessed. You no longer need to launch the full browser to start a search.

When you want to start a search, use the new Firefox Widget from the Android home screen, or use the “swipe up” gesture on the Android home button, which is available on devices with software home buttons.

fennec-swipeup

Once started, just start typing your search. You’ll see your search history, and get search suggestions as you type.

fennec-search-1-crop

The search results are displayed in the same activity, but tapping on any of the results will load the page in Firefox.

fennec-search-3

Your search history is shared between the Search and Firefox applications. You have access to the same search engines as in Firefox itself. Switching search engines is easy.

Sharing

Another cool feature is the Sharing overlay. This feature grew out of the desire to make Firefox work with the way you use mobile devices. Instead of forcing you to switch away from applications when sharing, Firefox gives you a simple overlay with some sharing actions, without leaving the current application.

fennec-share-overlay

You can add the link to your bookmarks or reading list. You can also send the link to a different device, via Firefox Sync. Once the action is complete, you’re back in the application. If you want to open the link, you can tap the Firefox logo to open the link in Firefox itself.

Synced Tabs

Firefox Sync makes it easy to access your Firefox data across your different devices, including getting to the browser tabs you have open elsewhere. We have a new Synced Tabs panel available in the Home page that lets you easily access open tabs on other devices, making it simple to pick up where you left off.

Long-tap an item to easily add a bookmark or share to another application. You can expand/collapse the device lists to manage the view. You can even long-tap a device and hide it so you won’t see it again!

fennec-synctabs

Improved Error Pages

No one is happy when an error page appears, but in the latest version of Firefox the error pages try to be a bit more helpful. The page will look for WiFi problems and also allow you to quickly search for a problematic address.

fennec-errorpage-search

January 16, 2015 03:28 PM

January 13, 2015

Joshua Cranmer (jcranmer)

Why email is hard, part 8: why email security failed

This post is part 8 of an intermittent series exploring the difficulties of writing an email client. Part 1 describes a brief history of the infrastructure. Part 2 discusses internationalization. Part 3 discusses MIME. Part 4 discusses email addresses. Part 5 discusses the more general problem of email headers. Part 6 discusses how email security works in practice. Part 7 discusses the problem of trust. This part discusses why email security has largely failed.

At the end of the last part in this series, I posed the question, "Which email security protocol is most popular?" The answer to the question is actually neither S/MIME nor PGP, but a third protocol, DKIM. I haven't brought up DKIM until now because DKIM doesn't try to secure email in the same vein as S/MIME or PGP, but I still consider it relevant to discussing email security.

Unquestionably, DKIM is the only security protocol for email that can be considered successful. There are perhaps 4 billion active email addresses [1]. Of these, about 1-2 billion use DKIM. In contrast, S/MIME can count a few million users, and PGP at best a few hundred thousand. No other security protocols have really caught on past these three. Why did DKIM succeed where the others fail?

DKIM's success stems from its relatively narrow focus. It is nothing more than a cryptographic signature of the message body and a smattering of headers, and is itself stuck in the DKIM-Signature header. It is meant to be applied to messages only on outgoing servers and read and processed at the recipient mail server—it completely bypasses clients. That it bypasses clients allows it to solve the problem of key discovery and key management very easily (public keys are stored in DNS, which is already a key part of mail delivery), and its role in spam filtering is strong motivation to get it implemented quickly (it is 7 years old as of this writing). It's also simple: this one paragraph description is basically all you need to know [2].

The failure of S/MIME and PGP to see large deployment is certainly a large topic of discussion on myriads of cryptography enthusiast mailing lists, which often like to partake in propositions of new end-to-end encryption of email paradigms, such as the recent DIME proposal. Quite frankly, all of these solutions suffer broadly from at least the same 5 fundamental weaknesses, and I see it unlikely that a protocol will come about that can fix these weaknesses well enough to become successful.

The first weakness, and one I've harped about many times already, is UI. Most email security UI is abysmal and generally at best usable only by enthusiasts. At least some of this is endemic to security: while it mean seem obvious how to convey what an email signature or an encrypted email signifies, how do you convey the distinctions between sign-and-encrypt, encrypt-and-sign, or an S/MIME triple wrap? The Web of Trust model used by PGP (and many other proposals) is even worse, in that inherently requires users to do other actions out-of-band of email to work properly.

Trust is the second weakness. Consider that, for all intents and purposes, the email address is the unique identifier on the Internet. By extension, that implies that a lot of services are ultimately predicated on the notion that the ability to receive and respond to an email is a sufficient means to identify an individual. However, the entire purpose of secure email, or at least of end-to-end encryption, is subtly based on the fact that other people in fact have access to your mailbox, thus destroying the most natural ways to build trust models on the Internet. The quest for anonymity or privacy also renders untenable many other plausible ways to establish trust (e.g., phone verification or government-issued ID cards).

Key discovery is another weakness, although it's arguably the easiest one to solve. If you try to keep discovery independent of trust, the problem of key discovery is merely picking a protocol to publish and another one to find keys. Some of these already exist: PGP key servers, for example, or using DANE to publish S/MIME or PGP keys.

Key management, on the other hand, is a more troubling weakness. S/MIME, for example, basically works without issue if you have a certificate, but managing to get an S/MIME certificate is a daunting task (necessitated, in part, by its trust model—see how these issues all intertwine?). This is also where it's easy to say that webmail is an unsolvable problem, but on further reflection, I'm not sure I agree with that statement anymore. One solution is just storing the private key with the webmail provider (you're trusting them as an email client, after all), but it's also not impossible to imagine using phones or flash drives as keystores. Other key management factors are more difficult to solve: people who lose their private keys or key rollover create thorny issues. There is also the difficulty of managing user expectations: if I forget my password to most sites (even my email provider), I can usually get it reset somehow, but when a private key is lost, the user is totally and completely out of luck.

Of course, there is one glaring and almost completely insurmountable problem. Encrypted email fundamentally precludes certain features that we have come to take for granted. The lesser known is server-side search and filtration. While there exist some mechanisms to do search on encrypted text, those mechanisms rely on the fact that you can manipulate the text to change the message, destroying the integrity feature of secure email. They also tend to be fairly expensive. It's easy to just say "who needs server-side stuff?", but the contingent of people who do email on smartphones would not be happy to have to pay the transfer rates to download all the messages in their folder just to find one little email, nor the energy costs of doing it on the phone. And those who have really large folders—Fastmail has a design point of 1,000,000 in a single folder—would still prefer to not have to transfer all their mail even on desktops.

The more well-known feature that would disappear is spam filtration. Consider that 90% of all email is spam, and if you think your spam folder is too slim for that to be true, it's because your spam folder only contains messages that your email provider wasn't sure were spam. The loss of server-side spam filtering would dramatically increase the cost of spam (a 10% reduction in efficiency would double the amount of server storage, per my calculations), and client-side spam filtering is quite literally too slow [3] and too costly (remember smartphones? Imagine having your email take 10 times as much energy and bandwidth) to be a tenable option. And privacy or anonymity tends to be an invitation to abuse (cf. Tor and Wikipedia). Proposed solutions to the spam problem are so common that there is a checklist containing most of the objections.

When you consider all of those weaknesses, it is easy to be pessimistic about the possibility of wide deployment of powerful email security solutions. The strongest future—all email is encrypted, including metadata—is probably impossible or at least woefully impractical. That said, if you weaken some of the assumptions (say, don't desire all or most traffic to be encrypted), then solutions seem possible if difficult.

This concludes my discussion of email security, at least until things change for the better. I don't have a topic for the next part in this series picked out (this part actually concludes the set I knew I wanted to discuss when I started), although OAuth and DMARC are two topics that have been bugging me enough recently to consider writing about. They also have the unfortunate side effect of being things likely to see changes in the near future, unlike most of the topics I've discussed so far. But rest assured that I will find more difficulties in the email infrastructure to write about before long!

[1] All of these numbers are crude estimates and are accurate to only an order of magnitude. To justify my choices: I assume 1 email address per Internet user (this overestimates the developing world and underestimates the developed world). The largest webmail providers have given numbers that claim to be 1 billion active accounts between them, and all of them use DKIM. S/MIME is guessed by assuming that any smartcard deployment supports S/MIME, and noting that the US Department of Defense and Estonia's digital ID project are both heavy users of such smartcards. PGP is estimated from the size of the strong set and old numbers on the reachable set from the core Web of Trust.
[2] Ever since last April, it's become impossible to mention DKIM without referring to DMARC, as a result of Yahoo's controversial DMARC policy. A proper discussion of DMARC (and why what Yahoo did was controversial) requires explaining the mail transmission architecture and spam, however, so I'll defer that to a later post. It's also possible that changes in this space could happen within the next year.
[3] According to a former GMail spam employee, if it takes you as long as three minutes to calculate reputation, the spammer wins.

January 13, 2015 04:38 AM

January 10, 2015

Joshua Cranmer (jcranmer)

A unified history for comm-central

Several years back, Ehsan and Jeff Muizelaar attempted to build a unified history of mozilla-central across the Mercurial era and the CVS era. Their result is now used in the gecko-dev repository. While being distracted on yet another side project, I thought that I might want to do the same for comm-central. It turns out that building a unified history for comm-central makes mozilla-central look easy: mozilla-central merely had one import from CVS. In contrast, comm-central imported twice from CVS (the calendar code came later), four times from mozilla-central (once with converted history), and imported twice from Instantbird's repository (once with converted history). Three of those conversions also involved moving paths. But I've worked through all of those issues to provide a nice snapshot of the repository [1]. And since I've been frustrated by failing to find good documentation on how this sort of process went for mozilla-central, I'll provide details on the process for comm-central.

The first step and probably the hardest is getting the CVS history in DVCS form (I use hg because I'm more comfortable it, but there's effectively no difference between hg, git, or bzr here). There is a git version of mozilla's CVS tree available, but I've noticed after doing research that its last revision is about a month before the revision I need for Calendar's import. The documentation for how that repo was built is no longer on the web, although we eventually found a copy after I wrote this post on git.mozilla.org. I tried doing another conversion using hg convert to get CVS tags, but that rudely blew up in my face. For now, I've filed a bug on getting an official, branchy-and-tag-filled version of this repository, while using the current lack of history as a base. Calendar people will have to suffer missing a month of history.

CVS is famously hard to convert to more modern repositories, and, as I've done my research, Mozilla's CVS looks like it uses those features which make it difficult. In particular, both the calendar CVS import and the comm-central initial CVS import used a CVS tag HG_COMM_INITIAL_IMPORT. That tagging was done, on only a small portion of the tree, twice, about two months apart. Fortunately, mailnews code was never touched on CVS trunk after the import (there appears to be one commit on calendar after the tagging), so it is probably possible to salvage a repository-wide consistent tag.

The start of my script for conversion looks like this:

#!/bin/bash

set -e

WORKDIR=/tmp
HGCVS=$WORKDIR/mozilla-cvs-history
MC=/src/trunk/mozilla-central
CC=/src/trunk/comm-central
OUTPUT=$WORKDIR/full-c-c

# Bug 445146: m-c/editor/ui -> c-c/editor/ui
MC_EDITOR_IMPORT=d8064eff0a17372c50014ee305271af8e577a204

# Bug 669040: m-c/db/mork -> c-c/db/mork
MC_MORK_IMPORT=f2a50910befcf29eaa1a29dc088a8a33e64a609a

# Bug 1027241, bug 611752 m-c/security/manager/ssl/** -> c-c/mailnews/mime/src/*
MC_SMIME_IMPORT=e74c19c18f01a5340e00ecfbc44c774c9a71d11d

# Step 0: Grab the mozilla CVS history.
if [ ! -e $HGCVS ]; then
  hg clone git+https://github.com/jrmuizel/mozilla-cvs-history.git $HGCVS
fi

Since I don't want to include the changesets useless to comm-central history, I trimmed the history by using hg convert to eliminate changesets that don't change the necessary files. Most of the files are simple directory-wide changes, but S/MIME only moved a few files over, so it requires a more complex way to grab the file list. In addition, I also replaced the % in the usernames with @ that they are used to appearing in hg. The relevant code is here:

# Step 1: Trim mozilla CVS history to include only the files we are ultimately
# interested in.
cat >$WORKDIR/convert-filemap.txt <<EOF
# Revision e4f4569d451a
include directory/xpcom
include mail
include mailnews
include other-licenses/branding/thunderbird
include suite
# Revision 7c0bfdcda673
include calendar
include other-licenses/branding/sunbird
# Revision ee719a0502491fc663bda942dcfc52c0825938d3
include editor/ui
# Revision 52efa9789800829c6f0ee6a005f83ed45a250396
include db/mork/
include db/mdb/
EOF

# Add the S/MIME import files
hg -R $MC log -r "children($MC_SMIME_IMPORT)" \
  --template "{file_dels % 'include {file}\n'}" >>$WORKDIR/convert-filemap.txt

if [ ! -e $WORKDIR/convert-authormap.txt ]; then
hg -R $HGCVS log --template "{email(author)}={sub('%', '@', email(author))}\n" \
  | sort -u > $WORKDIR/convert-authormap.txt
fi

cd $WORKDIR
hg convert $HGCVS $OUTPUT --filemap convert-filemap.txt -A convert-authormap.txt

That last command provides us the subset of the CVS history that we need for unified history. Strictly speaking, I should be pulling a specific revision, but I happen to know that there's no need to (we're cloning the only head) in this case. At this point, we now need to pull in the mozilla-central changes before we pull in comm-central. Order is key; hg convert will only apply the graft points when converting the child changeset (which it does but once), and it needs the parents to exist before it can do that. We also need to ensure that the mozilla-central graft point is included before continuing, so we do that, and then pull mozilla-central:

CC_CVS_BASE=$(hg log -R $HGCVS -r 'tip' --template '{node}')
CC_CVS_BASE=$(grep $CC_CVS_BASE $OUTPUT/.hg/shamap | cut -d' ' -f2)
MC_CVS_BASE=$(hg log -R $HGCVS -r 'gitnode(215f52d06f4260fdcca797eebd78266524ea3d2c)' --template '{node}')
MC_CVS_BASE=$(grep $MC_CVS_BASE $OUTPUT/.hg/shamap | cut -d' ' -f2)

# Okay, now we need to build the map of revisions.
cat >$WORKDIR/convert-revmap.txt <<EOF
e4f4569d451a5e0d12a6aa33ebd916f979dd8faa $CC_CVS_BASE # Thunderbird / Suite
7c0bfdcda6731e77303f3c47b01736aaa93d5534 d4b728dc9da418f8d5601ed6735e9a00ac963c4e, $CC_CVS_BASE # Calendar
9b2a99adc05e53cd4010de512f50118594756650 $MC_CVS_BASE # Mozilla graft point
ee719a0502491fc663bda942dcfc52c0825938d3 78b3d6c649f71eff41fe3f486c6cc4f4b899fd35, $MC_EDITOR_IMPORT # Editor
8cdfed92867f885fda98664395236b7829947a1d 4b5da7e5d0680c6617ec743109e6efc88ca413da, e4e612fcae9d0e5181a5543ed17f705a83a3de71 # Chat
EOF

# Next, import mozilla-central revisions
for rev in $MC_MORK_IMPORT $MC_EDITOR_IMPORT $MC_SMIME_IMPORT; do
  hg convert $MC $OUTPUT -r $rev --splicemap $WORKDIR/convert-revmap.txt \
    --filemap $WORKDIR/convert-filemap.txt
done

Some notes about all of the revision ids in the script. The splicemap requires the full 40-character SHA ids; anything less and the thing complains. I also need to specify the parents of the revisions that deleted the code for the mozilla-central import, so if you go hunting for those revisions and are surprised that they don't remove the code in question, that's why.

I mentioned complications about the merges earlier. The Mork and S/MIME import codes here moved files, so that what was db/mdb in mozilla-central became db/mork. There's no support for causing the generated splice to record these as a move, so I have to manually construct those renamings:

# We need to execute a few hg move commands due to renamings.
pushd $OUTPUT
hg update -r $(grep $MC_MORK_IMPORT .hg/shamap | cut -d' ' -f2)
(hg -R $MC log -r "children($MC_MORK_IMPORT)" \
  --template "{file_dels % 'hg mv {file} {sub(\"db/mdb\", \"db/mork\", file)}\n'}") | bash
hg commit -m 'Pseudo-changeset to move Mork files' -d '2011-08-06 17:25:21 +0200'
MC_MORK_IMPORT=$(hg log -r tip --template '{node}')

hg update -r $(grep $MC_SMIME_IMPORT .hg/shamap | cut -d' ' -f2)
(hg -R $MC log -r "children($MC_SMIME_IMPORT)" \
  --template "{file_dels % 'hg mv {file} {sub(\"security/manager/ssl\", \"mailnews/mime\", file)}\n'}") | bash
hg commit -m 'Pseudo-changeset to move S/MIME files' -d '2014-06-15 20:51:51 -0700'
MC_SMIME_IMPORT=$(hg log -r tip --template '{node}')
popd

# Echo the new move commands to the changeset conversion map.
cat >>$WORKDIR/convert-revmap.txt <<EOF
52efa9789800829c6f0ee6a005f83ed45a250396 abfd23d7c5042bc87502506c9f34c965fb9a09d1, $MC_MORK_IMPORT # Mork
50f5b5fc3f53c680dba4f237856e530e2097adfd 97253b3cca68f1c287eb5729647ba6f9a5dab08a, $MC_SMIME_IMPORT # S/MIME
EOF

Now that we have all of the graft points defined, and all of the external code ready, we can pull comm-central and do the conversion. That's not quite it, though—when we graft the S/MIME history to the original mozilla-central history, we have a small segment of abandoned converted history. A call to hg strip removes that.

# Now, import comm-central revisions that we need
hg convert $CC $OUTPUT --splicemap $WORKDIR/convert-revmap.txt
hg strip 2f69e0a3a05a

[1] I left out one of the graft points because I just didn't want to deal with it. I'll leave it as an exercise to the reader to figure out which one it was. Hint: it's the only one I didn't know about before I searched for the archive points [2].
[2] Since I wasn't sure I knew all of the graft points, I decided to try to comb through all of the changesets to figure out who imported code. It turns out that hg log -r 'adds("**")' narrows it down nicely (1667 changesets to look at instead of 17547), and using the {file_adds} template helps winnow it down more easily.

January 10, 2015 05:55 PM

November 30, 2014

Benjamin Smedberg (bsmedberg)

An Invitation

I’d like to invite my blog readers and Mozilla coworkers to Jesus Christ.

For most Christians, today marks the beginning of Advent, the season of preparation before Christmas. Not only is this a time for personal preparation and prayer while remembering the first coming of Christ as a child, but also a time to prepare the entire world for Christ’s second coming. Christians invite their friends, coworkers, and neighbors to experience Christ’s love and saving power.

I began my journey to Christ through music and choirs. Through these I discovered beauty in the teachings of Christ. There is a unique beauty that comes from combining faith and reason: belief in Christ does not require superstition nor ignorance of history or science. Rather, belief in Christ’s teachings brought me to a wholeness of understanding the truth in all it’s forms, and our own place within it.

Although Jesus is known to Christians as priest, prophet, and king, I have a special and personal devotion to Jesus as king of heaven and earth. The feast of Christ the King at the end of the church year is my personal favorite, and it is a particular focus when I perform and composing music for the Church. I discovered this passion during college; every time I tried to plan my own life, I ended up in confusion or failure, while every time I handed my life over to Christ, I ended up being successful. My friends even got me a rubber stamp which said “How to make God laugh: tell him your plans!” This understanding of Jesus as ruler of my life has led to a profound trust in divine providence and personal guidance in my life. It even led to my becoming involved with Mozilla and eventually becoming a Mozilla employee: I was a church organist and switching careers to become a computer programmer was a leap of faith, given my lack of education.

Making a religious invitation to coworkers and friends at Mozilla is difficult. We spend our time and build our deepest relationships in a setting of on email, video, and online chat, where off-topic discussions are typically out of place. I want to share my experience of Christ with those who may be interested, but I don’t want to offend or upset those who aren’t.

This year, however, presents me with a unique opportunity. Most Mozilla employees will be together for a shared planning week. If you will be there, please feel free to find me during our down time and ask me about my experience of Christ. If you aren’t at the work week, but you still want to talk, I will try to make that work as well! Email me.

1. On Jordan’s bank, the Baptist’s cry
Announces that the Lord is nigh;
Awake, and hearken, for he brings
Glad tidings of the King of kings!

2. Then cleansed be every breast from sin;
Make straight the way for God within;
Prepare we in our hearts a home
Where such a mighty Guest may come.

3. For Thou art our Salvation, Lord,
Our Refuge, and our great Reward.
Without Thy grace we waste away,
Like flowers that wither and decay.

4. To heal the sick stretch out Thine hand,
And bid the fallen sinner stand;
Shine forth, and let Thy light restore
Earth’s own true lovliness once more.

5. Stretch forth thine hand, to heal our sore,
And make us rise to fall no more;
Once more upon thy people shine,
And fill the world with love divine.3

6. All praise, eternal Son, to Thee
Whose advent sets Thy people free,
Whom, with the Father, we adore,
And Holy Ghost, forevermore.

—Charles Coffin, Jordanis oras prævia (1736), Translated from Latin to English by John Chandler, 1837

November 30, 2014 10:19 PM

October 22, 2014

Benjamin Smedberg (bsmedberg)

How I Do Code Reviews at Mozilla

Since I received some good feedback about my prior post, How I Hire at Mozilla, I thought I’d try to continue this is a mini-series about how I do other things at Mozilla. Next up is code review.

Even though I have found new module owners for some of the code I own, I still end up doing 8-12 review/feedback cycles per week. Reviews are only as good as the time you spend on them: I approach reviews in a fairly systematic way.

When I load a patch for review, I don’t read it top-to-bottom. I also try to avoid reading the bug report: a code change should be able to explain itself either directly in the code or in the code commit message which is part of the patch. If bugzilla comments are required to understand a patch, those comments should probably be part of the commit message itself. Instead, I try to understand the patch by unwrapping it from the big picture into the small details:

The Commit Message

Read the Specification

If there is an external specification that this change should conform to, I will read it or the appropriate sections of it. In the following steps of the review, I try to relate the changes to the specification.

Documentation

If there is in-tree documentation for a feature, it should be kept up to date by patches. Some changes, such as Firefox data collection, must be documented. I encourage anyone writing Mozilla-specific features and APIs to document them primarily with in-tree docs, and not on developer.mozilla.org. In-tree docs are much more likely to remain correct and be updated over time.

API Review

APIs define the interaction between units of Mozilla code. A well-designed API that strikes the right balance between simplicity and power is a key component of software engineering.

In Mozilla code, APIs can come in many forms: IDL, IPDL, .webidl, C++ headers, XBL bindings, and JS can all contain APIs. Sometimes even C++ files can contain an API; for example Mozilla has an mostly-unfortunate pattern of using the global observer service as an API surface between disconnected code.

In the first pass I try to avoid reviewing the implementation of an API. I’m focused on the API itself and its associated doccomments. The design of the system and the interaction between systems should be clear from the API docs. Error handling should be clear. If it’s not perfectly obvious, the threading, asynchronous behavior, or other state-machine aspects of an API should be carefully documented.

During this phase, it is often necessary to read the surrounding code to understand the system. None of our existing tools are very good at this, so I often have several MXR tabs open while reading a patch. Hopefully future review-board integration will make this better!

Brainstorm Design Issues

In my experience, the design review is the hardest phase of a review, the part which requires the most experience and creativity, and provides the most value.

Testing Review

I try to review the tests before I review the implementation.

Code Review

The code review is the least interesting part of the review. At this point I’m going through the patch line by line.

Re-read the Specification

If there is a specification, I’ll briefly re-read it to make sure that it was covered by the code I just finished reading.

Mechanics

Currently, I primarily do reviews in the bugzilla “edit” interface, with the “edit attachment as comment” option. Splinter is confusing and useless to me, and review-board doesn’t seem to be ready for prime-time.

For long or complex reviews, I will sometimes copy and quote the patch in emacs and paste or attach it to bugzilla when I’m finished.

In some cases I will cut off a review after one of the earlier phases: if I have questions about the general approach, the design, or the API surface, I will often try to clarify those questions before proceeding with the rest of the review.

There’s an interesting thread in mozilla.dev.planning about whether it is discouraging to new contributors to mark “review-” on a patch, and whether there are less-painful ways of indicating that a patch needs work without making them feel discouraged. My current practice is to mark r- in all cases where a patch needs to be revised, but to thank contributors for their effort so that they are still appreciated and to be as specific as possible about required changes while avoiding any words that could be perceived as an insult.

If I haven’t worked with a coder (paid or volunteer) in the past, I will typically always ask them to submit an updated patch with any changes for re-review. This allows me to make sure that the changes were completed properly and didn’t introduce any new problems. After I gain some experience, I will often trust people to make necessary changes and simply mark “r+ with review comments fixed”.

October 22, 2014 04:00 PM

October 02, 2014

Benjamin Smedberg (bsmedberg)

How I Hire at Mozilla

As a manager, one of my most important responsibilities is hiring. While I’m hiring, I spend a lot of time sifting through resumés, screening candidates, and eventually doing interviews. This is both fun and depressing at the same time, because you get to meet and learn a lot about some interesting people, but you also have to wade through a lot of terrible resumés and phone screens. I want to share some things that I look for during the hiring process, and some tips for potential job-seekers about how to deal with recruiting:

Read the Job Description

Please read the job description before applying for a job! I put a lot of effort into writing a job description, and I try very hard to describe both the job responsibilities and the necessary and desirable skills. Your cover letter should show some evidence that you’ve thought about applying for this particular job.

Write a Good Cover Letter

Periodically, I see articles advising job-seekers to ditch the cover letter. This is terrible advice. I read cover letters carefully. In fact, the cover letter sometimes gives me a better sense for your skill level and ability to do the job than your resumé.

Grammar and Spelling Matter

Every job I’ve ever posted has required good communication skills, and your cover letter is the first evidence of your communication skills. It doesn’t need to be more than a paragraph or two; I’d love to know why you think that this job is a good fit, and some evidence that you read the job description. Spelling and grammar are important.

I’m Picky

The last time I posted a job position, I screened more than 800 resumés, did almost 100 phone screens, and did five interviews before I found the right person. It took six months. It’s better for Mozilla if I’m too picky and reject a qualified candidate than if I’m not picky enough and accept a bad candidate. Please don’t take rejection as a comment on your personal worth. I’m going to reject many people during this process.

Smart and Gets Things Done

Joel Spolsky is right: I’m primarily looking for somebody who is smart and gets things done. When I’m scanning through resumés, I’m looking for evidence that you’ve gotten things done in the past. If you’re just coming out of school, internship and open-source project experience helps. If you’ve been working, I want to see some description of the things you’ve gotten done. Be specific: “Led multiple projects to completion” is meaningless. You can expect phone-screen questions about your prior projects. Be prepared to talk about what worked, what didn’t, and how you solved problems.

No Assholes

It may seem obvious, but assholes need not apply. I will reject a technically-qualified candidate for even a whiff of assholery. I use reference checks to try and guard against assholes.

Passion Isn’t Sufficient

At Mozilla we get a lot of applicants who are passionate, either about the open web in general or about Firefox. Passion is great, perhaps even necessary, but it’s not sufficient.

Interview Questions Are Based on Your Resumé

In phone screens and interview panels, I try very hard to base my questions on the things that you already know. If your resumé says that you are a master of C++, you should expect that there is at least one person on the interview panel who will grill you on C++ in excruciating detail. If you say that you know statistics, you had better be able to answer detailed questions about statistics.

Don’t overstate your skills. If you have written a couple scripts in Python, you are familiar with Python, not proficient. More than once I’ve rejected people because they claimed to be a master of C++ debugging but didn’t know how a vtable is structured in memory. Knowing vtable layout isn’t usually a job prerequisite, but if you are a master of C++ debugging you’d better know how that works in detail.

If you claim to be a master of both Python and JavaScript, expect me to ask you detailed questions about how Python and JS closures and methods work, how they are different in JS and python, how JS this-binding is different from Python bound methods, and other details of the language. I will be impressed if you can discuss these questions intelligently, and reject your application if you can’t.

I Value Code Reading

I value people who can learn new code quickly. You’re going to need to be comfortable learning new systems, libraries, and languages. My team in particular often touches large swaths of the Mozilla codebase; you will be expected to be able to read and understand new code quickly. You can expect an interview session entirely dedicated to reading and explaining a piece of code that you’ve never seen before. Perhaps it will be reviewing a patch, or trying to find a bug in a piece of code. I’ll try to find a problem in a language that you are already comfortable with, so see above about keeping your resumé honest!

Do You Love Your Tools?

Every good programmer has a toolbox. I don’t care whether you use vim or emacs or Visual Studio or Eclipse or XCode, but I do care that you have an editor and use it productively. I also care that you are proficient using a debugger (or two, or three) and hopefully a profiler (or two, or three). If you can’t tell me why you love your editor or your debugger, it’s likely that you won’t be a successful software engineer.

You need to be proficient in a scripting language. I expect you to be able to use at least one scripting language to process text data, read CSV files, write JSON, and that kind of thing. Mozilla has gravitated toward Python for scripting, and you’ll probably be expected to learn Python if you don’t know it already.

Also, can you type? I’m constantly surprised by applicants who are unable to type quickly and accurately. A significant portion of your job is going to be writing code, and not having mastered the act of typing is probably not a good sign.

Phone Screens

When I conduct a phone-screen, it is usually over Skype video. Please make sure that you have a decent headset and that I will be able to hear you. During a phone screen, I try to include at least one coding question which is typically conducted via etherpad. You should be at a computer with access to the web in order to do this successfully.

By the way, did I mention I’m hiring?

October 02, 2014 08:53 PM

Chris Tyler (ctyler)

You'd be crazy to miss FSOSS 2014


The Free Software and Open Source Symposium (FSOSS) 2014 is around the corner, and it's shaping up to be the best in years. We have well over 30 talks spread over 2 days, covering just about every corner of open source from new and upcoming technologies through business models. We have a keynote from my colleague David Humphrey examining the implications of Heartbleed, as well as keynotes from Chris Aniszczyk (Twitter) and Bob Young (Lulu/Red Hat/TiCats). There are speakers from Canada, the US, Hungary, the UK, Cuba, and India, representing open source communities, academia, entrepreneurs, startups, and companies such as Mozilla, Cisco, AMD, Red Hat, and Rackspace.

Until October 10, registration for this event is just $40 (or, for students and faculty of any school, $20), which includes access to all of the keynotes, talks, and workshops, two lunches, a wine/beer/soft drink reception, a t-shirt, and swag.

Full details can be found at fsoss.ca -- see you October 23/24!





October 02, 2014 03:31 PM

August 06, 2014

Joshua Cranmer (jcranmer)

Why email is hard, part 7: email security and trust

This post is part 7 of an intermittent series exploring the difficulties of writing an email client. Part 1 describes a brief history of the infrastructure. Part 2 discusses internationalization. Part 3 discusses MIME. Part 4 discusses email addresses. Part 5 discusses the more general problem of email headers. Part 6 discusses how email security works in practice. This part discusses the problem of trust.

At a technical level, S/MIME and PGP (or at least PGP/MIME) use cryptography essentially identically. Yet the two are treated as radically different models of email security because they diverge on the most important question of public key cryptography: how do you trust the identity of a public key? Trust is critical, as it is the only way to stop an active, man-in-the-middle (MITM) attack. MITM attacks are actually easier to pull off in email, since all email messages effectively have to pass through both the sender's and the recipients' email servers [1], allowing attackers to be able to pull off permanent, long-lasting MITM attacks [2].

S/MIME uses the same trust model that SSL uses, based on X.509 certificates and certificate authorities. X.509 certificates effectively work by providing a certificate that says who you are which is signed by another authority. In the original concept (as you might guess from the name "X.509"), the trusted authority was your telecom provider, and the certificates were furthermore intended to be a part of the global X.500 directory—a natural extension of the OSI internet model. The OSI model of the internet never gained traction, and the trusted telecom providers were replaced with trusted root CAs.

PGP, by contrast, uses a trust model that's generally known as the Web of Trust. Every user has a PGP key (containing their identity and their public key), and users can sign others' public keys. Trust generally flows from these signatures: if you trust a user, you know the keys that they sign are correct. The name "Web of Trust" comes from the vision that trust flows along the paths of signatures, building a tight web of trust.

And now for the controversial part of the post, the comparisons and critiques of these trust models. A disclaimer: I am not a security expert, although I am a programmer who revels in dreaming up arcane edge cases. I also don't use PGP at all, and use S/MIME to a very limited extent for some Mozilla work [3], although I did try a few abortive attempts to dogfood it in the past. I've attempted to replace personal experience with comprehensive research [4], but most existing critiques and comparisons of these two trust models are about 10-15 years old and predate several changes to CA certificate practices.

A basic tenet of development that I have found is that the average user is fairly ignorant. At the same time, a lot of the defense of trust models, both CAs and Web of Trust, tends to hinge on configurability. How many people, for example, know how to add or remove a CA root from Firefox, Windows, or Android? Even among the subgroup of Mozilla developers, I suspect the number of people who know how to do so are rather few. Or in the case of PGP, how many people know how to change the maximum path length? Or even understand the security implications of doing so?

Seen in the light of ignorant users, the Web of Trust is a UX disaster. Its entire security model is predicated on having users precisely specify how much they trust other people to trust others (ultimate, full, marginal, none, unknown) and also on having them continually do out-of-band verification procedures and publicly reporting those steps. In 1998, a seminal paper on the usability of a GUI for PGP encryption came to the conclusion that the UI was effectively unusable for users, to the point that only a third of the users were able to send an encrypted email (and even then, only with significant help from the test administrators), and a quarter managed to publicly announce their private keys at some point, which is pretty much the worst thing you can do. They also noted that the complex trust UI was never used by participants, although the failure of many users to get that far makes generalization dangerous [5]. While newer versions of security UI have undoubtedly fixed many of the original issues found (in no small part due to the paper, one of the first to argue that usability is integral, not orthogonal, to security), I have yet to find an actual study on the usability of the trust model itself.

The Web of Trust has other faults. The notion of "marginal" trust it turns out is rather broken: if you marginally trust a user who has two keys who both signed another person's key, that's the same as fully trusting a user with one key who signed that key. There are several proposals for different trust formulas [6], but none of them have caught on in practice to my knowledge.

A hidden fault is associated with its manner of presentation: in sharp contrast to CAs, the Web of Trust appears to not delegate trust, but any practical widespread deployment needs to solve the problem of contacting people who have had no prior contact. Combined with the need to bootstrap new users, this implies that there needs to be some keys that have signed a lot of other keys that are essentially default-trusted—in other words, a CA, a fact sometimes lost on advocates of the Web of Trust.

That said, a valid point in favor of the Web of Trust is that it more easily allows people to distrust CAs if they wish to. While I'm skeptical of its utility to a broader audience, the ability to do so for is crucial for a not-insignificant portion of the population, and it's important enough to be explicitly called out.

X.509 certificates are most commonly discussed in the context of SSL/TLS connections, so I'll discuss them in that context as well, as the implications for S/MIME are mostly the same. Almost all criticism of this trust model essentially boils down to a single complaint: certificate authorities aren't trustworthy. A historical criticism is that the addition of CAs to the main root trust stores was ad-hoc. Since then, however, the main oligopoly of these root stores (Microsoft, Apple, Google, and Mozilla) have made their policies public and clear [7]. The introduction of the CA/Browser Forum in 2005, with a collection of major CAs and the major browsers as members, and several [8] helps in articulating common policies. These policies, simplified immensely, boil down to:

  1. You must verify information (depending on certificate type). This information must be relatively recent.
  2. You must not use weak algorithms in your certificates (e.g., no MD5).
  3. You must not make certificates that are valid for too long.
  4. You must maintain revocation checking services.
  5. You must have fairly stringent physical and digital security practices and intrusion detection mechanisms.
  6. You must be [externally] audited every year that you follow the above rules.
  7. If you screw up, we can kick you out.

I'm not going to claim that this is necessarily the best policy or even that any policy can feasibly stop intrusions from happening. But it's a policy, so CAs must abide by some set of rules.

Another CA criticism is the fear that they may be suborned by national government spy agencies. I find this claim underwhelming, considering that the number of certificates acquired by intrusions that were used in the wild is larger than the number of certificates acquired by national governments that were used in the wild: 1 and 0, respectively. Yet no one complains about the untrustworthiness of CAs due to their ability to be hacked by outsiders. Another attack is that CAs are controlled by profit-seeking corporations, which misses the point because the business of CAs is not selling certificates but selling their access to the root databases. As we will see shortly, jeopardizing that access is a great way for a CA to go out of business.

To understand issues involving CAs in greater detail, there are two CAs that are particularly useful to look at. The first is CACert. CACert is favored by many by its attempt to handle X.509 certificates in a Web of Trust model, so invariably every public discussion about CACert ends up devolving into an attack on other CAs for their perceived capture by national governments or corporate interests. Yet what many of the proponents for inclusion of CACert miss (or dismiss) is the fact that CACert actually failed the required audit, and it is unlikely to ever pass an audit. This shows a central failure of both CAs and Web of Trust: different people have different definitions of "trust," and in the case of CACert, some people are favoring a subjective definition (I trust their owners because they're not evil) when an objective definition fails (in this case, that the root signing key is securely kept).

The other CA of note here is DigiNotar. In July 2011, some hackers managed to acquire a few fraudulent certificates by hacking into DigiNotar's systems. By late August, people had become aware of these certificates being used in practice [9] to intercept communications, mostly in Iran. The use appears to have been caught after Chromium updates failed due to invalid certificate fingerprints. After it became clear that the fraudulent certificates were not limited to a single fake Google certificate, and that DigiNotar had failed to notify potentially affected companies of its breach, DigiNotar was swiftly removed from all of the trust databases. It ended up declaring bankruptcy within two weeks.

DigiNotar indicates several things. One, SSL MITM attacks are not theoretical (I have seen at least two or three security experts advising pre-DigiNotar that SSL MITM attacks are "theoretical" and therefore the wrong target for security mechanisms). Two, keeping the trust of browsers is necessary for commercial operation of CAs. Three, the notion that a CA is "too big to fail" is false: DigiNotar played an important role in the Dutch community as a major CA and the operator of Staat der Nederlanden. Yet when DigiNotar screwed up and lost its trust, it was swiftly kicked out despite this role. I suspect that even Verisign could be kicked out if it manages to screw up badly enough.

This isn't to say that the CA model isn't problematic. But the source of its problems is that delegating trust isn't a feasible model in the first place, a problem that it shares with the Web of Trust as well. Different notions of what "trust" actually means and the uncertainty that gets introduced as chains of trust get longer both make delegating trust weak to both social engineering and technical engineering attacks. There appears to be an increasing consensus that the best way forward is some variant of key pinning, much akin to how SSH works: once you know someone's public key, you complain if that public key appears to change, even if it appears to be "trusted." This does leave people open to attacks on first use, and the question of what to do when you need to legitimately re-key is not easy to solve.

In short, both CAs and the Web of Trust have issues. Whether or not you should prefer S/MIME or PGP ultimately comes down to the very conscious question of how you want to deal with trust—a question without a clear, obvious answer. If I appear to be painting CAs and S/MIME in a positive light and the Web of Trust and PGP in a negative one in this post, it is more because I am trying to focus on the positions less commonly taken to balance perspective on the internet. In my next post, I'll round out the discussion on email security by explaining why email security has seen poor uptake and answering the question as to which email security protocol is most popular. The answer may surprise you!

[1] Strictly speaking, you can bypass the sender's SMTP server. In practice, this is considered a hole in the SMTP system that email providers are trying to plug.
[2] I've had 13 different connections to the internet in the same time as I've had my main email address, not counting all the public wifis that I have used. Whereas an attacker would find it extraordinarily difficult to intercept all of my SSH sessions for a MITM attack, intercepting all of my email sessions is clearly far easier if the attacker were my email provider.
[3] Before you read too much into this personal choice of S/MIME over PGP, it's entirely motivated by a simple concern: S/MIME is built into Thunderbird; PGP is not. As someone who does a lot of Thunderbird development work that could easily break the Enigmail extension locally, needing to use an extension would be disruptive to workflow.
[4] This is not to say that I don't heavily research many of my other posts, but I did go so far for this one as to actually start going through a lot of published journals in an attempt to find information.
[5] It's questionable how well the usability of a trust model UI can be measured in a lab setting, since the observer effect is particularly strong for all metrics of trust.
[6] The web of trust makes a nice graph, and graphs invite lots of interesting mathematical metrics. I've always been partial to eigenvectors of the graph, myself.
[7] Mozilla's policy for addition to NSS is basically the standard policy adopted by all open-source Linux or BSD distributions, seeing as OpenSSL never attempted to produce a root database.
[8] It looks to me that it's the browsers who are more in charge in this forum than the CAs.
[9] To my knowledge, this is the first—and so far only—attempt to actively MITM an SSL connection.

August 06, 2014 03:39 AM

July 25, 2014

Mark Finkle (mfinkle)

Firefox for Android: Collecting and Using Telemetry

Firefox 31 for Android is the first release where we collect telemetry data on user interactions. We created a simple “event” and “session” system, built on top of the current telemetry system that has been shipping in Firefox for many releases. The existing telemetry system is focused more on the platform features and tracking how various components are behaving in the wild. The new system is really focused on how people are interacting with the application itself.

Collecting Data

The basic system consists of two types of telemetry probes:

We add the probes into any part of the application that we want to study, which is most of the application.

Visualizing Data

The raw telemetry data is processed into summaries, one for Events and one for Sessions. In order to visualize the telemetry data, we created a simple dashboard (source code). It’s built using a great little library called PivotTable.js, which makes it easy to slice and dice the summary data. The dashboard has several predefined tables so you can start digging into various aspects of the data quickly. You can drag and drop the fields into the column or row headers to reorganize the table. You can also add filters to any of the fields, even those not used in the row/column headers. It’s a pretty slick library.

uitelemetry-screenshot-crop

Acting on Data

Now that we are collecting and studying the data, the goal is to find patterns that are unexpected or might warrant a closer inspection. Here are a few of the discoveries:

Page Reload: Even in our Nightly channel, people seem to be reloading the page quite a bit. Way more than we expected. It’s one of the Top 2 actions. Our current thinking includes several possibilities:

  1. Page gets stuck during a load and a Reload gets it going again
  2. Networking error of some kind, with a “Try again” button on the page. If the button does not solve the problem, a Reload might be attempted.
  3. Weather or some other update-able page where a Reload show the current information.

We have started projects to explore the first two issues. The third issue might be fine as-is, or maybe we could add a feature to make updating pages easier? You can still see high uses of Reload (reload) on the dashboard.

Remove from Home Pages: The History, primarily, and Top Sites pages see high uses of Remove (home_remove) to delete browsing information from the Home pages. People do this a lot, again it’s one of the Top 2 actions. People will do this repeatably, over and over as well, clearing the entire list in a manual fashion. Firefox has a Clear History feature, but it must not be very discoverable. We also see people asking for easier ways of clearing history in our feedback too, but it wasn’t until we saw the telemetry data for us to understand how badly this was needed. This led us to add some features:

  1. Since the History page was the predominant source of the Removes, we added a Clear History button right on the page itself.
  2. We added a way to Clear History when quitting the application. This was a bit tricky since Android doesn’t really promote “Quitting” applications, but if a person wants to enable this feature, we add a Quit menu item to make the action explicit and in their control.
  3. With so many people wanting to clear their browsing history, we assumed they didn’t know that Private Browsing existed. No history is saved when using Private Browsing, so we’re adding some contextual hinting about the feature.

These features are included in Nightly and Aurora versions of Firefox. Telemetry is showing a marked decrease in Remove usage, which is great. We hope to see the trend continue into Beta next week.

External URLs: People open a lot of URLs from external applications, like Twitter, into Firefox. This wasn’t totally unexpected, it’s a common pattern on Android, but the degree to which it happened versus opening the browser directly was somewhat unexpected. Close to 50% of the URLs loaded into Firefox are from external applications. Less so in Nightly, Aurora and Beta, but even those channels are almost 30%. We have started looking into ideas for making the process of opening URLs into Firefox a better experience.

Saving Images: An unexpected discovery was how often people save images from web content (web_save_image). We haven’t spent much time considering this one. We think we are doing the “right thing” with the images as far as Android conventions are concerned, but there might be new features waiting to be implemented here as well.

Take a look at the data. What patterns do you see?

Here is the obligatory UI heatmap, also available from the dashboard:
uitelemetry-heatmap

July 25, 2014 03:08 AM

July 07, 2014

Blake Winton (bwinton)

Figuring out where things are in an image.

People love heatmaps.

They’re a great way to show how much various UI elements are used in relation to each other, and are much easier to read at a glance than a table of click- counts would be. They can also reveal hidden patterns of usage based on the locations of elements, let us know if we’re focusing our efforts on the correct elements, and tell us how effective our communication about new features is. Because they’re so useful, one of the things I am doing in my new role is setting up the framework to provide our UX team with automatically updating heatmaps for both Desktop and Android Firefox.

Unfortunately, we can’t just wave our wands and have a heatmap magically appear. Creating them takes work, and one of the most tedious processes is figuring out where each element starts and stops. Even worse, we need to repeat the process for each platform we’re planning on displaying. This is one of the primary reasons we haven’t run a heatmap study since 2012.

In order to not spend all my time generating the heatmaps, I had to reduce the effort involved in producing these visualizations.

Being a programmer, my first inclination was to write a program to calculate them, and that sort of worked for the first version of the heatmap, but there were some difficulties. To collect locations for all the elements, we had to display all the elements.

Firefox in the process of being customized

Customize mode (as shown above) was an obvious choice since it shows everything you could click on almost by definition, but it led people to think that we weren’t showing which elements were being clicked the most, but instead which elements people customized the most. So that was out.

Next we tried putting everything in the toolbar, or the menu, but those were a little too cluttered even without leaving room for labels, and too wide (or too tall, in the case of the menu).

A shockingly busy toolbar

Similarly, I couldn’t fit everything into the menu panel either. The only solution was to resort to some Photoshop-trickery to fit all the buttons in, but that ended up breaking the script I was using to locate the various elements in the UI.

A surprisingly tall menu panel

Since I couldn’t automatically figure out where everything was, I figured we might as well use a nicely-laid out, partially generated image, and calculate the positions (mostly-)manually.

The current version of the heatmap (Note: This is not the real data.)

I had foreseen the need for different positions for the widgets when the project started, and so I put the widget locations in their own file from the start. This meant that I could update them without changing the code, which made it a little nicer to see what’s changed between versions, but still required me to reload the whole page every time I changed a position or size, which would just have taken way too long. I needed something that could give me much more immediate feedback.

Fortunately, I had recently finished watching a series of videos from Ian Johnson (@enjalot on twitter) where he used a tool he made called Tributary to do some rapid prototyping of data visualization code. It seemed like a good fit for the quick moving around of elements I was trying to do, and so I copied a bunch of the code and data in, and got to work moving things around.

I did encounter a few problems: Tributary wasn’t functional in Firefox Nightly (but I could use Chrome as a workaround) and occasionally sometimes trying to move the cursor would change the value slider instead. Even with these glitches it only took me an hour or two to get from set-up to having all the numbers for the final result! And the best part is that since it's all open source, you can take a look at the final result, or fork it yourself!

July 07, 2014 03:53 PM

June 11, 2014

Guillermo López (willyaranda)

La travesía por el desierto del balonmano español

Y lo que es peor, sin final del camino aparente…

La larga lista de desgracias del balonmano español continúa hoy con la desaparición del histórico BM. Valladolid, después del Portland San Antonio, del Ciudad Real – Atlético de Madrid o el Teka hace unos años, incluyendo el descenso a los infiernos del Bidasoa, un grande en la década de los 90.

La desaparición del BM. Valladolid, el primer equipo de élite que fui a ver, allá por mediados de los 2000, con partidos históricos (y desgraciados), como aquella semifinal contra el Flensburg (recordemos, campeón de la Final Four de este año, frente a equipazos como el Barça, Kiel o Veszprem) es un puntito más (o la puntilla) en la penosa gestión que se ha llevado en este país con el balonmano en el último lustro.

Una gestión realmente mala, viviendo en la opulencia, pensando que nada les podía bajar de sus salarios, de sus fichajes, de su egocentrismo, de su “estamos en ASOBAL”, de sus deudas, de jugadores a los que sabían que no podían pagar, esperando que todo se solucionara con un “patapúm parriba” y “que los demás se coman la mierda”.

Así que aquí estamos, 6 años después, en esta trágica situación. El Flensburg es campeón de Europa (merecido, ganando al Barça y luego al Kiel en la final) y el BM Valladolid, estandarte del balonmano, de la cantera, de Castilla y León como el Ademar (otro cuya vida pende de un hilo) muerto, kaput. Atrás quedan esos días donde un grupo de colegas de Aranda, entre los que me encontraba, jugaba contra ellos.

Mientras, los mejores jugadores españoles se van fuera. Como por ejemplo los ganadores del mundial de 2013, con una humillación impropia de dos grandísimos equipos, contra Dinamarca (un apabullante 35-19, en el que muchos disfrutamos como aquella final contra Croacia en Túnez en 2005), donde de todos los jugadores #hispanos participantes, sólo 5 quedan en la ASOBAL. Y por supuesto en el Barça (Víctor Tomás, Ariño, Sarmiento, Strbik, que se va este año, y Viran).

Y así llegamos a la ASOBAL de estos dos últimos años, con un nivel tan bajo que permite a equipos que deberían estar en una “tercera división” (Primera división, en la nomenclatura balonmanística) el competir en “la élite”, dando sorpresas a grandes equipos de siempre, en una liga extremadamente competida e impredecible (excepto por el primer puesto, el Barça, con tipos cobrando 3 veces más que todo el presupuesto anual de equipos pequeños), pero con una facilidad sobrecogedora para que los equipos (incluso históricos) mueran.

No podemos, ni debemos, hacer que el balonmano caiga, y eso obliga a que Federación, ASOBAL y muchos dirigentes de los equipos se sienten, y reflexionen a dónde quieren que vaya el deporte al que aman (si es que aún lo aman).

Larga vida al balonmano.

Hispanos

La entrada La travesía por el desierto del balonmano español aparece primero en Pijus Magnificus.

June 11, 2014 11:14 PM

June 06, 2014

Mark Finkle (mfinkle)

Firefox for Android: Casting videos and Roku support – Ready to test in Nightly

Firefox for Android Nightly builds now support casting HTML5 videos from a web page to a TV via a connected Roku streaming player. Using the system is simple, but it does require you to install a viewer application on your Roku device. Firefox support for the Roku viewer and the viewer itself are both currently pre-release. We’re excited to invite our Nightly channel users to help us test these new features, share feedback and file any bugs so we can continue to make improvements to performance and functionality.

Setup

To begin testing, first you’ll need to install the viewer application to your Roku. The viewer app, called Firefox for Roku Nightly, is currently a private channel. You can install it via this link: Firefox Nightly

Once installed, try loading this test page into your Firefox for Android Nightly browser: Casting Test

When Firefox has discovered your Roku, you should see the Media Control Bar with Cast and Play icons:

casting-onload

The Cast icon on the left of the video controls allows you to send the video to a device. You can also long-tap on the video to get the context menu, and cast from there too.

Hint: Make sure Firefox and the Roku are on the same Wifi network!

Once you have sent a video to a device, Firefox will display the Media Control Bar in the bottom of the application. This allows you to pause, play and close the video. You don’t need to stay on the original web page either. The Media Control Bar will stay visible as long as the video is playing, even as you change tabs or visit new web pages.

fennec-casting-pageaction-active

You’ll notice that Firefox displays an “active casting” indicator in the URL Bar when a video on the current web page is being cast to a device.

Limitations and Troubleshooting

Firefox currently limits casting HTML5 video in H264 format. This is one of the formats most easily handled by Roku streaming players. We are working on other formats too.

Some web sites hide or customize the HTML5 video controls and some override the long-tap menu too. This can make starting to cast difficult, but the simple fallback is to start playing the video in the web page. If the video is H264 and Firefox can find your Roku, a “ready to cast” indicator will appear in the URL Bar. Just tap on that to start casting the video to your Roku.

If Firefox does not display the casting icons, it might be having a problem discovering your Roku on the network. Make sure your Android device and the Roku are on the same Wifi network. You can load about:devices into Firefox to see what devices Firefox has discovered.

This is a pre-release of video casting support. We need your help to test the system. Please remember to share your feedback and file any bugs. Happy testing!

June 06, 2014 03:45 PM

May 31, 2014

Mark Finkle (mfinkle)

Firefox for Android: Your Feedback Matters!

Millions of people use Firefox for Android every day. It’s amazing to work on a product used by so many people. Unsurprisingly, some of those people send us feedback. We even have a simple system built into the application to make it easy to do. We have various systems to scan the feedback and look for trends. Sometimes, we even manually dig through the feedback for a given day. It takes time. There is a lot.

Your feedback is important and I thought I’d point out a few recent features and fixes that were directly influenced from feedback:

Help Menu
Some people have a hard time discovering features or were not aware Firefox supported some of the features they wanted. To make it easier to learn more about Firefox, we added a simple Help menu which directs you to SUMO, our online support system.

Managing Home Panels
Not everyone loves the Firefox Homepage (I do!), or more specifically, they don’t like some of the panels. We added a simple way for people to control the panels shown in Firefox’s Homepage. You can change the default panel. You can even hide all the panels. Use Settings > Customize > Home to get there.

Home panels

Improve Top Sites
The Top Sites panel in the Homepage is used by many people. At the same time, other people find that the thumbnails can reveal a bit too much of their browsing to others. We recently added support for respecting sites that might not want to be snapshot into thumbnails. In those cases, the thumbnail is replaced with a favicon and a favicon-influenced background color. The Facebook and Twitter thumbnails show the effect below:

fennec-private-thumbnails

We also added the ability to remove thumbnails using the long-tap menu.

Manage Search Engines
People also like to be able to manage their search engines. They like to switch the default. They like to hide some of the built-in engines. They like to add new engines. We have a simple system for managing search engines. Use Settings > Customize > Search to get there.

fennec-search-mgr

Clear History
We have a lot of feedback from people who want to clear their browsing history quickly and easily. We are not sure if the Settings > Privacy > Clear private data method is too hard to find or too time consuming to use, but it’s apparent people need other methods. We added a quick access method at the bottom of the History panel in the Homepage.

clear-history

We are also working on a Clear data on exit approach too.

Quickly Switch to a Newly Opened Tab
When you long-tap on a link in a webpage, you get a menu that allows you to Open in New Tab or Open in New Private Tab. Both of those open the new tab in the background. Feedback indicates the some people really want to switch to the new tab. We already show an Android toast to let you know the tab was opened. Now we add a button to the toast allowing you to quickly switch to the tab too.

switch-to-tab

Undo Closing a Tab
Closing tabs can be awkward for people. Sometimes the [x] is too easy to hit by mistake or swiping to close is unexpected. In any case, we added the ability to undo closing a tab. Again, we use a button toast.

undo-close-tab

Offer to Setup Sync from Tabs Tray
We feel that syncing your desktop and mobile browsing data makes browsing on mobile devices much easier. Figuring out how to setup the Sync feature in Firefox might not be obvious. We added a simple banner to the Homepage to let you know the feature exists. We also added a setup entry point in the Sync area of the Tabs Tray.

fennec-setup-sync

We’ll continue to make changes based on your feedback, so keep sending it to us. Thanks for using Firefox for Android!

May 31, 2014 03:28 AM

May 27, 2014

Joshua Cranmer (jcranmer)

Why email is hard, part 6: today's email security

This post is part 6 of an intermittent series exploring the difficulties of writing an email client. Part 1 describes a brief history of the infrastructure. Part 2 discusses internationalization. Part 3 discusses MIME. Part 4 discusses email addresses. Part 5 discusses the more general problem of email headers. This part discusses how email security works in practice.

Email security is a rather wide-ranging topic, and one that I've wanted to cover for some time, well before several recent events that have made it come up in the wider public knowledge. There is no way I can hope to cover it in a single post (I think it would outpace even the length of my internationalization discussion), and there are definitely parts for which I am underqualified, as I am by no means an expert in cryptography. Instead, I will be discussing this over the course of several posts of which this is but the first; to ease up on the amount of background explanation, I will assume passing familiarity with cryptographic concepts like public keys, hash functions, as well as knowing what SSL and SSH are (though not necessarily how they work). If you don't have that knowledge, ask Wikipedia.

Before discussing how email security works, it is first necessary to ask what email security actually means. Unfortunately, the layman's interpretation is likely going to differ from the actual precise definition. Security is often treated by laymen as a boolean interpretation: something is either secure or insecure. The most prevalent model of security to people is SSL connections: these allow the establishment of a communication channel whose contents are secret to outside observers while also guaranteeing to the client the authenticity of the server. The server often then gets authenticity of the client via a more normal authentication scheme (i.e., the client sends a username and password). Thus there is, at the end, a channel that has both secrecy and authenticity [1]: channels with both of these are considered secure and channels without these are considered insecure [2].

In email, the situation becomes more difficult. Whereas an SSL connection is between a client and a server, the architecture of email is such that email providers must be considered as distinct entities from end users. In addition, messages can be sent from one person to multiple parties. Thus secure email is a more complex undertaking than just porting relevant details of SSL. There are two major cryptographic implementations of secure email [3]: S/MIME and PGP. In terms of implementation, they are basically the same [4], although PGP has an extra mode which wraps general ASCII (known as "ASCII-armor"), which I have been led to believe is less recommended these days. Since I know the S/MIME specifications better, I'll refer specifically to how S/MIME works.

S/MIME defines two main MIME types: multipart/signed, which contains the message text as a subpart followed by data indicating the cryptographic signature, and application/pkcs7-mime, which contains an encrypted MIME part. The important things to note about this delineation are that only the body data is encrypted [5], that it's theoretically possible to encrypt only part of a message's body, and that the signing and encryption constitute different steps. These factors combine to make for a potentially infuriating UI setup.

How does S/MIME tackle the challenges of encrypting email? First, rather than encrypting using recipients' public keys, the message is encrypted with a symmetric key. This symmetric key is then encrypted with each of the recipients' keys and then attached to the message. Second, by only signing or encrypting the body of the message, the transit headers are kept intact for the mail system to retain its ability to route, process, and deliver the message. The body is supposed to be prepared in the "safest" form before transit to avoid intermediate routers munging the contents. Finally, to actually ascertain what the recipients' public keys are, clients typically passively pull the information from signed emails. LDAP, unsurprisingly, contains an entry for a user's public key certificate, which could be useful in large enterprise deployments. There is also work ongoing right now to publish keys via DNS and DANE.

I mentioned before that S/MIME's use can present some interesting UI design decisions. I ended up actually testing some common email clients on how they handled S/MIME messages: Thunderbird, Apple Mail, Outlook [6], and Evolution. In my attempts to create a surreptitious signed part to confuse the UI, Outlook decided that the message had no body at all, and Thunderbird decided to ignore all indication of the existence of said part. Apple Mail managed to claim the message was signed in one of these scenarios, and Evolution took the cake by always agreeing that the message was signed [7]. It didn't even bother questioning the signature if the certificate's identity disagreed with the easily-spoofable From address. I was actually surprised by how well people did in my tests—I expected far more confusion among clients, particularly since the will to maintain S/MIME has clearly been relatively low, judging by poor support for "new" features such as triple-wrapping or header protection.

Another fault of S/MIME's design is that it makes the mistaken belief that composing a signing step and an encryption step is equivalent in strength to a simultaneous sign-and-encrypt. Another page describes this in far better detail than I have room to; note that this flaw is fixed via triple-wrapping (which has relatively poor support). This creates yet more UI burden into how to adequately describe in UI all the various minutiae in differing security guarantees. Considering that users already have a hard time even understanding that just because a message says it's from example@isp.invalid doesn't actually mean it's from example@isp.invalid, trying to develop UI that both adequately expresses the security issues and is understandable to end-users is an extreme challenge.

What we have in S/MIME (and PGP) is a system that allows for strong guarantees, if certain conditions are met, yet is also vulnerable to breaches of security if the message handling subsystems are poorly designed. Hopefully this is a sufficient guide to the technical impacts of secure email in the email world. My next post will discuss the most critical component of secure email: the trust model. After that, I will discuss why secure email has seen poor uptake and other relevant concerns on the future of email security.

[1] This is a bit of a lie: a channel that does secrecy and authentication at different times isn't as secure as one that does them at the same time.
[2] It is worth noting that authenticity is, in many respects, necessary to achieve secrecy.
[3] This, too, is a bit of a lie. More on this in a subsequent post.
[4] I'm very aware that S/MIME and PGP use radically different trust models. Trust models will be covered later.
[5] S/MIME 3.0 did add a provision stating that if the signed/encrypted part is a message/rfc822 part, the headers of that part should override the outer message's headers. However, I am not aware of a major email client that actually handles these kind of messages gracefully.
[6] Actually, I tested Windows Live Mail instead of Outlook, but given the presence of an official MIME-to-Microsoft's-internal-message-format document which seems to agree with what Windows Live Mail was doing, I figure their output would be identical.
[7] On a more careful examination after the fact, it appears that Evolution may have tried to indicate signedness on a part-by-part basis, but the UI was sufficiently confusing that ordinary users are going to be easily confused.

May 27, 2014 12:32 AM

May 24, 2014

Daniel Le Duc Khoi Nguyen (Libras2909)

New blog on GitHub!

https://greenrecyclebin.github.io

I decided to start blogging again at my new blog (URL above). I will no longer maintain this blog. Thank you everyone who has stopped by/commented. Hope to see you all at my new blog!


May 24, 2014 09:06 AM

May 06, 2014

Guillermo López (willyaranda)

Sherpa Summit 2014

Unas fotos con clase.

Unas fotos con clase.

El martes 6 de mayo de 2014 estuve en el Sherpa Summit, un evento organizado por la gente detrás de la aplicación Sher.pa (un “asistente personal”), representando a Mozilla, junto con Osoitz y Patxi, de Librezale.

En el stand, accesible por todo el mundo desde las 9:30 de la mañana hasta las 18 de la tarde, pasaron decenas de personas preguntando por los terminales Firefox OS que tenemos disponibles en nuestra “exposición”: ZTE Open C, Alcatel OneTouch, LG Fireweb, Geeksphone Keon y Geeksphone Peak. Nos faltó una tablet para mostrar a todo el mundo otro tamaño de pantalla y una homescreen un poco diferente. Pegatinas, chapas y alguna que otra calcamonía fueron entregadas a los fieles devotos del panda (zorro) rojo que se acercaron.

Stand. Firefoxes ready to rock.

A las 10:15 de la mañana tuve una charla enmarcada en el track de Apps móviles, a pesar de que quedó bastante enfocada sobre HTML5, mostré cómo crear una pequeña aplicación para Firefox OS en menos de un minuto, seleccionando la web de RTVE (¡gracias a Salva!) para móviles, la cual es muy sencilla de guardar (Control-S, como index.html y crear un minimanifest, el resto “just works”) y tunear para que se vea bien.

Sherpa 2014, you

27% of HTML5 apps using Wrappers.

Finalmente, terminamos a las 6 de la tarde, dando las gracias a la organización por dejarnos venir un año más (¡junto con la posibilidad de dar una charla!) y a Francisco Picolini por su siempre ayuda en estos eventos, aunque esta vez se quedara en Madrid organizando cosas más importantes.

Laster arte!

You.

You care about HTML5 apps.

La entrada Sherpa Summit 2014 aparece primero en Pijus Magnificus.

May 06, 2014 06:25 PM

April 06, 2014

Boris Zbarsky (bz)

Speech and consequences

I've seen the phrase "freedom of speech does not mean freedom from consequences" a lot recently. This is clearly true. However, it's also clearly true that freedom of speech does in fact mean freedom from some consequences. As a simple example, the First Amendment to the US Constitution and its associated jurisprudence is all about delineating some consequences one must be free of for speech to be considered free.

The question then becomes this: which consequences should one be free of when speaking? I am not a lawyer, and this is not a legal analysis (though some of these consequences are pretty clearly illegal in their own right, though not readily actionable if performed anonymously), but rather a moral one. I would consider at least the following consequences that involve private action only as unacceptable restraints on freedom of speech:

  1. Physical violence against the speaker or their family, friends, or associates.
  2. Threats of such physical violence. This most definitely includes death threats.
  3. Destruction of or damage to the property of the speaker or their family, friends, or associates.
  4. Harassment (bullhorns in the night, incessant phone calls, etc) of the family, friends, or associates of the speaker. I don't feel as absolutely about the speaker him/herself, because the definition of "harassment" is rather vague. While the above examples with bullhorns and phones seem morally repugnant to me as applied to the speaker, there may be other things that I consider harassment but others consider OK as a response to a speaker. There's a lot more gray area here than in items 1-3.

This is not meant to be an exhaustive list; these are the things that came to mind off the top of my head.

It's clear to me that a large number of people out there disagree with me at least about item 2 and item 4 of the list above in practice. They may or may not perform such actions themselves, but they will certainly excuse such actions on the part of others by claiming that freedom of speech does not mean freedom from consequences. For these particular consequences, I do not accept that argument, and I sincerely hope the people involved are simply unaware of the actions they're excusing, instead of actively believing that the consequences listed above are compatible with the exercise of free speech.

April 06, 2014 09:39 PM

April 05, 2014

Joshua Cranmer (jcranmer)

Announcing jsmime 0.2

Previously, I've been developing JSMime as a subdirectory within comm-central. However, after discussions with other developers, I have moved the official repository of record for JSMime to its own repository, now found on GitHub. The repository has been cleaned up and the feature set for version 0.2 has been selected, so that the current tip on JSMime (also the initial version) is version 0.2. This contains the feature set I imported into Thunderbird's source code last night, which is to say support for parsing MIME messages into the MIME tree, as well as support for parsing and encoding email address headers.

Thunderbird doesn't actually use the new code quite yet (as my current tree is stuck on a mozilla-central build error, so I haven't had time to run those patches through a last minute sanity check before requesting review), but the intent is to replace the current C++ implementations of nsIMsgHeaderParser and nsIMimeConverter with JSMime instead. Once those are done, I will be moving forward with my structured header plans which more or less ought to make those interfaces obsolete.

Within JSMime itself, the pieces which I will be working on next will be rounding out the implementation of header parsing and encoding support (I have prototypes for Date headers and the infernal RFC 2231 encoding that Content-Disposition needs), as well as support for building MIME messages from their constituent parts (a feature which would be greatly appreciated in the depths of compose and import in Thunderbird). I also want to implement full IDN and EAI support, but that's hampered by the lack of a JS implementation I can use for IDN (yes, there's punycode.js, but that doesn't do StringPrep). The important task of converting the MIME tree to a list of body parts and attachments is something I do want to work on as well, but I've vacillated on the implementation here several times and I'm not sure I've found one I like yet.

JSMime, as its name implies, tries to work in as pure JS as possible, augmented with several web APIs as necessary (such as TextDecoder for charset decoding). I'm using ES6 as the base here, because it gives me several features I consider invaluable for implementing JavaScript: Promises, Map, generators, let. This means it can run on an unprivileged web page—I test JSMime using Firefox nightlies and the Firefox debugger where necessary. Unfortunately, it only really works in Firefox at the moment because V8 doesn't support many ES6 features yet (such as destructuring, which is annoying but simple enough to work around, or Map iteration, which is completely necessary for the code). I'm not opposed to changing it to make it work on Node.js or Chrome, but I don't realistically have the time to spend doing it myself; if someone else has the time, please feel free to contact me or send patches.

April 05, 2014 05:18 PM

April 03, 2014

Joshua Cranmer (jcranmer)

If you want fast code, don't use assembly

…unless you're an expert at assembly, that is. The title of this post was obviously meant to be an attention-grabber, but it is much truer than you might think: poorly-written assembly code will probably be slower than an optimizing compiler on well-written code (note that you may need to help the compiler along for things like vectorization). Now why is this?

Modern microarchitectures are incredibly complex. A modern x86 processor will be superscalar and use some form of compilation to microcode to do that. Desktop processors will undoubtedly have multiple instruction issues per cycle, forms of register renaming, branch predictors, etc. Minor changes—a misaligned instruction stream, a poor order of instructions, a bad instruction choice—could kill the ability to take advantages of these features. There are very few people who could accurately predict the performance of a given assembly stream (I myself wouldn't attempt it if the architecture can take advantage of ILP), and these people are disproportionately likely to be working on compiler optimizations. So unless you're knowledgeable enough about assembly to help work on a compiler, you probably shouldn't be hand-coding assembly to make code faster.

To give an example to elucidate this point (and the motivation for this blog post in the first place), I was given a link to an implementation of the N-queens problem in assembly. For various reasons, I decided to use this to start building a fine-grained performance measurement system. This system uses a high-resolution monotonic clock on Linux and runs the function 1000 times to warm up caches and counters and then runs the function 1000 more times, measuring each run independently and reporting the average runtime at the end. This is a single execution of the system; 20 executions of the system were used as the baseline for a t-test to determine statistical significance as well as visual estimation of normality of data. Since the runs observed about a constant 1-2 μs of noise, I ran all of my numbers on the 10-queens problem to better separate the data (total runtimes ended up being in the range of 200-300μs at this level). When I say that some versions are faster, the p-values for individual measurements are on the order of 10-20—meaning that there is a 1-in-100,000,000,000,000,000,000 chance that the observed speedups could be produced if the programs take the same amount of time to run.

The initial assembly version of the program took about 288μs to run. The first C++ version I coded, originating from the same genesis algorithm that the author of the assembly version used, ran in 275μs. A recursive program beat out a hand-written assembly block of code... and when I manually converted the recursive program into a single loop, the runtime improved to 249μs. It wasn't until I got rid of all of the assembly in the original code that I could get the program to beat the derecursified code (at 244μs)—so it's not the vectorization that's causing the code to be slow. Intrigued, I started to analyze why the original assembly was so slow.

It turns out that there are three main things that I think cause the slow speed of the original code. The first one is alignment of branches: the assembly code contains no instructions to align basic blocks on particular branches, whereas gcc happily emits these for some basic blocks. I mention this first as it is mere conjecture; I never made an attempt to measure the effects for myself. The other two causes are directly measured from observing runtime changes as I slowly replaced the assembly with code. When I replaced the use of push and pop instructions with a global static array, the runtime improved dramatically. This suggests that the alignment of the stack could be to blame (although the stack is still 8-byte aligned when I checked via gdb), which just goes to show you how much alignments really do matter in code.

The final, and by far most dramatic, effect I saw involves the use of three assembly instructions: bsf (find the index of the lowest bit that is set), btc (clear a specific bit index), and shl (left shift). When I replaced the use of these instructions with a more complicated expression int bit = x & -x and x = x - bit, the program's speed improved dramatically. And the rationale for why the speed improved won't be found in latency tables, although those will tell you that bsf is not a 1-cycle operation. Rather, it's in minutiae that's not immediately obvious.

The original program used the fact that bsf sets the zero flag if the input register is 0 as the condition to do the backtracking; the converted code just checked if the value was 0 (using a simple test instruction). The compare and the jump instructions are basically converted into a single instruction in the processor. In contrast, the bsf does not get to do this; combined with the lower latency of the instruction intrinsically, it means that empty loops take a lot longer to do nothing. The use of an 8-bit shift value is also interesting, as there is a rather severe penalty for using 8-bit registers in Intel processors as far as I can see.

Now, this isn't to say that the compiler will always produce the best code by itself. My final code wasn't above using x86 intrinsics for the vector instructions. Replacing the _mm_andnot_si128 intrinsic with an actual and-not on vectors caused gcc to use other, slower instructions instead of the vmovq to move the result out of the SSE registers for reasons I don't particularly want to track down. The use of the _mm_blend_epi16 and _mm_srli_si128 intrinsics can probably be replaced with __builtin_shuffle instead for more portability, but I was under the misapprehension that this was a clang-only intrinsic when I first played with the code so I never bothered to try that, and this code has passed out of my memory long enough that I don't want to try to mess with it now.

In short, compilers know things about optimizing for modern architectures that many general programmers don't. Compilers may have issues with autovectorization, but the existence of vector intrinsics allow you to force compilers to use vectorization while still giving them leeway to make decisions about instruction scheduling or code alignment which are easy to screw up in hand-written assembly. Also, compilers are liable to get better in the future, whereas hand-written assembly code is unlikely to get faster in the future. So only write assembly code if you really know what you're doing and you know you're better than the compiler.

April 03, 2014 04:52 PM

March 21, 2014

Benjamin Smedberg (bsmedberg)

Using Software Copyright To Benefit the Public

Imagine a world where copyright on a piece of software benefits the world even after it expires. A world where eventually all software becomes Free Software.

The purpose of copyright is “To promote the Progress of Science and useful Arts”. The law gives a person the right to profit from their creation for a while, after which everyone gets to profit from it freely. In general, this works for books, music, and other creative works. The current term of copyright is far too long, but at least once the term is up, the whole world gets to read and love Shakespeare or Walter de la Mare equally.

The same is not true of software. In order to be useful, software has to run. Imagine the great commercial software of the past decade: Excel, Photoshop, Pagemaker. Even after copyright expires on Microsoft Excel 95 (in 2090!), nobody will be able to run it! Hardware that can run Windows 95 will not be available, and our only hope of running the software is to emulate the machines and operating systems of a century ago. There will be no opportunity to fix or improve the software.

What should we reasonably require from commercial software producers in exchange for giving them copyright protection?

The code.

In order to get any copyright protection at all, publishers should be required to make the source code available. This can either happen immediately at release, or by putting the code into escrow until copyright expires. This needs to include everything required to build the program and make it run, but since the same copyright rules would apply to operating systems and compilers, it ought to all just work.

The copyright term for software also needs to be rethought. The goal when setting a copyright term should be to balance the competing desires of giving a software author time to make money by selling software, with the natural rights of people to share ideas and use and modify their own tools.

With a term of 14 years, the following software would be leaving copyright protection around now:

A short copyright term is an incentive to software developers to constantly improve their software, and make the new versions of their software more valuable than older versions which are entering the public domain. It also opens the possibility for other companies to support old software even after the original author has decided that it isn’t worthwhile.

The European Union is currently holding a public consultation to review their copyright laws, and I’ve encouraged Mozilla to propose source availability and a shorter copyright term for software in our official contribution/proposal to that process. Maybe eventually the U.S. Congress could be persuaded to make such significant changes to copyright law, although recent history and powerful money and lobbyists make that difficult to imagine.

Commercial copyrighted software has done great things, and there will continue to be an important place in the world for it. Instead of treating the four freedoms as ethical absolutes and treating non-Free software as a “social problem”, let’s use copyright law to, after a period of time, make all software Free Software.

March 21, 2014 05:43 PM

March 14, 2014

Joshua Cranmer (jcranmer)

Understanding email charsets

Several years ago, I embarked on a project to collect the headers of all the messages I could reach on NNTP, with the original intent of studying the progression of the most common news clients. More recently, I used this dataset to attempt to discover the prevalence of charsets in email messages. In doing so, I identified a critical problem with the dataset: since it only contains headers, there is very little scope for actually understanding the full, sad story of charsets. So I've decided to rectify this problem.

This time, I modified my data-collection scripts to make it much easier to mass-download NNTP messages. The first script effectively lists all the newsgroups, and then all the message IDs in those newsgroups, stuffing the results in a set to remove duplicates (cross-posts). The second script uses Python's nntplib package to attempt to download all of those messages. Of the 32,598,261 messages identified by the first set, I succeeded in obtaining 1,025,586 messages in full or in part. Some messages failed to download due to crashing nntplib (which appears to be unable to handle messages of unbounded length), and I suspect my newsserver connections may have just timed out in the middle of the download at times. Others failed due to expiring before I could download them. All in all, 19,288 messages were not downloaded.

Analysis of the contents of messages were hampered due to a strong desire to find techniques that could mangle messages as little as possible. Prior experience with Python's message-parsing libraries lend me to believe that they are rather poor at handling some of the crap that comes into existence, and the errors in nntplib suggest they haven't fixed them yet. The only message parsing framework I truly trust to give me the level of finess is the JSMime that I'm writing, but that happens to be in the wrong language for this project. After reading some blog posts of Jeffrey Stedfast, though, I decided I would give GMime a try instead of trying to rewrite ad-hoc MIME parser #N.

Ultimately, I wrote a program to investigate the following questions on how messages operate in practice:

While those were the questions I seeked the answers to originally, I did come up with others as I worked on my tool, some in part due to what information I was basically already collecting. The tool I wrote primarily uses GMime to convert the body parts to 8-bit text (no charset conversion), as well as parse the Content-Type headers, which are really annoying to do without writing a full parser. I used ICU to handle charset conversion and detection. RFC 2047 decoding is done largely by hand since I needed very specific information that I couldn't convince GMime to give me. All code that I used is available upon request; the exact dataset is harder to transport, given that it is some 5.6GiB of data.

Other than GMime being built on GObject and exposing a C API, I can't complain much, although I didn't try to use it to do magic. Then again, in my experience (and as this post will probably convince you as well), you really want your MIME library to do charset magic for you, so in doing well for my needs, it's actually not doing well for a larger audience. ICU's C API similarly makes me want to complain. However, I'm now very suspect of the quality of its charset detection code, which is the main reason I used it. Trying to figure out how to get it to handle the charset decoding errors also proved far more annoying than it really should.

Some final background regards the biases I expect to crop up in the dataset. As the approximately 1 million messages were drawn from the python set iterator, I suspect that there's no systematic bias towards or away from specific groups, excepting that the ~11K messages found in the eternal-september.* hierarchy are completely represented. The newsserver I used, Eternal September, has a respectably large set of newsgroups, although it is likely to be biased towards European languages and under-representing East Asians. The less well-connected South America, Africa, or central Asia are going to be almost completely unrepresented. The download process will be biased away towards particularly heinous messages (such as exceedingly long lines), since nntplib itself is failing.

This being news messages, I also expect that use of 8-bit will be far more common than would be the case in regular mail messages. On a related note, the use of 8-bit in headers would be commensurately elevated compared to normal email. What would be far less common is HTML. I also expect that undeclared charsets may be slightly higher.

Charsets

Charset data is mostly collected on the basis of individual body parts within body messages; some messages have more than one. Interestingly enough, the 1,025,587 messages yielded 1,016,765 body parts with some text data, which indicates that either the messages on the server had only headers in the first place or the download process somehow managed to only grab the headers. There were also 393 messages that I identified having parts with different charsets, which only further illustrates how annoying charsets are in messages.

The aliases in charsets are mostly uninteresting in variance, except for the various labels used for US-ASCII (us - ascii, 646, and ANSI_X3.4-1968 are the less-well-known aliases), as well as the list of charsets whose names ICU was incapable of recognizing, given below. Unknown charsets are treated as equivalent to undeclared charsets in further processing, as there were too few to merit separate handling (45 in all).

For the next step, I used ICU to attempt to detect the actual charset of the body parts. ICU's charset detector doesn't support the full gamut of charsets, though, so charset names not claimed to be detected were instead processed by checking if they decoded without error. Before using this detection, I detect if the text is pure ASCII (excluding control characters, to enable charsets like ISO-2022-JP, and +, if the charset we're trying to check is UTF-7). ICU has a mode which ignores all text in things that look like HTML tags, and this mode is set for all HTML body parts.

I don't quite believe ICU's charset detection results, so I've collapsed the results into a simpler table to capture the most salient feature. The correct column indicates the cases where the detected result was the declared charset. The ASCII column captures the fraction which were pure ASCII. The UTF-8 column indicates if ICU reported that the text was UTF-8 (it always seems to try this first). The Wrong C1 column refers to an ISO-8859-1 text being detected as windows-1252 or vice versa, which is set by ICU if it sees or doesn't see an octet in the appropriate range. The other column refers to all other cases, including invalid cases for charsets not supported by ICU.

DeclaredCorrectASCIIUTF-8 Wrong C1OtherTotal
ISO-8859-1230,526225,6678838,1191,035466,230
Undeclared148,0541,11637,626186,796
UTF-875,67437,6001,551114,825
US-ASCII98,238030498,542
ISO-8859-1567,52918,527086,056
windows-125221,4144,3701543,31913029,387
ISO-8859-218,6472,13870712,31923,245
KOI8-R4,61642421,1126,154
GB23121,3075901121,478
Big562260801741,404
windows-125634310045398
IBM437842570341
ISO-8859-1331160317
windows-125113197161290
windows-12506969014101253
ISO-8859-7262600131183
ISO-8859-9127110017155
ISO-2022-JP766903148
macintosh67570124
ISO-8859-16015101116
UTF-7514055
x-mac-croatian0132538
KOI8-U282030
windows-125501800624
ISO-8859-4230023
EUC-KR0301619
ISO-8859-14144018
GB180301430017
ISO-8859-800001616
TIS-620150015
Shift_JIS840113
ISO-8859-391111
ISO-8859-10100010
KSC_56013609
GBK4206
windows-1253030025
ISO-8859-510034
IBM8500404
windows-12570303
ISO-2022-JP-22002
ISO-8859-601001
Total421,751536,3732,22611,52344,8921,016,765

The most obvious thing shown in this table is that the most common charsets remain ISO-8859-1, Windows-1252, US-ASCII, UTF-8, and ISO-8859-15, which is to be expected, given an expected prior bias to European languages in newsgroups. The low prevalence of ISO-2022-JP is surprising to me: it means a lower incidence of Japanese than I would have expected. Either that, or Japanese have switched to UTF-8 en masse, which I consider very unlikely given that Japanese have tended to resist the trend towards UTF-8 the most.

Beyond that, this dataset has caused me to lose trust in the ICU charset detectors. KOI8-R is recorded as being 18% malformed text, with most of that ICU believing to be ISO-8859-1 instead. Judging from the results, it appears that ICU has a bias towards guessing ISO-8859-1, which means I don't believe the numbers in the Other column to be accurate at all. For some reason, I don't appear to have decoders for ISO-8859-16 or x-mac-croatian on my local machine, but running some tests by hand appear to indicate that they are valid and not incorrect.

Somewhere between 0.1% and 1.0% of all messages are subject to mojibake, depending on how much you trust the charset detector. The cases of UTF-8 being misdetected as non-UTF-8 could potentially be explained by having very few non-ASCII sequences (ICU requires four valid sequences before it confidently declares text UTF-8); someone who writes a post in English but has a non-ASCII signature (such as myself) could easily fall into this category. Despite this, however, it does suggest that there is enough mojibake around that users need to be able to override charset decisions.

The undeclared charsets are described, in descending order of popularity, by ISO-8859-1, Windows-1252, KOI8-R, ISO-8859-2, and UTF-8, describing 99% of all non-ASCII undeclared data. ISO-8859-1 and Windows-1252 are probably over-counted here, but the interesting tidbit is that KOI8-R is used half as much undeclared as it is declared, and I suspect it may be undercounted. The practice of using locale-default fallbacks that Thunderbird has been using appears to be the best way forward for now, although UTF-8 is growing enough in popularity that using a specialized detector that decodes as UTF-8 if possible may be worth investigating (3% of all non-ASCII, undeclared messages are UTF-8).

HTML

Unsuprisingly (considering I'm polling newsgroups), very few messages contained any HTML parts at all: there were only 1,032 parts in the total sample size, of which only 552 had non-ASCII characters and were therefore useful for the rest of this analysis. This means that I'm skeptical of generalizing the results of this to email in general, but I'll still summarize the findings.

HTML, unlike plain text, contains a mechanism to explicitly identify the charset of a message. The official algorithm for determining the charset of an HTML file can be described simply as "look for a <meta> tag in the first 1024 bytes. If it can be found, attempt to extract a charset using one of several different techniques depending on what's present or not." Since doing this fully properly is complicated in library-less C++ code, I opted to look first for a <meta[ \t\r\n\f] production, guess the extent of the tag, and try to find a charset= string somewhere in that tag. This appears to be an approach which is more reflective of how this parsing is actually done in email clients than the proper HTML algorithm. One difference is that my regular expressions also support the newer <meta charset="UTF-8"/> construct, although I don't appear to see any use of this.

I found only 332 parts where the HTML declared a charset. Only 22 parts had a case where both a MIME charset and an HTML charset and the two disagreed with each other. I neglected to count how many messages had HTML charsets but no MIME charsets, but random sampling appeared to indicate that this is very rare on the data set (the same order of magnitude or less as those where they disagreed).

As for the question of who wins: of the 552 non-ASCII HTML parts, only 71 messages did not have the MIME type be the valid charset. Then again, 71 messages did not have the HTML type be valid either, which strongly suggests that ICU was detecting the incorrect charset. Judging from manual inspection of such messages, it appears that the MIME charset ought to be preferred if it exists. There are also a large number of HTML charset specifications saying unicode, which ICU treats as UTF-16, which is most certainly wrong.

Headers

In the data set, 1,025,856 header blocks were processed for the following statistics. This is slightly more than the number of messages since the headers of contained message/rfc822 parts were also processed. The good news is that 97% (996,103) headers were completely ASCII. Of the remaining 29,753 headers, 3.6% (1,058) were UTF-8 and 43.6% (12,965) matched the declared charset of the first body part. This leaves 52.9% (15,730) that did not match that charset, however.

Now, NNTP messages can generally be expected to have a higher 8-bit header ratio, so this is probably exaggerating the setup in most email messages. That said, the high incidence is definitely an indicator that even non-EAI-aware clients and servers cannot blindly presume that headers are 7-bit, nor can EAI-aware clients and servers presume that 8-bit headers are UTF-8. The high incidence of mismatching the declared charset suggests that fallback-charset decoding of headers is a necessary step.

RFC 2047 encoded-words is also an interesting statistic to mine. I found 135,951 encoded-words in the data set, which is rather low, considering that messages can be reasonably expected to carry more than one encoded-word. This is likely an artifact of NNTP's tendency towards 8-bit instead of 7-bit communication and understates their presence in regular email.

Counting encoded-words can be difficult, since there is a mechanism to let them continue in multiple pieces. For the purposes of this count, a sequence of such words count as a single word, and I indicate the number of them that had more than one element in a sequence in the Continued column. The 2047 Violation column counts the number of sequences where decoding words individually does not yield the same result as decoding them as a whole, in violation of RFC 2047. The Only ASCII column counts those words containing nothing but ASCII symbols and where the encoding was thus (mostly) pointless. The Invalid column counts the number of sequences that had a decoder error.

CharsetCountContinued2047 ViolationOnly ASCIIInvalid
ISO-8859-156,35515,6104990
UTF-836,56314,2163,3112,7049,765
ISO-8859-1520,6995,695400
ISO-8859-211,2472,66990
windows-12525,1743,075260
KOI8-R3,5231,203120
windows-125676556800
Big551146280171
ISO-8859-71652603
windows-12511573020
GB2312126356051
ISO-2022-JP10285049
ISO-8859-13784500
ISO-8859-9762100
ISO-8859-471200
windows-1250682100
ISO-8859-5662000
US-ASCII3810380
TIS-620363400
KOI8-U251100
ISO-8859-16221022
UTF-7172183
EUC-KR174409
x-mac-croatian103010
Shift_JIS80003
Unknown7207
ISO-2022-KR70000
GB1803061001
windows-12554000
ISO-8859-143000
ISO-8859-32100
GBK20002
ISO-8859-61100
Total135,95143,3603,3613,33810,096

This table somewhat mirrors the distribution of regular charsets, with one major class of differences: charsets that represent non-Latin scripts (particularly Asian scripts) appear to be overdistributed compared to their corresponding use in body parts. The exception to this rule is GB2312 which is far lower than relative rankings would presume—I attribute this to people using GB2312 being more likely to use 8-bit headers instead of RFC 2047 encoding, although I don't have direct evidence.

Clearly continuations are common, which is to be relatively expected. The sad part is how few people bother to try to adhere to the specification here: out of 14,312 continuations in languages that could violate the specification, 23.5% of them violated the specification. The mode-shifting versions (ISO-2022-JP and EUC-KR) are basically all violated, which suggests that no one bothered to check if their encoder "returns to ASCII" at the end of the word (I know Thunderbird's does, but the other ones I checked don't appear to).

The number of invalid UTF-8 decoded words, 26.7%, seems impossibly high to me. A brief check of my code indicates that this is working incorrectly in the face of invalid continuations, which certainly exaggerates the effect but still leaves a value too high for my tastes. Of more note are the elevated counts for the East Asian charsets: Big5, GB2312, and ISO-2022-JP. I am not an expert in charsets, but I belive that Big5 and GB2312 in particular are a family of almost-but-not-quite-identical charsets and it may be that ICU is choosing the wrong candidate of each family for these instances.

There is a surprisingly large number of encoded words that encode only ASCII. When searching specifically for the ones that use the US-ASCII charset, I found that these can be divided into three categories. One set comes from a few people who apparently have an unsanitized whitespace (space and LF were the two I recall seeing) in the display name, producing encoded words like =?us-ascii?Q?=09Edward_Rosten?=. Blame 40tude Dialog here. Another set encodes some basic characters (most commonly = and ?, although a few other interpreted characters popped up). The final set of errors were double-encoded words, such as =?us-ascii?Q?=3D=3FUTF-8=3FQ=3Ff=3DC3=3DBCr=3F=3D?=, which appear to be all generated by an Emacs-based newsreader.

One interesting thing when sifting the results is finding the crap that people produce in their tools. By far the worst single instance of an RFC 2047 encoded-word that I found is this one: Subject: Re: [Kitchen Nightmares] Meow! Gordon Ramsay Is =?ISO-8859-1?B?UEgR lqZ VuIEhlYWQgVH rbGeOIFNob BJc RP2JzZXNzZW?= With My =?ISO-8859-1?B?SHVzYmFuZ JzX0JhbGxzL JfU2F5c19BbXiScw==?= Baking Company Owner (complete with embedded spaces), discovered by crashing my ad-hoc base64 decoder (due to the spaces). The interesting thing is that even after investigating the output encoding, it doesn't look like the text is actually correct ISO-8859-1... or any obvious charset for that matter.

I looked at the unknown charsets by hand. Most of them were actually empty charsets (looked like =??B?Sy4gSC4gdm9uIFLDvGRlbg==?=), and all but one of the outright empty ones were generated by KNode and really UTF-8. The other one was a Windows-1252 generated by a minor newsreader.

Another important aspect of headers is how to handle 8-bit headers. RFC 5322 blindly hopes that headers are pure ASCII, while RFC 6532 dictates that they are UTF-8. Indeed, 97% of headers are ASCII, leaving just 29,753 headers that are not. Of these, only 1,058 (3.6%) are UTF-8 per RFC 6532. Deducing which charset they are is difficult because the large amount of English text for header names and the important control values will greatly skew any charset detector, and there is too little text to give a charset detector confidence. The only metric I could easily apply was testing Thunderbird's heuristic as "the header blocks are the same charset as the message contents"—which only worked 45.2% of the time.

Encodings

While developing an earlier version of my scanning program, I was intrigued to know how often various content transfer encodings were used. I found 1,028,971 parts in all (1,027,474 of which are text parts). The transfer encoding of binary did manage to sneak in, with 57 such parts. Using 8-bit text was very popular, at 381,223 samples, second only to 7-bit at 496,114 samples. Quoted-printable had 144,932 samples and base64 only 6,640 samples. Extremely interesting are the presence of 4 illegal transfer encodings in 5 messages, two of them obvious typos and the others appearing to be a client mangling header continuations into the transfer-encoding.

Conclusions

So, drawing from the body of this data, I would like to make the following conclusions as to using charsets in mail messages:

  1. Have a fallback charset. Undeclared charsets are extremely common, and I'm skeptical that charset detectors are going to get this stuff right, particularly since email can more naturally combine multiple languages than other bodies of text (think signatures). Thunderbird currently uses a locale-dependent fallback charset, which roughly mirrors what Firefox and I think most web browsers do.
  2. Let users override charsets when reading. On a similar token, mojibake text, while not particularly common, is common enough to make declared charsets sometimes unreliable. It's also possible that the fallback charset is wrong, so users may need to override the chosen charset.
  3. Testing is mandatory. In this set of messages, I found base64 encoded words with spaces in them, encoded words without charsets (even UNKNOWN-8BIT), and clearly invalid Content-Transfer-Encodings. Real email messages that are flagrantly in violation of basic spec requirements exist, so you should make sure that your email parser and client can handle the weirdest edge cases.
  4. Non-UTF-8, non-ASCII headers exist. EAI not withstanding, 8-bit headers are a reality. Combined with a predilection for saying ASCII when text is really ASCII, this means that there is often no good in-band information to tell you what charset is correct for headers, so you have to go back to a fallback charset.
  5. US-ASCII really means ASCII. Email clients appear to do a very good job of only emitting US-ASCII as a charset label if it's US-ASCII. The sample size is too small for me to grasp what charset 8-bit characters should imply in US-ASCII.
  6. Know your decoders. ISO-8859-1 actually means Windows-1252 in practice. Big5 and GB1232 are actually small families of charsets with slightly different meanings. ICU notably disagrees with some of these realities, so be sure to include in your tests various charset edge cases so you know that the decoders are correct.
  7. UTF-7 is still relevant. Of the charsets I found not mentioned in the WHATWG encoding spec, IBM437 and x-mac-croatian are in use only due to specific circumstances that limit their generalizable presence. IBM850 is too rare. UTF-7 is common enough that you need to actually worry about it, as abominable and evil a charset it is.
  8. HTML charsets may matter—but MIME matters more. I don't have enough data to say if charsets declared in HTML are needed to do proper decoding. I do have enough to say fairly conclusively that the MIME charset declaration is authoritative if HTML disagrees.
  9. Charsets are not languages. The entire reason x-mac-croatian is used at all can be traced to Thunderbird displaying the charset as "Croatian," despite it being pretty clearly not a preferred charset. Similarly most charsets are often enough ASCII that, say, an instance of GB2312 is a poor indicator of whether or not the message is in English. Anyone trying to filter based on charsets is doing a really, really stupid thing.
  10. RFCs reflect an ideal world, not reality. This is most notable in RFC 2047: the specification may state that encoded words are supposed to be independently decodable, but the evidence is pretty clear that more clients break this rule than uphold it.
  11. Limit the charsets you support. Just because your library lets you emit a hundred charsets doesn't mean that you should let someone try to do it. You should emit US-ASCII or UTF-8 unless you have a really compelling reason not to, and those compelling reasons don't require obscure charsets. Some particularly annoying charsets should never be written: EBCDIC is already basically dead on the web, and I'd like to see UTF-7 die as well.

When I have time, I'm planning on taking some of the more egregious or interesting messages in my dataset and packaging them into a database of emails to help create testsuites on handling messages properly.

March 14, 2014 04:17 AM

March 10, 2014

Benjamin Smedberg (bsmedberg)

Use -debugexe to debug apps in Visual Studio

Many people don’t know about how awesome the windows debuggers are. I recently got a question from a volunteer mentee: he was experiencing a startup crash in Firefox and he wanted to know how to get the debugger attached to Firefox before the crash.

On other systems, I’d say to use mach debug, but that currently doesn’t do useful things on Windows. But it’s still pretty simple. You have two options:

Debug Using Your IDE

Both Visual Studio and Visual C++ Express have a command-line option for launching the IDE ready for debugging.

devenv.exe -debugexe obj-ff-debug/dist/bin/firefox.exe -profile /c/builds/test-profile -no-remote

The -debugexe flag informs the IDE to load your Firefox build with the command lines you specify. Firefox will launch with the “Go” command (F5).

For Visual C++ express edition, run WDExpress.exe instead of devenv.exe.

Debug Using Windbg

windbg is a the Windows command-line debugger. As with any command-line debugger it has an arcane debugging syntax, but it is very powerful.

Launching Firefox with windbg doesn’t require any flags at all:

windbg.exe obj-ff-debug/dist/bin/firefox.exe -profile /c/builds/test-profile -no-remote

Debugging Firefox Release Builds

You can also debug Firefox release builds on Windows! Mozilla runs a symbol server that allows you to automatically download the debugging symbols for recent prerelease builds (I think we keep 30 days of nightly/aurora symbols) and all release builds. See the Mozilla Developer Network article for detailed instructions.

Debugging official builds can be a bit confusing due to inlining, reordering, and other compiler optimizations. I often find myself looking at the disassembly view of a function rather than the source view in order to understand what exactly is going on. Also note that if you are planning on debugging a release build, you probably want to disable automatic crash reporting by setting MOZ_CRASHREPORTER_DISABLE=1 in your environment.

March 10, 2014 02:27 PM

February 11, 2014

Benjamin Smedberg (bsmedberg)

Don’t Use Mozilla Persona to Secure High-Value Data

Mozilla Persona (formerly called Browser ID) is a login system that Mozilla has developed to make it better for users to sign in at sites without having to remember passwords. But I have seen a trend recently of people within Mozilla insisting that we should use Persona for all logins. This is a mistake: the security properties of Persona are simply not good enough to secure high-value data such as the Mozilla security bug database, user crash dumps, or other high-value information.

The chain of trust in Persona has several attack points:

The Public Key: HTTPS Fetch

When the user submits a login “assertion”, the website (Relying Party or RP) fetches the public key of the email provider (Identity Provider or IdP) using HTTPS. For instance, when I log in as benjamin@smedbergs.us, the site I’m logging into will fetch https://smedbergs.us/.well-known/browserid. This relies on the public key and CA infrastructure of the internet. Attacking this part of the chain is hard because it’s the network connection between two servers. This doesn’t appear to be a significant risk factor to me except for perhaps some state actors.

The Public Key: Attacking the IdP HTTPS Server

Attacking the email provider’s web server, on the other hand, becomes a very high value proposition. If an attacker can replace the .well-known/browserid file on a major email provider (gmail, yahoo, etc) they have the ability to impersonate every user of that service. This puts a huge responsibility on email providers to monitor and secure their HTTPS site, which may not typically be part of their email system at all. It is likely that this kind of intrusion will cause signin problems across multiple users and will be detected, but there is no guarantee that individual users will be aware of the compromise of their accounts.

Signing: Accessing the IdP Signing System

Persona email providers can silently impersonate any of their users just by the nature of the protocol. This opens the door to silent identity attacks by anyone who can access the private key of the identity/email provider. This can either be subverting the signing server, or by using legal means such as subpoenas or national security letters. In these cases, the account compromise is almost completely undetectable by either the user or the RP.

What About Password-Reset Emails?

One common defense of Persona is that email providers already have access to users account via password-reset emails. This is partly true, but it ignores an essential property of these emails: when a password is reset, a user will be aware of the attack then next time they try to login. Being unable to login will likely trigger a cautious user to review the details of their account or ask for an audit. Attacks against the IdP, on the other hand, are silent and are not as likely to trigger alarm bells.

Who Should Use Persona?

Persona is a great system for the multitude of lower-value accounts people keep on the internet. Persona is the perfect solution for the Mozilla Status Board. I wish the UI were better and built into the browser: the current UI that requires JS, shim libraries, and popup windows; it is not a great experience. But the tradeoff for not having to store and handle passwords on the server is worth that small amount of pain.

For any site with high-value data, Persona is not a good choice. On bugzilla.mozilla.org, we disabled password reset emails for users with access to security bugs. This decision indicates that persona should also be considered an unacceptable security risk for these users. Persona as a protocol doesn’t have the right security properties.

It would be very interesting to combine Persona with some other authentication system such as client certificates or a two-factor system. This would allow most users to use the simple login system, while providing extra security properties when users start to access high-value resources.

In the meantime, Mozilla should be careful how it promotes and uses Persona; it’s not a universal solution and we should be careful not to bill it as one.

February 11, 2014 04:19 PM

February 01, 2014

Joshua Cranmer (jcranmer)

Why email is hard, part 5: mail headers

This post is part 5 of an intermittent series exploring the difficulties of writing an email client. Part 1 describes a brief history of the infrastructure. Part 2 discusses internationalization. Part 3 discusses MIME. Part 4 discusses email addresses. This post discusses the more general problem of email headers.

Back in my first post, Ludovic kindly posted, in a comment, a link to a talk of someone else's email rant. And the best place to start this post is with a quote from that talk: "If you want to see an email programmer's face turn red, ask him about CFWS." CFWS is an acronym that stands for "comments and folded whitespace," and I can attest that the mere mention of CFWS is enough for me to start ranting. Comments in email headers are spans of text wrapped in parentheses, and the folding of whitespace refers to the ability to continue headers on multiple lines by inserting a newline before (but not in lieu of) a space.

I'll start by pointing out that there is little advantage to adding in free-form data to headers which are not going to be manually read in the vast majority of cases. In practice, I have seen comments used for only three headers on a reliable basis. One of these is the Date header, where a human-readable name of the timezone is sometimes included. The other two are the Received and Authentication-Results headers, where some debugging aids are thrown in. There would be no great loss in omitting any of this information; if information is really important, appending an X- header with that information is still a viable option (that's where most spam filtration notes get added, for example).

For this feature of questionable utility in the first place, the impact it has on parsing message headers is enormous. RFC 822 is specified in a manner that is familiar to anyone who reads language specifications: there is a low-level lexical scanning phase which feeds tokens into a secondary parsing phase. Like programming languages, comments and white space are semantically meaningless [1]. Unlike programming languages, however, comments can be nested—and therefore lexing an email header is not regular [2]. The problems of folding (a necessary evil thanks to the line length limit I keep complaining about) pale in comparison to comments, but it's extra complexity that makes machine-readability more difficult.

Fortunately, RFC 2822 made a drastic change to the specification that greatly limited where CFWS could be inserted into headers. For example, in the Date header, comments are allowed only following the timezone offset (and whitespace in a few specific places); in addressing headers, CFWS is not allowed within the email address itself [3]. One unanticipated downside is that it makes reading the other RFCs that specify mail headers more difficult: any version that predates RFC 2822 uses the syntax assumptions of RFC 822 (in particular, CFWS may occur between any listed tokens), whereas RFC 2822 and its descendants all explicitly enumerate where CFWS may occur.

Beyond the issues with CFWS, though, syntax is still problematic. The separation of distinct lexing and parsing phases means that you almost see what may be a hint of uniformity which turns out to be an ephemeral illusion. For example, the header parameters define in RFC 2045 for Content-Type and Content-Disposition set a tradition of ;-separated param=value attributes, which has been picked up by, say, the DKIM-Signature or Authentication-Results headers. Except a close look indicates that Authenticatin-Results allows two param=value pairs between semicolons. Another side effect was pointed out in my second post: you can't turn a generic 8-bit header into a 7-bit compatible header, since you can't tell without knowing the syntax of the header which parts can be specified as 2047 encoded-words and which ones can't.

There's more to headers than their syntax, though. Email headers are structured as a somewhat-unordered list of headers; this genericity gives rise to a very large number of headers, and that's just the list of official headers. There are unofficial headers whose use is generally agreed upon, such as X-Face, X-No-Archive, or X-Priority; other unofficial headers are used for internal tracking such as Mailman's X-BeenThere or Mozilla's X-Mozilla-Status headers. Choosing how to semantically interpret these headers (or even which headers to interpret!) can therefore be extremely daunting.

Some of the headers are specified in ways that would seem surprising to most users. For example, the venerable From header can represent anywhere between 0 mailboxes [4] to an arbitrarily large number—but most clients assume that only one exists. It's also worth noting that the Sender header is (if present) a better indication of message origin as far as tracing is concerned [5], but its relative rarity likely results in filtering applications not taking it into account. The suite of Resent-* headers also experiences similar issues.

Another impact of email headers is the degree to which they can be trusted. RFC 5322 gives some nice-sounding platitudes to how headers are supposed to be defined, but many of those interpretations turn out to be difficult to verify in practice. For example, Message-IDs are supposed to be globally unique, but they turn out to be extremely lousy UUIDs for emails on a local system, even if you allow for minor differences like adding trace headers [6].

More serious are the spam, phishing, etc. messages that lie as much as possible so as to be seen by end-users. Assuming that a message is hostile, the only header that can be actually guaranteed to be correct is the first Received header, which is added by the final user's mailserver [7]. Every other header, including the Date and From headers most notably, can be a complete and total lie. There's no real way to authenticate the headers or hide them from snoopers—this has critical consequences for both spam detection and email security.

There's more I could say on this topic (especially CFWS), but I don't think it's worth dwelling on. This is more of a preparatory post for the next entry in the series than a full compilation of complaints. Speaking of my next post, I don't think I'll be able to keep up my entirely-unintentional rate of posting one entry this series a month. I've exhausted the topics in email that I am intimately familiar with and thus have to move on to the ones I'm only familiar with.

[1] Some people attempt to be to zealous in following RFCs and ignore the distinction between syntax and semantics, as I complained about in part 4 when discussing the syntax of email addresses.
[2] I mean this in the theoretical sense of the definition. The proof that balanced parentheses is not a regular language is a standard exercise in use of the pumping lemma.
[3] Unless domain literals are involved. But domain literals are their own special category.
[4] Strictly speaking, the 0 value is intended to be used only when the email has been downgraded and the email address cannot be downgraded. Whether or not these will actually occur in practice is an unresolved question.
[5] Semantically speaking, Sender is the person who typed the message up and actually sent it out. From is the person who dictated the message. If the two headers would be the same, then Sender is omitted.
[6] Take a message that's cross-posted to two mailing lists. Each mailing list will generate copies of the message which end up being submitted back into the mail system and will typically avoid touching the Message-ID.
[7] Well, this assumes you trust your email provider. However, your email provider can do far worse to your messages than lie about the Received header…

February 01, 2014 03:57 AM

January 24, 2014

Joshua Cranmer (jcranmer)

Charsets and NNTP

Recently, the question of charsets came up within the context of necessary decoder support for Thunderbird. After much hemming and hawing about how to find this out (which included a plea to the IMAP-protocol list for data), I remembered that I actually had this data. Long-time readers of this blog may recall that I did a study several years ago on the usage share of newsreaders. After that, I was motivated to take my data collection to the most extreme way possible. Instead of considering only the "official" Big-8 newsgroups, I looked at all of them on the news server I use (effectively, all but alt.binaries). Instead of relying on pulling the data from the server for the headers I needed, I grabbed all of them—the script literally runs HEAD and saves the results in a database. And instead of a month of results, I grabbed the results for the entire year of 2011. And then I sat on the data.

After recalling Henri Svinonen's pesterings about data, I decided to see the suitability of my dataset for this task. For data management reasons, I only grabbed the data from the second half of the year (about 10 million messages). I know from memory that the quality of Python's message parser (which was used to extract data in the first place) is surprisingly poor, which introduces bias of unknown consequence to my data. Since I only extracted headers, I can't identify charsets for anything which was sent as, say, multipart/alternative (which is more common than you'd think), which introduces further systematic bias. The end result is approximately 9.6M messages that I could extract charsets from and thence do further research.

Discussions revealed one particularly surprising tidbit of information. The most popular charset not accounted for by the Encoding specification was IBM437. Henri Sivonen speculated that the cause was some crufty old NNTP client on Windows using that encoding, so I endeavored to build a correlation database to check that assumption. Using the wonderful magic of d3, I produced a heatmap comparing distributions of charsets among various user agents. Details about the visualization may be found on that page, but it does refute Henri's claim when you dig into the data (it appears to be caused by specific BBS-to-news gateways, and is mostly localized in particular BBS newsgroups).

Also found on that page are some fun discoveries of just what kind of crap people try to pass off as valid headers. Some of those User-Agents are clearly spoofs (Outlook Express and family used the X-Newsreader header, not the User-Agent header). There also appears to be a fair amount of mojibake in headers (one of them appeared to be venerable double mojibake). The charsets also have some interesting labels to them: the "big5\n" and the "(null)" illustrate that some people don't double check their code very well, and not shown are the 5 examples of people who think charset names have spaces in them. A few people appear to have mixed up POSIX locales with charsets as well.

January 24, 2014 12:53 AM

January 20, 2014

Mark Finkle (mfinkle)

Firefox for Android: Page Load Performance

One of the common types of feedback we get about Firefox for Android is that it’s slow. Other browsers get the same feedback and it’s an ongoing struggle. I mean, is anything ever really fast enough?

We tend to separate performance into three categories: Startup, Page Load and UX Responsiveness. Lately, we have been focusing on the Page Load performance. We separate further into Objective (real timing) and Subjective (perceived timing). If something “feels” slow it can be just as bad as something that is measurably slow. We have a few testing frameworks that help us track objective and subjective performance. We also use Java, JavaScript and C++ profiling to look for slow code.

To start, we have been focusing on anything that is not directly part of the Gecko networking stack. This means we are looking at all the code that executes while Gecko is loading a page. In general, we want to reduce this code as much as possible. Some of the things we turned up include:

Some of these were small improvements, while others, like the proxy lookups, were significant for “desktop” pages. I’d like to expand on two of the improvements:

Predictive Networking Hints
Gecko networking has a feature called Speculative Connections, where it’s possible for the networking system to start opening TCP connections and even begin the SSL handshake, when needed. We use this feature when we have a pretty good idea that a connection might be opened. We now use the feature in three cases:

Animating the Page Load Spinner
Firefox for Android has used the animated spinner as a page load indicator for a long time. We use the Android animation framework to “spin” an image. Keeping the spinner moving smoothly is pretty important for perceived performance. A stuck spinner doesn’t look good. Profiling showed a lot of time was being taken to keep the animation moving, so we did a test and removed it. Our performance testing frameworks showed a variety of improvements in both objective and perceived tests.

We decided to move to a progressbar, but not a real progressbar widget. We wanted to control the rendering. We did not want the same animation rendering issues to happen again. We also use only a handful of “trigger” points, since listening to true network progress is also time consuming. The result is an objective page load improvement the ranges from ~5% on multi-core, faster devices to ~20% on single-core, slower devices.

fennec-throbber-and-progressbar-on-cnn

The progressbar is currently allowed to “stall” during a page load, which can be disconcerting to some people. We will experiment with ways to improve this too.

Install Firefox for Android Nightly and let us know what you think of the page load improvements.

January 20, 2014 05:22 PM

December 04, 2013

Joshua Cranmer (jcranmer)

Why email is hard, part 4: Email addresses

This post is part 4 of an intermittent series exploring the difficulties of writing an email client. Part 1 describes a brief history of the infrastructure. Part 2 discusses internationalization. Part 3 discusses MIME. This post discusses the problems with email addresses.

You might be surprised that I find email addresses difficult enough to warrant a post discussing only this single topic. However, this is a surprisingly complex topic, and one which is made much harder by the presence of a very large number of people purporting to know the answer who then proceed to do the wrong thing [0]. To understand why email addresses are complicated, and why people do the wrong thing, I pose the following challenge: write a regular expression that matches all valid email addresses and only valid email addresses. Go ahead, stop reading, and play with it for a few minutes, and then you can compare your answer with the correct answer.

 

 

 

Done yet? So, if you came up with a regular expression, you got the wrong answer. But that's because it's a trick question: I never defined what I meant by a valid email address. Still, if you're hoping for partial credit, you may able to get some by correctly matching one of the purported definitions I give below.

The most obvious definition meant by "valid email address" is text that matches the addr-spec production of RFC 822. No regular expression can match this definition, though—and I am aware of the enormous regular expression that is often purported to solve this problem. This is because comments can be nested, which means you would need to solve the "balanced parentheses" language, which is easily provable to be non-regular [2].

Matching the addr-spec production, though, is the wrong thing to do: the production dictates the possible syntax forms an address may have, when you arguably want a more semantic interpretation. As a case in point, the two email addresses example@test.invalid and example @ test . invalid are both meant to refer to the same thing. When you ignore the actual full grammar of an email address and instead read the prose, particularly of RFC 5322 instead of RFC 822, you'll realize that matching comments and whitespace are entirely the wrong thing to do in the email address.

Here, though, we run into another problem. Email addresses are split into local-parts and the domain, the text before and after the @ character; the format of the local-part is basically either a quoted string (to escape otherwise illegal characters in a local-part), or an unquoted "dot-atom" production. The quoting is meant to be semantically invisible: "example"@test.invalid is the same email address as example@test.invalid. Normally, I would say that the use of quoted strings is an artifact of the encoding form, but given the strong appetite for aggressively "correct" email validators that attempt to blindly match the specification, it seems to me that it is better to keep the local-parts quoted if they need to be quoted. The dot-atom production matches a sequence of atoms (spans of text excluding several special characters like [ or .) separated by . characters, with no intervening spaces or comments allowed anywhere.

RFC 5322 only specifies how to unfold the syntax into a semantic value, and it does not explain how to semantically interpret the values of an email address. For that, we must turn to SMTP's definition in RFC 5321, whose semantic definition clearly imparts requirements on the format of an email address not found in RFC 5322. On domains, RFC 5321 explains that the domain is either a standard domain name [3], or it is a domain literal which is either an IPv4 or an IPv6 address. Examples of the latter two forms are test@[127.0.0.1] and test@[IPv6:::1]. But when it comes to the local-parts, RFC 5321 decides to just give up and admit no interpretation except at the final host, advising only that servers should avoid local-parts that need to be quoted. In the context of email specification, this kind of recommendation is effectively a requirement to not use such email addresses, and (by implication) most client code can avoid supporting these email addresses [4].

The prospect of internationalized domain names and email addresses throws a massive wrench into the state affairs, however. I've talked at length in part 2 about the problems here; the lack of a definitive decision on Unicode normalization means that the future here is extremely uncertain, although RFC 6530 does implicitly advise that servers should accept that some (but not all) clients are going to do NFC or NFKC normalization on email addresses.

At this point, it should be clear that asking for a regular expression to validate email addresses is really asking the wrong question. I did it at the beginning of this post because that is how the question tends to be phrased. The real question that people should be asking is "what characters are valid in an email address?" (and more specifically, the left-hand side of the email address, since the right-hand side is obviously a domain name). The answer is simple: among the ASCII printable characters (Unicode is more difficult), all the characters but those in the following string: " \"\\[]();,@". Indeed, viewing an email address like this is exactly how HTML 5 specifies it in its definition of a format for <input type="email">

Another, much easier, more obvious, and simpler way to validate an email address relies on zero regular expressions and zero references to specifications. Just send an email to the purported address and ask the user to click on a unique link to complete registration. After all, the most common reason to request an email address is to be able to send messages to that email address, so if mail cannot be sent to it, the email address should be considered invalid, even if it is syntactically valid.

Unfortunately, people persist in trying to write buggy email validators. Some are too simple and ignore valid characters (or valid top-level domain names!). Others are too focused on trying to match the RFC addr-spec syntax that, while they will happily accept most or all addr-spec forms, they also result in email addresses which are very likely to weak havoc if you pass to another system to send email; cause various forms of SQL injection, XSS injection, or even shell injection attacks; and which are likely to confuse tools as to what the email address actually is. This can be ameliorated with complicated normalization functions for email addresses, but none of the email validators I've looked at actually do this (which, again, goes to show that they're missing the point).

Which brings me to a second quiz question: are email addresses case-insensitive? If you answered no, well, you're wrong. If you answered yes, you're also wrong. The local-part, as RFC 5321 emphasizes, is not to be interpreted by anyone but the final destination MTA server. A consequence is that it does not specify if they are case-sensitive or case-insensitive, which means that general code should not assume that it is case-insensitive. Domains, of course, are case-insensitive, unless you're talking about internationalized domain names [5]. In practice, though, RFC 5321 admits that servers should make the names case-insensitive. For everyone else who uses email addresses, the effective result of this admission is that email addresses should be stored in their original case but matched case-insensitively (effectively, code should be case-preserving).

Hopefully this gives you a sense of why email addresses are frustrating and much more complicated then they first appear. There are historical artifacts of email addresses I've decided not to address (the roles of ! and % in addresses), but since they only matter to some SMTP implementations, I'll discuss them when I pick up SMTP in a later part (if I ever discuss them). I've avoided discussing some major issues with the specification here, because they are much better handled as part of the issues with email headers in general.

Oh, and if you were expecting regular expression answers to the challenge I gave at the beginning of the post, here are the answers I threw together for my various definitions of "valid email address." I didn't test or even try to compile any of these regular expressions (as you should have gathered, regular expressions are not what you should be using), so caveat emptor.

RFC 822 addr-spec
Impossible. Don't even try.
RFC 5322 non-obsolete addr-spec production
([^\x00-\x20()\[\]:;@\\,.]+(\.[^\x00-\x20()\[\]:;@\\,.]+)*|"(\\.|[^\\"])*")@([^\x00-\x20()\[\]:;@\\,.]+(.[^\x00-\x20()\[\]:;@\\,.]+)*|\[(\\.|[^\\\]])*\])
RFC 5322, unquoted email address
.*@([^\x00-\x20()\[\]:;@\\,.]+(\.[^\x00-\x20()\[\]:;@\\,.]+)*|\[(\\.|[^\\\]])*\])
HTML 5's interpretation
[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*
Effective EAI-aware version
[^\x00-\x20\x80-\x9f]()\[\]:;@\\,]+@[^\x00-\x20\x80-\x9f()\[\]:;@\\,]+, with the caveats that a dot does not begin or end the local-part, nor do two dots appear subsequent, the local part is in NFC or NFKC form, and the domain is a valid domain name.

[1] If you're trying to find guides on valid email addresses, a useful way to eliminate incorrect answers are the following litmus tests. First, if the guide mentions an RFC, but does not mention RFC 5321 (or RFC 2821, in a pinch), you can generally ignore it. If the email address test (not) @ example.com would be valid, then the author has clearly not carefully read and understood the specifications. If the guide mentions RFC 5321, RFC 5322, RFC 6530, and IDN, then the author clearly has taken the time to actually understand the subject matter and their opinion can be trusted.
[2] I'm using "regular" here in the sense of theoretical regular languages. Perl-compatible regular expressions can match non-regular languages (because of backreferences), but even backreferences can't solve the problem here. It appears that newer versions support a construct which can match balanced parentheses, but I'm going to discount that because by the time you're going to start using that feature, you have at least two problems.
[3] Specifically, if you want to get really technical, the domain name is going to be routed via MX records in DNS.
[4] RFC 5321 is the specification for SMTP, and, therefore, it is only truly binding for things that talk SMTP; likewise, RFC 5322 is only binding on people who speak email headers. When I say that systems can pretend that email addresses with domain literals or quoted local-parts don't exist, I'm excluding mail clients and mail servers. If you're writing a website and you need an email address, there is no need to support email addresses which don't exist on the open, public Internet.
[5] My usual approach to seeing internationalization at this point (if you haven't gathered from the lengthy second post of this series) is to assume that the specifications assume magic where case insensitivity is desired.

December 04, 2013 11:24 PM

November 20, 2013

Joshua Cranmer (jcranmer)

Why email is hard, part 3: MIME

This post is part 3 of an intermittent series exploring the difficulties of writing an email client. Part 1 describes a brief history of the infrastructure. Part 2 discuses internationalization. This post discusses MIME, the mechanism by which email evolves beyond plain text.

MIME, which stands for Multipurpose Internet Mail Extensions, is primarily dictated by a set of 5 RFCs: RFC 2045, RFC 2046, RFC 2047, RFC 2048, and RFC 2049, although RFC 2048 (which governs registration procedures for new MIME types) was updated with newer versions. RFC 2045 covers the format of related headers, as well as the format of the encodings used to convert 8-bit data into 7-bit for transmission. RFC 2046 describes the basic set of MIME types, most importantly the format of multipart/ types. RFC 2047 was discussed in my part 2 of this series, as it discusses encoding internationalized data in headers. RFC 2049 describes a set of guidelines for how to be conformant when processing MIME; as you might imagine, these are woefully inadequate for modern processing anyways. In practice, it is only the first three documents that matter for building an email client.

There are two main contributions of MIME, which actually makes it a bit hard to know what is meant when people refer to MIME in the abstract. The first contribution, which is of interest mostly to email, is the development of a tree-based representation of email which allows for the inclusion of non-textual parts to messages. This tree is ultimately how attachments and other features are incorporated. The other contribution is the development of a registry of MIME types for different types of file contents. MIME types have promulgated far beyond just the email infrastructure: if you want to describe what kind of file binary blob is, you can refer to it by either a magic header sequence, a file extension, or a MIME type. Searching for terms like MIME libraries will sometimes refer to libraries that actually handle the so-called MIME sniffing process (guessing a MIME type from a file extension or the contents of a file).

MIME types are decomposable into two parts, a media type and a subtype. The type text/plain has a media type of text and a subtype of plain, for example. IANA maintains an official repository of MIME types. There are very few media types, and I would argue that there ought to be fewer. In practice, degradation of unknown MIME types means that there are essentially three "fundamental" types: text/plain (which represents plain, unformatted text and to which unknown text/* types degrade), multipart/mixed (the "default" version of multipart messages; more on this later), and application/octet-stream (which represents unknown, arbitrary binary data). I can understand the separation of the message media type for things which generally follow the basic format of headers+body akin to message/rfc822, although the presence of types like message/partial that don't follow the headers+body format and the requirement to downgrade to application/octet-stream mars usability here. The distinction between image, audio, video and application is petty when you consider that in practice, the distinction isn't going to be able to make clients give better recommendations for how to handle these kinds of content (which really means deciding if it can be displayed inline or if it needs to be handed off to an external client).

Is there a better way to label content types than MIME types? Probably not. X.400 (remember that from my first post?) uses OIDs, in line with the rest of the OSI model, and my limited workings with other systems that use these OIDs is that they are obtuse, effectively opaque identifiers with no inherent semantic meaning. People use file extensions in practice to distinguish between different file types, but not all content types are stored in files (such as multipart/mixed), and the MIME types is a finer granularity to distinguish when needing to guess the type from the start of a file. My only complaints about MIME types are petty and marginal, not about the idea itself.

No, the part of MIME that I have serious complaints with is the MIME tree structure. This allows you to represent emails in arbitrarily complex structures… and onto which the standard view of email as a body with associated attachments is poorly mapped. The heart of this structure is the multipart media type, for which the most important subtypes are mixed, alternative, related, signed, and encrypted. The last two types are meant for cryptographic security definitions [1], and I won't cover them further here. All multipart types have a format where the body consists of parts (each with their own headers) separated by a boundary string. There is space before and after the last parts which consists of semantically-meaningless text sometimes containing a message like "This is a MIME message." meant to be displayed to the now practically-non-existent crowd of people who use clients that don't support MIME.

The simplest type is multipart/mixed, which means that there is no inherent structure to the parts. Attachments to a message use this type: the type of the message is set to multipart/mixed, a body is added as (typically) the first part, and attachments are added as parts with types like image/png (for PNG images). It is also not uncommon to see multipart/mixed types that have a multipart/mixed part within them: some mailing list software attaches footers to messages by wrapping the original message inside a single part of a multipart/mixed message and then appending a text/plain footer.

multipart/related is intended to refer to an HTML page [2] where all of its external resources are included as additional parts. Linking all of these parts together is done by use of a cid: URL scheme. Generating and displaying these messages requires tracking down all URL references in an HTML page, which of course means that email clients that want full support for this feature also need robust HTML (and CSS!) knowledge, and future-proofing is hard. Since the primary body of this type appears first in the tree, it also makes handling this datatype in a streaming manner difficult, since the values to which URLs will be rewritten are not known until after the entire body is parsed.

In contrast, multipart/alternative is used to satisfy the plain-text-or-HTML debate by allowing one to provide a message that is either plain text or HTML [3]. It is also the third-biggest failure of the entire email infrastructure, in my opinion. The natural expectation would be that the parts should be listed in decreasing order of preference, so that streaming clients can reject all the data after it finds the part it will display. Instead, the parts are listed in increasing order of preference, which was done in order to make the plain text part be first in the list, which helps increase readability of MIME messages for those reading email without MIME-aware clients. As a result, streaming clients are unable to progressively display the contents of multipart/alternative until the entire message has been read.

Although multipart/alternative states that all parts must contain the same contents (to varying degrees of degradation), you shouldn't be surprised to learn that this is not exactly the case. There was a period in time when spam filterers looked at only the text/plain side of things, so spammers took to putting "innocuous" messages in the text/plain half and displaying the real spam in the text/html half [4] (this technique appears to have died off a long time ago, though). In another interesting case, I received a bug report with a message containing an image/jpeg and a text/html part within a multipart/alternative [5].

To be fair, the current concept of emails as a body with a set of attachments did not exist when MIME was originally specified. The definition of multipart/parallel plays into this a lot (it means what you think it does: show all of the parts in parallel… somehow). Reading between the lines of the specification also indicates a desire to create interactive emails (via application/postscript, of course). Given that email clients have trouble even displaying HTML properly [6], and the fact that interactivity has the potential to be a walking security hole, it is not hard to see why this functionality fell by the wayside.

The final major challenge that MIME solved was how to fit arbitrary data into a 7-bit format safe for transit. The two encoding schemes they came up with were quoted-printable (which retains most printable characters, but emits non-printable characters in a =XX format, where the Xs are hex characters), and base64 which reencodes every 3 bytes into 4 ASCII characters. Non-encoded data is separated into three categories: 7-bit (which uses only ASCII characters except NUL and bare CR or LF characters), 8-bit (which uses any character but NUL, bare CR, and bare LF), and binary (where everything is possible). A further limitation is placed on all encodings but binary: every line is at most 998 bytes long, not including the terminating CRLF.

A side-effect of these requirements is that all attachments must be considered binary data, even if they are textual formats (like source code), as end-of-line autoconversion is now considered a major misfeature. To make matters even worse, body text for formats with text written in scripts that don't use spaces (such as Japanese or Chinese) can sometimes be prohibited from using 8-bit transfer format due to overly long lines: you can reach the end of a line in as few as 249 characters (UTF-8, non-BMP characters, although Chinese and Japanese typically take three bytes per character). So a single long paragraph can force a message to be entirely encoded in a format with 33% overhead. There have been suggestions for a binary-to-8-bit encoding in the past, but no standardization effort has been made for one [7].

The binary encoding has none of these problems, but no one claims to support it. However, I suspect that violating maximum line length, or adding 8-bit characters to a quoted-printable part, are likely to make it through the mail system, in part because not doing so either increases your security vulnerabilities or requires more implementation effort. Sending lone CR or LF characters is probably fine so long as one is careful to assume that they may be treated as line breaks. Sending a NUL character I suspect could cause some issues due to lack of testing (but it also leaves room for security vulnerabilities to ignore it). In other words, binary-encoded messages probably already work to a large degree in the mail system. Which makes it extremely tempting (even for me) to ignore the specification requirements when composing messages; small wonder then that blatant violations of specifications are common.

This concludes my discussion of MIME. There are certainly many more complaints I have, but this should be sufficient to lay out why building a generic MIME-aware library by itself is hard, and why you do not want to write such a parser yourself. Too bad Thunderbird has at least two different ad-hoc parsers (not libmime or JSMime) that I can think of off the top of my head, both of which are wrong.

[1] I will be covering this in a later post, but the way that signed and encrypted data is represented in MIME actually makes it really easy to introduce flaws in cryptographic code (which, the last time I surveyed major email clients with support for cryptographic code, was done by all of them).
[2] Other types are of course possible in theory, but HTML is all anyone cares about in practice.
[3] There is also text/enriched, which was developed as a stopgap while HTML 3.2 was being developed. Its use in practice is exceedingly slim.
[4] This is one of the reasons I'm minded to make "prefer plain text" do degradation of natural HTML display instead of showing the plain text parts. Not that cleanly degrading HTML is easy.
[5] In the interests of full disclosure, the image/jpeg was actually a PNG image and the HTML claimed to be 7-bit UTF-8 but was actually 8-bit, and it contained a Unicode homograph attack.
[6] Of the major clients, Outlook uses Word's HTML rendering engine, which I recall once reading as being roughly equivalent to IE 5.5 in capability. Webmail is forced to do their own sanitization and sandboxing, and the output leaves something to desire; Gmail is the worst offender here, stripping out all but inline style. Thunderbird and SeaMonkey are nearly alone in using a high-quality layout engine: you can even send a <video> in an email to Thunderbird and have it work properly. :-)
[7] There is yEnc. Its mere existence does contradict several claims (for example, that adding new transfer encodings is infeasible due to install base of software), but it was developed for a slightly different purpose. Some implementation details are hostile to MIME, and although it has been discussed to death on the relevant mailing list several times, no draft was ever made that would integrate it into MIME properly.

November 20, 2013 07:54 PM

October 24, 2013

Guillermo López (willyaranda)

Espabilad.

Espabilad. No os queda otra. Os toca espabilar.

Espabilad porque si queréis hacer lo que queréis, y realmente os queréis dedicar a lo que lleváis varios años estudiando, tenéis que espabilar.

Cuando acabéis de estudiar la carrera, y tengáis un simple papel que sois Ingenieros en Informática, es sólo un papel. A partir de ahí os toca demostrarlo. No antes. Ahora. Ahora os toca enfrentaros para lo que os han preparado.

“¡A nosotros no nos han preparado!” Mentira. Sí os han preparado. Os han preparado a ser autosuficientes. Os han preparado a pensar, han preparado vuestra mente durante al menos 5 años para que penséis y actuéis como personas que tienen una carrera y que se les presupone que aprenden rápido. Se os ha enseñado a ser ingenieros.

Os han preparado. Pero quizás no os hayáis preparado. Es fácil echar las culpas a los demás cuando uno no ha puesto mucho de su parte. Y lo bueno es que eso depende de vosotros. Depende de lo que queráis hacer una vez tengáis el dichoso papel.

Espabilad. Invertid vuestro tiempo en estudiar más allá de poder sacar un 5.0 en un examen. Invertidlo en tecnologías, en conocimientos que no os van a dar en vuestra carrera. Invertid vuestro tiempo en ser los mejores en vuestro campo.

Espabilad, estudiantes, espabilad.

La entrada Espabilad. aparece primero en Pijus Magnificus.

October 24, 2013 03:23 PM

October 17, 2013

Mark Finkle (mfinkle)

GeckoView: Embedding Gecko in your Android Application

Firefox for Android is a great browser, bringing a modern HTML rendering engine to Android 2.2 and newer. One of the things we have been hoping to do for a long time now is make it possible for other Android applications to embed the Gecko rendering engine. Over the last few months we started a side project to make this possible. We call it GeckoView.

As mentioned in the project page, we don’t intend GeckoView to be a drop-in replacement for WebView. Internally, Gecko is very different from Webkit and trying to expose the same features using the same APIs just wouldn’t be scalable or maintainable. That said, we want it to feel Android-ish and you should be comfortable with using it in your applications.

We have started to build GeckoView as part of our nightly Firefox for Android builds. You can find the library ZIPs in our latest nightly FTP folder. We are in the process of improving the APIs used to embed GeckoView. The current API is very basic. Most of that work is happening in these bugs:

If you want to start playing around with GeckoView, you can try the demo application I have on Github. It links to some pre-built GeckoView libraries.

We’d love your feedback! We use the Firefox for Android mailing list to discuss status, issues and feedback.

Note: We’re having some Tech Talks at Mozilla’s London office on Monday (Oct 21). One of the topics is GeckoView. If you’re around or in town for Droidcon, please stop by.

October 17, 2013 09:33 PM

October 11, 2013

Joshua Cranmer (jcranmer)

Why email is hard, part 2: internationalization

This post is part 2 of an intermittent series exploring the difficulties of writing an email client. Part 1 describes a brief history of the infrastructure, as well as the issues I have with it. This post is discussing internationalization, specifically supporting non-ASCII characters in email.

Internationalization is not a simple task, even if the consideration is limited to "merely" the textual aspect [1]. Languages turn out to be incredibly diverse in their writing systems, so software that tries to support all writing systems equally well ends up running into several problems that admit no general solution. Unfortunately, I am ill-placed to be able to offer personal experience with internationalization concerns [2], so some of the information I give may well be wrong.

A word of caution: this post is rather long, even by my standards, since the problems of internationalization are legion. To help keep this post from being even longer, I'm going to assume passing familiarity with terms like ASCII, Unicode, and UTF-8.

The first issue I'll talk about is Unicode normalization, and it's an issue caused largely by Unicode itself. Unicode has two ways of making accented characters: precomposed characters (such as U+00F1, ñ) or a character followed by a combining character (U+006E, n, followed by U+0303, ◌̃). The display of both is the same: ñ versus ñ (read the HTML), and no one would disagree that the share the meaning. To let software detect that they are the same, Unicode prescribes four algorithms to normalize them. These four algorithms are defined on two axes: whether to prefer composed characters (like U+00F1) or prefer decomposed characters (U+006E U+0303), and whether to normalize by canonical equivalence (noting that, for example, U+212A Kelvin sign is equivalent to the Latin majuscule K) or by compatibility (e.g., superscript 2 to a regular 2).

Another issue is one that mostly affects display. Western European languages all use a left-to-right, top-to-bottom writing order. This isn't universal: Semitic languages like Hebrew or Arabic use right-to-left, top-to-bottom; Japanese and Chinese prefer a top-to-bottom, right-to-left order (although it is sometimes written left-to-right, top-to-bottom). It thus becomes an issue as to the proper order to store these languages using different writing orders in the actual text, although I believe the practice of always storing text in "start-to-finish" order, and reversing it for display, is nearly universal.

Now, both of those issues mentioned so far are minor in the grand scheme of things, in that you can ignore them and they will still probably work properly almost all of the time. Most text that is exposed to the web is already normalized to the same format, and web browsers have gotten away with not normalizing CSS or HTML identifiers with only theoretical objections raised. All of the other issues I'm going to discuss are things that cause problems and illustrate why properly internationalizing email is hard.

Another historical mistake of Unicode is one that we will likely be stuck with for decades, and I need to go into some history first. The first Unicode standard dates from 1991, and its original goal then was to collect all of the characters needed for modern transmission, which was judged to need only a 16-bit set of characters. Unfortunately, the needs of ideographic-centric Chinese, Japanese, and Korean writing systems, particularly rare family names, turns out to rather fill up that space. Thus, in 1996, Unicode was changed to permit more characters: 17 planes of 65,536 characters each, of which the original set was termed the "Basic Multilingual Plane" or BMP for short. Systems that chose to adopt Unicode in those intervening 5 years often adopted a 16-bit character model as their standard internal format, so as to keep the benefits of fixed-width character encodings. However, with the change to a larger format, their fixed-width character encoding is no longer fixed-width.

This issue plagues anybody who works with systems that considered internationalization in that unfortunate window, which notably includes prominent programming languages like C#, Java, and JavaScript. Many cross-platform C and C++ programs implicitly require UTF-16 due to its pervasive inclusion into the Windows operating system and common internationalization libraries [3]. Unsurprisingly, non-BMP characters tend to quickly run into all sorts of hangups by unaware code. For example, right now, it is possible to coax Thunderbird to render these characters unusable in, say, your subject string if the subject is just right, and I suspect similar bugs exist in a majority of email applications [4].

For all of the flaws of Unicode [5], there is a tacit agreement that UTF-8 should be the character set to use for anyone not burdened by legacy concerns. Unfortunately, email is burdened by legacy concerns, and the use of 8-bit characters in headers that are not UTF-8 is more prevalent than it ought to be, RFC 6532 notwithstanding. In any case, email explicitly provides for handling a wide variety of alternative character sets without saying which ones should be supported. The official list [6] contains about 200 of them (including the UNKNOWN-8BIT character set), but not all of them see widespread use. In practice, the ones that definitely need to be supported are the ISO 8859-* and ISO 2022-* charsets, the EUC-* charsets, Windows-* charsets, GB18030, GBK, Shift-JIS, KOI8-{R,U}, Big5, and of course UTF-8. There are two other major charsets that don't come up directly in email but are important for implementing the entire suite of protocols: UTF-7, used in IMAP (more on that later), and Punycode (more on that later, too).

The suite of character sets falls into three main categories. First is the set of fixed-width character sets, most notably ASCII and the ISO 8859 suite of charsets, as well as UCS-2 (2 bytes per character) and UTF-32 (4 bytes per character). Since the major East Asian languages are all ideographic, which require a rather large number of characters to be encoded, fixed-width character sets are infeasible. Instead, many choose to do a variable-width encoding: Shift-JIS lets some characters (notably ASCII characters and half-width katakana) remain a single byte and uses two bytes to encode all of its other characters. UTF-8 can use between 1 byte (for ASCII characters) and 4 bytes (for non-BMP characters) for a single character. The final set of character sets, such as the ISO 2022 ones, use escape sequences to change the interpretation of subsequent characters. As a result, taking the substring of an encoding string can change its interpretation while remaining valid. This will be important later.

Two more problems related to character sets are worth mentioning. The first is the byte-order mark, or BOM, which is used to distinguish whether UTF-16 is written on a little-endian or big-endian machine. It is also sometimes used in UTF-8 to indicate that the text is UTF-8 versus some unknown legacy encoding. It is also not supposed to appear in email, but I have done some experiments which suggest that people use software that adds it without realizing that this is happening. The second issue, unsurprisingly [7], is that for some character sets (Big5 in particular, I believe), not everyone agrees on how to interpret some of the characters.

The largest problem of internationalization that applies in a general sense is the problem of case insensitivity. The 26 basic Latin letters all map nicely to case, having a single uppercase and a single lowercase variant for each letter. This practice doesn't hold in general—languages like Japanese lack even the notion of case, although it does have two kana variants that hold semantic differences. Rather, there are three basic issues with case insensitivity which showcase enough of its problems to make you want to run away from it altogether [8].

The simplest issue is the Greek sigma. Greek has two lowercase variants of the sigma character: σ and &varsigma (the "final sigma"), but a single uppercase variant, Σ. Thus mapping a string s to uppercase and back to lowercase is not equivalent to mapping s directly to lower-case in some cases. Related to this issue is the story of German ß character. This character evolved as a ligature of a long and short 's', and its uppercase form is generally held to be SS. The existence of a capital form is in some dispute, and Unicode only recently added it (ẞ, if your software supports it). As a result, merely interconverting between uppercase and lowercase versions of a string does not necessarily lead to a simple fixed point. The third issue is the Turkish dotless i (ı), which is the lowercase variant of the ASCII uppercase I character to those who speak Turkish. So it turns out that case insensitivity isn't quite the same across all locales.

Again unsurprisingly in light of the issues, the general tendency towards case-folding or case-insensitive matching in internationalized-aware specifications is to ignore the issues entirely. For example, asking for clarity on the process of case-insensitive matching for IMAP folder names, the response I got was "don't do it." HTML and CSS moved to the cumbersomely-named variant known as "ASCII-subset case-insensitivity", where only the 26 basic Latin letters are mapped to their (English) variants in case. The solution for email is also a verbose variant of "unspecified," but that is only tradition for email (more on this later).

Now that you have a good idea of the general issues, it is time to delve into how the developers of email rose to the challenge of handling internationalization. It turns out that the developers of email have managed to craft one of the most perfect and exquisite examples I have seen of how to completely and utterly fail. The challenges of internationalized emails are so difficult that buggier implementations are probably more common than fully correct implementations, and any attempt to ignore the issue is completely and totally impossible. In fact, the faults of RFC 2047 are my personal least favorite part of email, and implementing it made me change the design of JSMime more than any other feature. It is probably the single hardest thing to implement correctly in an email client, and it is so broken that another specification was needed to be able to apply internationalization more widely (RFC 2231).

The basic problem RFC 2047 sets out to solve is how to reliably send non-ASCII characters across a medium where only 7-bit characters can be reliably sent. The solution that was set out in the original version, RFC 1342, is to encode specific strings in an "encoded-word" format: =?charset?encoding?encoded text?=. The encoding can either be a 'B' (for Base64) or a 'Q' (for quoted-printable). Except the quoted-printable encoding in this format isn't quite the same quoted-printable encoding used in bodies: the space character is encoded via a '_' character instead, as spaces aren't allowed in encoded-words. Naturally, the use of spaces in encoded-words is common enough to get at least one or two bugs filed a year about Thunderbird not supporting it, and I wonder if this subtle difference between two quoted-printable variants is what causes the prevalence of such emails.

One of my great hates with regard to email is the strict header line length limit. Since the encoded-word form can get naturally verbose, particularly when you consider languages like Chinese that are going to have little whitespace amenable for breaking lines, the ingenious solution is to have adjacent encoded-word tokens separated only by whitespace be treated as the same word. As RFC 6857 kindly summarizes, "whitespace behavior is somewhat unpredictable, in practice, when multiple encoded words are used." RFC 6857 also suggests that the requirement to limit encoded words to only 74 characters in length is also rather meaningless in practice.

A more serious problem arises when you consider the necessity of treating adjacent encoded-word tokens as a single unit. This one is so serious that it reaches the point where all of your options would break somebody. When implementing an RFC 2047 encoding algorithm, how do you write the code to break up a long span of text into multiple encoded words without ever violating the specification? The naive way of doing so is to encode the text once in one long string, and then break it into checks which are then converted into the encoded-word form as necessary. This is, of course, wrong, as it breaks two strictures of RFC 2047. The first is that you cannot split the middle of multibyte characters. The second is that mode-switching character sets must return to ASCII by the end of a single encoded-word [9]. The smarter way of building encoded-words is to encode words by trying to figure out how much text can be encoded before needing to switch, and breaking the encoded-words when length quotas are exceeded. This is also wrong, since you could end up violating the return-to-ASCII rule if your don't double-check your converters. Also, if UTF-16 is used as the basis for the string before charset conversion, the encoder stands a good chance of splitting up creating unpaired surrogates and a giant mess as a result.

For JSMime, the algorithm I chose to implement is specific to UTF-8, because I can use a property of the UTF-8 implementation to make encoding fast (every octet is looked at exactly three times: once to convert to UTF-8, once to count to know when to break, and once to encode into base64 or quoted-printable). The property of UTF-8 is that the second, third, and fourth octets of a multibyte character all start with the same two bits, and those bits never start the first octet of a character. Essentially, I convert the entire string to a binary buffer using UTF-8. I then pass through the buffer, keeping counters of the length that the buffer would be in base64 form and in quoted-printable form. When both counters are exceeded, I back up to the beginning of the character, and encode that entire buffer in a word and then move on. I made sure to test that I don't break surrogate characters by making liberal use of the non-BMP character U+1F4A9 [10] in my encoding tests.

The sheer ease of writing a broken encoder for RFC 2047 means that broken encodings exist in the wild, so an RFC 2047 decoder needs to support some level of broken RFC 2047 encoding. Unfortunately, to "fix" different kinds of broken encodings requires different support for decoders. Treating adjacent encoded-words as part of the same buffer when decoding makes split multibyte characters work properly but breaks non-return-to-ASCII issues; if they are decoded separately the reverse is true. Recovering issues with isolated surrogates is at best time-consuming and difficult and at worst impossible.

Yet another problem with the way encoded-words are defined is that they are defined as specific tokens in the grammar of structured address fields. This means that you can't hide RFC 2047 encoding or decoding as a final processing step when reading or writing messages. Instead you have to do it during or after parsing (or during or before emission). So the parser as a result becomes fully intertwined with support for encoded-words. Converting a fully UTF-8 message into a 7-bit form is thus a non-trivial operation: there is a specification solely designed to discuss how to do such downgrading, RFC 6857. It requires deducing what structure a header has, parsing that harder, and then reencoding the parsed header. This sort of complicated structure makes it much harder to write general-purpose email libraries: the process of emitting a message basically requires doing a generic UTF-8-to-7-bit conversion. Thus, what is supposed to be a more implementation detail of how to send out a message ends up permeating the entire stack.

Unfortunately, the developers of RFC 2047 were a bit too clever for their own good. The specification limits the encoded-words to occurring only inside of phrases (basically, display names for addresses), unstructured text (like the subject), or comments (…). I presume this was done to avoid requiring parsers to handle internationalization in email addresses themselves or possibly even things like MIME boundary delimiters. However, this list leaves out one common source of internationalized text: filenames of attachments. This was ultimately patched by RFC 2231.

RFC 2231 is by no means a simple specification, since it attempts to solve three problems simultaneously. The first is the use of non-ASCII characters in parameter values. Like RFC 2047, the excessively low header line length limit causes the second problem, the need to wrap parameter values across multiple line lengths. As a result, the encoding is complicated (it takes more lines of code to parse RFC 2231's new features alone than it does to parse the basic format [11]), but it's not particularly difficult.

The third problem RFC 2231 attempts to solve is a rather different issue altogether: it tries to conclusively assign a language tag to the encoded text and also provides a "fix" for this to RFC 2047's encoded-words. The stated rationale is to be able to have screen readers read the text aloud properly, but the other (much more tangible) benefit is to ameliorate the issues of Unicode's Han unification by clearly identifying if the text is Chinese, Japanese, or Korean. While it sounds like a nice idea, it suffers from a major flaw: there is no way to use this data without converting internal data structures from using flat strings to richer representations. Another issue is that actually setting this value correctly (especially if your goal is supporting screen readers' pronunciations) is difficult if not impossible. Fortunately, this is an entirely optional feature; though I do see very little email that needs to be concerned about internationalization, I have yet to find an example of someone using this in the wild.

If you're the sort of person who finds properly writing internationalized text via RFC 2231 or RFC 2047 too hard (or you don't realize that you need to actually worry about this sort of stuff), and you don't want to use any of the several dozen MIME libraries to do the hard stuff for you, then you will become the bane of everyone who writes email clients, because you've just handed us email messages that have 8-bit text in the headers. At which point everything goes mad, because we have no clue what charset you just used. Well, RFC 6532 says that headers are supposed to be UTF-8, but with the specification being only 19 months old and part of a system which is still (to my knowledge) not supported by any major clients, this should be taken with a grain of salt. UTF-8 has the very nice property that text that is valid UTF-8 is highly unlikely to be any other charset, even if you start considering the various East Asian multibyte charsets. Thus you can try decoding under the assumption that is UTF-8 and switch to a designated fallback charset if decoding fails. Of course, knowing which designated fallback to use is a different matter entirely.

Stepping outside email messages themselves, internationalization is still a concern. IMAP folder names are another well-known example. RFC 3501 specified that mailbox names should be in a modified version of UTF-7 in an awkward compromise. To my knowledge, this is the only remaining significant use of UTF-7, as many web browsers disabled support due to its use in security attacks. RFC 6855, another recent specification (6 months old as of this writing), finally allows UTF-8 mailbox names here, although it too is not yet in widespread usage.

You will note missing from the list so far is email addresses. The topic of email addresses is itself worthy of lengthy discussion, but for the purposes of a discussion on internationalization, all you need to know is that, according to RFCs 821 and 822 and their cleaned-up successors, everything to the right of the '@' is a domain name and everything to the left is basically an opaque ASCII string [12]. It is here that internationalization really runs headlong into an immovable obstacle, for the email address has become the de facto unique identifier of the web, and everyone has their own funky ideas of what an email address looks like. As a result, the motto of "be liberal in what you accept" really breaks down with email addresses, and the amount of software that needs to change to accept internationalization extends far beyond the small segment interested only in the handling of email itself. Unfortunately, the relative newness of the latest specifications and corresponding lack of implementations means that I am less intimately familiar with this aspect of internationalization. Indeed, the impetus for this entire blogpost was a day-long struggle with trying to ascertain when two email addresses are the same if internationalized email address are involved.

The email address is split nicely by the '@' symbol, and internationalization of the two sides happens at two different times. Domains were internationalized first, by RFC 3490, a specification with the mouthful of a name "Internationalizing Domain Names in Applications" [13], or IDNA2003 for short. I mention the proper name of the specification here to make a point: the underlying protocol is completely unchanged, and all the work is intended to happen at roughly the level of getaddrinfo—the internal DNS resolver is supposed to be involved, but the underlying DNS protocol and tools are expected to remain blissfully unaware of the issues involved. That I mention the year of the specification should tell you that this is going to be a bumpy ride.

An internationalized domain name (IDN for short) is a domain name that has some non-ASCII characters in it. Domain names, according to DNS, are labels terminated by '.' characters, where each label may consist of up to 63 characters. The repertoire of characters are the ASCII alphanumerics and the '-' character, and labels are of course case-insensitive like almost everything else on the Internet. Encoding non-ASCII characters into this small subset while meeting these requirements is difficult for other contemporary schemes: UTF-7 uses Base64, which means 'A' and 'a' are not equivalent; percent-encoding eats up characters extremely quickly. So IDN use a different specification for this purpose, called Punycode, which allows for a dense but utterly unreadable encoding. The basic algorithm of encoding an IDN is to take the input string, apply case-folding, normalize using NFKC, and then encode with Punycode.

Case folding, as I mentioned several paragraphs ago, turns out to have some issues. The ß and &varsigma characters were the ones that caused the most complaints. You see, if you were to register, say, www.weiß.de, you would actually be registering www.weiss.de. As there is no indication of Punycode involved in the name, browsers would show the domain in the ASCII variant. One way of fixing this problem would be to work with browser vendors to institute a "preferred name" specification for websites (much like there exists one for the little icons next to page titles), so that the world could know that the proper capitalization is of course www.GoOgle.com instead of www.google.com. Instead, the German and Greek registrars pushed for a change to IDNA, which they achieved in 2010 with IDNA2008.

IDNA2008 is defined principally in RFCs 5890-5895 and UTS #46. The principal change is that the normalization step no longer exists in the protocol and is instead supposed to be done by applications, in a possibly locale-specific manner, before looking up the domain name. One reason for doing this was to eliminate the hard dependency on a specific, outdated version of Unicode [14]. It also helps fix things like the Turkish dotless I issue, in theory at least. However, this different algorithm causes some domains to be processed differently from IDNA2003. UTS #46 specifies a "compatibility mode" which changes the algorithm to match IDNA2003 better in the important cases (specifically, ß, &varsigma, and ZWJ/ZWNJ), with a note expressing the hope that this will eventually become unnecessary. To handle the lack of normalization in the protocol, registrars are asked to automatically register all classes of equivalent domain names at the same time. I should note that most major browsers (and email clients, if they implement IDN at all) are still using IDNA2003: an easy test of this fact is to attempt to go to ☃.net, which is valid under IDNA2003 but not IDNA2008.

Unicode text processing is often vulnerable to an attack known as the "homograph attack." In most fonts, the Greek omicron and the Latin miniscule o will be displayed in exactly the same way, so an attacker could pretend to be from, say, Google while instead sending you to Gοogle—I used Latin in the first word and Greek in the second. The standard solution is to only display the Unicode form (and not the Punycode form) where this is not an issue; Firefox and Opera display Unicode only for a whitelist of registrars with acceptable polices, Chrome and Internet Explorer only permits scripts that the user claims to read, and Safari only permits scripts that don't permit the homograph attack (i.e., not Cyrillic or Greek). (Note: this information I've summarized from Chromium's documentation; forward any complaints of out-of-date information to them).

IDN satisfies the needs of internationalizing the second half of an email address, so a working group was commissioned to internationalize the first one. The result is EAI, which was first experimentally specified in RFCs 5335-5337, and the standards themselves are found in RFCs 6530-6533 and 6855-6858. The primary difference between the first, experimental version and the second, to-be-implemented version is the removal of attempts to downgrade emails in the middle of transit. In the experimental version, provisions were made to specify with every internalized address an alternate, fully ASCII address to which a downgraded message could be sent if SMTP servers couldn't support the new specifications. These were removed after the experiment found that such automatic downgrading didn't work as well as hoped.

With automatic downgrading removed from the underlying protocol, the onus is on people who generate the emails—mailing lists and email clients—to figure out who can and who can't receive messages and then downgrade messages as appropriate for the recipients of the message. However, the design of SMTP is such that it is impossible to automatically determine if the client can receive these new kinds of messages. Thus, the options are to send them and hope that it works or to rely on the (usually clueless) user to inform you if it works. Clearly an unpalatable set of options, but it is one that can't be avoided due to protocol design.

The largest change of EAI is that the local parts of addresses are specified as a sequence of UTF-8 characters, omitting only the control characters [15]. The working group responsible for the specification adamantly refused to define a Unicode-to-ASCII conversion process, and thus a mechanism to make downgrading work smoothly, for several reasons. First, they didn't want to specify a prefix which could change the meaning of existing local-parts (the structure of local-parts is much less discoverable than the structure of all domain names). Second, they felt that the lack of support for displaying the Unicode variants of Punycode meant that users would have a much worse experience. Finally, the transition period would be hopefully short (although messy), so designing a protocol that supports that short period would worsen it in the long term. Considering that, at the moment of writing, only one of the major SMTP implementations has even a bug filed to support it, I think the working group underestimates just how long transition periods can take.

As far as changes to the message format go, that change is the only real change, considering how much effort is needed to opt-in. Yes, headers are now supposed to be UTF-8, but, in practice, every production MIME parser needs to handle 8-bit characters in headers anyways. Yes, message/global can have MIME encoding applied to it (unlike message/rfc822), but, in practice, you already need to assume that people are going to MIME-encode message/rfc822 in violation of the specification. So, in practice, the changes needed to a parser are to add message/global as an alias to message/rfc822 [16] and possibly tweaking some charset detection heuristics to prefer UTF-8. I would very much have liked the restriction on header line length removed, but, alas, the working group did not feel moved to make those changes. Still, I look forward to the day when I never have to worry about encoding text into RFC 2047 encoded-words.

IMAP, POP, and SMTP are also all slightly modified to take account of the new specifications. Specifically, internationalized headers are supposed to be opt-in only—SMTP are supposed to reject sending to these messages if it doesn't support them in the first place, and IMAP and POP are supposed to downgrade messages when requested unless the client asks for them to not be. As there are no major server implementations yet, I don't know how well these requirements will be followed, especially given that most of the changes already need to be tolerated by clients in practice. The experimental version of internationalization specified a format which would have wreaked havoc to many current parsers, so I suspect some of the strict requirements may be a holdover from that version.

And thus ends my foray into email internationalization, a collection of bad solutions to hard problems. I have probably done a poor job of covering the complete set of inanities involved, but what I have covered are the ones that annoy me the most. This certainly isn't the last I'll talk about the impossibility of message parsing either, but it should be enough at least to convince you that you really don't want to write your own message parser.

[1] Date/time, numbers, and currency are the other major aspects of internalization.
[2] I am a native English speaker who converses with other people almost completely in English. That said, I can comprehend French, although I am not familiar with the finer points that come with fluency, such as collation concerns.
[3] C and C++ have a built-in internationalization and localization API, derived from POSIX. However, this API is generally unsuited to the full needs of people who actually care about these topics, so it's not really worth mentioning.
[4] The basic algorithm to encode RFC 2047 strings for any charset are to try to shift characters into the output string until you hit the maximum word length. If the internal character set for Unicode conversion is UTF-16 instead of UTF-32 and the code is ignorant of surrogate concerns, then this algorithm could break surrogates apart. This is exactly how the bug is triggered in Thunderbird.
[5] I'm not discussing Han unification, which is arguably the single most controversial aspect of Unicode.
[6] Official list here means the official set curated by IANA as valid for use in the charset="" parameter. The actual set of values likely to be acceptable to a majority of clients is rather different.
[7] If you've read this far and find internationalization inoperability surprising, you are either incredibly ignorant or incurably optimistic.
[8] I'm not discussing collation (sorting) or word-breaking issues as this post is long enough already. Nevertheless, these also help very much in making you want to run away from internationalization.
[9] I actually, when writing this post, went to double-check to see if Thunderbird correctly implements return-to-ASCII in its encoder, which I can only do by running tests, since I myself find its current encoder impenetrable. It turns out that it does, but it also looks like if we switched conversion to ICU (as many bugs suggest), we may break this part of the specification, since I don't see the ICU converters switching to ASCII at the end of conversion.
[10] Chosen as a very adequate description of what I think of RFC 2047. Look it up if you can't guess it from context.
[11] As measured by implementation in JSMime, comments and whitespace included. This is biased by the fact that I created a unified lexer for the header parser, which rather simplifies the implementation of the actual parsers themselves.
[12] This is, of course a gross oversimplification, so don't complain that I'm ignoring domain literals or the like. Email addresses will be covered later.
[13] A point of trivia: the 'I' in IDNA2003 is expanded as "Internationalizing" while the 'I' in IDNA2008 is for "Internationalized."
[14] For the technically-minded: IDNA2003 relied on a hard-coded list of banned codepoints in processing, while IDNA2008 derives its lists directly from Unicode codepoint categories, with a small set of hard-coded exceptions.
[15] Certain ASCII characters may require the local-part to be quoted, of course.
[16] Strictly speaking, message/rfc822 remains all-ASCII, and non-ASCII headers need message/global. Given the track record of message/news, I suspect that this distinction will, in practice, not remain for long.

October 11, 2013 04:07 AM

October 01, 2013

Benjamin Smedberg (bsmedberg)

Mozilla Summit: Listen Hard

Listen hard at the Mozilla Summit.

When you’re at a session, give the speaker your attention. If you are like me and get distracted easily by all the people, take notes using a real pen and paper. Practice active listening: don’t argue with the speaker in your head, or start phrasing the perfect rebuttal. If a speaker or topic is not interesting to you, leave and find a different session.

At meals, sit with at least some people you don’t know. Introduce yourself! Talk to people about themselves, about the project, about their personal history. If you are a shy person, ask somebody you already know to make introductions. If you are a connector who knows lots of people, one of your primary jobs at the summit should be making introductions.

In the evenings and downtime, spend time working through the things you heard. If a presentation gave you a new technique, spend time thinking about how you could use it, and what the potential downsides are. If you learned new information, go back through your old assumptions and priorities and question whether they are still correct. If you have questions, track down the speaker and ask them in person. Questions that come the next day are one of the most valuable forms of feedback for a speaker (note: try to avoid presentations on the last day of a conference).

Talk when you have something valuable to ask or say. If you are the expert on a topic, it is your duty to lead a conversation even if you are naturally a shy person. If you aren’t the expert, use discretion so you don’t disrupt a conversation.

If you disagree with somebody, say so! Usually it’s better to disagree in a private conversation, not in a public Q&A session. If you don’t know the history of a decision, ask! Be willing to change your mind, but also be willing to stay in disagreement. You can build trust and respect even in disagreement.

If somebody disagrees with you, try to avoid being defensive (it’s hard!). Keep sharing context and asking questions. If you’re not sure whether the people you’re talking to know the history of a decision, ask them! Don’t be afraid to repeat information over and over again if the people you’re talking to haven’t heard it before.

Don’t read your email. Unfortunately you’ll probably have to scan your email for summit-related announcements, but in general your email can wait.

I’ve been at two summits, a mozcamp, and numerous all-hands and workweeks. They are exhausting and draining events for introverted individuals such as myself. But they are also motivating, inspiring, and in general awesome. Put on a positive attitude and make the most of every part of the event.

More great summit tips from Laura Forrest.

October 01, 2013 01:33 PM

September 26, 2013

Mark Finkle (mfinkle)

We

If Mozilla had secret weapons, I think our Interns would be included on the list. These hard working troops descend upon us during their school breaks and end up working on some of the hardest problems Mozilla has to offer. Our primary Intern “season” is wrapping up and I wanted to touch upon some of the work completed or in-progress.

Firefox for Android

Firefox for Metro

One of the things I like about the way Mozilla utilizes interns is that it shows them exactly what happens in real software development. They learn that code reviews can take a lot of time. Your feature might not make the desired release, or even get backed out at the last minute. They learn that large software projects are painful and carry a lot of legacy baggage, and you need to deal with it. I think it’s also a great way to learn how to communicate in a team environment. They also get to ship features in Firefox, and who doesn’t love shipping stuff?

Interns of 2013, we salute you!

September 26, 2013 05:50 PM

September 25, 2013

Mark Finkle (mfinkle)

Firefox for Android: Team Meetup, Brainstorming and Hacking

Last week, the Firefox for Android team, and some friends, had a team meetup at the Mozilla Toronto office. As is typical for Mozilla, the team is quite distributed so getting together, face to face, is refreshing. The agenda for the week was fairly simple: Brainstorm new feature ideas, discuss ways to make our workflow better, and provide some time for fun hacking.

We spent most of our time brainstorming, first at a high level, then we picked a few ideas/concepts to drill into. The high level list ended up with over 150 ideas. These ranged from blue-sky features, building on existing features, performance and UX improvements, and removing technical debt. Some of the areas where we had deeper discussions included:

We also took some time to examine our workflow. We found some rough edges we intend to smooth out. We also ended up with a better understanding of our current, somewhat organic, workflow. Look for more write-ups from the team on this as we pull the information together. One technical outcome of the the discussions was a critical examination of our automated testing situation. We decided that we depend entirely too much on Robotium for testing our Java UI and non-UI code. Plans are underway to add some JUnit test support for the non-UI code.

The Android team is very committed to working with contributors and have been doing a great job attracting and mentoring code contributors. Last week they started discussing how to attract other types of contributors, focusing on bug triage as the next possible area. Desktop Firefox has had some great bug triage support from the community, so it seems like a natural candidate. Look for more information about that effort coming soon.

There was also some time for hacking on code. Some of the hacking was pure fun stuff. I saw a twitterbot and an IRCbot created. There was also a lot of discussion and hacking on add-on APIs that provide more integration hooks for add-on developers into the Java UI. Of course there is usually a fire that needs to be put out, and we had one this time as well. The front-end team quickly pulled together to implement a late-breaking design change to the new Home page. It’s been baking on Nightly for a few days now and will start getting uplifted to Aurora by the end of the week.

All in all, it was a great week. I’m looking forward to seeing what happens next!

September 25, 2013 03:29 PM

September 14, 2013

Joshua Cranmer (jcranmer)

Why email is hard, part 1: architecture

Which is harder, writing an email client or writing a web browser? Several years ago, I would have guessed the latter. Having worked on an email client for several years, I am now more inclined to guess that email is harder, although I never really worked on a web browser, so perhaps it's just bias. Nevertheless, HTML comes with a specification that tells you how to parse crap that pretends to be HTML; email messages come with no such specification, which forces people working with email to guess based on other implementations and bug reports. To vent some of my frustration with working with email, I've decided to post some of my thoughts on what email did wrong and why it is so hard to work with. Since there is so much to talk about, instead of devoting one post to it, I'll make it an ongoing series with occasional updates (i.e., updates will come out when I feel like it, so don't bother asking).

First off, what do I mean by an email client? The capabilities of, say, Outlook versus Gaia Email versus Thunderbird are all wildly different, and history has afforded many changes in support. I'll consider anything that someone might want to put in an email client as fodder for discussion in this series (so NNTP, RSS, LDAP, CalDAV, and maybe even IM stuff might find discussions later). What I won't consider are things likely to be found in a third-party library, so SSL, HTML, low-level networking, etc., are all out of scope, although I may mention them where relevant in later posts. If one is trying to build a client from scratch, the bare minimum one needs to understand first is the basic message formatting, MIME (which governs attachments), SMTP (email delivery), and either POP or IMAP (email receipt). Unfortunately, each of these requires cross-referencing a dozen RFCs individually when you start considering optional or not-really-optional features.

The current email architecture we work with today doesn't have a unique name, although "Internet email" [1] or "SMTP-based email" are probably the most appropriate appellations. Since there is only one in use in modern times, there is no real need to refer to it by anything other than "email." The reason for the use of SMTP in lieu of any other major protocol to describe the architecture is because the heart of the system is motivated by the need to support SMTP, and because SMTP is how email is delivered across organizational boundaries, even if other protocols (such as LMTP) are used internally.

Some history of email, at least that lead up to SMTP, is in order. In the days of mainframes, mail generally only meant communicating between different users on the same machine, and so a bevy of incompatible systems started to arise. These incompatible systems grew to support connections with other computers as networking computers became possible. The ARPANET project brought with it an attempt to standardize mail transfer on ARPANET, separated into two types of documents: those that standardized message formats, and those that standardized the message transfer. These would eventually culminate in RFC 822 and RFC 821, respectively. SMTP was designed in the context of ARPANET, and it was originally intended primarily to standardize the messages transferred only on this network. As a result, it was never intended to become the standard for modern email.

The main competitor to SMTP-based email that is worth discussing is X.400. X.400 was at one time expected to be the eventual global email interconnect protocol, and interoperability between SMTP and X.400 was a major focus in the 1980s and 1990s. SMTP has a glaring flaw, to those who work with it, in that it is not so much designed as evolved to meet new needs as they came up. In contrast, X.400 was designed to account for a lot of issues that SMTP hadn't dealt with yet, and included arguably better functionality than SMTP. However, it turned out to be a colossal failure, although theories differ as to why. The most convincing to me boils down to X.400 being developed at a time of great flux in computing (the shift from mainframes to networked PCs) combined with a development process that was ill-suited to reacting quickly to these changes.

I mentioned earlier that SMTP eventually culminates in RFC 821. This is a slight lie, for one of the key pieces of the Internet, and a core of the modern email architecture, didn't exist. That is DNS, which is the closest thing the Internet has to X.500 (a global, searchable directory of everything). Without DNS, figuring out how to route mail via SMTP is a bit of a challenge (hence why SMTP allowed explicit source routing, deprecated post-DNS in RFC 2821). The documents which lay out how to use DNS to route are RFC 974, RFC 1035, and RFC 1123. So it's fair to say that RFC 1123 is really the point at which modern SMTP was developed.

But enough about history, and on to the real topic of this post. The most important artifact of SMTP-based architecture is that different protocols are used to send email from the ones used to read email. This is both a good thing and a bad thing. On the one hand, it's easier to experiment with different ways of accessing mailboxes, or only supporting limited functionality where such is desired. On the other, the need to agree on a standard format still keeps all the protocols more or less intertwined, and it makes some heavily-desired features extremely difficult to implement. For example, there is still, thirty years later, no feasible way to send a mail and save it to a "Sent" folder on your IMAP mailbox without submitting it twice [2].

The greatest flaws in the modern architecture, I think, lie in particular in a bevy of historical design mistakes which remain unmitigated to this day, in particular in the base message format and MIME. Changing these specifications is not out of the question, but the rate at which the changes become adopted is agonizingly slow, to the point that changing is generally impossible unless necessary. Sending outright binary messages was proposed as experimental in 1995, proposed as a standard in 2000, and still remains relatively unsupported: the BINARYMIME SMTP keyword only exists on one of my 4 SMTP servers. Sending non-ASCII text is potentially possible, but it is still not used in major email clients to my knowledge (searching for "8BITMIME" leads to the top results generally being "how do I turn this off?"). It will be interesting to see how email address internationalization is handled, since it's the first major overhaul to email since the introduction of MIME—the first major overhaul in 16 years. Intriguingly enough, the NNTP and Usenet communities have shown themselves to be more adept to change: sending 8-bit Usenet messages generally works, and yEnc would have been a worthwhile addition to MIME if its author had ever attempted to push it through. His decision not to (with the weak excuses he claimed) is emblematic of the resistance of the architecture to change, even in cases where such change would be pretty beneficial.

My biggest complaint with the email architecture isn't actually really a flaw in the strict sense of the term but rather a disagreement. The core motto of email could perhaps be summed up with "Be liberal in what you accept and conservative in what you send." Now, I come from a compilers background, and the basic standpoint in compilers is, if a user does something wrong, to scream at them for being a bloody idiot and to reject their code. Actually, there's a tendency to do that even if they do something technically correct but possibly unintentionally wrong. I understand why people dislike this kind of strict checking, but I personally consider it to be a feature, not a bug. My experience with attempting to implement MIME is that accepting what amounts to complete crap not only means that everyone has to worry about parsing the crap, but it actually ends up encouraging it. The attitude people get in bugs starts becoming "this is supported by <insert other client>, and your client is broken for not supporting it," even when pointed out that their message is in flagrant violation of the specification. As I understand it, HTML 5 has the luxury of specifying a massively complex parser that makes /dev/urandom in theory reliably parsed across different implementations, but there is no such similar document for the modern email message. But we still have to deal with the utter crap people claim is a valid email message. Just this past week, upon sifting through my spam folder, I found a header which is best described as =?UTF-8?Q? ISO-8859-1, non-ASCII text ?= (spaces included). The only way people are going to realize that their tools are producing this kind of crap is if their tools stop working altogether.

These two issues come together most spectacularly when RFC 2047 is involved. This is worth a blog post by itself, but the very low technically-not-but-effectively-mandatory limit on the header length (to aide people who read email without clients) means that encoded words need to be split up to fit on header lines. If you're not careful, you can end up splitting multibyte characters between different encoded words. This unfortunately occurs in practice. Properly handling it in my new parser required completely reflowing the design of the innermost parsing function and greatly increasing implementation complexity. I would estimate that this single attempt to "gracefully" handle wrong-but-of-obvious-intent scenario is worth 15% or so of the total complexity of decoding RFC 2047-encoded text.

There are other issues with modern email, of course, but all of the ones that I've collected so far are not flaws in the architecture as a whole but rather flaws of individual portions of the architecture, so I'll leave them for later posts.

[1] The capital 'I' in "Internet email" is important, as it's referring to the "Internet" in "Internet Standard" or "Internet Engineering Task Force." Thus, "Internet email" means "the email standards developed for the Internet/by the IETF" and not "email used on the internet."
[2] Yes, I know about BURL. It doesn't count. Look at who supports it: almost nobody.

September 14, 2013 10:50 PM