Planet Mozilla Education

July 18, 2014

Mark Surman (surman)

Quick thoughts from Kenya

Going anywhere in Africa always energizes me. It surprises me. Challenges my assumptions. Gives me new ideas. And makes me smile. The week I just spent in Nairobi did all these things.

Airtel top up agenda in Nairobi

The main goal of my trip was to talk to people about the local content and simple appmaking work Mozilla is doing. I spent an evening talking with Mozilla community members, a day and a bit with people at Equity Bank and a bunch of time with people from iHub. Here are three of the many thoughts I had while reflecting on the flight home:

Microbusiness is our biggest opportunity for AppMaker

I talked to ALOT of people about the idea of non-techie smartphone users being able to make their own apps.

My main question was: who would want to make their own app rather than just use Facebook? Most of the good answers had to with someone running a very small business. A person selling juice to office workers who wastes alot of travel time taking orders. An up and coming musician who wants a way to pre-sell tickets to loyal fans using mobile money. A chicken farmer outside Nairobi who is always on the phone with the hotels she sells to (pic below, met her and her husband while on a trip with Equity Bank folks). The common thread: simple to make and remix apps could be very useful to very small real world businesses that would benefit from better communications, record keeping and transaction processing via mobile phone.


Our main priority with AppMaker (or whatever we call it) right now is to get a first cut at on-device authoring out there. In the background, we also really need to be pushing on use cases like these — and the kind of app templates that would enable them. Some people at the iHub in Nairobi have offered to help with prototyping template apps specific to Kenya over the next few months, which will help with figuring this out.

Even online is offline in much of Africa

As I was reminded at MozFest East Africa, even online is offline in much of Africa (and many other parts of the world). In the city, the cost of data for high bandwidth applications like media streaming — or running a Webmaker workshop — is expensive. And, outside the city, huge areas have connections that are spotty or non-existent.


It was great to meet the BRCK people who are building products to address issues like this. Specifically: BRCK is a ruggedized wifi router with a SIM card, useful I/O ports and local storage. Brainstorming with Juliana and Erik from iHub, it quickly became clear that it could be useful for things like Webmaker workshops in places where connectivity is expensive, slow or even non-existent. If you popped a Raspberry Pi on the side, you might even be able create a working version of Webmaker tools like Thimble and Appmaker that people could use locally — with published web pages and apps trickling back or syncing once the BRCK had a connection. The Kenyan Mozillians I talked to were very excited about this idea. Worth exploring.

People buy brands

During a dinner with local Mozillians, a question was raised: ‘what will it take for Firefox OS to succeed in Kenya?’ A debate ensued. “Price,” said one person, “you can’t get a $30 smartphone like the one Mozilla is going to sell.” “Yes you can!”, said another. “But those are China phones,” said someone else. “People want real phones backed by a real brand. If people believe Firefox phones are authentic, they will buy them.”


Essentially, they were talking about the tension between brand / authenticity / price in commodity markets like smartphones. The contention was: young Kenyan’s are aspiring to move up in the world. An affordable phone backed by a global brand like Mozilla stands for this. Of course, we know this. But it’s a good reminder from the people who care most about Mozilla (our community, pic below of Mozillians from Kenya) that the Firefox brand really needs to shine through on our devices and in the product experience as we roll out phones in more parts of the world.

Mozillians from Nairobi

I’ve got alot more than this rumbling around in my head, of course. My week in Uganda and Kenya really has my mind spinning. In a good way. It’s all a good reminder that the diverse perspectives of our community and our partners are one of our greatest strengths. As an organization, we need to tap into that even more than we already do. I truly believe that the big brain that is the Mozilla Community will be a key factor in winning the next round in our efforts to stand up for the web.

Filed under: mozilla, webmakers

July 18, 2014 08:36 PM

July 16, 2014

Mark Surman (surman)

How do we get depth *and* scale?

We want millions of people learning about the web everyday with Mozilla. The ‘why’ is simple: web literacy is quickly becoming just as important as reading, writing and math. By 2024, there will be more than 5 billion people on the web. And, by then, the web will shape our everyday lives even more than it does today. Understanding how it works, how to build it and how to make it your own will be essential for nearly everyone.

Maker Party Uganda

The tougher question is ‘how’ — how do we teach the web with both the depth *and* scale that’s needed? Most people who tackle a big learning challenge pick one path of the other. For example, the educators in our Hive Learning Networks are focused on depth of learning. Everything the do is high touch, hands-on and focused on innovating so learning happens in a deep way. On the flip side, MOOCs have quickly shown what scale looks like, but they almost universally have high drop out rates and limited learning impact for all but the most motivated learners. We rarely see depth and scale go together. Yet, as the web grows, we need both. Urgently.

I’m actually quite hopeful. I’m hopeful because the Mozilla community is deeply focused on tackling this challenge head on, with people rolling up their sleeves to help people learn by making and organizing themselves in new ways that could massively grow the number of people teaching the web. We’re seeing the seeds of both depth and scale emerge.

This snapped into focus for me at MozFest East Africa in Kampala a few days ago. Borrowing from the MozFest London model, the event showcased a variety of open tech efforts by Mozilla and others: FirefoxOS app development; open data tools from a local org called Mountabatten; Mozilla localization; Firefox Desktop engineering; the work of the Ugandan National Information Technology Agency. It also included a huge Maker Party, with 200 young Ugandans showing up to learn and hack with Webmaker tools.

Maker Party Uganda

The Maker Party itself was impressive — pulled off well despite rain and limited connectivity. But what was more impressive was seeing how the Mozilla community is stepping up to plant the seeds of teaching the web at depth and scale, which I’d call out as:

Mentors: IMHO, a key to depth is humans connecting face to face to learn. We’ve set up a Webmaker Mentors program in the last year to encourage this kind of learning. The question has been: will people step up to do this kind of teaching and mentoring, and do it well? MozFest EA was promising start: 30 motivated mentors showed up prepared, enthusiastic and ready to help the 200 young people at the event learn the web.

Curriculum: one of the hard parts of scaling a volunteer-based mentor program is getting people to focus their teaching on the most important web literacy skills. We released a new collection of open source web literacy curriculum over the past couple of months designed to solve this problem. We weren’t sure how things would work out, I’d say MozFestEA is early evidence that curriculum can do a good job of helping people quickly understand what and how to teach. Here, each of the mentors was confidently and articulately teaching a piece of the web literacy framework using Webmaker tools.

Making as learning: another challenge is getting people to teach / learn deeply based on written curriculum. Mozilla focuses on ‘making by learning’ as a way past this — putting hands-on, project based learning at the heart of most of our Webmaker teaching kits. For example, the basic remix teaching kit gets learners quickly hacking and personalizing their favourite big brand web site, which almost always gets people excited and curious. More importantly: this ‘making as learning’ approach lets mentors adapt the experience to a learner’s interests and local context in real time. It was exciting to see the Ugandan mentors having students work on web pages focused on local school tasks and local music stars, which worked well in making the standard teaching kits come to life.

Clubs: mentors + curriculum + making can likely get us to our 2014 goal of 10,000 people around the world teaching web literacy with Mozilla. But the bigger question is how do we keep the depth while scaling to a much bigger level? One answer is to create more ’nodes’ in the Webmaker network and get them teaching all year round. At MozFest EA, there was a session on Webmaker Clubs — after school web literacy clubs run by students and teachers. This is an idea that floated up from the Mozilla community in Uganda and Canada. In Uganda, the clubs are starting to form. For me, this is exciting. Right now we have 30 contributors working on Webmaker in Uganda. If we opened up clubs in schools, we could imagine 100s or even 1000s. I think clubs like this is a key next step towards scale.

Community leadership: the thing that most impressed me at MozFestEA was the leadership from the community. San Emmanuel James and Lawrence Kisuuki have grown the Mozilla community in Uganda in a major way over the last couple of years. More importantly, they have invested in building more community leaders. As one example, they organized a Webmaker train the trainer event a few weeks before MozFestEA. The result was what I described above: confident mentors showing up ready to teach, including people other than San and Lawrence taking leadership within the Maker Party side of the event. I was impressed.This is key to both depth and scale: building more and better Mozilla community leaders around the world.

Of course, MozFestEA was just one event for one weekend. But, as I said, it gave me hope: it made be feel that the Mozilla community is taking the core building blocks of Webmaker shaping them into something that could have a big impact.


With Maker Party kicking off this week, I suspect we’ll see more of this in coming months. We’ll see more people rolling up their sleeves to help people learn by making. And more people organizing themselves in new ways that could massively grow the number of people teaching the web. If we can make happen this summer, much bigger things lay on the path ahead.

Filed under: education, mozilla, webmakers

July 16, 2014 08:20 PM

July 14, 2014

Mark Surman (surman)

The Instagram Effect: can we make app making easy?

Do you remember how hard digital photography used to be? I do. When my first son was born, I was still shooting film, scanning things in and manually creating web pages to show off a few choice pictures. By the time my second son was walking I had my first good digital camera. Things were better, but I still had to drag pictures onto a hard drive, bring them into Photoshop, painstakingly process them and then upload to Flickr. And then, seemingly overnight, we took a leap. Phones got good cameras. Photo processing right on the camera got dead simple. And Instagram happened. We rarely think about it, but: digital photography went from hard and expensive to cheap and ubiquitous in a very short period of time.

Mozilla on-device app making concept from MWC 2013 (Frog Design)

Mozilla on-device app making concept from MWC 2013 (Frog Design)

I want to make the same thing happen with mobile apps. Today: making a mobile app — or a complex interactive web page — is slow, hard and only for the brave and talented few. I want to make making a mobile app as easy as posting to Instagram.

At Mozilla, we’ve been talking about this for while now. At Mobile World Congress 2013 we floated the idea of making easy to make apps. And we’ve been prototyping a tool for making mobile apps in a desktop browser since last fall. We’ve built some momentum, but we have yet to solve two key problems: crafting a vision of app making that’s valuable to everyday people and making app making easy on a phone.

We came one step closer to solving these problems last week win London. In partnership with the GSMA, we organized a design workshop that asked: What if anyone could make a mobile app? What would this unlock for people? And, more interestingly, what kind of opportunity and imagination would is create in places where large numbers (billions) of people are coming online for the first time using affordable smartphones? These are the right questions to be asking if we want to create an Instagram Effect for apps.

Screen Shot 2014-07-14 at 6.08.08 PMScreen Shot 2014-07-14 at 6.09.00 PM Screen Shot 2014-07-14 at 6.08.47 PM

The London design workshop created some interesting case studies of why and how people would create and remix their own apps on their phones. A DJ in Rio who wanted to gain fans and distribute her music. A dabbawalla in Mumbai who wants to grow and manage the list of customers he delivers food to. A teacher in Durban who wants to use her Google doc full on student records to recruit parents to combat truancy. All of these case studies pointed to problems that non-technical people could more easily solve for themselves if they could easily make their own mobile apps.

Over the next few months, Mozilla will start building on-device authoring for mobile phones and interactive web pages. The case studies we developed in London — and others we’ll be pulling together over the coming weeks — will go a long way towards helping us figure out what features and app templates to build first. As we get to some first prototypes, we’re going get the Mozilla community around the world to test out our thinking via Maker Parties and other events.

At the same time, we’re going to be working on a broader piece of research on the role of locally generated content in creating opportunity for people in places whee smartphones are just starting to take at off. At the London workshop, we dug into this question with people from organizations like Equity Bank, Telefonica, USAID, EcoNet Wireless, Caribou Digital, Orange, Dalberg, Vodaphone. Working with GSMA, we plan to research this local content question and field test easy app making with partners like these over next six months. I’ll post more soon about this partnership.

Filed under: education, mozilla, webmakers

July 14, 2014 04:04 PM

July 07, 2014

Mark Surman (surman)

MoFo Update (and Board Slides)

A big priority for Mozilla in 2014 is growing our community: getting more people engaged in everything from bringing the web to mobile and teaching web literacy to millions of people around the world.  At our June Mozilla Foundation board meeting, I provided an update on the MoFo teams contribution to this effort during Q2 and on our plans for the next quarter. Here is a brief screen cast that summarizes the material fromt that meeting.

In addition to the screencast, I have posted the full board deck (40 slides) here. Much of the deck focuses on our progress towards the goal of 10k Webmaker contributors in 2014. If you want a quick overview of that piece of what we’re working on, here are some notes I wrote up to explain the Webmaker slides:

The slides also talk about our joint efforts with MoCo to grow the number of Mozilla contributors overall to 20,000 people in 2014. In addition to Webmaker, Mozilla’s Open News, Science Lab, Open Internet Policy and MozFest initiatives are all a part of growing our contributor community. There is also a financial summary. We are currently $12M towards our $17M revenue goal for the year.

For back ground and context, see Mozilla’s overall 2014 goals here and the quarterly goal tracker here. If you have questions or comments on any of this, please reach out to me directly or leave comments below.

Filed under: mozilla, statusupdate, webmakers

July 07, 2014 05:01 PM

Blake Winton (bwinton)

Figuring out where things are in an image.

People love heatmaps.

They’re a great way to show how much various UI elements are used in relation to each other, and are much easier to read at a glance than a table of click- counts would be. They can also reveal hidden patterns of usage based on the locations of elements, let us know if we’re focusing our efforts on the correct elements, and tell us how effective our communication about new features is. Because they’re so useful, one of the things I am doing in my new role is setting up the framework to provide our UX team with automatically updating heatmaps for both Desktop and Android Firefox.

Unfortunately, we can’t just wave our wands and have a heatmap magically appear. Creating them takes work, and one of the most tedious processes is figuring out where each element starts and stops. Even worse, we need to repeat the process for each platform we’re planning on displaying. This is one of the primary reasons we haven’t run a heatmap study since 2012.

In order to not spend all my time generating the heatmaps, I had to reduce the effort involved in producing these visualizations.

Being a programmer, my first inclination was to write a program to calculate them, and that sort of worked for the first version of the heatmap, but there were some difficulties. To collect locations for all the elements, we had to display all the elements.

Firefox in the process of being customized

Customize mode (as shown above) was an obvious choice since it shows everything you could click on almost by definition, but it led people to think that we weren’t showing which elements were being clicked the most, but instead which elements people customized the most. So that was out.

Next we tried putting everything in the toolbar, or the menu, but those were a little too cluttered even without leaving room for labels, and too wide (or too tall, in the case of the menu).

A shockingly busy toolbar

Similarly, I couldn’t fit everything into the menu panel either. The only solution was to resort to some Photoshop-trickery to fit all the buttons in, but that ended up breaking the script I was using to locate the various elements in the UI.

A surprisingly tall menu panel

Since I couldn’t automatically figure out where everything was, I figured we might as well use a nicely-laid out, partially generated image, and calculate the positions (mostly-)manually.

The current version of the heatmap (Note: This is not the real data.)

I had foreseen the need for different positions for the widgets when the project started, and so I put the widget locations in their own file from the start. This meant that I could update them without changing the code, which made it a little nicer to see what’s changed between versions, but still required me to reload the whole page every time I changed a position or size, which would just have taken way too long. I needed something that could give me much more immediate feedback.

Fortunately, I had recently finished watching a series of videos from Ian Johnson (@enjalot on twitter) where he used a tool he made called Tributary to do some rapid prototyping of data visualization code. It seemed like a good fit for the quick moving around of elements I was trying to do, and so I copied a bunch of the code and data in, and got to work moving things around.

I did encounter a few problems: Tributary wasn’t functional in Firefox Nightly (but I could use Chrome as a workaround) and occasionally sometimes trying to move the cursor would change the value slider instead. Even with these glitches it only took me an hour or two to get from set-up to having all the numbers for the final result! And the best part is that since it's all open source, you can take a look at the final result, or fork it yourself!

July 07, 2014 03:53 PM

June 11, 2014

Guillermo López (willyaranda)

La travesía por el desierto del balonmano español

Y lo que es peor, sin final del camino aparente…

La larga lista de desgracias del balonmano español continúa hoy con la desaparición del histórico BM. Valladolid, después del Portland San Antonio, del Ciudad Real – Atlético de Madrid o el Teka hace unos años, incluyendo el descenso a los infiernos del Bidasoa, un grande en la década de los 90.

La desaparición del BM. Valladolid, el primer equipo de élite que fui a ver, allá por mediados de los 2000, con partidos históricos (y desgraciados), como aquella semifinal contra el Flensburg (recordemos, campeón de la Final Four de este año, frente a equipazos como el Barça, Kiel o Veszprem) es un puntito más (o la puntilla) en la penosa gestión que se ha llevado en este país con el balonmano en el último lustro.

Una gestión realmente mala, viviendo en la opulencia, pensando que nada les podía bajar de sus salarios, de sus fichajes, de su egocentrismo, de su “estamos en ASOBAL”, de sus deudas, de jugadores a los que sabían que no podían pagar, esperando que todo se solucionara con un “patapúm parriba” y “que los demás se coman la mierda”.

Así que aquí estamos, 6 años después, en esta trágica situación. El Flensburg es campeón de Europa (merecido, ganando al Barça y luego al Kiel en la final) y el BM Valladolid, estandarte del balonmano, de la cantera, de Castilla y León como el Ademar (otro cuya vida pende de un hilo) muerto, kaput. Atrás quedan esos días donde un grupo de colegas de Aranda, entre los que me encontraba, jugaba contra ellos.

Mientras, los mejores jugadores españoles se van fuera. Como por ejemplo los ganadores del mundial de 2013, con una humillación impropia de dos grandísimos equipos, contra Dinamarca (un apabullante 35-19, en el que muchos disfrutamos como aquella final contra Croacia en Túnez en 2005), donde de todos los jugadores #hispanos participantes, sólo 5 quedan en la ASOBAL. Y por supuesto en el Barça (Víctor Tomás, Ariño, Sarmiento, Strbik, que se va este año, y Viran).

Y así llegamos a la ASOBAL de estos dos últimos años, con un nivel tan bajo que permite a equipos que deberían estar en una “tercera división” (Primera división, en la nomenclatura balonmanística) el competir en “la élite”, dando sorpresas a grandes equipos de siempre, en una liga extremadamente competida e impredecible (excepto por el primer puesto, el Barça, con tipos cobrando 3 veces más que todo el presupuesto anual de equipos pequeños), pero con una facilidad sobrecogedora para que los equipos (incluso históricos) mueran.

No podemos, ni debemos, hacer que el balonmano caiga, y eso obliga a que Federación, ASOBAL y muchos dirigentes de los equipos se sienten, y reflexionen a dónde quieren que vaya el deporte al que aman (si es que aún lo aman).

Larga vida al balonmano.


June 11, 2014 11:14 PM

June 06, 2014

Mark Finkle (mfinkle)

Firefox for Android: Casting videos and Roku support – Ready to test in Nightly

Firefox for Android Nightly builds now support casting HTML5 videos from a web page to a TV via a connected Roku streaming player. Using the system is simple, but it does require you to install a viewer application on your Roku device. Firefox support for the Roku viewer and the viewer itself are both currently pre-release. We’re excited to invite our Nightly channel users to help us test these new features, share feedback and file any bugs so we can continue to make improvements to performance and functionality.


To begin testing, first you’ll need to install the viewer application to your Roku. The viewer app, called Firefox for Roku Nightly, is currently a private channel. You can install it via this link: Firefox Nightly

Once installed, try loading this test page into your Firefox for Android Nightly browser: Casting Test

When Firefox has discovered your Roku, you should see the Media Control Bar with Cast and Play icons:


The Cast icon on the left of the video controls allows you to send the video to a device. You can also long-tap on the video to get the context menu, and cast from there too.

Hint: Make sure Firefox and the Roku are on the same Wifi network!

Once you have sent a video to a device, Firefox will display the Media Control Bar in the bottom of the application. This allows you to pause, play and close the video. You don’t need to stay on the original web page either. The Media Control Bar will stay visible as long as the video is playing, even as you change tabs or visit new web pages.


You’ll notice that Firefox displays an “active casting” indicator in the URL Bar when a video on the current web page is being cast to a device.

Limitations and Troubleshooting

Firefox currently limits casting HTML5 video in H264 format. This is one of the formats most easily handled by Roku streaming players. We are working on other formats too.

Some web sites hide or customize the HTML5 video controls and some override the long-tap menu too. This can make starting to cast difficult, but the simple fallback is to start playing the video in the web page. If the video is H264 and Firefox can find your Roku, a “ready to cast” indicator will appear in the URL Bar. Just tap on that to start casting the video to your Roku.

If Firefox does not display the casting icons, it might be having a problem discovering your Roku on the network. Make sure your Android device and the Roku are on the same Wifi network. You can load about:devices into Firefox to see what devices Firefox has discovered.

This is a pre-release of video casting support. We need your help to test the system. Please remember to share your feedback and file any bugs. Happy testing!

June 06, 2014 03:45 PM

June 03, 2014

Mark Surman (surman)

Things I’m thinking about

I’m in one of those ‘need to get back to blogging’ modes. Thinking about a lot of things. Feeling too busy to blog. Waiting until I have the perfect thing to say. Which is always a bad sign.

So, to get the juices flowing, I just decided to make a list of things I’m thinking about. Here it is:

1. Connecting open mobile <-> local content <-> web literacy — we we need to make progress on all three of these things at once if we want the web to be a serious player for the next few billion internet users. I’m working up a project on this topic with Ben, David and others.

2. Building a web literacy mentor community that scales — I’m excited about Maker Party, but also worried we’ll see post-campaign drop off again this year. We need a more systematic mentor program that grows, gets better and keeps people engaged 365/days a year. I’m helping Michelle and Brett think about this.

3. Figuring out the connection between an open internet and a fair internet. This a tricky. We assume an open internet will unlock opportunity for the billions of people coming online over the next few years. But it could just as easily lead to digital sweatshops. My new friends Chris and Brooke got me thinking about this in April. And I haven’t stopped thinking about it since.

4. Finding the right balance between clear goals, working across teams and distributed leadership. If I’m honest, we’ve struggled with these things at MoFo for the last 18 months or so. Our recent all hands in San Francisco felt like a breakthrough: focused, problem-solvey, fast moving. I’m thinking alot about how to keep this feeling. Working with Gunner and a bunch of other people on this.

5. Pushing on the Hive Lab concept. Some of the best Webmaker ideas — and much of our new Webmaker ‘textbook’ — come from the educators, designers and programmers we work with in Hives. However, we haven’t really figured out a way to systematically support and invest in this ‘lab’ side of Hive. I’ve been working with Claw and others to see how we can do more here.

6. Raising money. I’m always thinking about this, so it’s on my list. Right now, I’m thinking about major gifts, which is an area we’ve never cracked. IMHO, breaking through in this area is critical if we want to build Mozilla into a 100 year org that withstand the ups and downs of the market. I’m starting to talk to Geoff about this. Also, looking for outside people to help.

7. Linking Maker Party and net neutrality. Alot of the issues that Mozilla cares about are hard for people to get their heads around — net neutrality, DRM, etc. We should be able to use our web literacy work to help with this. I’ve been talking to Dave, Amira and others about building a ‘net neutrality teach-in’ campaign into Maker Party as an experiment in making this web literacy <-> big-hairy-internet-issue link.

8. Talking about LEGO some more. Specifically: how the LEGO Movie has a bunch of corny-but-useful metaphors for how screwed the Internet is right now. And how we can’t rely on a single here (e.g. Mozilla) to save the day. Spoiler: I’m going to use some LEGO Movie clips in my Knight Civic Media Conference talk later this month.

Random. I know. But these are places my brain is right now. A little scattered. But all feels juicy, good, important. Will write in more depth on some of these things soon.

Filed under: drumbeat, mozilla, webmakers

June 03, 2014 03:22 PM

May 31, 2014

Mark Finkle (mfinkle)

Firefox for Android: Your Feedback Matters!

Millions of people use Firefox for Android every day. It’s amazing to work on a product used by so many people. Unsurprisingly, some of those people send us feedback. We even have a simple system built into the application to make it easy to do. We have various systems to scan the feedback and look for trends. Sometimes, we even manually dig through the feedback for a given day. It takes time. There is a lot.

Your feedback is important and I thought I’d point out a few recent features and fixes that were directly influenced from feedback:

Help Menu
Some people have a hard time discovering features or were not aware Firefox supported some of the features they wanted. To make it easier to learn more about Firefox, we added a simple Help menu which directs you to SUMO, our online support system.

Managing Home Panels
Not everyone loves the Firefox Homepage (I do!), or more specifically, they don’t like some of the panels. We added a simple way for people to control the panels shown in Firefox’s Homepage. You can change the default panel. You can even hide all the panels. Use Settings > Customize > Home to get there.

Home panels

Improve Top Sites
The Top Sites panel in the Homepage is used by many people. At the same time, other people find that the thumbnails can reveal a bit too much of their browsing to others. We recently added support for respecting sites that might not want to be snapshot into thumbnails. In those cases, the thumbnail is replaced with a favicon and a favicon-influenced background color. The Facebook and Twitter thumbnails show the effect below:


We also added the ability to remove thumbnails using the long-tap menu.

Manage Search Engines
People also like to be able to manage their search engines. They like to switch the default. They like to hide some of the built-in engines. They like to add new engines. We have a simple system for managing search engines. Use Settings > Customize > Search to get there.


Clear History
We have a lot of feedback from people who want to clear their browsing history quickly and easily. We are not sure if the Settings > Privacy > Clear private data method is too hard to find or too time consuming to use, but it’s apparent people need other methods. We added a quick access method at the bottom of the History panel in the Homepage.


We are also working on a Clear data on exit approach too.

Quickly Switch to a Newly Opened Tab
When you long-tap on a link in a webpage, you get a menu that allows you to Open in New Tab or Open in New Private Tab. Both of those open the new tab in the background. Feedback indicates the some people really want to switch to the new tab. We already show an Android toast to let you know the tab was opened. Now we add a button to the toast allowing you to quickly switch to the tab too.


Undo Closing a Tab
Closing tabs can be awkward for people. Sometimes the [x] is too easy to hit by mistake or swiping to close is unexpected. In any case, we added the ability to undo closing a tab. Again, we use a button toast.


Offer to Setup Sync from Tabs Tray
We feel that syncing your desktop and mobile browsing data makes browsing on mobile devices much easier. Figuring out how to setup the Sync feature in Firefox might not be obvious. We added a simple banner to the Homepage to let you know the feature exists. We also added a setup entry point in the Sync area of the Tabs Tray.


We’ll continue to make changes based on your feedback, so keep sending it to us. Thanks for using Firefox for Android!

May 31, 2014 03:28 AM

May 27, 2014

Joshua Cranmer (jcranmer)

Why email is hard, part 6: today's email security

This post is part 6 of an intermittent series exploring the difficulties of writing an email client. Part 1 describes a brief history of the infrastructure. Part 2 discusses internationalization. Part 3 discusses MIME. Part 4 discusses email addresses. Part 5 discusses the more general problem of email headers. This part discusses how email security works in practice.

Email security is a rather wide-ranging topic, and one that I've wanted to cover for some time, well before several recent events that have made it come up in the wider public knowledge. There is no way I can hope to cover it in a single post (I think it would outpace even the length of my internationalization discussion), and there are definitely parts for which I am underqualified, as I am by no means an expert in cryptography. Instead, I will be discussing this over the course of several posts of which this is but the first; to ease up on the amount of background explanation, I will assume passing familiarity with cryptographic concepts like public keys, hash functions, as well as knowing what SSL and SSH are (though not necessarily how they work). If you don't have that knowledge, ask Wikipedia.

Before discussing how email security works, it is first necessary to ask what email security actually means. Unfortunately, the layman's interpretation is likely going to differ from the actual precise definition. Security is often treated by laymen as a boolean interpretation: something is either secure or insecure. The most prevalent model of security to people is SSL connections: these allow the establishment of a communication channel whose contents are secret to outside observers while also guaranteeing to the client the authenticity of the server. The server often then gets authenticity of the client via a more normal authentication scheme (i.e., the client sends a username and password). Thus there is, at the end, a channel that has both secrecy and authenticity [1]: channels with both of these are considered secure and channels without these are considered insecure [2].

In email, the situation becomes more difficult. Whereas an SSL connection is between a client and a server, the architecture of email is such that email providers must be considered as distinct entities from end users. In addition, messages can be sent from one person to multiple parties. Thus secure email is a more complex undertaking than just porting relevant details of SSL. There are two major cryptographic implementations of secure email [3]: S/MIME and PGP. In terms of implementation, they are basically the same [4], although PGP has an extra mode which wraps general ASCII (known as "ASCII-armor"), which I have been led to believe is less recommended these days. Since I know the S/MIME specifications better, I'll refer specifically to how S/MIME works.

S/MIME defines two main MIME types: multipart/signed, which contains the message text as a subpart followed by data indicating the cryptographic signature, and application/pkcs7-mime, which contains an encrypted MIME part. The important things to note about this delineation are that only the body data is encrypted [5], that it's theoretically possible to encrypt only part of a message's body, and that the signing and encryption constitute different steps. These factors combine to make for a potentially infuriating UI setup.

How does S/MIME tackle the challenges of encrypting email? First, rather than encrypting using recipients' public keys, the message is encrypted with a symmetric key. This symmetric key is then encrypted with each of the recipients' keys and then attached to the message. Second, by only signing or encrypting the body of the message, the transit headers are kept intact for the mail system to retain its ability to route, process, and deliver the message. The body is supposed to be prepared in the "safest" form before transit to avoid intermediate routers munging the contents. Finally, to actually ascertain what the recipients' public keys are, clients typically passively pull the information from signed emails. LDAP, unsurprisingly, contains an entry for a user's public key certificate, which could be useful in large enterprise deployments. There is also work ongoing right now to publish keys via DNS and DANE.

I mentioned before that S/MIME's use can present some interesting UI design decisions. I ended up actually testing some common email clients on how they handled S/MIME messages: Thunderbird, Apple Mail, Outlook [6], and Evolution. In my attempts to create a surreptitious signed part to confuse the UI, Outlook decided that the message had no body at all, and Thunderbird decided to ignore all indication of the existence of said part. Apple Mail managed to claim the message was signed in one of these scenarios, and Evolution took the cake by always agreeing that the message was signed [7]. It didn't even bother questioning the signature if the certificate's identity disagreed with the easily-spoofable From address. I was actually surprised by how well people did in my tests—I expected far more confusion among clients, particularly since the will to maintain S/MIME has clearly been relatively low, judging by poor support for "new" features such as triple-wrapping or header protection.

Another fault of S/MIME's design is that it makes the mistaken belief that composing a signing step and an encryption step is equivalent in strength to a simultaneous sign-and-encrypt. Another page describes this in far better detail than I have room to; note that this flaw is fixed via triple-wrapping (which has relatively poor support). This creates yet more UI burden into how to adequately describe in UI all the various minutiae in differing security guarantees. Considering that users already have a hard time even understanding that just because a message says it's from example@isp.invalid doesn't actually mean it's from example@isp.invalid, trying to develop UI that both adequately expresses the security issues and is understandable to end-users is an extreme challenge.

What we have in S/MIME (and PGP) is a system that allows for strong guarantees, if certain conditions are met, yet is also vulnerable to breaches of security if the message handling subsystems are poorly designed. Hopefully this is a sufficient guide to the technical impacts of secure email in the email world. My next post will discuss the most critical component of secure email: the trust model. After that, I will discuss why secure email has seen poor uptake and other relevant concerns on the future of email security.

[1] This is a bit of a lie: a channel that does secrecy and authentication at different times isn't as secure as one that does them at the same time.
[2] It is worth noting that authenticity is, in many respects, necessary to achieve secrecy.
[3] This, too, is a bit of a lie. More on this in a subsequent post.
[4] I'm very aware that S/MIME and PGP use radically different trust models. Trust models will be covered later.
[5] S/MIME 3.0 did add a provision stating that if the signed/encrypted part is a message/rfc822 part, the headers of that part should override the outer message's headers. However, I am not aware of a major email client that actually handles these kind of messages gracefully.
[6] Actually, I tested Windows Live Mail instead of Outlook, but given the presence of an official MIME-to-Microsoft's-internal-message-format document which seems to agree with what Windows Live Mail was doing, I figure their output would be identical.
[7] On a more careful examination after the fact, it appears that Evolution may have tried to indicate signedness on a part-by-part basis, but the UI was sufficiently confusing that ordinary users are going to be easily confused.

May 27, 2014 12:32 AM

May 24, 2014

Daniel Le Duc Khoi Nguyen (Libras2909)

New blog on GitHub!

I decided to start blogging again at my new blog (URL above). I will no longer maintain this blog. Thank you everyone who has stopped by/commented. Hope to see you all at my new blog!

May 24, 2014 09:06 AM

May 06, 2014

Guillermo López (willyaranda)

Sherpa Summit 2014

Unas fotos con clase.

Unas fotos con clase.

El martes 6 de mayo de 2014 estuve en el Sherpa Summit, un evento organizado por la gente detrás de la aplicación (un “asistente personal”), representando a Mozilla, junto con Osoitz y Patxi, de Librezale.

En el stand, accesible por todo el mundo desde las 9:30 de la mañana hasta las 18 de la tarde, pasaron decenas de personas preguntando por los terminales Firefox OS que tenemos disponibles en nuestra “exposición”: ZTE Open C, Alcatel OneTouch, LG Fireweb, Geeksphone Keon y Geeksphone Peak. Nos faltó una tablet para mostrar a todo el mundo otro tamaño de pantalla y una homescreen un poco diferente. Pegatinas, chapas y alguna que otra calcamonía fueron entregadas a los fieles devotos del panda (zorro) rojo que se acercaron.

Stand. Firefoxes ready to rock.

A las 10:15 de la mañana tuve una charla enmarcada en el track de Apps móviles, a pesar de que quedó bastante enfocada sobre HTML5, mostré cómo crear una pequeña aplicación para Firefox OS en menos de un minuto, seleccionando la web de RTVE (¡gracias a Salva!) para móviles, la cual es muy sencilla de guardar (Control-S, como index.html y crear un minimanifest, el resto “just works”) y tunear para que se vea bien.

Sherpa 2014, you

27% of HTML5 apps using Wrappers.

Finalmente, terminamos a las 6 de la tarde, dando las gracias a la organización por dejarnos venir un año más (¡junto con la posibilidad de dar una charla!) y a Francisco Picolini por su siempre ayuda en estos eventos, aunque esta vez se quedara en Madrid organizando cosas más importantes.

Laster arte!


You care about HTML5 apps.

May 06, 2014 06:25 PM

April 14, 2014

Mark Surman (surman)

Mozilla is all of us

Ten years ago, a scrappy group of ten Mozilla staff, and thousands of volunteer Mozillians, broke up Microsoft’s monopoly on accessing the web with the release of Firefox 1.0. No single mastermind can claim credit for this achievement. Instead, it was a wildly diverse and global community brought together through their shared commitment to a singular goal: to protect and build the open web. They achieved something that seemed impossible. That’s what Mozillians can do when we’re at our best.

Over the last few years, we’ve taken on another huge challenge: building a smartphone incorporating the technology and values of the open web. In a few short years, we’ve taken Boot to Gecko, an idea for an open source operating system for mobile, all the way to the release of Firefox OS phones in 15+ countries. It was thousands of Mozillians — coders, localizers, partners, evangelists and others — that made this journey possible. These Mozillians, and the many more who will join us, will play a key role in achieving the audacious goal of putting the full power and potential of the web into the hands of the next two billion people who come online.

Over the last few weeks, the media and critics have jumped to the conclusion that our CEO defines who Mozilla is. But, that’s not the reality.

The reality is this: Mozilla is all of us. We are not one or two leaders, and we never have been. Mozilla is a global community of people building tools for a free and open web that we can’t build anywhere else. We’re people solving the tough problems on the web that most need solving. Mozilla is all of us taking action every day, wherever we are. Building. Teaching. Empowering. We all define who Mozilla is together. It’s the things we choose to build and teach and do every day that add up to ‘Mozilla’.

While hard, the past few weeks have been a reminder of that.  The attention, boycotts, ire from across the political spectrum, and departure of an original founder like Brendan would have devastated most companies, leaving them wounded and floundering with their leadership gone. But, Mozilla is not like most companies. Instead, we’re a global community that rolls up our sleeves to work on a common cause, not a company with single leader. Mozilla is all of us. As Mozillians, we need to remember this. And live it.

That’s one of the reasons I’m happy Chris Beard agreed to step in as interim CEO at the Mozilla Corporation today. Certainly, he knows technology and products, having played a key role in everything from the early success of Firefox to unveiling Firefox OS at the Mobile World Congress. But, more importantly right now, Chris is one of the best leaders I know at gathering people around Mozilla in a way that lets them have impact.

Just one example of where Chris has done this: the famous Firefox 1.0 ad in the New York Times.

Firefox 1.0 New York Times Ad

The notable thing about this ad is not its size or reach, but that Mozilla neither placed nor even paid for it. The ad was a grassroots effort, dreamed up and paid for by roughly 10,000 people who’d been using Firefox in beta and wanted the world to know that there was a real choice in how people could access the web. Chris was running marketing for Mozilla at the time. As he saw community momentum growing around the idea, he jumped in to help, bringing in more resources to make sure the ad actually made it into the Times. He did what Mozilla leaders do at their best: empower Mozillians to take concrete action to move our cause forward.

Mozilla has a tremendous amount of momentum right now. We’ve just shipped Firefox OS in 15 countries and released a $25 open source smartphone that will bring the web to tens of millions of people for the first time. We’re about to unleash the next round of events for our grassroots Maker Party campaign, which will bring in thousands of new volunteers and teach people around the world about how the web works. And we’re becoming a bigger — and more necessary — voice for trust and for privacy on the web at time when online security is facing unprecedented threats. The things we are all working on together are exciting, and they’re important.

In all honesty, the past few weeks have taken their toll. But, as they say, never waste a good crisis. We’re already seizing the opportunity to become even better and stronger than we were a month ago.
This starts with reminding ourselves that Mozilla is at its best when we all see ourselves as leaders, when we all bring our passion and our talent full bore to building Mozilla every single day. Chris has a role in making this happen. So do people like Mitchell and me. The members of our boards play a role, too. But, it is only when all of us roll up our sleeves to lead, act and inspire that we unlock the full potential of Mozilla. That is what we need to do right now.


Filed under: mozilla, openweb, webmakers

April 14, 2014 07:33 PM

April 06, 2014

Boris Zbarsky (bz)

Speech and consequences

I've seen the phrase "freedom of speech does not mean freedom from consequences" a lot recently. This is clearly true. However, it's also clearly true that freedom of speech does in fact mean freedom from some consequences. As a simple example, the First Amendment to the US Constitution and its associated jurisprudence is all about delineating some consequences one must be free of for speech to be considered free.

The question then becomes this: which consequences should one be free of when speaking? I am not a lawyer, and this is not a legal analysis (though some of these consequences are pretty clearly illegal in their own right, though not readily actionable if performed anonymously), but rather a moral one. I would consider at least the following consequences that involve private action only as unacceptable restraints on freedom of speech:

  1. Physical violence against the speaker or their family, friends, or associates.
  2. Threats of such physical violence. This most definitely includes death threats.
  3. Destruction of or damage to the property of the speaker or their family, friends, or associates.
  4. Harassment (bullhorns in the night, incessant phone calls, etc) of the family, friends, or associates of the speaker. I don't feel as absolutely about the speaker him/herself, because the definition of "harassment" is rather vague. While the above examples with bullhorns and phones seem morally repugnant to me as applied to the speaker, there may be other things that I consider harassment but others consider OK as a response to a speaker. There's a lot more gray area here than in items 1-3.

This is not meant to be an exhaustive list; these are the things that came to mind off the top of my head.

It's clear to me that a large number of people out there disagree with me at least about item 2 and item 4 of the list above in practice. They may or may not perform such actions themselves, but they will certainly excuse such actions on the part of others by claiming that freedom of speech does not mean freedom from consequences. For these particular consequences, I do not accept that argument, and I sincerely hope the people involved are simply unaware of the actions they're excusing, instead of actively believing that the consequences listed above are compatible with the exercise of free speech.

April 06, 2014 09:39 PM

April 05, 2014

Joshua Cranmer (jcranmer)

Announcing jsmime 0.2

Previously, I've been developing JSMime as a subdirectory within comm-central. However, after discussions with other developers, I have moved the official repository of record for JSMime to its own repository, now found on GitHub. The repository has been cleaned up and the feature set for version 0.2 has been selected, so that the current tip on JSMime (also the initial version) is version 0.2. This contains the feature set I imported into Thunderbird's source code last night, which is to say support for parsing MIME messages into the MIME tree, as well as support for parsing and encoding email address headers.

Thunderbird doesn't actually use the new code quite yet (as my current tree is stuck on a mozilla-central build error, so I haven't had time to run those patches through a last minute sanity check before requesting review), but the intent is to replace the current C++ implementations of nsIMsgHeaderParser and nsIMimeConverter with JSMime instead. Once those are done, I will be moving forward with my structured header plans which more or less ought to make those interfaces obsolete.

Within JSMime itself, the pieces which I will be working on next will be rounding out the implementation of header parsing and encoding support (I have prototypes for Date headers and the infernal RFC 2231 encoding that Content-Disposition needs), as well as support for building MIME messages from their constituent parts (a feature which would be greatly appreciated in the depths of compose and import in Thunderbird). I also want to implement full IDN and EAI support, but that's hampered by the lack of a JS implementation I can use for IDN (yes, there's punycode.js, but that doesn't do StringPrep). The important task of converting the MIME tree to a list of body parts and attachments is something I do want to work on as well, but I've vacillated on the implementation here several times and I'm not sure I've found one I like yet.

JSMime, as its name implies, tries to work in as pure JS as possible, augmented with several web APIs as necessary (such as TextDecoder for charset decoding). I'm using ES6 as the base here, because it gives me several features I consider invaluable for implementing JavaScript: Promises, Map, generators, let. This means it can run on an unprivileged web page—I test JSMime using Firefox nightlies and the Firefox debugger where necessary. Unfortunately, it only really works in Firefox at the moment because V8 doesn't support many ES6 features yet (such as destructuring, which is annoying but simple enough to work around, or Map iteration, which is completely necessary for the code). I'm not opposed to changing it to make it work on Node.js or Chrome, but I don't realistically have the time to spend doing it myself; if someone else has the time, please feel free to contact me or send patches.

April 05, 2014 05:18 PM

April 04, 2014

Mark Surman (surman)

Mozilla is human

A few days ago I wrote: Mozilla is messy. For better and for worse, the week’s events showed how true that is.

Looking back at the past week, this also comes to mind: Mozilla is human. In all the best and worst ways. With all the struggle and all the inspiration. Mozilla is very very human.

On the inspiration part, I need to say: Brendan Eich is one of the most inspiring humans that I have ever met. He is a true hero for many of us. He invented a programming language that is the heart and soul of the most open communications system the world has ever known. He led a band of brilliant engineers and activists who freed the internet from the grip of Microsoft. And, one-on-one, in his odd and brilliant ways, he helped and advised so many of us as we put our own hearts and souls into building Mozilla and building the web. I was truly excited to see Brendan step into the role of CEO two weeks ago. And, today, I am equally sad.

It’s important to remember that all heroes are also human. They struggle. And they often have flaws. Brendan’s biggest flaw, IMHO, was his inability to connect and empathize with people. I’ve seen and felt that over the years, finding Bredan brilliant, but distant. And you certainly saw it this past week, as many calm and reasonable people said “Brendan, I want you to lead Mozilla. But I also want you to feel my pain.” Brendan didn’t need to change his mind on Proposition 8 to get out of the crisis of the past week. He simply needed to project and communicate empathy. His failure to do so proved to be his fatal flaw as CEO.

I would argue that Mozilla is filled with heroes. Thousands of them. All of them very human, just like Brendan. In the past week, I’ve spent every waking hour with these heroes. And I have watched them struggle. I’ve watched Mitchell struggle with how to protect the soul and spirit of a global community that is filled with passion, dreams, tensions and contradictions. I’ve watched the boards struggle with how to govern something that is at once a global social movement, a valuable consumer brand and a company based in the State of California. I have watched dozens and dozens of Mozillians reflect — and sometimes lash out — as they struggle to figure out what it means to be an individual contributor or leader inside this complex organism. And I myself have struggled with how to help Mozillians sort through all this complexity and messiness. Being human is messy. That is Mozilla.

As I look at the world’s reaction to all this, I want to clarify two things:

1. Brendan Eich was not fired. He struggled to connect and empathize with people who both respect him and felt hurt. He also got beat up. We all tried to protect him and help him get around these challenges until the very last hours. But, ultimately, I think Brendan found it impossible to lead under these circumstances. It was his choice to step down. And, frankly, I don’t blame him. I would have done the same.

2. This story is actually not about Brendan Eich. Of course, the critics and the media have made this a story about Brendan and his beliefs. But, as someone intimately involved in the evens of the past week, I would say in earnest: this is a story about Mozilla finding its soul and its spirit again. Over the past three years, we’ve become better at being A Company. I would argue we’ve also become worse at being Mozilla. We’ve become worse at caring for each other. Worse at holding the space for difference. Worse at working in the open. And worse at creating the space where we all can lead. These are the things that make Mozilla Mozilla. And they are the things we did not have enough of to properly find our way out of the crisis of the past two weeks.

Before getting into this kerfuffle, we were working on the right things. We were building a phone that will truly bring the web into the hands and the pockets of the world. We were teaching the world how the web works. And we were standing up against those who want to break the web or turn into a way to watch what each of us do every day. Those are still the things we need to be doing. And we need to start doing them again on Monday.

What we also need to do is start a process of rebirth and renewal. We need to find our soul and our spirit. The good news: Mozillians know how to do this. We know how to make a phoenix rise from the ashes. That is what we must now do.

Filed under: mozilla, webmakers

April 04, 2014 04:52 PM

April 03, 2014

Joshua Cranmer (jcranmer)

If you want fast code, don't use assembly

…unless you're an expert at assembly, that is. The title of this post was obviously meant to be an attention-grabber, but it is much truer than you might think: poorly-written assembly code will probably be slower than an optimizing compiler on well-written code (note that you may need to help the compiler along for things like vectorization). Now why is this?

Modern microarchitectures are incredibly complex. A modern x86 processor will be superscalar and use some form of compilation to microcode to do that. Desktop processors will undoubtedly have multiple instruction issues per cycle, forms of register renaming, branch predictors, etc. Minor changes—a misaligned instruction stream, a poor order of instructions, a bad instruction choice—could kill the ability to take advantages of these features. There are very few people who could accurately predict the performance of a given assembly stream (I myself wouldn't attempt it if the architecture can take advantage of ILP), and these people are disproportionately likely to be working on compiler optimizations. So unless you're knowledgeable enough about assembly to help work on a compiler, you probably shouldn't be hand-coding assembly to make code faster.

To give an example to elucidate this point (and the motivation for this blog post in the first place), I was given a link to an implementation of the N-queens problem in assembly. For various reasons, I decided to use this to start building a fine-grained performance measurement system. This system uses a high-resolution monotonic clock on Linux and runs the function 1000 times to warm up caches and counters and then runs the function 1000 more times, measuring each run independently and reporting the average runtime at the end. This is a single execution of the system; 20 executions of the system were used as the baseline for a t-test to determine statistical significance as well as visual estimation of normality of data. Since the runs observed about a constant 1-2 μs of noise, I ran all of my numbers on the 10-queens problem to better separate the data (total runtimes ended up being in the range of 200-300μs at this level). When I say that some versions are faster, the p-values for individual measurements are on the order of 10-20—meaning that there is a 1-in-100,000,000,000,000,000,000 chance that the observed speedups could be produced if the programs take the same amount of time to run.

The initial assembly version of the program took about 288μs to run. The first C++ version I coded, originating from the same genesis algorithm that the author of the assembly version used, ran in 275μs. A recursive program beat out a hand-written assembly block of code... and when I manually converted the recursive program into a single loop, the runtime improved to 249μs. It wasn't until I got rid of all of the assembly in the original code that I could get the program to beat the derecursified code (at 244μs)—so it's not the vectorization that's causing the code to be slow. Intrigued, I started to analyze why the original assembly was so slow.

It turns out that there are three main things that I think cause the slow speed of the original code. The first one is alignment of branches: the assembly code contains no instructions to align basic blocks on particular branches, whereas gcc happily emits these for some basic blocks. I mention this first as it is mere conjecture; I never made an attempt to measure the effects for myself. The other two causes are directly measured from observing runtime changes as I slowly replaced the assembly with code. When I replaced the use of push and pop instructions with a global static array, the runtime improved dramatically. This suggests that the alignment of the stack could be to blame (although the stack is still 8-byte aligned when I checked via gdb), which just goes to show you how much alignments really do matter in code.

The final, and by far most dramatic, effect I saw involves the use of three assembly instructions: bsf (find the index of the lowest bit that is set), btc (clear a specific bit index), and shl (left shift). When I replaced the use of these instructions with a more complicated expression int bit = x & -x and x = x - bit, the program's speed improved dramatically. And the rationale for why the speed improved won't be found in latency tables, although those will tell you that bsf is not a 1-cycle operation. Rather, it's in minutiae that's not immediately obvious.

The original program used the fact that bsf sets the zero flag if the input register is 0 as the condition to do the backtracking; the converted code just checked if the value was 0 (using a simple test instruction). The compare and the jump instructions are basically converted into a single instruction in the processor. In contrast, the bsf does not get to do this; combined with the lower latency of the instruction intrinsically, it means that empty loops take a lot longer to do nothing. The use of an 8-bit shift value is also interesting, as there is a rather severe penalty for using 8-bit registers in Intel processors as far as I can see.

Now, this isn't to say that the compiler will always produce the best code by itself. My final code wasn't above using x86 intrinsics for the vector instructions. Replacing the _mm_andnot_si128 intrinsic with an actual and-not on vectors caused gcc to use other, slower instructions instead of the vmovq to move the result out of the SSE registers for reasons I don't particularly want to track down. The use of the _mm_blend_epi16 and _mm_srli_si128 intrinsics can probably be replaced with __builtin_shuffle instead for more portability, but I was under the misapprehension that this was a clang-only intrinsic when I first played with the code so I never bothered to try that, and this code has passed out of my memory long enough that I don't want to try to mess with it now.

In short, compilers know things about optimizing for modern architectures that many general programmers don't. Compilers may have issues with autovectorization, but the existence of vector intrinsics allow you to force compilers to use vectorization while still giving them leeway to make decisions about instruction scheduling or code alignment which are easy to screw up in hand-written assembly. Also, compilers are liable to get better in the future, whereas hand-written assembly code is unlikely to get faster in the future. So only write assembly code if you really know what you're doing and you know you're better than the compiler.

April 03, 2014 04:52 PM

March 30, 2014

Mark Surman (surman)

Mozilla is messy

I was a anarchist, lefty, peace movementy punk teenager. I spent my 20s making documentaries with the environmental collective. And the feminist collective. And whoever else I could teach to use a video camera. During my 30s I co-founded Canada’s most popular  left wing news web site, I’ve spent all my life being active and public about the causes I believe in.

Needless to say, people have asked how I could work side by side with people like Brendan Eich who seem to have such different beliefs and politics.

For the record: I don’t like the fact that Brendan supported Proposition 8, and I stand strongly for gay marriage. And, while I don’t actually know what Brendan’s politics are, as he is normally quite private about them, I have always assumed they are very different from mine on a wide range of issues.

But Brendan and I very aligned on something very political: defending the free and open internet. This is one of the most important issues of  our day. And it is what Brendan and I — and countless other Mozillians around — the world are working on together.

We live in a great moment in history, where more people than ever before can express themselves wildly and creatively, thanks to the web. As a punk rock kid, it’s more than I could have ever dreamed of. But we also live in a world where Big Brother is among  us and around us. There are governments and companies using the web to  watch us and control us. That’s happening more and more. As billions more people roll onto the web in the next few years, we really are at a critical crossroads for humanity: we have to decide if the web is about freedom or control.

What’s amazing is Mozillians, including me, work side by side with people who have very different beliefs — even beliefs that upset them — because protecting the web matters so much right now. As we do, differing beliefs on issues other than the web almost never enter the fray. In Brendan’s case, I didn’t even know he had taken a position on Prop 8 until it became the subject of public attention. And, as someone with public views that are likely different than his, I’ve never experienced him as anything other than a supportive and hardworking colleague. This ability to set aside differing and diverse beliefs to focus on a common cause is something we as Mozilla stand for on principle. And, in a way that I have never seen in any other organization, this works at Mozilla. It makes us stronger.

But Mozilla is messy. Our ability to set aside differences does not mean that everything is simple or that we’re always civil. In fact, when the topic is the web or Mozilla itself, we quite often get into open, heated and, for the most part, thoughtful debate. Many people outside Mozilla may not understand this. But, again, it makes us stronger.

The community dialogue around Brendan’s appointment is an example of this. People have reflected on the tension feeling of being emotionally effected by Brendan’s donation while at the same time experiencing Mozilla as a supportive and safe space for all sorts of people to work and protect the web. People have said personal opinions and donations are not their business. [corrected link] People have said that a CEO has to be held to a different standard as they are a public figure. People have talked about the tension between inclusion and free speech. And people have talked about forgiveness, the benefit of the doubt and picking your battles. All of this in public on blogs and twitter. All of this, in my opinion, making us stronger as we work through the complex questions at hand.

Which brings me to a worry I have. And an ask.

I worry that Mozilla is in a tough spot right now. I worry that we do a bad job of explaining ourselves, that people are angry and don’t know who we are or where we stand. And, I worry that in the time it takes to work this through and explain ourselves the things I love about Mozilla will be deeply damaged. And I suspect others do to.

If you are a Mozillian, I ask that you help the people around you understand who we are. And, if you have supported Mozilla in the past are frustrated or angry with us, I ask you for kindness and patience.

What Mozilla is about is working through these things, even when they’re hard. Because the web need us to. It’s that important.

Filed under: mozilla, openweb, personal, webmakers

March 30, 2014 11:44 AM

March 24, 2014

Mark Surman (surman)

Excited about Eich

I’m excited that Brendan Eich is Mozilla’s CEOBrendan knows what’s important right now: building the values of the web into mobile and into the cloud at a massive scale. This vision is key to our success. But Brendan also offers something else: a real example of how we can each roll up our sleeves to tackle the hard, messy problems that we need to solve if we want to make this vision into a reality.

Brendan Graffiti

When I first came to Mozilla, I was a little starstruck. Here was an organization that had rallied a global community to build Firefox and beat Microsoft. An organization that had made open source — and many of the ideas behind it — mainstream. An organization filled with internet rock stars like Brendan. People who have changed the world, for real.

What I realized quickly: Mozillians are just everyday people. What makes them special is their particular ideas about how to make things happen in the world. Ideas like: build your vision and values into products that lots of people will want to use. And, find leverage, create standards and nurture relationships that help these products shape whole industries. And, do these things by empowering people and inviting them to help you out. Mozilla has won in the past because its people had been smart, tenacious and committed to this particular brand of poetry and pragmatism.

Working with Brendan over the years, I saw someone who everyday personified this Mozilla way of working. Think about Javascript as an example. Most of you probably know that Brendan invented Javascript. But what you might not know is how hard he worked worked internally at Netscape and Mozilla to put Javascript into products that millions of people would use. Or how he collaborated with competitors like Microsoft and Google. Or how he’s worked on standards. Or how much time he’s taken to speak to and inspire developers at events on every corner of the planet. Code matters in the history of Javascript. But so does leadership, business and alliance building. Brendan worked on all of these things over the years, turning Javascript from an idea into a programming language that is used widely and freely everywhere around the world to build the web.

In my opinion, this way of working is something we need more of at Mozilla right now. See the big picture. Roll up our sleeves. Pick the right battles. Make the right compromises. Figure out all the different kinds of things we need to do to win. And help each other out to get these things done.

I truly think Brendan can help us all do more of this right now. He’s already started with Firefox OS, not only leading us to build a compelling product in a few short years but also playing an active role in helping carriers and device manufacturers become partners contributing to Mozilla’s cause. Like always, Brendan has helped us build not just a product but also platforms and relationships that have the potential to shift the whole playing field. As we tackle everything from cloud services to building up new revenue lines to teaching 100s millions of people about the web, I believe Brendan can help each be smart, tenacious, practical and open as we work on solving our particular piece of the open web puzzle.

I’m feeling hopeful and inspired today. I’m looking forward to working closely with Brendan as Mozilla’s new CEO. And to rolling up my sleeves even further to build the web we’re dreaming of. I hope you want to do the same.

Filed under: mozilla, opensource, webmakers

March 24, 2014 06:00 PM

March 21, 2014

Benjamin Smedberg (bsmedberg)

Using Software Copyright To Benefit the Public

Imagine a world where copyright on a piece of software benefits the world even after it expires. A world where eventually all software becomes Free Software.

The purpose of copyright is “To promote the Progress of Science and useful Arts”. The law gives a person the right to profit from their creation for a while, after which everyone gets to profit from it freely. In general, this works for books, music, and other creative works. The current term of copyright is far too long, but at least once the term is up, the whole world gets to read and love Shakespeare or Walter de la Mare equally.

The same is not true of software. In order to be useful, software has to run. Imagine the great commercial software of the past decade: Excel, Photoshop, Pagemaker. Even after copyright expires on Microsoft Excel 95 (in 2090!), nobody will be able to run it! Hardware that can run Windows 95 will not be available, and our only hope of running the software is to emulate the machines and operating systems of a century ago. There will be no opportunity to fix or improve the software.

What should we reasonably require from commercial software producers in exchange for giving them copyright protection?

The code.

In order to get any copyright protection at all, publishers should be required to make the source code available. This can either happen immediately at release, or by putting the code into escrow until copyright expires. This needs to include everything required to build the program and make it run, but since the same copyright rules would apply to operating systems and compilers, it ought to all just work.

The copyright term for software also needs to be rethought. The goal when setting a copyright term should be to balance the competing desires of giving a software author time to make money by selling software, with the natural rights of people to share ideas and use and modify their own tools.

With a term of 14 years, the following software would be leaving copyright protection around now:

A short copyright term is an incentive to software developers to constantly improve their software, and make the new versions of their software more valuable than older versions which are entering the public domain. It also opens the possibility for other companies to support old software even after the original author has decided that it isn’t worthwhile.

The European Union is currently holding a public consultation to review their copyright laws, and I’ve encouraged Mozilla to propose source availability and a shorter copyright term for software in our official contribution/proposal to that process. Maybe eventually the U.S. Congress could be persuaded to make such significant changes to copyright law, although recent history and powerful money and lobbyists make that difficult to imagine.

Commercial copyrighted software has done great things, and there will continue to be an important place in the world for it. Instead of treating the four freedoms as ethical absolutes and treating non-Free software as a “social problem”, let’s use copyright law to, after a period of time, make all software Free Software.

March 21, 2014 05:43 PM

March 14, 2014

Joshua Cranmer (jcranmer)

Understanding email charsets

Several years ago, I embarked on a project to collect the headers of all the messages I could reach on NNTP, with the original intent of studying the progression of the most common news clients. More recently, I used this dataset to attempt to discover the prevalence of charsets in email messages. In doing so, I identified a critical problem with the dataset: since it only contains headers, there is very little scope for actually understanding the full, sad story of charsets. So I've decided to rectify this problem.

This time, I modified my data-collection scripts to make it much easier to mass-download NNTP messages. The first script effectively lists all the newsgroups, and then all the message IDs in those newsgroups, stuffing the results in a set to remove duplicates (cross-posts). The second script uses Python's nntplib package to attempt to download all of those messages. Of the 32,598,261 messages identified by the first set, I succeeded in obtaining 1,025,586 messages in full or in part. Some messages failed to download due to crashing nntplib (which appears to be unable to handle messages of unbounded length), and I suspect my newsserver connections may have just timed out in the middle of the download at times. Others failed due to expiring before I could download them. All in all, 19,288 messages were not downloaded.

Analysis of the contents of messages were hampered due to a strong desire to find techniques that could mangle messages as little as possible. Prior experience with Python's message-parsing libraries lend me to believe that they are rather poor at handling some of the crap that comes into existence, and the errors in nntplib suggest they haven't fixed them yet. The only message parsing framework I truly trust to give me the level of finess is the JSMime that I'm writing, but that happens to be in the wrong language for this project. After reading some blog posts of Jeffrey Stedfast, though, I decided I would give GMime a try instead of trying to rewrite ad-hoc MIME parser #N.

Ultimately, I wrote a program to investigate the following questions on how messages operate in practice:

While those were the questions I seeked the answers to originally, I did come up with others as I worked on my tool, some in part due to what information I was basically already collecting. The tool I wrote primarily uses GMime to convert the body parts to 8-bit text (no charset conversion), as well as parse the Content-Type headers, which are really annoying to do without writing a full parser. I used ICU to handle charset conversion and detection. RFC 2047 decoding is done largely by hand since I needed very specific information that I couldn't convince GMime to give me. All code that I used is available upon request; the exact dataset is harder to transport, given that it is some 5.6GiB of data.

Other than GMime being built on GObject and exposing a C API, I can't complain much, although I didn't try to use it to do magic. Then again, in my experience (and as this post will probably convince you as well), you really want your MIME library to do charset magic for you, so in doing well for my needs, it's actually not doing well for a larger audience. ICU's C API similarly makes me want to complain. However, I'm now very suspect of the quality of its charset detection code, which is the main reason I used it. Trying to figure out how to get it to handle the charset decoding errors also proved far more annoying than it really should.

Some final background regards the biases I expect to crop up in the dataset. As the approximately 1 million messages were drawn from the python set iterator, I suspect that there's no systematic bias towards or away from specific groups, excepting that the ~11K messages found in the eternal-september.* hierarchy are completely represented. The newsserver I used, Eternal September, has a respectably large set of newsgroups, although it is likely to be biased towards European languages and under-representing East Asians. The less well-connected South America, Africa, or central Asia are going to be almost completely unrepresented. The download process will be biased away towards particularly heinous messages (such as exceedingly long lines), since nntplib itself is failing.

This being news messages, I also expect that use of 8-bit will be far more common than would be the case in regular mail messages. On a related note, the use of 8-bit in headers would be commensurately elevated compared to normal email. What would be far less common is HTML. I also expect that undeclared charsets may be slightly higher.


Charset data is mostly collected on the basis of individual body parts within body messages; some messages have more than one. Interestingly enough, the 1,025,587 messages yielded 1,016,765 body parts with some text data, which indicates that either the messages on the server had only headers in the first place or the download process somehow managed to only grab the headers. There were also 393 messages that I identified having parts with different charsets, which only further illustrates how annoying charsets are in messages.

The aliases in charsets are mostly uninteresting in variance, except for the various labels used for US-ASCII (us - ascii, 646, and ANSI_X3.4-1968 are the less-well-known aliases), as well as the list of charsets whose names ICU was incapable of recognizing, given below. Unknown charsets are treated as equivalent to undeclared charsets in further processing, as there were too few to merit separate handling (45 in all).

For the next step, I used ICU to attempt to detect the actual charset of the body parts. ICU's charset detector doesn't support the full gamut of charsets, though, so charset names not claimed to be detected were instead processed by checking if they decoded without error. Before using this detection, I detect if the text is pure ASCII (excluding control characters, to enable charsets like ISO-2022-JP, and +, if the charset we're trying to check is UTF-7). ICU has a mode which ignores all text in things that look like HTML tags, and this mode is set for all HTML body parts.

I don't quite believe ICU's charset detection results, so I've collapsed the results into a simpler table to capture the most salient feature. The correct column indicates the cases where the detected result was the declared charset. The ASCII column captures the fraction which were pure ASCII. The UTF-8 column indicates if ICU reported that the text was UTF-8 (it always seems to try this first). The Wrong C1 column refers to an ISO-8859-1 text being detected as windows-1252 or vice versa, which is set by ICU if it sees or doesn't see an octet in the appropriate range. The other column refers to all other cases, including invalid cases for charsets not supported by ICU.

DeclaredCorrectASCIIUTF-8 Wrong C1OtherTotal

The most obvious thing shown in this table is that the most common charsets remain ISO-8859-1, Windows-1252, US-ASCII, UTF-8, and ISO-8859-15, which is to be expected, given an expected prior bias to European languages in newsgroups. The low prevalence of ISO-2022-JP is surprising to me: it means a lower incidence of Japanese than I would have expected. Either that, or Japanese have switched to UTF-8 en masse, which I consider very unlikely given that Japanese have tended to resist the trend towards UTF-8 the most.

Beyond that, this dataset has caused me to lose trust in the ICU charset detectors. KOI8-R is recorded as being 18% malformed text, with most of that ICU believing to be ISO-8859-1 instead. Judging from the results, it appears that ICU has a bias towards guessing ISO-8859-1, which means I don't believe the numbers in the Other column to be accurate at all. For some reason, I don't appear to have decoders for ISO-8859-16 or x-mac-croatian on my local machine, but running some tests by hand appear to indicate that they are valid and not incorrect.

Somewhere between 0.1% and 1.0% of all messages are subject to mojibake, depending on how much you trust the charset detector. The cases of UTF-8 being misdetected as non-UTF-8 could potentially be explained by having very few non-ASCII sequences (ICU requires four valid sequences before it confidently declares text UTF-8); someone who writes a post in English but has a non-ASCII signature (such as myself) could easily fall into this category. Despite this, however, it does suggest that there is enough mojibake around that users need to be able to override charset decisions.

The undeclared charsets are described, in descending order of popularity, by ISO-8859-1, Windows-1252, KOI8-R, ISO-8859-2, and UTF-8, describing 99% of all non-ASCII undeclared data. ISO-8859-1 and Windows-1252 are probably over-counted here, but the interesting tidbit is that KOI8-R is used half as much undeclared as it is declared, and I suspect it may be undercounted. The practice of using locale-default fallbacks that Thunderbird has been using appears to be the best way forward for now, although UTF-8 is growing enough in popularity that using a specialized detector that decodes as UTF-8 if possible may be worth investigating (3% of all non-ASCII, undeclared messages are UTF-8).


Unsuprisingly (considering I'm polling newsgroups), very few messages contained any HTML parts at all: there were only 1,032 parts in the total sample size, of which only 552 had non-ASCII characters and were therefore useful for the rest of this analysis. This means that I'm skeptical of generalizing the results of this to email in general, but I'll still summarize the findings.

HTML, unlike plain text, contains a mechanism to explicitly identify the charset of a message. The official algorithm for determining the charset of an HTML file can be described simply as "look for a <meta> tag in the first 1024 bytes. If it can be found, attempt to extract a charset using one of several different techniques depending on what's present or not." Since doing this fully properly is complicated in library-less C++ code, I opted to look first for a <meta[ \t\r\n\f] production, guess the extent of the tag, and try to find a charset= string somewhere in that tag. This appears to be an approach which is more reflective of how this parsing is actually done in email clients than the proper HTML algorithm. One difference is that my regular expressions also support the newer <meta charset="UTF-8"/> construct, although I don't appear to see any use of this.

I found only 332 parts where the HTML declared a charset. Only 22 parts had a case where both a MIME charset and an HTML charset and the two disagreed with each other. I neglected to count how many messages had HTML charsets but no MIME charsets, but random sampling appeared to indicate that this is very rare on the data set (the same order of magnitude or less as those where they disagreed).

As for the question of who wins: of the 552 non-ASCII HTML parts, only 71 messages did not have the MIME type be the valid charset. Then again, 71 messages did not have the HTML type be valid either, which strongly suggests that ICU was detecting the incorrect charset. Judging from manual inspection of such messages, it appears that the MIME charset ought to be preferred if it exists. There are also a large number of HTML charset specifications saying unicode, which ICU treats as UTF-16, which is most certainly wrong.


In the data set, 1,025,856 header blocks were processed for the following statistics. This is slightly more than the number of messages since the headers of contained message/rfc822 parts were also processed. The good news is that 97% (996,103) headers were completely ASCII. Of the remaining 29,753 headers, 3.6% (1,058) were UTF-8 and 43.6% (12,965) matched the declared charset of the first body part. This leaves 52.9% (15,730) that did not match that charset, however.

Now, NNTP messages can generally be expected to have a higher 8-bit header ratio, so this is probably exaggerating the setup in most email messages. That said, the high incidence is definitely an indicator that even non-EAI-aware clients and servers cannot blindly presume that headers are 7-bit, nor can EAI-aware clients and servers presume that 8-bit headers are UTF-8. The high incidence of mismatching the declared charset suggests that fallback-charset decoding of headers is a necessary step.

RFC 2047 encoded-words is also an interesting statistic to mine. I found 135,951 encoded-words in the data set, which is rather low, considering that messages can be reasonably expected to carry more than one encoded-word. This is likely an artifact of NNTP's tendency towards 8-bit instead of 7-bit communication and understates their presence in regular email.

Counting encoded-words can be difficult, since there is a mechanism to let them continue in multiple pieces. For the purposes of this count, a sequence of such words count as a single word, and I indicate the number of them that had more than one element in a sequence in the Continued column. The 2047 Violation column counts the number of sequences where decoding words individually does not yield the same result as decoding them as a whole, in violation of RFC 2047. The Only ASCII column counts those words containing nothing but ASCII symbols and where the encoding was thus (mostly) pointless. The Invalid column counts the number of sequences that had a decoder error.

CharsetCountContinued2047 ViolationOnly ASCIIInvalid

This table somewhat mirrors the distribution of regular charsets, with one major class of differences: charsets that represent non-Latin scripts (particularly Asian scripts) appear to be overdistributed compared to their corresponding use in body parts. The exception to this rule is GB2312 which is far lower than relative rankings would presume—I attribute this to people using GB2312 being more likely to use 8-bit headers instead of RFC 2047 encoding, although I don't have direct evidence.

Clearly continuations are common, which is to be relatively expected. The sad part is how few people bother to try to adhere to the specification here: out of 14,312 continuations in languages that could violate the specification, 23.5% of them violated the specification. The mode-shifting versions (ISO-2022-JP and EUC-KR) are basically all violated, which suggests that no one bothered to check if their encoder "returns to ASCII" at the end of the word (I know Thunderbird's does, but the other ones I checked don't appear to).

The number of invalid UTF-8 decoded words, 26.7%, seems impossibly high to me. A brief check of my code indicates that this is working incorrectly in the face of invalid continuations, which certainly exaggerates the effect but still leaves a value too high for my tastes. Of more note are the elevated counts for the East Asian charsets: Big5, GB2312, and ISO-2022-JP. I am not an expert in charsets, but I belive that Big5 and GB2312 in particular are a family of almost-but-not-quite-identical charsets and it may be that ICU is choosing the wrong candidate of each family for these instances.

There is a surprisingly large number of encoded words that encode only ASCII. When searching specifically for the ones that use the US-ASCII charset, I found that these can be divided into three categories. One set comes from a few people who apparently have an unsanitized whitespace (space and LF were the two I recall seeing) in the display name, producing encoded words like =?us-ascii?Q?=09Edward_Rosten?=. Blame 40tude Dialog here. Another set encodes some basic characters (most commonly = and ?, although a few other interpreted characters popped up). The final set of errors were double-encoded words, such as =?us-ascii?Q?=3D=3FUTF-8=3FQ=3Ff=3DC3=3DBCr=3F=3D?=, which appear to be all generated by an Emacs-based newsreader.

One interesting thing when sifting the results is finding the crap that people produce in their tools. By far the worst single instance of an RFC 2047 encoded-word that I found is this one: Subject: Re: [Kitchen Nightmares] Meow! Gordon Ramsay Is =?ISO-8859-1?B?UEgR lqZ VuIEhlYWQgVH rbGeOIFNob BJc RP2JzZXNzZW?= With My =?ISO-8859-1?B?SHVzYmFuZ JzX0JhbGxzL JfU2F5c19BbXiScw==?= Baking Company Owner (complete with embedded spaces), discovered by crashing my ad-hoc base64 decoder (due to the spaces). The interesting thing is that even after investigating the output encoding, it doesn't look like the text is actually correct ISO-8859-1... or any obvious charset for that matter.

I looked at the unknown charsets by hand. Most of them were actually empty charsets (looked like =??B?Sy4gSC4gdm9uIFLDvGRlbg==?=), and all but one of the outright empty ones were generated by KNode and really UTF-8. The other one was a Windows-1252 generated by a minor newsreader.

Another important aspect of headers is how to handle 8-bit headers. RFC 5322 blindly hopes that headers are pure ASCII, while RFC 6532 dictates that they are UTF-8. Indeed, 97% of headers are ASCII, leaving just 29,753 headers that are not. Of these, only 1,058 (3.6%) are UTF-8 per RFC 6532. Deducing which charset they are is difficult because the large amount of English text for header names and the important control values will greatly skew any charset detector, and there is too little text to give a charset detector confidence. The only metric I could easily apply was testing Thunderbird's heuristic as "the header blocks are the same charset as the message contents"—which only worked 45.2% of the time.


While developing an earlier version of my scanning program, I was intrigued to know how often various content transfer encodings were used. I found 1,028,971 parts in all (1,027,474 of which are text parts). The transfer encoding of binary did manage to sneak in, with 57 such parts. Using 8-bit text was very popular, at 381,223 samples, second only to 7-bit at 496,114 samples. Quoted-printable had 144,932 samples and base64 only 6,640 samples. Extremely interesting are the presence of 4 illegal transfer encodings in 5 messages, two of them obvious typos and the others appearing to be a client mangling header continuations into the transfer-encoding.


So, drawing from the body of this data, I would like to make the following conclusions as to using charsets in mail messages:

  1. Have a fallback charset. Undeclared charsets are extremely common, and I'm skeptical that charset detectors are going to get this stuff right, particularly since email can more naturally combine multiple languages than other bodies of text (think signatures). Thunderbird currently uses a locale-dependent fallback charset, which roughly mirrors what Firefox and I think most web browsers do.
  2. Let users override charsets when reading. On a similar token, mojibake text, while not particularly common, is common enough to make declared charsets sometimes unreliable. It's also possible that the fallback charset is wrong, so users may need to override the chosen charset.
  3. Testing is mandatory. In this set of messages, I found base64 encoded words with spaces in them, encoded words without charsets (even UNKNOWN-8BIT), and clearly invalid Content-Transfer-Encodings. Real email messages that are flagrantly in violation of basic spec requirements exist, so you should make sure that your email parser and client can handle the weirdest edge cases.
  4. Non-UTF-8, non-ASCII headers exist. EAI not withstanding, 8-bit headers are a reality. Combined with a predilection for saying ASCII when text is really ASCII, this means that there is often no good in-band information to tell you what charset is correct for headers, so you have to go back to a fallback charset.
  5. US-ASCII really means ASCII. Email clients appear to do a very good job of only emitting US-ASCII as a charset label if it's US-ASCII. The sample size is too small for me to grasp what charset 8-bit characters should imply in US-ASCII.
  6. Know your decoders. ISO-8859-1 actually means Windows-1252 in practice. Big5 and GB1232 are actually small families of charsets with slightly different meanings. ICU notably disagrees with some of these realities, so be sure to include in your tests various charset edge cases so you know that the decoders are correct.
  7. UTF-7 is still relevant. Of the charsets I found not mentioned in the WHATWG encoding spec, IBM437 and x-mac-croatian are in use only due to specific circumstances that limit their generalizable presence. IBM850 is too rare. UTF-7 is common enough that you need to actually worry about it, as abominable and evil a charset it is.
  8. HTML charsets may matter—but MIME matters more. I don't have enough data to say if charsets declared in HTML are needed to do proper decoding. I do have enough to say fairly conclusively that the MIME charset declaration is authoritative if HTML disagrees.
  9. Charsets are not languages. The entire reason x-mac-croatian is used at all can be traced to Thunderbird displaying the charset as "Croatian," despite it being pretty clearly not a preferred charset. Similarly most charsets are often enough ASCII that, say, an instance of GB2312 is a poor indicator of whether or not the message is in English. Anyone trying to filter based on charsets is doing a really, really stupid thing.
  10. RFCs reflect an ideal world, not reality. This is most notable in RFC 2047: the specification may state that encoded words are supposed to be independently decodable, but the evidence is pretty clear that more clients break this rule than uphold it.
  11. Limit the charsets you support. Just because your library lets you emit a hundred charsets doesn't mean that you should let someone try to do it. You should emit US-ASCII or UTF-8 unless you have a really compelling reason not to, and those compelling reasons don't require obscure charsets. Some particularly annoying charsets should never be written: EBCDIC is already basically dead on the web, and I'd like to see UTF-7 die as well.

When I have time, I'm planning on taking some of the more egregious or interesting messages in my dataset and packaging them into a database of emails to help create testsuites on handling messages properly.

March 14, 2014 04:17 AM

March 12, 2014

Mark Surman (surman)

Happy birthday world wide web. I love you. And want to keep you free.

As my business card says, I have an affection for the world wide the web. And, as the web turns 25 this week, I thought it only proper to say to the web ‘I love you’ and ‘I want to keep you free’.

Web heart pic

From its beginning, the web has been a force for innovation and education, reshaping the way we interact with the world around us. Interestingly, the original logo and tag line for the web was ‘let’s share what we know! — which is what billions of us have now done.

As we have gone online to connect and share, the web has revolutionized how we work, live and love: it has brought friends and families closer even when they are far away; it has decentralized once closed and top-down industries; it has empowered citizens to pursue democracy and freedom. It has become a central building block for all that we do.

Yet, on its 25 birthday, the web is at an inflection point.

Despite its positive impact, too many of us don’t understand its basic mechanics, let alone its culture or what it means to be a citizen of the web. Mobile, the platform through which the next billion users will join the web, is increasingly closed, not allowing the kind of innovation and sharing that has made the world wide web such a revolutionary force in in the first place. And, in many parts of the world, the situation is made worse by governments who censor the web or use the web surveil people at a massive scale, undermining the promise of the web as an open and trusted resource for all of humanity.

Out of crisis comes opportunity. As the Web turns 25, let us all say to the web: ‘I love you’ and ‘I want to keep you free’. Let’s take the time to reflect not just on the web we have, but on the web we want.

Mozilla believes the web needs to be both open and trusted. We believe that users should be able to control how their private information is used. And we believe that the web is not a one-way platform — it should give as all a chance makers, not just consumers. Making this web means we all need access to an open network, we all need software that is open and puts us in control and we all need to be literate in the technology and culture of the web. The Mozilla community around the world stands for all these things.

So on the web’s 25 birthday, we are joining with the Web at 25 campaign and the Web We Want campaign to enable and amplify the voice of the Internet community. We encourage you to visit to sign a birthday card for the web and visit our interactive quilt to share your vision for the type of web you want.

Happy Birthday to the web — and to all of us who are on and in it!


ps: Also, check out these birthday wishes for the web from Mozilla’s Mitchell Baker and Brendan Eich.

pps: Here’s the quilt: the web I want enables everyone around the world to be a maker. Add yourself.


Filed under: drumbeat, mozilla, openweb, webmakers

March 12, 2014 05:06 PM

March 10, 2014

Benjamin Smedberg (bsmedberg)

Use -debugexe to debug apps in Visual Studio

Many people don’t know about how awesome the windows debuggers are. I recently got a question from a volunteer mentee: he was experiencing a startup crash in Firefox and he wanted to know how to get the debugger attached to Firefox before the crash.

On other systems, I’d say to use mach debug, but that currently doesn’t do useful things on Windows. But it’s still pretty simple. You have two options:

Debug Using Your IDE

Both Visual Studio and Visual C++ Express have a command-line option for launching the IDE ready for debugging.

devenv.exe -debugexe obj-ff-debug/dist/bin/firefox.exe -profile /c/builds/test-profile -no-remote

The -debugexe flag informs the IDE to load your Firefox build with the command lines you specify. Firefox will launch with the “Go” command (F5).

For Visual C++ express edition, run WDExpress.exe instead of devenv.exe.

Debug Using Windbg

windbg is a the Windows command-line debugger. As with any command-line debugger it has an arcane debugging syntax, but it is very powerful.

Launching Firefox with windbg doesn’t require any flags at all:

windbg.exe obj-ff-debug/dist/bin/firefox.exe -profile /c/builds/test-profile -no-remote

Debugging Firefox Release Builds

You can also debug Firefox release builds on Windows! Mozilla runs a symbol server that allows you to automatically download the debugging symbols for recent prerelease builds (I think we keep 30 days of nightly/aurora symbols) and all release builds. See the Mozilla Developer Network article for detailed instructions.

Debugging official builds can be a bit confusing due to inlining, reordering, and other compiler optimizations. I often find myself looking at the disassembly view of a function rather than the source view in order to understand what exactly is going on. Also note that if you are planning on debugging a release build, you probably want to disable automatic crash reporting by setting MOZ_CRASHREPORTER_DISABLE=1 in your environment.

March 10, 2014 02:28 PM

February 27, 2014

Mark Surman (surman)

A $2.75M Challenge to Create a More Open Internet

When I first joined Mozilla I was blown away by its mix of poetry and pragmatism.

Mozilla started with a very basic question: How can we create a web that is more open? In answer to that question we witnessed the growth of a global community of Mozillians around the world who have an ingrained instinct to roll up their sleeves, learn from and teach each other, and build the solution to a problem.

Mozilla and Knight Challenge

The Mozilla community fundamentally understands that in order to affect change, people need to be makers, not just consumers.

After all, it’s this instinct to teach and build that enabled us to develop Firefox: 10 paid staff and a global army of volunteers contributed to the creation of a browser that beat out the biggest software company in the world. And we didn’t just build a better browser — we demonstrated the power of teaching. Millions of us installed Firefox on our friends’ and families’ computers—and explained what Mozilla stood for as we did this.

Fifteen years later, the world faces different problems.

In this new, post-Snowden era, it feels like everyone is talking about — and craving — a more open and trustworthy internet. The world has a better understanding of how people are being tracked online, and of the delicate balance between privacy and a world where we are all constantly connected. We live our lives online, and store everything in the cloud, yet our lives depend on technology and networks that too few users really understand.

And so, as Mozilla, we strive to answer the same fundamental questions that guided us at the beginning: how, today, can we strengthen an internet that is open and trustworthy? What is the next set of tools to enable people to protect their privacy? How best can we teach everyone basic web literacy skills like how to secure their information? Should we use our influence to ensure government and corporate policies help make the web more open and trustworthy? How do we do this work at scale, recognizing that billions of people will soon be online?

We’re trying to answer these questions in a number of ways, but we can’t do it alone. That’s why we are thrilled to join foces with the Knight Foundation and Ford Foundation on this year’s Knight News Challenge, which will fund $2.75M in grants to find and support great ideas.

In part, this new Knight Challenge aims to fund the technology, education programs and campaigns that will begin to satisfy the craving we have for a more open and trustworthy web. But, it also aims to spark a conversation around how to shape the next era of the web – an era where openness, security and innovation regain their status as elevated principles.

Mozillians bring our very practical ‘roll up our sleeves and build the future as we want to see it’ ethic and that is exactly what we need now. We’re the right ones to figure out how we can best leverage the mechanics, culture and citizenship of the web to keep it open.

So, let’s all commit to help out and participate in The Knight News Challenge. I’d love to see as many people as possible from the broader Mozilla community submit ideas for tech and propose new education programs.

Along with Knight and Ford, Mozilla is trying to build a wave of people creating the web we want. Let’s help start that wave.

This posting is also on the Knight Foundation blog.

Filed under: mozilla

February 27, 2014 09:22 PM

February 11, 2014

Benjamin Smedberg (bsmedberg)

Don’t Use Mozilla Persona to Secure High-Value Data

Mozilla Persona (formerly called Browser ID) is a login system that Mozilla has developed to make it better for users to sign in at sites without having to remember passwords. But I have seen a trend recently of people within Mozilla insisting that we should use Persona for all logins. This is a mistake: the security properties of Persona are simply not good enough to secure high-value data such as the Mozilla security bug database, user crash dumps, or other high-value information.

The chain of trust in Persona has several attack points:

The Public Key: HTTPS Fetch

When the user submits a login “assertion”, the website (Relying Party or RP) fetches the public key of the email provider (Identity Provider or IdP) using HTTPS. For instance, when I log in as, the site I’m logging into will fetch This relies on the public key and CA infrastructure of the internet. Attacking this part of the chain is hard because it’s the network connection between two servers. This doesn’t appear to be a significant risk factor to me except for perhaps some state actors.

The Public Key: Attacking the IdP HTTPS Server

Attacking the email provider’s web server, on the other hand, becomes a very high value proposition. If an attacker can replace the .well-known/browserid file on a major email provider (gmail, yahoo, etc) they have the ability to impersonate every user of that service. This puts a huge responsibility on email providers to monitor and secure their HTTPS site, which may not typically be part of their email system at all. It is likely that this kind of intrusion will cause signin problems across multiple users and will be detected, but there is no guarantee that individual users will be aware of the compromise of their accounts.

Signing: Accessing the IdP Signing System

Persona email providers can silently impersonate any of their users just by the nature of the protocol. This opens the door to silent identity attacks by anyone who can access the private key of the identity/email provider. This can either be subverting the signing server, or by using legal means such as subpoenas or national security letters. In these cases, the account compromise is almost completely undetectable by either the user or the RP.

What About Password-Reset Emails?

One common defense of Persona is that email providers already have access to users account via password-reset emails. This is partly true, but it ignores an essential property of these emails: when a password is reset, a user will be aware of the attack then next time they try to login. Being unable to login will likely trigger a cautious user to review the details of their account or ask for an audit. Attacks against the IdP, on the other hand, are silent and are not as likely to trigger alarm bells.

Who Should Use Persona?

Persona is a great system for the multitude of lower-value accounts people keep on the internet. Persona is the perfect solution for the Mozilla Status Board. I wish the UI were better and built into the browser: the current UI that requires JS, shim libraries, and popup windows; it is not a great experience. But the tradeoff for not having to store and handle passwords on the server is worth that small amount of pain.

For any site with high-value data, Persona is not a good choice. On, we disabled password reset emails for users with access to security bugs. This decision indicates that persona should also be considered an unacceptable security risk for these users. Persona as a protocol doesn’t have the right security properties.

It would be very interesting to combine Persona with some other authentication system such as client certificates or a two-factor system. This would allow most users to use the simple login system, while providing extra security properties when users start to access high-value resources.

In the meantime, Mozilla should be careful how it promotes and uses Persona; it’s not a universal solution and we should be careful not to bill it as one.

February 11, 2014 04:19 PM

February 01, 2014

Joshua Cranmer (jcranmer)

Why email is hard, part 5: mail headers

This post is part 5 of an intermittent series exploring the difficulties of writing an email client. Part 1 describes a brief history of the infrastructure. Part 2 discusses internationalization. Part 3 discusses MIME. Part 4 discusses email addresses. This post discusses the more general problem of email headers.

Back in my first post, Ludovic kindly posted, in a comment, a link to a talk of someone else's email rant. And the best place to start this post is with a quote from that talk: "If you want to see an email programmer's face turn red, ask him about CFWS." CFWS is an acronym that stands for "comments and folded whitespace," and I can attest that the mere mention of CFWS is enough for me to start ranting. Comments in email headers are spans of text wrapped in parentheses, and the folding of whitespace refers to the ability to continue headers on multiple lines by inserting a newline before (but not in lieu of) a space.

I'll start by pointing out that there is little advantage to adding in free-form data to headers which are not going to be manually read in the vast majority of cases. In practice, I have seen comments used for only three headers on a reliable basis. One of these is the Date header, where a human-readable name of the timezone is sometimes included. The other two are the Received and Authentication-Results headers, where some debugging aids are thrown in. There would be no great loss in omitting any of this information; if information is really important, appending an X- header with that information is still a viable option (that's where most spam filtration notes get added, for example).

For this feature of questionable utility in the first place, the impact it has on parsing message headers is enormous. RFC 822 is specified in a manner that is familiar to anyone who reads language specifications: there is a low-level lexical scanning phase which feeds tokens into a secondary parsing phase. Like programming languages, comments and white space are semantically meaningless [1]. Unlike programming languages, however, comments can be nested—and therefore lexing an email header is not regular [2]. The problems of folding (a necessary evil thanks to the line length limit I keep complaining about) pale in comparison to comments, but it's extra complexity that makes machine-readability more difficult.

Fortunately, RFC 2822 made a drastic change to the specification that greatly limited where CFWS could be inserted into headers. For example, in the Date header, comments are allowed only following the timezone offset (and whitespace in a few specific places); in addressing headers, CFWS is not allowed within the email address itself [3]. One unanticipated downside is that it makes reading the other RFCs that specify mail headers more difficult: any version that predates RFC 2822 uses the syntax assumptions of RFC 822 (in particular, CFWS may occur between any listed tokens), whereas RFC 2822 and its descendants all explicitly enumerate where CFWS may occur.

Beyond the issues with CFWS, though, syntax is still problematic. The separation of distinct lexing and parsing phases means that you almost see what may be a hint of uniformity which turns out to be an ephemeral illusion. For example, the header parameters define in RFC 2045 for Content-Type and Content-Disposition set a tradition of ;-separated param=value attributes, which has been picked up by, say, the DKIM-Signature or Authentication-Results headers. Except a close look indicates that Authenticatin-Results allows two param=value pairs between semicolons. Another side effect was pointed out in my second post: you can't turn a generic 8-bit header into a 7-bit compatible header, since you can't tell without knowing the syntax of the header which parts can be specified as 2047 encoded-words and which ones can't.

There's more to headers than their syntax, though. Email headers are structured as a somewhat-unordered list of headers; this genericity gives rise to a very large number of headers, and that's just the list of official headers. There are unofficial headers whose use is generally agreed upon, such as X-Face, X-No-Archive, or X-Priority; other unofficial headers are used for internal tracking such as Mailman's X-BeenThere or Mozilla's X-Mozilla-Status headers. Choosing how to semantically interpret these headers (or even which headers to interpret!) can therefore be extremely daunting.

Some of the headers are specified in ways that would seem surprising to most users. For example, the venerable From header can represent anywhere between 0 mailboxes [4] to an arbitrarily large number—but most clients assume that only one exists. It's also worth noting that the Sender header is (if present) a better indication of message origin as far as tracing is concerned [5], but its relative rarity likely results in filtering applications not taking it into account. The suite of Resent-* headers also experiences similar issues.

Another impact of email headers is the degree to which they can be trusted. RFC 5322 gives some nice-sounding platitudes to how headers are supposed to be defined, but many of those interpretations turn out to be difficult to verify in practice. For example, Message-IDs are supposed to be globally unique, but they turn out to be extremely lousy UUIDs for emails on a local system, even if you allow for minor differences like adding trace headers [6].

More serious are the spam, phishing, etc. messages that lie as much as possible so as to be seen by end-users. Assuming that a message is hostile, the only header that can be actually guaranteed to be correct is the first Received header, which is added by the final user's mailserver [7]. Every other header, including the Date and From headers most notably, can be a complete and total lie. There's no real way to authenticate the headers or hide them from snoopers—this has critical consequences for both spam detection and email security.

There's more I could say on this topic (especially CFWS), but I don't think it's worth dwelling on. This is more of a preparatory post for the next entry in the series than a full compilation of complaints. Speaking of my next post, I don't think I'll be able to keep up my entirely-unintentional rate of posting one entry this series a month. I've exhausted the topics in email that I am intimately familiar with and thus have to move on to the ones I'm only familiar with.

[1] Some people attempt to be to zealous in following RFCs and ignore the distinction between syntax and semantics, as I complained about in part 4 when discussing the syntax of email addresses.
[2] I mean this in the theoretical sense of the definition. The proof that balanced parentheses is not a regular language is a standard exercise in use of the pumping lemma.
[3] Unless domain literals are involved. But domain literals are their own special category.
[4] Strictly speaking, the 0 value is intended to be used only when the email has been downgraded and the email address cannot be downgraded. Whether or not these will actually occur in practice is an unresolved question.
[5] Semantically speaking, Sender is the person who typed the message up and actually sent it out. From is the person who dictated the message. If the two headers would be the same, then Sender is omitted.
[6] Take a message that's cross-posted to two mailing lists. Each mailing list will generate copies of the message which end up being submitted back into the mail system and will typically avoid touching the Message-ID.
[7] Well, this assumes you trust your email provider. However, your email provider can do far worse to your messages than lie about the Received header…

February 01, 2014 03:57 AM

January 24, 2014

Joshua Cranmer (jcranmer)

Charsets and NNTP

Recently, the question of charsets came up within the context of necessary decoder support for Thunderbird. After much hemming and hawing about how to find this out (which included a plea to the IMAP-protocol list for data), I remembered that I actually had this data. Long-time readers of this blog may recall that I did a study several years ago on the usage share of newsreaders. After that, I was motivated to take my data collection to the most extreme way possible. Instead of considering only the "official" Big-8 newsgroups, I looked at all of them on the news server I use (effectively, all but alt.binaries). Instead of relying on pulling the data from the server for the headers I needed, I grabbed all of them—the script literally runs HEAD and saves the results in a database. And instead of a month of results, I grabbed the results for the entire year of 2011. And then I sat on the data.

After recalling Henri Svinonen's pesterings about data, I decided to see the suitability of my dataset for this task. For data management reasons, I only grabbed the data from the second half of the year (about 10 million messages). I know from memory that the quality of Python's message parser (which was used to extract data in the first place) is surprisingly poor, which introduces bias of unknown consequence to my data. Since I only extracted headers, I can't identify charsets for anything which was sent as, say, multipart/alternative (which is more common than you'd think), which introduces further systematic bias. The end result is approximately 9.6M messages that I could extract charsets from and thence do further research.

Discussions revealed one particularly surprising tidbit of information. The most popular charset not accounted for by the Encoding specification was IBM437. Henri Sivonen speculated that the cause was some crufty old NNTP client on Windows using that encoding, so I endeavored to build a correlation database to check that assumption. Using the wonderful magic of d3, I produced a heatmap comparing distributions of charsets among various user agents. Details about the visualization may be found on that page, but it does refute Henri's claim when you dig into the data (it appears to be caused by specific BBS-to-news gateways, and is mostly localized in particular BBS newsgroups).

Also found on that page are some fun discoveries of just what kind of crap people try to pass off as valid headers. Some of those User-Agents are clearly spoofs (Outlook Express and family used the X-Newsreader header, not the User-Agent header). There also appears to be a fair amount of mojibake in headers (one of them appeared to be venerable double mojibake). The charsets also have some interesting labels to them: the "big5\n" and the "(null)" illustrate that some people don't double check their code very well, and not shown are the 5 examples of people who think charset names have spaces in them. A few people appear to have mixed up POSIX locales with charsets as well.

January 24, 2014 12:53 AM

January 20, 2014

Mark Finkle (mfinkle)

Firefox for Android: Page Load Performance

One of the common types of feedback we get about Firefox for Android is that it’s slow. Other browsers get the same feedback and it’s an ongoing struggle. I mean, is anything ever really fast enough?

We tend to separate performance into three categories: Startup, Page Load and UX Responsiveness. Lately, we have been focusing on the Page Load performance. We separate further into Objective (real timing) and Subjective (perceived timing). If something “feels” slow it can be just as bad as something that is measurably slow. We have a few testing frameworks that help us track objective and subjective performance. We also use Java, JavaScript and C++ profiling to look for slow code.

To start, we have been focusing on anything that is not directly part of the Gecko networking stack. This means we are looking at all the code that executes while Gecko is loading a page. In general, we want to reduce this code as much as possible. Some of the things we turned up include:

Some of these were small improvements, while others, like the proxy lookups, were significant for “desktop” pages. I’d like to expand on two of the improvements:

Predictive Networking Hints
Gecko networking has a feature called Speculative Connections, where it’s possible for the networking system to start opening TCP connections and even begin the SSL handshake, when needed. We use this feature when we have a pretty good idea that a connection might be opened. We now use the feature in three cases:

Animating the Page Load Spinner
Firefox for Android has used the animated spinner as a page load indicator for a long time. We use the Android animation framework to “spin” an image. Keeping the spinner moving smoothly is pretty important for perceived performance. A stuck spinner doesn’t look good. Profiling showed a lot of time was being taken to keep the animation moving, so we did a test and removed it. Our performance testing frameworks showed a variety of improvements in both objective and perceived tests.

We decided to move to a progressbar, but not a real progressbar widget. We wanted to control the rendering. We did not want the same animation rendering issues to happen again. We also use only a handful of “trigger” points, since listening to true network progress is also time consuming. The result is an objective page load improvement the ranges from ~5% on multi-core, faster devices to ~20% on single-core, slower devices.


The progressbar is currently allowed to “stall” during a page load, which can be disconcerting to some people. We will experiment with ways to improve this too.

Install Firefox for Android Nightly and let us know what you think of the page load improvements.

January 20, 2014 05:22 PM

January 14, 2014

Mark Surman (surman)

MoFo 2014 plans

I’m excited about 2014 at Mozilla. Building on last fall’s Mozilla Summit, it feels like people across the project are re-energized by Mitchell’s reminder that we are a global community with a common cause. Right now, this community is sharply focused on making sure the web wins on mobile and on teaching the world how the web works. I’m optimistic that we’re going to make some breakthroughs in these areas in the year ahead.

Last month, I sat down with our board to talk about where we want to focus the Mozilla Foundation’s education and community program efforts in 2014. We agreed that two things should be our main priorities this year: 1. getting more people to use our learning tools and 2. growing our community of contributors. I’ve posted the board slides (pdf) and a screencast for people who want a detailed overview of our plans. Here is the screencast:

If you are just looking for a quick overview, here are some of the main points from the slides:

In addition to these slides, you can also find detailed workplans for Webmaker, Open Badges, Open News and other MoFo initiatives on the Mozilla Wiki.

At the Mozilla Summit, we imagined a bold future 10 years from now: one where the values of the web are built into all aspects of our connected lives and where the broad majority of people are literate in the ways of the web. In this world, Mozilla is a strong global movement with over a million active contributors.

We move towards this world by building real things: a widely used mobile operating system based on the web; new ways to store and protect personal information online; content and tools for teaching web literacy. I’m excited working on the education and community sides of all this in 2014 — I think we can make some breakthroughs.

As always, I’d love to hear your thoughts on our plans and the year ahead, either as comments here or by email.

Filed under: drumbeat, learning, mozilla, statusupdate, webmakers

January 14, 2014 09:52 PM

December 04, 2013

Joshua Cranmer (jcranmer)

Why email is hard, part 4: Email addresses

This post is part 4 of an intermittent series exploring the difficulties of writing an email client. Part 1 describes a brief history of the infrastructure. Part 2 discusses internationalization. Part 3 discusses MIME. This post discusses the problems with email addresses.

You might be surprised that I find email addresses difficult enough to warrant a post discussing only this single topic. However, this is a surprisingly complex topic, and one which is made much harder by the presence of a very large number of people purporting to know the answer who then proceed to do the wrong thing [0]. To understand why email addresses are complicated, and why people do the wrong thing, I pose the following challenge: write a regular expression that matches all valid email addresses and only valid email addresses. Go ahead, stop reading, and play with it for a few minutes, and then you can compare your answer with the correct answer.




Done yet? So, if you came up with a regular expression, you got the wrong answer. But that's because it's a trick question: I never defined what I meant by a valid email address. Still, if you're hoping for partial credit, you may able to get some by correctly matching one of the purported definitions I give below.

The most obvious definition meant by "valid email address" is text that matches the addr-spec production of RFC 822. No regular expression can match this definition, though—and I am aware of the enormous regular expression that is often purported to solve this problem. This is because comments can be nested, which means you would need to solve the "balanced parentheses" language, which is easily provable to be non-regular [2].

Matching the addr-spec production, though, is the wrong thing to do: the production dictates the possible syntax forms an address may have, when you arguably want a more semantic interpretation. As a case in point, the two email addresses example@test.invalid and example @ test . invalid are both meant to refer to the same thing. When you ignore the actual full grammar of an email address and instead read the prose, particularly of RFC 5322 instead of RFC 822, you'll realize that matching comments and whitespace are entirely the wrong thing to do in the email address.

Here, though, we run into another problem. Email addresses are split into local-parts and the domain, the text before and after the @ character; the format of the local-part is basically either a quoted string (to escape otherwise illegal characters in a local-part), or an unquoted "dot-atom" production. The quoting is meant to be semantically invisible: "example"@test.invalid is the same email address as example@test.invalid. Normally, I would say that the use of quoted strings is an artifact of the encoding form, but given the strong appetite for aggressively "correct" email validators that attempt to blindly match the specification, it seems to me that it is better to keep the local-parts quoted if they need to be quoted. The dot-atom production matches a sequence of atoms (spans of text excluding several special characters like [ or .) separated by . characters, with no intervening spaces or comments allowed anywhere.

RFC 5322 only specifies how to unfold the syntax into a semantic value, and it does not explain how to semantically interpret the values of an email address. For that, we must turn to SMTP's definition in RFC 5321, whose semantic definition clearly imparts requirements on the format of an email address not found in RFC 5322. On domains, RFC 5321 explains that the domain is either a standard domain name [3], or it is a domain literal which is either an IPv4 or an IPv6 address. Examples of the latter two forms are test@[] and test@[IPv6:::1]. But when it comes to the local-parts, RFC 5321 decides to just give up and admit no interpretation except at the final host, advising only that servers should avoid local-parts that need to be quoted. In the context of email specification, this kind of recommendation is effectively a requirement to not use such email addresses, and (by implication) most client code can avoid supporting these email addresses [4].

The prospect of internationalized domain names and email addresses throws a massive wrench into the state affairs, however. I've talked at length in part 2 about the problems here; the lack of a definitive decision on Unicode normalization means that the future here is extremely uncertain, although RFC 6530 does implicitly advise that servers should accept that some (but not all) clients are going to do NFC or NFKC normalization on email addresses.

At this point, it should be clear that asking for a regular expression to validate email addresses is really asking the wrong question. I did it at the beginning of this post because that is how the question tends to be phrased. The real question that people should be asking is "what characters are valid in an email address?" (and more specifically, the left-hand side of the email address, since the right-hand side is obviously a domain name). The answer is simple: among the ASCII printable characters (Unicode is more difficult), all the characters but those in the following string: " \"\\[]();,@". Indeed, viewing an email address like this is exactly how HTML 5 specifies it in its definition of a format for <input type="email">

Another, much easier, more obvious, and simpler way to validate an email address relies on zero regular expressions and zero references to specifications. Just send an email to the purported address and ask the user to click on a unique link to complete registration. After all, the most common reason to request an email address is to be able to send messages to that email address, so if mail cannot be sent to it, the email address should be considered invalid, even if it is syntactically valid.

Unfortunately, people persist in trying to write buggy email validators. Some are too simple and ignore valid characters (or valid top-level domain names!). Others are too focused on trying to match the RFC addr-spec syntax that, while they will happily accept most or all addr-spec forms, they also result in email addresses which are very likely to weak havoc if you pass to another system to send email; cause various forms of SQL injection, XSS injection, or even shell injection attacks; and which are likely to confuse tools as to what the email address actually is. This can be ameliorated with complicated normalization functions for email addresses, but none of the email validators I've looked at actually do this (which, again, goes to show that they're missing the point).

Which brings me to a second quiz question: are email addresses case-insensitive? If you answered no, well, you're wrong. If you answered yes, you're also wrong. The local-part, as RFC 5321 emphasizes, is not to be interpreted by anyone but the final destination MTA server. A consequence is that it does not specify if they are case-sensitive or case-insensitive, which means that general code should not assume that it is case-insensitive. Domains, of course, are case-insensitive, unless you're talking about internationalized domain names [5]. In practice, though, RFC 5321 admits that servers should make the names case-insensitive. For everyone else who uses email addresses, the effective result of this admission is that email addresses should be stored in their original case but matched case-insensitively (effectively, code should be case-preserving).

Hopefully this gives you a sense of why email addresses are frustrating and much more complicated then they first appear. There are historical artifacts of email addresses I've decided not to address (the roles of ! and % in addresses), but since they only matter to some SMTP implementations, I'll discuss them when I pick up SMTP in a later part (if I ever discuss them). I've avoided discussing some major issues with the specification here, because they are much better handled as part of the issues with email headers in general.

Oh, and if you were expecting regular expression answers to the challenge I gave at the beginning of the post, here are the answers I threw together for my various definitions of "valid email address." I didn't test or even try to compile any of these regular expressions (as you should have gathered, regular expressions are not what you should be using), so caveat emptor.

RFC 822 addr-spec
Impossible. Don't even try.
RFC 5322 non-obsolete addr-spec production
RFC 5322, unquoted email address
HTML 5's interpretation
Effective EAI-aware version
[^\x00-\x20\x80-\x9f]()\[\]:;@\\,]+@[^\x00-\x20\x80-\x9f()\[\]:;@\\,]+, with the caveats that a dot does not begin or end the local-part, nor do two dots appear subsequent, the local part is in NFC or NFKC form, and the domain is a valid domain name.

[1] If you're trying to find guides on valid email addresses, a useful way to eliminate incorrect answers are the following litmus tests. First, if the guide mentions an RFC, but does not mention RFC 5321 (or RFC 2821, in a pinch), you can generally ignore it. If the email address test (not) @ would be valid, then the author has clearly not carefully read and understood the specifications. If the guide mentions RFC 5321, RFC 5322, RFC 6530, and IDN, then the author clearly has taken the time to actually understand the subject matter and their opinion can be trusted.
[2] I'm using "regular" here in the sense of theoretical regular languages. Perl-compatible regular expressions can match non-regular languages (because of backreferences), but even backreferences can't solve the problem here. It appears that newer versions support a construct which can match balanced parentheses, but I'm going to discount that because by the time you're going to start using that feature, you have at least two problems.
[3] Specifically, if you want to get really technical, the domain name is going to be routed via MX records in DNS.
[4] RFC 5321 is the specification for SMTP, and, therefore, it is only truly binding for things that talk SMTP; likewise, RFC 5322 is only binding on people who speak email headers. When I say that systems can pretend that email addresses with domain literals or quoted local-parts don't exist, I'm excluding mail clients and mail servers. If you're writing a website and you need an email address, there is no need to support email addresses which don't exist on the open, public Internet.
[5] My usual approach to seeing internationalization at this point (if you haven't gathered from the lengthy second post of this series) is to assume that the specifications assume magic where case insensitivity is desired.

December 04, 2013 11:24 PM

November 20, 2013

Joshua Cranmer (jcranmer)

Why email is hard, part 3: MIME

This post is part 3 of an intermittent series exploring the difficulties of writing an email client. Part 1 describes a brief history of the infrastructure. Part 2 discuses internationalization. This post discusses MIME, the mechanism by which email evolves beyond plain text.

MIME, which stands for Multipurpose Internet Mail Extensions, is primarily dictated by a set of 5 RFCs: RFC 2045, RFC 2046, RFC 2047, RFC 2048, and RFC 2049, although RFC 2048 (which governs registration procedures for new MIME types) was updated with newer versions. RFC 2045 covers the format of related headers, as well as the format of the encodings used to convert 8-bit data into 7-bit for transmission. RFC 2046 describes the basic set of MIME types, most importantly the format of multipart/ types. RFC 2047 was discussed in my part 2 of this series, as it discusses encoding internationalized data in headers. RFC 2049 describes a set of guidelines for how to be conformant when processing MIME; as you might imagine, these are woefully inadequate for modern processing anyways. In practice, it is only the first three documents that matter for building an email client.

There are two main contributions of MIME, which actually makes it a bit hard to know what is meant when people refer to MIME in the abstract. The first contribution, which is of interest mostly to email, is the development of a tree-based representation of email which allows for the inclusion of non-textual parts to messages. This tree is ultimately how attachments and other features are incorporated. The other contribution is the development of a registry of MIME types for different types of file contents. MIME types have promulgated far beyond just the email infrastructure: if you want to describe what kind of file binary blob is, you can refer to it by either a magic header sequence, a file extension, or a MIME type. Searching for terms like MIME libraries will sometimes refer to libraries that actually handle the so-called MIME sniffing process (guessing a MIME type from a file extension or the contents of a file).

MIME types are decomposable into two parts, a media type and a subtype. The type text/plain has a media type of text and a subtype of plain, for example. IANA maintains an official repository of MIME types. There are very few media types, and I would argue that there ought to be fewer. In practice, degradation of unknown MIME types means that there are essentially three "fundamental" types: text/plain (which represents plain, unformatted text and to which unknown text/* types degrade), multipart/mixed (the "default" version of multipart messages; more on this later), and application/octet-stream (which represents unknown, arbitrary binary data). I can understand the separation of the message media type for things which generally follow the basic format of headers+body akin to message/rfc822, although the presence of types like message/partial that don't follow the headers+body format and the requirement to downgrade to application/octet-stream mars usability here. The distinction between image, audio, video and application is petty when you consider that in practice, the distinction isn't going to be able to make clients give better recommendations for how to handle these kinds of content (which really means deciding if it can be displayed inline or if it needs to be handed off to an external client).

Is there a better way to label content types than MIME types? Probably not. X.400 (remember that from my first post?) uses OIDs, in line with the rest of the OSI model, and my limited workings with other systems that use these OIDs is that they are obtuse, effectively opaque identifiers with no inherent semantic meaning. People use file extensions in practice to distinguish between different file types, but not all content types are stored in files (such as multipart/mixed), and the MIME types is a finer granularity to distinguish when needing to guess the type from the start of a file. My only complaints about MIME types are petty and marginal, not about the idea itself.

No, the part of MIME that I have serious complaints with is the MIME tree structure. This allows you to represent emails in arbitrarily complex structures… and onto which the standard view of email as a body with associated attachments is poorly mapped. The heart of this structure is the multipart media type, for which the most important subtypes are mixed, alternative, related, signed, and encrypted. The last two types are meant for cryptographic security definitions [1], and I won't cover them further here. All multipart types have a format where the body consists of parts (each with their own headers) separated by a boundary string. There is space before and after the last parts which consists of semantically-meaningless text sometimes containing a message like "This is a MIME message." meant to be displayed to the now practically-non-existent crowd of people who use clients that don't support MIME.

The simplest type is multipart/mixed, which means that there is no inherent structure to the parts. Attachments to a message use this type: the type of the message is set to multipart/mixed, a body is added as (typically) the first part, and attachments are added as parts with types like image/png (for PNG images). It is also not uncommon to see multipart/mixed types that have a multipart/mixed part within them: some mailing list software attaches footers to messages by wrapping the original message inside a single part of a multipart/mixed message and then appending a text/plain footer.

multipart/related is intended to refer to an HTML page [2] where all of its external resources are included as additional parts. Linking all of these parts together is done by use of a cid: URL scheme. Generating and displaying these messages requires tracking down all URL references in an HTML page, which of course means that email clients that want full support for this feature also need robust HTML (and CSS!) knowledge, and future-proofing is hard. Since the primary body of this type appears first in the tree, it also makes handling this datatype in a streaming manner difficult, since the values to which URLs will be rewritten are not known until after the entire body is parsed.

In contrast, multipart/alternative is used to satisfy the plain-text-or-HTML debate by allowing one to provide a message that is either plain text or HTML [3]. It is also the third-biggest failure of the entire email infrastructure, in my opinion. The natural expectation would be that the parts should be listed in decreasing order of preference, so that streaming clients can reject all the data after it finds the part it will display. Instead, the parts are listed in increasing order of preference, which was done in order to make the plain text part be first in the list, which helps increase readability of MIME messages for those reading email without MIME-aware clients. As a result, streaming clients are unable to progressively display the contents of multipart/alternative until the entire message has been read.

Although multipart/alternative states that all parts must contain the same contents (to varying degrees of degradation), you shouldn't be surprised to learn that this is not exactly the case. There was a period in time when spam filterers looked at only the text/plain side of things, so spammers took to putting "innocuous" messages in the text/plain half and displaying the real spam in the text/html half [4] (this technique appears to have died off a long time ago, though). In another interesting case, I received a bug report with a message containing an image/jpeg and a text/html part within a multipart/alternative [5].

To be fair, the current concept of emails as a body with a set of attachments did not exist when MIME was originally specified. The definition of multipart/parallel plays into this a lot (it means what you think it does: show all of the parts in parallel… somehow). Reading between the lines of the specification also indicates a desire to create interactive emails (via application/postscript, of course). Given that email clients have trouble even displaying HTML properly [6], and the fact that interactivity has the potential to be a walking security hole, it is not hard to see why this functionality fell by the wayside.

The final major challenge that MIME solved was how to fit arbitrary data into a 7-bit format safe for transit. The two encoding schemes they came up with were quoted-printable (which retains most printable characters, but emits non-printable characters in a =XX format, where the Xs are hex characters), and base64 which reencodes every 3 bytes into 4 ASCII characters. Non-encoded data is separated into three categories: 7-bit (which uses only ASCII characters except NUL and bare CR or LF characters), 8-bit (which uses any character but NUL, bare CR, and bare LF), and binary (where everything is possible). A further limitation is placed on all encodings but binary: every line is at most 998 bytes long, not including the terminating CRLF.

A side-effect of these requirements is that all attachments must be considered binary data, even if they are textual formats (like source code), as end-of-line autoconversion is now considered a major misfeature. To make matters even worse, body text for formats with text written in scripts that don't use spaces (such as Japanese or Chinese) can sometimes be prohibited from using 8-bit transfer format due to overly long lines: you can reach the end of a line in as few as 249 characters (UTF-8, non-BMP characters, although Chinese and Japanese typically take three bytes per character). So a single long paragraph can force a message to be entirely encoded in a format with 33% overhead. There have been suggestions for a binary-to-8-bit encoding in the past, but no standardization effort has been made for one [7].

The binary encoding has none of these problems, but no one claims to support it. However, I suspect that violating maximum line length, or adding 8-bit characters to a quoted-printable part, are likely to make it through the mail system, in part because not doing so either increases your security vulnerabilities or requires more implementation effort. Sending lone CR or LF characters is probably fine so long as one is careful to assume that they may be treated as line breaks. Sending a NUL character I suspect could cause some issues due to lack of testing (but it also leaves room for security vulnerabilities to ignore it). In other words, binary-encoded messages probably already work to a large degree in the mail system. Which makes it extremely tempting (even for me) to ignore the specification requirements when composing messages; small wonder then that blatant violations of specifications are common.

This concludes my discussion of MIME. There are certainly many more complaints I have, but this should be sufficient to lay out why building a generic MIME-aware library by itself is hard, and why you do not want to write such a parser yourself. Too bad Thunderbird has at least two different ad-hoc parsers (not libmime or JSMime) that I can think of off the top of my head, both of which are wrong.

[1] I will be covering this in a later post, but the way that signed and encrypted data is represented in MIME actually makes it really easy to introduce flaws in cryptographic code (which, the last time I surveyed major email clients with support for cryptographic code, was done by all of them).
[2] Other types are of course possible in theory, but HTML is all anyone cares about in practice.
[3] There is also text/enriched, which was developed as a stopgap while HTML 3.2 was being developed. Its use in practice is exceedingly slim.
[4] This is one of the reasons I'm minded to make "prefer plain text" do degradation of natural HTML display instead of showing the plain text parts. Not that cleanly degrading HTML is easy.
[5] In the interests of full disclosure, the image/jpeg was actually a PNG image and the HTML claimed to be 7-bit UTF-8 but was actually 8-bit, and it contained a Unicode homograph attack.
[6] Of the major clients, Outlook uses Word's HTML rendering engine, which I recall once reading as being roughly equivalent to IE 5.5 in capability. Webmail is forced to do their own sanitization and sandboxing, and the output leaves something to desire; Gmail is the worst offender here, stripping out all but inline style. Thunderbird and SeaMonkey are nearly alone in using a high-quality layout engine: you can even send a <video> in an email to Thunderbird and have it work properly. :-)
[7] There is yEnc. Its mere existence does contradict several claims (for example, that adding new transfer encodings is infeasible due to install base of software), but it was developed for a slightly different purpose. Some implementation details are hostile to MIME, and although it has been discussed to death on the relevant mailing list several times, no draft was ever made that would integrate it into MIME properly.

November 20, 2013 07:54 PM

October 24, 2013

Guillermo López (willyaranda)


Espabilad. No os queda otra. Os toca espabilar.

Espabilad porque si queréis hacer lo que queréis, y realmente os queréis dedicar a lo que lleváis varios años estudiando, tenéis que espabilar.

Cuando acabéis de estudiar la carrera, y tengáis un simple papel que sois Ingenieros en Informática, es sólo un papel. A partir de ahí os toca demostrarlo. No antes. Ahora. Ahora os toca enfrentaros para lo que os han preparado.

“¡A nosotros no nos han preparado!” Mentira. Sí os han preparado. Os han preparado a ser autosuficientes. Os han preparado a pensar, han preparado vuestra mente durante al menos 5 años para que penséis y actuéis como personas que tienen una carrera y que se les presupone que aprenden rápido. Se os ha enseñado a ser ingenieros.

Os han preparado. Pero quizás no os hayáis preparado. Es fácil echar las culpas a los demás cuando uno no ha puesto mucho de su parte. Y lo bueno es que eso depende de vosotros. Depende de lo que queráis hacer una vez tengáis el dichoso papel.

Espabilad. Invertid vuestro tiempo en estudiar más allá de poder sacar un 5.0 en un examen. Invertidlo en tecnologías, en conocimientos que no os van a dar en vuestra carrera. Invertid vuestro tiempo en ser los mejores en vuestro campo.

Espabilad, estudiantes, espabilad.

October 24, 2013 03:23 PM

October 17, 2013

Mark Finkle (mfinkle)

GeckoView: Embedding Gecko in your Android Application

Firefox for Android is a great browser, bringing a modern HTML rendering engine to Android 2.2 and newer. One of the things we have been hoping to do for a long time now is make it possible for other Android applications to embed the Gecko rendering engine. Over the last few months we started a side project to make this possible. We call it GeckoView.

As mentioned in the project page, we don’t intend GeckoView to be a drop-in replacement for WebView. Internally, Gecko is very different from Webkit and trying to expose the same features using the same APIs just wouldn’t be scalable or maintainable. That said, we want it to feel Android-ish and you should be comfortable with using it in your applications.

We have started to build GeckoView as part of our nightly Firefox for Android builds. You can find the library ZIPs in our latest nightly FTP folder. We are in the process of improving the APIs used to embed GeckoView. The current API is very basic. Most of that work is happening in these bugs:

If you want to start playing around with GeckoView, you can try the demo application I have on Github. It links to some pre-built GeckoView libraries.

We’d love your feedback! We use the Firefox for Android mailing list to discuss status, issues and feedback.

Note: We’re having some Tech Talks at Mozilla’s London office on Monday (Oct 21). One of the topics is GeckoView. If you’re around or in town for Droidcon, please stop by.

October 17, 2013 09:33 PM

October 11, 2013

Joshua Cranmer (jcranmer)

Why email is hard, part 2: internationalization

This post is part 2 of an intermittent series exploring the difficulties of writing an email client. Part 1 describes a brief history of the infrastructure, as well as the issues I have with it. This post is discussing internationalization, specifically supporting non-ASCII characters in email.

Internationalization is not a simple task, even if the consideration is limited to "merely" the textual aspect [1]. Languages turn out to be incredibly diverse in their writing systems, so software that tries to support all writing systems equally well ends up running into several problems that admit no general solution. Unfortunately, I am ill-placed to be able to offer personal experience with internationalization concerns [2], so some of the information I give may well be wrong.

A word of caution: this post is rather long, even by my standards, since the problems of internationalization are legion. To help keep this post from being even longer, I'm going to assume passing familiarity with terms like ASCII, Unicode, and UTF-8.

The first issue I'll talk about is Unicode normalization, and it's an issue caused largely by Unicode itself. Unicode has two ways of making accented characters: precomposed characters (such as U+00F1, ñ) or a character followed by a combining character (U+006E, n, followed by U+0303, ◌̃). The display of both is the same: ñ versus ñ (read the HTML), and no one would disagree that the share the meaning. To let software detect that they are the same, Unicode prescribes four algorithms to normalize them. These four algorithms are defined on two axes: whether to prefer composed characters (like U+00F1) or prefer decomposed characters (U+006E U+0303), and whether to normalize by canonical equivalence (noting that, for example, U+212A Kelvin sign is equivalent to the Latin majuscule K) or by compatibility (e.g., superscript 2 to a regular 2).

Another issue is one that mostly affects display. Western European languages all use a left-to-right, top-to-bottom writing order. This isn't universal: Semitic languages like Hebrew or Arabic use right-to-left, top-to-bottom; Japanese and Chinese prefer a top-to-bottom, right-to-left order (although it is sometimes written left-to-right, top-to-bottom). It thus becomes an issue as to the proper order to store these languages using different writing orders in the actual text, although I believe the practice of always storing text in "start-to-finish" order, and reversing it for display, is nearly universal.

Now, both of those issues mentioned so far are minor in the grand scheme of things, in that you can ignore them and they will still probably work properly almost all of the time. Most text that is exposed to the web is already normalized to the same format, and web browsers have gotten away with not normalizing CSS or HTML identifiers with only theoretical objections raised. All of the other issues I'm going to discuss are things that cause problems and illustrate why properly internationalizing email is hard.

Another historical mistake of Unicode is one that we will likely be stuck with for decades, and I need to go into some history first. The first Unicode standard dates from 1991, and its original goal then was to collect all of the characters needed for modern transmission, which was judged to need only a 16-bit set of characters. Unfortunately, the needs of ideographic-centric Chinese, Japanese, and Korean writing systems, particularly rare family names, turns out to rather fill up that space. Thus, in 1996, Unicode was changed to permit more characters: 17 planes of 65,536 characters each, of which the original set was termed the "Basic Multilingual Plane" or BMP for short. Systems that chose to adopt Unicode in those intervening 5 years often adopted a 16-bit character model as their standard internal format, so as to keep the benefits of fixed-width character encodings. However, with the change to a larger format, their fixed-width character encoding is no longer fixed-width.

This issue plagues anybody who works with systems that considered internationalization in that unfortunate window, which notably includes prominent programming languages like C#, Java, and JavaScript. Many cross-platform C and C++ programs implicitly require UTF-16 due to its pervasive inclusion into the Windows operating system and common internationalization libraries [3]. Unsurprisingly, non-BMP characters tend to quickly run into all sorts of hangups by unaware code. For example, right now, it is possible to coax Thunderbird to render these characters unusable in, say, your subject string if the subject is just right, and I suspect similar bugs exist in a majority of email applications [4].

For all of the flaws of Unicode [5], there is a tacit agreement that UTF-8 should be the character set to use for anyone not burdened by legacy concerns. Unfortunately, email is burdened by legacy concerns, and the use of 8-bit characters in headers that are not UTF-8 is more prevalent than it ought to be, RFC 6532 notwithstanding. In any case, email explicitly provides for handling a wide variety of alternative character sets without saying which ones should be supported. The official list [6] contains about 200 of them (including the UNKNOWN-8BIT character set), but not all of them see widespread use. In practice, the ones that definitely need to be supported are the ISO 8859-* and ISO 2022-* charsets, the EUC-* charsets, Windows-* charsets, GB18030, GBK, Shift-JIS, KOI8-{R,U}, Big5, and of course UTF-8. There are two other major charsets that don't come up directly in email but are important for implementing the entire suite of protocols: UTF-7, used in IMAP (more on that later), and Punycode (more on that later, too).

The suite of character sets falls into three main categories. First is the set of fixed-width character sets, most notably ASCII and the ISO 8859 suite of charsets, as well as UCS-2 (2 bytes per character) and UTF-32 (4 bytes per character). Since the major East Asian languages are all ideographic, which require a rather large number of characters to be encoded, fixed-width character sets are infeasible. Instead, many choose to do a variable-width encoding: Shift-JIS lets some characters (notably ASCII characters and half-width katakana) remain a single byte and uses two bytes to encode all of its other characters. UTF-8 can use between 1 byte (for ASCII characters) and 4 bytes (for non-BMP characters) for a single character. The final set of character sets, such as the ISO 2022 ones, use escape sequences to change the interpretation of subsequent characters. As a result, taking the substring of an encoding string can change its interpretation while remaining valid. This will be important later.

Two more problems related to character sets are worth mentioning. The first is the byte-order mark, or BOM, which is used to distinguish whether UTF-16 is written on a little-endian or big-endian machine. It is also sometimes used in UTF-8 to indicate that the text is UTF-8 versus some unknown legacy encoding. It is also not supposed to appear in email, but I have done some experiments which suggest that people use software that adds it without realizing that this is happening. The second issue, unsurprisingly [7], is that for some character sets (Big5 in particular, I believe), not everyone agrees on how to interpret some of the characters.

The largest problem of internationalization that applies in a general sense is the problem of case insensitivity. The 26 basic Latin letters all map nicely to case, having a single uppercase and a single lowercase variant for each letter. This practice doesn't hold in general—languages like Japanese lack even the notion of case, although it does have two kana variants that hold semantic differences. Rather, there are three basic issues with case insensitivity which showcase enough of its problems to make you want to run away from it altogether [8].

The simplest issue is the Greek sigma. Greek has two lowercase variants of the sigma character: σ and &varsigma (the "final sigma"), but a single uppercase variant, Σ. Thus mapping a string s to uppercase and back to lowercase is not equivalent to mapping s directly to lower-case in some cases. Related to this issue is the story of German ß character. This character evolved as a ligature of a long and short 's', and its uppercase form is generally held to be SS. The existence of a capital form is in some dispute, and Unicode only recently added it (ẞ, if your software supports it). As a result, merely interconverting between uppercase and lowercase versions of a string does not necessarily lead to a simple fixed point. The third issue is the Turkish dotless i (ı), which is the lowercase variant of the ASCII uppercase I character to those who speak Turkish. So it turns out that case insensitivity isn't quite the same across all locales.

Again unsurprisingly in light of the issues, the general tendency towards case-folding or case-insensitive matching in internationalized-aware specifications is to ignore the issues entirely. For example, asking for clarity on the process of case-insensitive matching for IMAP folder names, the response I got was "don't do it." HTML and CSS moved to the cumbersomely-named variant known as "ASCII-subset case-insensitivity", where only the 26 basic Latin letters are mapped to their (English) variants in case. The solution for email is also a verbose variant of "unspecified," but that is only tradition for email (more on this later).

Now that you have a good idea of the general issues, it is time to delve into how the developers of email rose to the challenge of handling internationalization. It turns out that the developers of email have managed to craft one of the most perfect and exquisite examples I have seen of how to completely and utterly fail. The challenges of internationalized emails are so difficult that buggier implementations are probably more common than fully correct implementations, and any attempt to ignore the issue is completely and totally impossible. In fact, the faults of RFC 2047 are my personal least favorite part of email, and implementing it made me change the design of JSMime more than any other feature. It is probably the single hardest thing to implement correctly in an email client, and it is so broken that another specification was needed to be able to apply internationalization more widely (RFC 2231).

The basic problem RFC 2047 sets out to solve is how to reliably send non-ASCII characters across a medium where only 7-bit characters can be reliably sent. The solution that was set out in the original version, RFC 1342, is to encode specific strings in an "encoded-word" format: =?charset?encoding?encoded text?=. The encoding can either be a 'B' (for Base64) or a 'Q' (for quoted-printable). Except the quoted-printable encoding in this format isn't quite the same quoted-printable encoding used in bodies: the space character is encoded via a '_' character instead, as spaces aren't allowed in encoded-words. Naturally, the use of spaces in encoded-words is common enough to get at least one or two bugs filed a year about Thunderbird not supporting it, and I wonder if this subtle difference between two quoted-printable variants is what causes the prevalence of such emails.

One of my great hates with regard to email is the strict header line length limit. Since the encoded-word form can get naturally verbose, particularly when you consider languages like Chinese that are going to have little whitespace amenable for breaking lines, the ingenious solution is to have adjacent encoded-word tokens separated only by whitespace be treated as the same word. As RFC 6857 kindly summarizes, "whitespace behavior is somewhat unpredictable, in practice, when multiple encoded words are used." RFC 6857 also suggests that the requirement to limit encoded words to only 74 characters in length is also rather meaningless in practice.

A more serious problem arises when you consider the necessity of treating adjacent encoded-word tokens as a single unit. This one is so serious that it reaches the point where all of your options would break somebody. When implementing an RFC 2047 encoding algorithm, how do you write the code to break up a long span of text into multiple encoded words without ever violating the specification? The naive way of doing so is to encode the text once in one long string, and then break it into checks which are then converted into the encoded-word form as necessary. This is, of course, wrong, as it breaks two strictures of RFC 2047. The first is that you cannot split the middle of multibyte characters. The second is that mode-switching character sets must return to ASCII by the end of a single encoded-word [9]. The smarter way of building encoded-words is to encode words by trying to figure out how much text can be encoded before needing to switch, and breaking the encoded-words when length quotas are exceeded. This is also wrong, since you could end up violating the return-to-ASCII rule if your don't double-check your converters. Also, if UTF-16 is used as the basis for the string before charset conversion, the encoder stands a good chance of splitting up creating unpaired surrogates and a giant mess as a result.

For JSMime, the algorithm I chose to implement is specific to UTF-8, because I can use a property of the UTF-8 implementation to make encoding fast (every octet is looked at exactly three times: once to convert to UTF-8, once to count to know when to break, and once to encode into base64 or quoted-printable). The property of UTF-8 is that the second, third, and fourth octets of a multibyte character all start with the same two bits, and those bits never start the first octet of a character. Essentially, I convert the entire string to a binary buffer using UTF-8. I then pass through the buffer, keeping counters of the length that the buffer would be in base64 form and in quoted-printable form. When both counters are exceeded, I back up to the beginning of the character, and encode that entire buffer in a word and then move on. I made sure to test that I don't break surrogate characters by making liberal use of the non-BMP character U+1F4A9 [10] in my encoding tests.

The sheer ease of writing a broken encoder for RFC 2047 means that broken encodings exist in the wild, so an RFC 2047 decoder needs to support some level of broken RFC 2047 encoding. Unfortunately, to "fix" different kinds of broken encodings requires different support for decoders. Treating adjacent encoded-words as part of the same buffer when decoding makes split multibyte characters work properly but breaks non-return-to-ASCII issues; if they are decoded separately the reverse is true. Recovering issues with isolated surrogates is at best time-consuming and difficult and at worst impossible.

Yet another problem with the way encoded-words are defined is that they are defined as specific tokens in the grammar of structured address fields. This means that you can't hide RFC 2047 encoding or decoding as a final processing step when reading or writing messages. Instead you have to do it during or after parsing (or during or before emission). So the parser as a result becomes fully intertwined with support for encoded-words. Converting a fully UTF-8 message into a 7-bit form is thus a non-trivial operation: there is a specification solely designed to discuss how to do such downgrading, RFC 6857. It requires deducing what structure a header has, parsing that harder, and then reencoding the parsed header. This sort of complicated structure makes it much harder to write general-purpose email libraries: the process of emitting a message basically requires doing a generic UTF-8-to-7-bit conversion. Thus, what is supposed to be a more implementation detail of how to send out a message ends up permeating the entire stack.

Unfortunately, the developers of RFC 2047 were a bit too clever for their own good. The specification limits the encoded-words to occurring only inside of phrases (basically, display names for addresses), unstructured text (like the subject), or comments (…). I presume this was done to avoid requiring parsers to handle internationalization in email addresses themselves or possibly even things like MIME boundary delimiters. However, this list leaves out one common source of internationalized text: filenames of attachments. This was ultimately patched by RFC 2231.

RFC 2231 is by no means a simple specification, since it attempts to solve three problems simultaneously. The first is the use of non-ASCII characters in parameter values. Like RFC 2047, the excessively low header line length limit causes the second problem, the need to wrap parameter values across multiple line lengths. As a result, the encoding is complicated (it takes more lines of code to parse RFC 2231's new features alone than it does to parse the basic format [11]), but it's not particularly difficult.

The third problem RFC 2231 attempts to solve is a rather different issue altogether: it tries to conclusively assign a language tag to the encoded text and also provides a "fix" for this to RFC 2047's encoded-words. The stated rationale is to be able to have screen readers read the text aloud properly, but the other (much more tangible) benefit is to ameliorate the issues of Unicode's Han unification by clearly identifying if the text is Chinese, Japanese, or Korean. While it sounds like a nice idea, it suffers from a major flaw: there is no way to use this data without converting internal data structures from using flat strings to richer representations. Another issue is that actually setting this value correctly (especially if your goal is supporting screen readers' pronunciations) is difficult if not impossible. Fortunately, this is an entirely optional feature; though I do see very little email that needs to be concerned about internationalization, I have yet to find an example of someone using this in the wild.

If you're the sort of person who finds properly writing internationalized text via RFC 2231 or RFC 2047 too hard (or you don't realize that you need to actually worry about this sort of stuff), and you don't want to use any of the several dozen MIME libraries to do the hard stuff for you, then you will become the bane of everyone who writes email clients, because you've just handed us email messages that have 8-bit text in the headers. At which point everything goes mad, because we have no clue what charset you just used. Well, RFC 6532 says that headers are supposed to be UTF-8, but with the specification being only 19 months old and part of a system which is still (to my knowledge) not supported by any major clients, this should be taken with a grain of salt. UTF-8 has the very nice property that text that is valid UTF-8 is highly unlikely to be any other charset, even if you start considering the various East Asian multibyte charsets. Thus you can try decoding under the assumption that is UTF-8 and switch to a designated fallback charset if decoding fails. Of course, knowing which designated fallback to use is a different matter entirely.

Stepping outside email messages themselves, internationalization is still a concern. IMAP folder names are another well-known example. RFC 3501 specified that mailbox names should be in a modified version of UTF-7 in an awkward compromise. To my knowledge, this is the only remaining significant use of UTF-7, as many web browsers disabled support due to its use in security attacks. RFC 6855, another recent specification (6 months old as of this writing), finally allows UTF-8 mailbox names here, although it too is not yet in widespread usage.

You will note missing from the list so far is email addresses. The topic of email addresses is itself worthy of lengthy discussion, but for the purposes of a discussion on internationalization, all you need to know is that, according to RFCs 821 and 822 and their cleaned-up successors, everything to the right of the '@' is a domain name and everything to the left is basically an opaque ASCII string [12]. It is here that internationalization really runs headlong into an immovable obstacle, for the email address has become the de facto unique identifier of the web, and everyone has their own funky ideas of what an email address looks like. As a result, the motto of "be liberal in what you accept" really breaks down with email addresses, and the amount of software that needs to change to accept internationalization extends far beyond the small segment interested only in the handling of email itself. Unfortunately, the relative newness of the latest specifications and corresponding lack of implementations means that I am less intimately familiar with this aspect of internationalization. Indeed, the impetus for this entire blogpost was a day-long struggle with trying to ascertain when two email addresses are the same if internationalized email address are involved.

The email address is split nicely by the '@' symbol, and internationalization of the two sides happens at two different times. Domains were internationalized first, by RFC 3490, a specification with the mouthful of a name "Internationalizing Domain Names in Applications" [13], or IDNA2003 for short. I mention the proper name of the specification here to make a point: the underlying protocol is completely unchanged, and all the work is intended to happen at roughly the level of getaddrinfo—the internal DNS resolver is supposed to be involved, but the underlying DNS protocol and tools are expected to remain blissfully unaware of the issues involved. That I mention the year of the specification should tell you that this is going to be a bumpy ride.

An internationalized domain name (IDN for short) is a domain name that has some non-ASCII characters in it. Domain names, according to DNS, are labels terminated by '.' characters, where each label may consist of up to 63 characters. The repertoire of characters are the ASCII alphanumerics and the '-' character, and labels are of course case-insensitive like almost everything else on the Internet. Encoding non-ASCII characters into this small subset while meeting these requirements is difficult for other contemporary schemes: UTF-7 uses Base64, which means 'A' and 'a' are not equivalent; percent-encoding eats up characters extremely quickly. So IDN use a different specification for this purpose, called Punycode, which allows for a dense but utterly unreadable encoding. The basic algorithm of encoding an IDN is to take the input string, apply case-folding, normalize using NFKC, and then encode with Punycode.

Case folding, as I mentioned several paragraphs ago, turns out to have some issues. The ß and &varsigma characters were the ones that caused the most complaints. You see, if you were to register, say, www.weiß.de, you would actually be registering As there is no indication of Punycode involved in the name, browsers would show the domain in the ASCII variant. One way of fixing this problem would be to work with browser vendors to institute a "preferred name" specification for websites (much like there exists one for the little icons next to page titles), so that the world could know that the proper capitalization is of course instead of Instead, the German and Greek registrars pushed for a change to IDNA, which they achieved in 2010 with IDNA2008.

IDNA2008 is defined principally in RFCs 5890-5895 and UTS #46. The principal change is that the normalization step no longer exists in the protocol and is instead supposed to be done by applications, in a possibly locale-specific manner, before looking up the domain name. One reason for doing this was to eliminate the hard dependency on a specific, outdated version of Unicode [14]. It also helps fix things like the Turkish dotless I issue, in theory at least. However, this different algorithm causes some domains to be processed differently from IDNA2003. UTS #46 specifies a "compatibility mode" which changes the algorithm to match IDNA2003 better in the important cases (specifically, ß, &varsigma, and ZWJ/ZWNJ), with a note expressing the hope that this will eventually become unnecessary. To handle the lack of normalization in the protocol, registrars are asked to automatically register all classes of equivalent domain names at the same time. I should note that most major browsers (and email clients, if they implement IDN at all) are still using IDNA2003: an easy test of this fact is to attempt to go to ☃.net, which is valid under IDNA2003 but not IDNA2008.

Unicode text processing is often vulnerable to an attack known as the "homograph attack." In most fonts, the Greek omicron and the Latin miniscule o will be displayed in exactly the same way, so an attacker could pretend to be from, say, Google while instead sending you to Gοogle—I used Latin in the first word and Greek in the second. The standard solution is to only display the Unicode form (and not the Punycode form) where this is not an issue; Firefox and Opera display Unicode only for a whitelist of registrars with acceptable polices, Chrome and Internet Explorer only permits scripts that the user claims to read, and Safari only permits scripts that don't permit the homograph attack (i.e., not Cyrillic or Greek). (Note: this information I've summarized from Chromium's documentation; forward any complaints of out-of-date information to them).

IDN satisfies the needs of internationalizing the second half of an email address, so a working group was commissioned to internationalize the first one. The result is EAI, which was first experimentally specified in RFCs 5335-5337, and the standards themselves are found in RFCs 6530-6533 and 6855-6858. The primary difference between the first, experimental version and the second, to-be-implemented version is the removal of attempts to downgrade emails in the middle of transit. In the experimental version, provisions were made to specify with every internalized address an alternate, fully ASCII address to which a downgraded message could be sent if SMTP servers couldn't support the new specifications. These were removed after the experiment found that such automatic downgrading didn't work as well as hoped.

With automatic downgrading removed from the underlying protocol, the onus is on people who generate the emails—mailing lists and email clients—to figure out who can and who can't receive messages and then downgrade messages as appropriate for the recipients of the message. However, the design of SMTP is such that it is impossible to automatically determine if the client can receive these new kinds of messages. Thus, the options are to send them and hope that it works or to rely on the (usually clueless) user to inform you if it works. Clearly an unpalatable set of options, but it is one that can't be avoided due to protocol design.

The largest change of EAI is that the local parts of addresses are specified as a sequence of UTF-8 characters, omitting only the control characters [15]. The working group responsible for the specification adamantly refused to define a Unicode-to-ASCII conversion process, and thus a mechanism to make downgrading work smoothly, for several reasons. First, they didn't want to specify a prefix which could change the meaning of existing local-parts (the structure of local-parts is much less discoverable than the structure of all domain names). Second, they felt that the lack of support for displaying the Unicode variants of Punycode meant that users would have a much worse experience. Finally, the transition period would be hopefully short (although messy), so designing a protocol that supports that short period would worsen it in the long term. Considering that, at the moment of writing, only one of the major SMTP implementations has even a bug filed to support it, I think the working group underestimates just how long transition periods can take.

As far as changes to the message format go, that change is the only real change, considering how much effort is needed to opt-in. Yes, headers are now supposed to be UTF-8, but, in practice, every production MIME parser needs to handle 8-bit characters in headers anyways. Yes, message/global can have MIME encoding applied to it (unlike message/rfc822), but, in practice, you already need to assume that people are going to MIME-encode message/rfc822 in violation of the specification. So, in practice, the changes needed to a parser are to add message/global as an alias to message/rfc822 [16] and possibly tweaking some charset detection heuristics to prefer UTF-8. I would very much have liked the restriction on header line length removed, but, alas, the working group did not feel moved to make those changes. Still, I look forward to the day when I never have to worry about encoding text into RFC 2047 encoded-words.

IMAP, POP, and SMTP are also all slightly modified to take account of the new specifications. Specifically, internationalized headers are supposed to be opt-in only—SMTP are supposed to reject sending to these messages if it doesn't support them in the first place, and IMAP and POP are supposed to downgrade messages when requested unless the client asks for them to not be. As there are no major server implementations yet, I don't know how well these requirements will be followed, especially given that most of the changes already need to be tolerated by clients in practice. The experimental version of internationalization specified a format which would have wreaked havoc to many current parsers, so I suspect some of the strict requirements may be a holdover from that version.

And thus ends my foray into email internationalization, a collection of bad solutions to hard problems. I have probably done a poor job of covering the complete set of inanities involved, but what I have covered are the ones that annoy me the most. This certainly isn't the last I'll talk about the impossibility of message parsing either, but it should be enough at least to convince you that you really don't want to write your own message parser.

[1] Date/time, numbers, and currency are the other major aspects of internalization.
[2] I am a native English speaker who converses with other people almost completely in English. That said, I can comprehend French, although I am not familiar with the finer points that come with fluency, such as collation concerns.
[3] C and C++ have a built-in internationalization and localization API, derived from POSIX. However, this API is generally unsuited to the full needs of people who actually care about these topics, so it's not really worth mentioning.
[4] The basic algorithm to encode RFC 2047 strings for any charset are to try to shift characters into the output string until you hit the maximum word length. If the internal character set for Unicode conversion is UTF-16 instead of UTF-32 and the code is ignorant of surrogate concerns, then this algorithm could break surrogates apart. This is exactly how the bug is triggered in Thunderbird.
[5] I'm not discussing Han unification, which is arguably the single most controversial aspect of Unicode.
[6] Official list here means the official set curated by IANA as valid for use in the charset="" parameter. The actual set of values likely to be acceptable to a majority of clients is rather different.
[7] If you've read this far and find internationalization inoperability surprising, you are either incredibly ignorant or incurably optimistic.
[8] I'm not discussing collation (sorting) or word-breaking issues as this post is long enough already. Nevertheless, these also help very much in making you want to run away from internationalization.
[9] I actually, when writing this post, went to double-check to see if Thunderbird correctly implements return-to-ASCII in its encoder, which I can only do by running tests, since I myself find its current encoder impenetrable. It turns out that it does, but it also looks like if we switched conversion to ICU (as many bugs suggest), we may break this part of the specification, since I don't see the ICU converters switching to ASCII at the end of conversion.
[10] Chosen as a very adequate description of what I think of RFC 2047. Look it up if you can't guess it from context.
[11] As measured by implementation in JSMime, comments and whitespace included. This is biased by the fact that I created a unified lexer for the header parser, which rather simplifies the implementation of the actual parsers themselves.
[12] This is, of course a gross oversimplification, so don't complain that I'm ignoring domain literals or the like. Email addresses will be covered later.
[13] A point of trivia: the 'I' in IDNA2003 is expanded as "Internationalizing" while the 'I' in IDNA2008 is for "Internationalized."
[14] For the technically-minded: IDNA2003 relied on a hard-coded list of banned codepoints in processing, while IDNA2008 derives its lists directly from Unicode codepoint categories, with a small set of hard-coded exceptions.
[15] Certain ASCII characters may require the local-part to be quoted, of course.
[16] Strictly speaking, message/rfc822 remains all-ASCII, and non-ASCII headers need message/global. Given the track record of message/news, I suspect that this distinction will, in practice, not remain for long.

October 11, 2013 04:07 AM

October 01, 2013

Benjamin Smedberg (bsmedberg)

Mozilla Summit: Listen Hard

Listen hard at the Mozilla Summit.

When you’re at a session, give the speaker your attention. If you are like me and get distracted easily by all the people, take notes using a real pen and paper. Practice active listening: don’t argue with the speaker in your head, or start phrasing the perfect rebuttal. If a speaker or topic is not interesting to you, leave and find a different session.

At meals, sit with at least some people you don’t know. Introduce yourself! Talk to people about themselves, about the project, about their personal history. If you are a shy person, ask somebody you already know to make introductions. If you are a connector who knows lots of people, one of your primary jobs at the summit should be making introductions.

In the evenings and downtime, spend time working through the things you heard. If a presentation gave you a new technique, spend time thinking about how you could use it, and what the potential downsides are. If you learned new information, go back through your old assumptions and priorities and question whether they are still correct. If you have questions, track down the speaker and ask them in person. Questions that come the next day are one of the most valuable forms of feedback for a speaker (note: try to avoid presentations on the last day of a conference).

Talk when you have something valuable to ask or say. If you are the expert on a topic, it is your duty to lead a conversation even if you are naturally a shy person. If you aren’t the expert, use discretion so you don’t disrupt a conversation.

If you disagree with somebody, say so! Usually it’s better to disagree in a private conversation, not in a public Q&A session. If you don’t know the history of a decision, ask! Be willing to change your mind, but also be willing to stay in disagreement. You can build trust and respect even in disagreement.

If somebody disagrees with you, try to avoid being defensive (it’s hard!). Keep sharing context and asking questions. If you’re not sure whether the people you’re talking to know the history of a decision, ask them! Don’t be afraid to repeat information over and over again if the people you’re talking to haven’t heard it before.

Don’t read your email. Unfortunately you’ll probably have to scan your email for summit-related announcements, but in general your email can wait.

I’ve been at two summits, a mozcamp, and numerous all-hands and workweeks. They are exhausting and draining events for introverted individuals such as myself. But they are also motivating, inspiring, and in general awesome. Put on a positive attitude and make the most of every part of the event.

More great summit tips from Laura Forrest.

October 01, 2013 01:34 PM

September 26, 2013

Mark Finkle (mfinkle)


If Mozilla had secret weapons, I think our Interns would be included on the list. These hard working troops descend upon us during their school breaks and end up working on some of the hardest problems Mozilla has to offer. Our primary Intern “season” is wrapping up and I wanted to touch upon some of the work completed or in-progress.

Firefox for Android

Firefox for Metro

One of the things I like about the way Mozilla utilizes interns is that it shows them exactly what happens in real software development. They learn that code reviews can take a lot of time. Your feature might not make the desired release, or even get backed out at the last minute. They learn that large software projects are painful and carry a lot of legacy baggage, and you need to deal with it. I think it’s also a great way to learn how to communicate in a team environment. They also get to ship features in Firefox, and who doesn’t love shipping stuff?

Interns of 2013, we salute you!

September 26, 2013 05:50 PM

September 25, 2013

Mark Finkle (mfinkle)

Firefox for Android: Team Meetup, Brainstorming and Hacking

Last week, the Firefox for Android team, and some friends, had a team meetup at the Mozilla Toronto office. As is typical for Mozilla, the team is quite distributed so getting together, face to face, is refreshing. The agenda for the week was fairly simple: Brainstorm new feature ideas, discuss ways to make our workflow better, and provide some time for fun hacking.

We spent most of our time brainstorming, first at a high level, then we picked a few ideas/concepts to drill into. The high level list ended up with over 150 ideas. These ranged from blue-sky features, building on existing features, performance and UX improvements, and removing technical debt. Some of the areas where we had deeper discussions included:

We also took some time to examine our workflow. We found some rough edges we intend to smooth out. We also ended up with a better understanding of our current, somewhat organic, workflow. Look for more write-ups from the team on this as we pull the information together. One technical outcome of the the discussions was a critical examination of our automated testing situation. We decided that we depend entirely too much on Robotium for testing our Java UI and non-UI code. Plans are underway to add some JUnit test support for the non-UI code.

The Android team is very committed to working with contributors and have been doing a great job attracting and mentoring code contributors. Last week they started discussing how to attract other types of contributors, focusing on bug triage as the next possible area. Desktop Firefox has had some great bug triage support from the community, so it seems like a natural candidate. Look for more information about that effort coming soon.

There was also some time for hacking on code. Some of the hacking was pure fun stuff. I saw a twitterbot and an IRCbot created. There was also a lot of discussion and hacking on add-on APIs that provide more integration hooks for add-on developers into the Java UI. Of course there is usually a fire that needs to be put out, and we had one this time as well. The front-end team quickly pulled together to implement a late-breaking design change to the new Home page. It’s been baking on Nightly for a few days now and will start getting uplifted to Aurora by the end of the week.

All in all, it was a great week. I’m looking forward to seeing what happens next!

September 25, 2013 03:29 PM

September 19, 2013

Mark Surman (surman)

Who wants to teach the web?

People who teach others about the web are key to the future of Webmaker — and maybe even the future of Mozilla. I’m not talking only about teachers in classrooms getting their kids into HTML. Although that’s part of it. I’m talking about anyone who a) is excited about the culture and technology of the web and b) wants to help others get more out of the web they create and communicate on everyday online. We’ve been calling these people ‘mentors’. But, more simply, they are people who love the web want to share their passion.

Maker Party Surabaya

In my recent post on Maker Party, I asked ‘how do we build a global community of mentors?’ One of the first steps is meeting these people, figuring out who they are and what they really want. We’ve been doing that all summer with Maker Party. And I did a bit personally as I traveled around over the earlier this month. Here are a few of the mentors I met.

Rafael, curious.


Rafael is an IT consultant who used to be a teacher. He knows the web and a little programming. He came to our Manila Maker party just to find out what was up. He ended up winning the ‘best make’ contest with a Thimble comic strip remix. At the end he said: I want to show this to some of the teacher friends. We pointed him to and told him local MozReps would be in touch. Rafael is the guy with a tshirt over his face.

Joe, learner turned mentor.


Joe is an high school student in the UK. He first got turned onto Webmaker at MozFest 2011. He liked the idea of teaching his less geeky friends about web programming, so he organized a Summer Code Party in 2012. This year he was helping as a Webmaker mentor at Campus Party in London. Joe is also active with DeCoded, an other London tech education group. Joe is the guy in the foreground with the white mentor shirt on.

Abdul, teacher.


Abdul is an IT teacher in a high school in Surabaya, Indonesia. He helped us organize a 100 person all day Maker Party in the school auditorium. He teaches HTML and PHP using notepad already, but wants a way to get kids more excited about those technologies. The two pane Thimble editor plus having his kids hack our animated GIF postcard template seemed like a good start. Now he wants to offer Webmaker activities regularly at his school, although would find it easier if there was content in Bahasa Indonesia.

Youth IT Clubs.


In Surabaya, I met with a bunch of high school IT clubs: after-school groups led by the the IT teacher. In the case of Abdul, he recruited his club to run our 100 person Webmaker event. And wants to help them learn to be leaders and teachers themselves by involving them in ongoing Webmaker programs. We already have a great example of working with youth in this way as part of our relationship with MOUSE via Hive NYC.

Lewie, youth mentor activator.


Lewie is in his early twenties. A few years ago, he didn’t know how to code. Now he teaches corporate execs about programming for Freeformers. He also helps find other young people who he can train up and do the same. This is part of Freeformers effort to get young talent creating more young talent, using a 1:1 business model where corporate training funds more training for young people from unlikely backgrounds. The Freeformers have been active users of and contributors to Webmaker. That’s Lewie on the right.

Michelle, partner in crime.


Michelle runs developer relations for one of the two big mobile operators in the Philippines. She is also a great friend of Mozilla’s. She regularly offers event space for things like Webmaker events. And, at the Maker Party, stepped up in real time to offer a small cash prize for the best make. It’s win / win for sure: her company is positioned as part of our effort to build young web talent at little cost. But, there is more there. Michelle is personally excited about what we’re doing. This offers a great deal of validation and motivation to both the mentors and the learners in the room. That’s Michelle on the right.

Kindred spirits and partners, more broadly.


A core idea behind Webmaker is being a big tent for anyone who wants to teach the Web. It’s about finding kindred spirits who want to teach alongside us. The three fellows above are from the local robot hacker community in Surabaya. They came to help with our Hive Pop Up. We worked with dozens and dozens of partners like this as part of Maker Party this summer including Code Club, National Writing Project, Technology Will Save Us, Young Rewired State and all of the members of our Hive Networks. I’m going to do a separate post on partners, but they are a key piece of building a mentor network in their own right.

Benny and Yoe One, Super Mentors.


Benny and Yoe One are dedicated Mozilla volunteers who live in Surabaya. They don’t just work on Webmaker. But they have been incredibly active. They organized the Maker Party and Hive Pop Up in Surabaya. And, more importantly, started to build relationships with dozens of schools and local government to create interest in what we’re doing. They are ‘Super Mentors’ in our parlance: people who have the skills to teach but also want to help us bring in and train more mentors. Obviously, these people are absolutely key to the success of our Webmaker effort. Benny is to the left and Yoe One is to the right of Abdul.

Faye, Webmaker country lead.


Faye is a university student in Manila and a Mozilla Rep. She is also the official Webmaker Country Lead. The MozReps in the Philippines have created lead positions like this for many Mozilla programs to make sure someone is a driver. Being Webmaker Lead means Faye not only organized the Maker Party I was at in Manila but is also thinking strategically about how to improve Webmaker and how to get it out of Manila into remote regions. She is like a Super Mentor with a more official role within the local reps community. We may want to consider having this kind of ‘lead’ role in other countries or other cities. That is Faye in the Firefox shirt on the right.

Bob, Jun and Viking, the elders.


In many countries around the world, Mozilla is lucky to have a community of elders. People who have been a part of the Mozilla community since very early on. A number of these people have been critical in getting Webmaker going in their countries, encouraging other community members like Benny, Faye and Yoe One to get involved. These people also could (and should) play a key role in defining where we go next with Webmaker and how it ties into the rest of Mozilla’s work. This is a picture of Viking. Bob and Jun are on the right in the picture below.

The posse.


Finally, a key part of the picture is what I just call ‘the posse’. These aren’t mentors per say, although they do often pitch in teaching at Maker Parties. They are active Mozilla community members working on a variety of things who are willing to help their peers who are running Webmaker activities. I found them in all three cities I visited. This is the awesome posse holding fort at the registration desk at our recent Manila Maker Party.

As you can see just from my handful of examples, these mentors (sic) are quite diverse. But they do have things in common. They are passionate about the web. They want to teach or share what they know about the web in an active way. They want to be part of what Mozilla’s doing, either on the face of it or because the big tent brings people to their own teaching programs. And, across the board, they are simply generous and enthusiastic people who want to make the world better for the people around them by sharing the web.

At this stage, these mentors are the most critical audience for Webmaker. This is in part because they are the ones who get it and like it: they are in a great position to help us test, iterate and build it out further with community contributions. But it’s also because they will bring in the next round of web makers. Each mentor who uses will bring 5 – 50 more users as a part of the teaching they are doing. Summed up: growing our mentor community will both make Webmaker better and grow our user base. IMHO, we should be putting most of our efforts right now on making Webmaker better for — and with — mentors.

Filed under: drumbeat, education, learning, mozilla, webmakers

September 19, 2013 04:49 PM

September 15, 2013

Mark Surman (surman)

A Webmaker tour in photos

As I mentioned in my last post, I attended some Maker Party events in different parts of the world over the past two weeks — London, Manila, Surabaya. Here are some photos and basic stats about the events.

1. London, Make the Web at Campus Party

Make the Web was a large space inside of Telefonica’s Campus Party in London, with hundreds of young people moving through to learn about the web.

Campus Party

It included workshops and demos from Telefonica Think Big, FreeFormers, Code Club, Firefox Student Ambassadors, Pop Up Talent, Webmaker, Code First Girls, Fluency and the new Mozilla Appmaker project.

Campus Party

Huge thanks to Helen Parker and the Telefonica Think Big team for being the leaders in putting this together.

2. Manila, Maker Party Makati

In Manila, the local Mozilla community organized a straight up Webmaker Maker Party on a Friday evening.

Maker Party Manila

Approximately 50 people gathered for a Mozilla talk, Thimble and pizza in the training room of Globe Labs.

Maker Party Manila

The participants ranged from high school students to IT consultants to a father in his 50s interested in teaching his kids. We met a number of new potential mentors there that we will follow up with.

Maker Party Manila

Thanks to Faye, Jun, Bob and the whole Mozilla Philippines community for organizing this.

3. Surabaya, SMAN11 School Maker Party

In Surabaya, Indonesia, the Mozilla community worked with a local school to run an all day Maker Party for 100 students.

Surabaya Maker Party

It started with a spectrogram exercise that asked people ‘should students be allowed to have their phones in the classroom’. This was followed by a Mozilla talk and working on the ‘postcard’ template in Thimble.

Surabaya Maker Party

Over 70 makes were published. More importantly, the teacher and school IT club expressed an interest in doing follow up events they would run themselves.

4. Surabaya, Hive Pop Up

On the following day, the local Mozilla team organized a Hive Pop Up with students and teachers from over 40 schools. The city government was also involved.

Hive Surabaya Pop Up

The event included hands on learning stations for robots, Firefox OS, animation, Webmaker, paper toy making and batik. As with any Hive Pop Up, this just gave students and teachers involved a taste of what is possible.

Hive Surabaya Pop Up

Many had an interest in going further with the idea of setting up a Hive community in Surabaya.

Hive Surabaya Pop Up

Both Surabaya events were organized by the amazing Benny, Yoe One, Viking and others from across the Mozilla Indonesia community.

I’m sharing this overview mostly as context for my upcoming posts on what we’re learning from the Maker Party Lab that we ran over the past four months. Also, to give you a snapshot of what Mozilla communities are doing around the world with Webmaker — you people are amazing!

Filed under: mozilla

September 15, 2013 03:08 PM

September 14, 2013

Joshua Cranmer (jcranmer)

Why email is hard, part 1: architecture

Which is harder, writing an email client or writing a web browser? Several years ago, I would have guessed the latter. Having worked on an email client for several years, I am now more inclined to guess that email is harder, although I never really worked on a web browser, so perhaps it's just bias. Nevertheless, HTML comes with a specification that tells you how to parse crap that pretends to be HTML; email messages come with no such specification, which forces people working with email to guess based on other implementations and bug reports. To vent some of my frustration with working with email, I've decided to post some of my thoughts on what email did wrong and why it is so hard to work with. Since there is so much to talk about, instead of devoting one post to it, I'll make it an ongoing series with occasional updates (i.e., updates will come out when I feel like it, so don't bother asking).

First off, what do I mean by an email client? The capabilities of, say, Outlook versus Gaia Email versus Thunderbird are all wildly different, and history has afforded many changes in support. I'll consider anything that someone might want to put in an email client as fodder for discussion in this series (so NNTP, RSS, LDAP, CalDAV, and maybe even IM stuff might find discussions later). What I won't consider are things likely to be found in a third-party library, so SSL, HTML, low-level networking, etc., are all out of scope, although I may mention them where relevant in later posts. If one is trying to build a client from scratch, the bare minimum one needs to understand first is the basic message formatting, MIME (which governs attachments), SMTP (email delivery), and either POP or IMAP (email receipt). Unfortunately, each of these requires cross-referencing a dozen RFCs individually when you start considering optional or not-really-optional features.

The current email architecture we work with today doesn't have a unique name, although "Internet email" [1] or "SMTP-based email" are probably the most appropriate appellations. Since there is only one in use in modern times, there is no real need to refer to it by anything other than "email." The reason for the use of SMTP in lieu of any other major protocol to describe the architecture is because the heart of the system is motivated by the need to support SMTP, and because SMTP is how email is delivered across organizational boundaries, even if other protocols (such as LMTP) are used internally.

Some history of email, at least that lead up to SMTP, is in order. In the days of mainframes, mail generally only meant communicating between different users on the same machine, and so a bevy of incompatible systems started to arise. These incompatible systems grew to support connections with other computers as networking computers became possible. The ARPANET project brought with it an attempt to standardize mail transfer on ARPANET, separated into two types of documents: those that standardized message formats, and those that standardized the message transfer. These would eventually culminate in RFC 822 and RFC 821, respectively. SMTP was designed in the context of ARPANET, and it was originally intended primarily to standardize the messages transferred only on this network. As a result, it was never intended to become the standard for modern email.

The main competitor to SMTP-based email that is worth discussing is X.400. X.400 was at one time expected to be the eventual global email interconnect protocol, and interoperability between SMTP and X.400 was a major focus in the 1980s and 1990s. SMTP has a glaring flaw, to those who work with it, in that it is not so much designed as evolved to meet new needs as they came up. In contrast, X.400 was designed to account for a lot of issues that SMTP hadn't dealt with yet, and included arguably better functionality than SMTP. However, it turned out to be a colossal failure, although theories differ as to why. The most convincing to me boils down to X.400 being developed at a time of great flux in computing (the shift from mainframes to networked PCs) combined with a development process that was ill-suited to reacting quickly to these changes.

I mentioned earlier that SMTP eventually culminates in RFC 821. This is a slight lie, for one of the key pieces of the Internet, and a core of the modern email architecture, didn't exist. That is DNS, which is the closest thing the Internet has to X.500 (a global, searchable directory of everything). Without DNS, figuring out how to route mail via SMTP is a bit of a challenge (hence why SMTP allowed explicit source routing, deprecated post-DNS in RFC 2821). The documents which lay out how to use DNS to route are RFC 974, RFC 1035, and RFC 1123. So it's fair to say that RFC 1123 is really the point at which modern SMTP was developed.

But enough about history, and on to the real topic of this post. The most important artifact of SMTP-based architecture is that different protocols are used to send email from the ones used to read email. This is both a good thing and a bad thing. On the one hand, it's easier to experiment with different ways of accessing mailboxes, or only supporting limited functionality where such is desired. On the other, the need to agree on a standard format still keeps all the protocols more or less intertwined, and it makes some heavily-desired features extremely difficult to implement. For example, there is still, thirty years later, no feasible way to send a mail and save it to a "Sent" folder on your IMAP mailbox without submitting it twice [2].

The greatest flaws in the modern architecture, I think, lie in particular in a bevy of historical design mistakes which remain unmitigated to this day, in particular in the base message format and MIME. Changing these specifications is not out of the question, but the rate at which the changes become adopted is agonizingly slow, to the point that changing is generally impossible unless necessary. Sending outright binary messages was proposed as experimental in 1995, proposed as a standard in 2000, and still remains relatively unsupported: the BINARYMIME SMTP keyword only exists on one of my 4 SMTP servers. Sending non-ASCII text is potentially possible, but it is still not used in major email clients to my knowledge (searching for "8BITMIME" leads to the top results generally being "how do I turn this off?"). It will be interesting to see how email address internationalization is handled, since it's the first major overhaul to email since the introduction of MIME—the first major overhaul in 16 years. Intriguingly enough, the NNTP and Usenet communities have shown themselves to be more adept to change: sending 8-bit Usenet messages generally works, and yEnc would have been a worthwhile addition to MIME if its author had ever attempted to push it through. His decision not to (with the weak excuses he claimed) is emblematic of the resistance of the architecture to change, even in cases where such change would be pretty beneficial.

My biggest complaint with the email architecture isn't actually really a flaw in the strict sense of the term but rather a disagreement. The core motto of email could perhaps be summed up with "Be liberal in what you accept and conservative in what you send." Now, I come from a compilers background, and the basic standpoint in compilers is, if a user does something wrong, to scream at them for being a bloody idiot and to reject their code. Actually, there's a tendency to do that even if they do something technically correct but possibly unintentionally wrong. I understand why people dislike this kind of strict checking, but I personally consider it to be a feature, not a bug. My experience with attempting to implement MIME is that accepting what amounts to complete crap not only means that everyone has to worry about parsing the crap, but it actually ends up encouraging it. The attitude people get in bugs starts becoming "this is supported by <insert other client>, and your client is broken for not supporting it," even when pointed out that their message is in flagrant violation of the specification. As I understand it, HTML 5 has the luxury of specifying a massively complex parser that makes /dev/urandom in theory reliably parsed across different implementations, but there is no such similar document for the modern email message. But we still have to deal with the utter crap people claim is a valid email message. Just this past week, upon sifting through my spam folder, I found a header which is best described as =?UTF-8?Q? ISO-8859-1, non-ASCII text ?= (spaces included). The only way people are going to realize that their tools are producing this kind of crap is if their tools stop working altogether.

These two issues come together most spectacularly when RFC 2047 is involved. This is worth a blog post by itself, but the very low technically-not-but-effectively-mandatory limit on the header length (to aide people who read email without clients) means that encoded words need to be split up to fit on header lines. If you're not careful, you can end up splitting multibyte characters between different encoded words. This unfortunately occurs in practice. Properly handling it in my new parser required completely reflowing the design of the innermost parsing function and greatly increasing implementation complexity. I would estimate that this single attempt to "gracefully" handle wrong-but-of-obvious-intent scenario is worth 15% or so of the total complexity of decoding RFC 2047-encoded text.

There are other issues with modern email, of course, but all of the ones that I've collected so far are not flaws in the architecture as a whole but rather flaws of individual portions of the architecture, so I'll leave them for later posts.

[1] The capital 'I' in "Internet email" is important, as it's referring to the "Internet" in "Internet Standard" or "Internet Engineering Task Force." Thus, "Internet email" means "the email standards developed for the Internet/by the IETF" and not "email used on the internet."
[2] Yes, I know about BURL. It doesn't count. Look at who supports it: almost nobody.

September 14, 2013 10:50 PM

September 13, 2013

Benjamin Smedberg (bsmedberg)

Click-To-Play Plugin Telemetry

Last week we finally turned on click-to-play plugins as the default state for all plugins except Flash in Nightly builds (which will be Firefox 26). This is a milestone in giving Firefox users control over plugins and helping protect them from being exploited via unused and unwanted plugins.

As part of this feature, we have started to measure how users interact with the click-to-play UI. Nightly users aren’t typical, so this data probably doesn’t mean much yet, but it’s nice to see it in action:


This data shows how many different kinds of plugins were present in the plugin notification UI when each user saw it. When designing the notification, we wanted to streamline the common case, which we believed was that normally there would be only one kind of plugin on a page. This telemetry data will help verify our assumption. The current Nightly data shows a single type of plugin is the most common case, but not by as much as I originally thought:

# of Plugins Notification Count

1 32994

2 5935

3 179

4 3

5 or more 0


This data shows what user action triggered showing the plugin notification.

User Action Notification Count

Click on in-content plugin UI 23706

Click on location bar icon 15405

I’m surprised that so many users are clicking on the location bar icon. That may just be inquisitive users checking what each button does, but I’ll be monitoring this as it goes up the trains to the more representative beta population. If this stays very high, then we may have a problem with distracting users with unnecessary UI.


This data shows what action users are choosing to take in the plugin notification. Note that when multiple plugins are shown in the same notification, there will be a separate action for each plugin:

User Action Notification Count

Allow Now 16705

Allow Always 9196

Block 2199

I’m a little surprised at the distribution of “Allow Now” and “Allow Always”. When designing this UI, we expected that most users would want the “Allow Always” option, and we wanted to highlight that. But again, Nightly users are atypical and may not be a good sample. I’ll be watching this data also in beta.

I’m a wary of drawing any significant conclusions from early data, but I’m happy that we appear to be collecting the correct data and with the new telemetry dashboard it’s not hard to get at simple measurements such as this. Kudos to Taras, Mark Reid, and Chris Lonnen for getting that runing and the small daily improvements that make all our lives better.

September 13, 2013 03:51 PM

September 11, 2013

Mark Surman (surman)

Mozilla Maker Party: a global lab

Over the last three months, Mozilla set up a global lab. It’s called Maker Party. And its goal is to do real world experiments that invite people to teach, play with and test the thinking behind Mozilla Webmaker. This lets us learn and improve as we go. In this post I outline the questions we’re asking. In follow-on posts, a bunch of us will look at what we’re learning.

Diagram of Maker Party as it relates to Webmaker

What are we trying to test? At the broadest level, we want to test the idea that we can teach the culture and technology of the web to large numbers of people by tapping into maker culture and people’s desire to create. More specifically, Maker Party is asking:

1. How broad is (web)making? What do people want to make and learn?

The Webmaker program is a big tent: we support people who are teaching the culture and technology of the web no matter what tools they are using. Maker Party events reflect this, with people teaching everything from HTML to robots to paper prototyping. On the other hand, is currently focused on Mozilla’s tools. Maker Party helps us ask: What do people most want to make and learn? What’s our relationship to the broader maker movement? Depending on the answers, do we expand the scope of How?

2. Does our ‘making as learning’ approach work? Does it draw people in?

We built with the theory that people will learn how the web works fastest and best if we invite them to make something that delights them and that they are passionate about. The starter makes, new UX and the increased focus on remix that started to appear in Webmaker in June are all based around this theory. Maker Party helps us test this theory, both by seeing which aspects of the tools / content / site people were most drawn to and by asking mentors ‘what do you think people are learning?’.

3 What value can we provide people who want to teach the web?

People who want to teach others about the web are our first target for Webmaker. Spanning everyone from English teachers to web developers to teens who want to show their friends something cool, these are our ‘lead users’ They’re willing to kick the tires to help us improve. And they help us grow by bringing in more users (the people they want to teach). They are key to our early success. The question for Maker Party: What motivates these people? What value can we provide them? What can our tools, content and community offer to them that they can’t find elsewhere?

4. Can we grow our reach by working with partners?

Partnership has always been a core part of our ‘big tent’ approach with Webmaker. For Maker Party, we signed up dozens of partners to help us in the lab (and to run great programs for young people. They include: National Writing Project; Code Club; New York Hall of Science; Black Girls Code; Girl Scouts of America; MIT; California Academy of Science; E-Skills; Pycon Canada. Maker Party helps us ask: What motivates these partners? What value add can we offer to the programs they are already running? Are they helping us grow our reach and impact? Are we helping them do the same?

5. How do we build a global community of mentors?

One of the key goals of Maker Party is to grow and strengthen a lasting community of Webmaker mentors. With this in mind, we designed multi-step process that included: 1) recruiting and teaching mentors (Teach the Web MOOC); 2) offering a Mozilla Mentor badge to create a sense of belonging; 3) supporting mentors as they ran Maker Parties; 4) celebrating the best mentors; and 5) creating an ongoing mentor program for people to join post campaign. The questions for the lab: What parts of this worked? Do people want to stay involved? What does a formal ‘program’ for mentors look like? What content, infrastructure and staff do we need?

With over 1,000 Maker Parties under our belt, it’s time to start answering these questions. We have a great deal of real world experience and feedback to throw against the questions above. We also have a slate of formal user testing feedback on that we’re rolling into the design. And we have a growing network of excellent mentors who can help us both reflect and design next steps.

For my part, I’m going to write up reflections on Maker Parties I attended in the UK, the Philippines and Indonesia over the past week. People from across the Webmaker team will also be doing their own posts. And we may do a survey of Maker Party organizers based on the questions above. This will feed into how we evolve both the Webmaker program overall and

If you’ve been involved, I encourage you to do your own reflections. Blog. Tweet using the #makerparty hashtag. Post in the Webmaker mailing list. We’ve all got a to playing in making Webmaker better.

Filed under: drumbeat, education, learning, mozilla, webmakers

September 11, 2013 12:30 AM

September 08, 2013

Mark Surman (surman)

Coming out of my (blogging) shell

I noticed with a titch of horror last week that my last post on this blog was in May. Well, maybe horror is wrong. Embarrassment.

Why embarrassed? Blogging is one of the things I do to work and live out in the open. It’s essential as a way to explain the things I’m working on at Mozilla so others can understand and contribute. Also, if you like collaborating with people, it’s just good form to be transparent and show what you are thinking as you move through life trying to get things done.

Mitigating the embarrassment factor: there was a method to my silent summer madness. I’ve been spending the last few months focused on supporting others on my leadership team and on framework ideas to help us all as we go to the Mozilla Summit. Still, I didn’t expect to go dark for so long.

As of today, I’m back at blogging. My first priority is to look back at the Mozilla Maker Party campaign that the Webmaker team has been running all summer. Will put up a first post on that tomorrow. I also want to start writing about the Summit and some thoughts on where Mozilla might head in the future. Lots of thinking and typing ahead.


Filed under: drumbeat, mozilla, webmakers

September 08, 2013 05:54 PM

September 03, 2013

Benjamin Smedberg (bsmedberg)

Introducing Jydoop: Fast and Sane Map-Reduce

Analyzing large data sets is hard, but it’s often way harder than it needs to be. Today, Taras Glek and I are unveiling a new data-analysis tool called jydoop. Jydoop is designed to allow engineers without any experience in HBase/Hadoop to write data analyses on their local machine and then deploy the analysis to our production Hadoop cluster. We want to enable every Mozilla engineer to use telemetry and crash-stats data as effectively as possible.


Jydoop started with three simple goals:

Enable fast prototyping/testing

Setting up hadoop/hbase is unreasonably difficult and time-consuming, and engineers should not need a hadoop/hbase setup in order to write an analysis. Because Jydoop analyses are written in Python, they can be developed and tested on a local machine without Hadoop or even Java.

Don’t hide map/reduce

In a query language like pig, it is hard to know exactly how your query will perform; which tasks will run on the map nodes, which will run on the reducers, and which must be run on the final reduced data. Jydoop doesn’t hide those details: each analysis has simple map/combine/reduce functions so that engineers know how a clustered query is going to behave.

Be fast

Performance of clustered queries should be as good or better than our existing solutions using pig or other existing tools.


Telemetry data takes the form of a single blob of JSON which is stored in an hbase table. A telemetry ping may be larger than 100kb, and we typically receive about 2 million telemetry pings per day. In order to allow engineers to test an analysis, we save off a small sample of reports (5000 or so) in a single file. Then analysis scripts can be written and tested against the sample.

The simplest job is one which simply acts as a filter. The following script is used to select all telemetry records which have an “androidANR” key and save them to a file:

import telemetryutils

# Ask telemetryutils to set up our query by date range using the correct hbase tables
setupjob = telemetryutils.setupjob

def map(key, value, context):
    if value.find("androidANR") == -1:

    context.write(key, value)

To test this script against sample data, run python scripts/ sampledata.txt.
To run this script against a single day of telemetry data, run make ARGS='scripts/ anrresults-20130308.txt 20130308 20130308' hadoop.

More complex analyses are also possible. The following script will calculate the bucketed distribution of the size of the telemetry ping:

import telemetryutils
import jydoop
import json

setupjob = telemetryutils.setupjob

kBucketSize = float(0x400)

def map(key, value, context):
    j = json.loads(value)
    channel = j['info'].get('appUpdateChannel', None)

    # if .info.appUpdateChannel is not present, we don't care about this record
    if channel is None:
        return # Don't care about this record

    bucket = int(round(len(value) / kBucketSize))
    context.write((channel, bucket), 1)

combine = jydoop.sumreducer
reduce = jydoop.sumreducer

I was then able to produce this chart with the result data and a little JS:


History and Alternatives

The native language for Hadoop map/reduce jobs is Java. Writing an analysis directly in Java requires careful construction of a class hierarchy and is extremely tedious. Even more annoying, it’s basically impossible to test an analysis without having access to a hadoop/hbase environment. Even if you use something like the cloudera test VMs, setting up hbase and mapreduce jobs on test data is onerous.

It is a venerable Hadoop tradition to build wrapper tools around mapreduce which run distributed queries. There are tools such as Hadoop Streaming which allow map/reduce to forward to arbitrary programs via pipes. There are at least five different tools which already combine Hadoop with python. Unfortunately, none of these tools met the performance and prototyping requirements for Mozilla’s needs.

At Mozilla, we currently mostly use pig, a query language and execution tool which transforms a series of query and filter statements into one or more map/reduce jobs. In theory, pig has an execution mode which can prototype a query on local data. In practice, this is also very difficult to use, and so most people prototype their pig scripts directly on the production cluster. It works after a while, but it’s not especially friendly.


They key difference between Jydoop and the existing Python alternatives is that we use Jython to run the python code directly within the map and reduce jobs, rather than shelling out and communicating via pipes. This allows for much tighter integration with hadoop and also allows us to control the execution environment.

For maximum performance, we use a custom key/value class which serializes and compares python objects. For performance and sanity, this class operates on a limited subset of Python types: analysis scripts are limited to using only None, integers, floats, strings, and tuples of these basic types as their key and values.

During the process of developing Jydoop, we also discovered that JSON parsing performance was a significant factor in the overall job speed. The performance of the `json` module which is built into jython 2.7 is horrible. We got better but not great performance using jyson as a drop-in JSON replacement. Eventually we wrote a complete json replacement which wraps the standard Jackson streaming API. With these improvements in place, tt appears that jydoop performance beats pig performance on equivalent tasks.


Jydoop is now available on github.

We hope that jydoop will make it possible for many more people to work with telemetry, crash, and healthreport data. If you have any issues or questions about using jydoop with Mozilla data, please feel free to ask questions in If you are using jydoop for other projects, feel free to file github issues, submit github PRs, or email me directly and let me know what you’re up to!

September 03, 2013 09:32 PM

July 27, 2013

Mark Finkle (mfinkle)

Patterns of Effective Teams

I have been lucky to build software products for a few different companies, each with a distinct culture. It’s help me form opinions about people, tools and processes that make teams effective at shipping software products.

Who makes up a software product team? Mileage may vary, but I like to include:

Lots of companies organize people into functional groups: All the developers in a group, all the testers in a group, all the designers in a group… and so on. This doesn’t make it easy to ship software. It can create walls and make it harder to communicate. You also lose the “team” feeling, as well as the focus and drive that comes from that.

Product-centric teams seem to be more effective at shipping. These multidisciplinary teams embed members from the various groups on the team, all working together to create and ship a software product.

Over the years, I’ve seen productive teams using a few basic concepts. Some are process related, some can be aided by tools, but most deal with relationships between people:

I like to keep things lightweight. This includes tools and processes. Focus more on your product and the work at hand. Processes and tools can be distractions. The best ones are those that stay out of your way.

Update: Taras reminded me indirectly about the importance of passion, so I added it to the list.

July 27, 2013 08:08 PM

June 28, 2013

Blake Winton (bwinton)

Using Persona in Angular apps.

In my previous blog post, I mentioned a tool I’m writing to make it easy for designers to link mockups to live bugs. But I didn’t mention that I had a reasonably-working version of the tool written in Backbone which I’ve decided to port to Angular. The reasons why are largely beside the point of this blog post, but I’ll try to sum them up by saying that I reached a point where Backbone seemed to be confusing me more than helping me, and Angular got a lot of good press at FluentConf this year.

So this morning’s task in the re-write was to re-hook up the Persona integration. I had read recently that when you had a lot of dom-manipulation functions, you should probably put that code in a directive, and since I hadn’t written an Angular directive yet, I figured this would be a great time to learn how. Writing the html was pretty easy, of course, and most of the code from the existing implementation (which was largely based on the code from the express-persona readme) could be ported over fairly quickly. The only tricky part I ran into was figuring out that I needed to include restrict: 'E' in the Directive Definition Object. After I was done, I noticed that there really wasn’t that much in the code that had anything to do with the tool I’m writing, and thus I pulled it out into a separate repo so that other people can use it.

And with that, I announce Angular-Tools, a repo containing one or more tools which you might find useful if you build Angular apps. As always, pull requests and bug reports welcome!

June 28, 2013 07:16 PM

June 18, 2013

Blake Winton (bwinton)

Drawing lines with CSS.

One of the things I’m working on as part of my job1 at Mozilla is a tool to make it easy for designers to create mockups that are linked to live bugs, similar to the ones at Are We Pretty Yet. Now, I’ve got the background showing up, and the bugs overlayed on top of it, but as it stands, I’m requiring the designers to draw the lines connecting the bugs to the various areas in the mockup right on the mockup itself! This is obviously a fairly terrible idea, since it makes it much harder than it should to move stuff around after the fact, and requires a ton of up-front planning when creating the initial image. But what are my other options?

I thought for a while about layering a canvas element over the mockup; it would let me draw whatever shapes I wanted to, but passing the click events through to the mockup seemed like it would be fairly annoying, and I don’t think the connecting lines should appear in front of the boxes showing the bug details, which adds another wrinkle. Then, over lunch, I started to wonder what it would look like if a 1px by 1px black square got stretched and rotated with CSS… So I took some time after lunch, and played around a little, and it seems like it just might work! Give it a try, let me know if you have any ideas to make it better, and feel free to take the idea anywhere you think it might be useful!

Update: In the comments, Andrew points out that I could use a 1px by 1px span instead, which would make it much easier to change the colour of the line, so I’ve linked to his jsfiddle instead. :)

  1. Sometimes I still can’t believe how lucky I am to get to do this stuff all day, and get paid for it!  

June 18, 2013 07:01 PM

June 14, 2013

Blake Winton (bwinton)

Drawing lines with CSS.

One of the things I’m working on as part of my job1 at Mozilla is a tool to make it easy for designers to create mockups that are linked to live bugs, similar to the ones at Are We Pretty Yet. Now, I’ve got the background showing up, and the bugs overlayed on top of it, but as it stands, I’m requiring the designers to draw the lines connecting the bugs to the various areas in the mockup right on the mockup itself! This is obviously a fairly terrible idea, since it makes it much harder than it should to move stuff around after the fact, and requires a ton of up-front planning when creating the initial image. But what are my other options?

I thought for a while about layering a canvas element over the mockup; it would let me draw whatever shapes I wanted to, but passing the click events through to the mockup seemed like it would be fairly annoying, and I don’t think the connecting lines should appear in front of the boxes showing the bug details, which adds another wrinkle. Then, over lunch, I started to wonder what it would look like if a 1px by 1px black square got stretched and rotated with CSS… So I took some time after lunch, and played around a little, and it seems like it just might work! Give it a try, let me know if you have any ideas to make it better, and feel free to take the idea anywhere you think it might be useful!

Update: In the comments, Andrew points out that I could use a 1px by 1px span instead, which would make it much easier to change the colour of the line, so I’ve linked to his jsfiddle instead. :)

  1. Sometimes I still can’t believe how lucky I am to get to do this stuff all day, and get paid for it!  

June 14, 2013 06:57 PM

June 08, 2013

Guillermo López (willyaranda)

Te miramos con otro PRISMa

PRISM es un programa de la NSA para saber qué es lo que hacen ciudadanos del mundo en servicios en internet, y entregar esos datos al gobierno americano. Vamos, espionaje en toda regla.

En el mundo de internet, y si conocéis a Mozilla y cuál es su manifiesto, veréis que una de las claves es:

La seguridad de los usuarios en Internet es fundamental y no debe ser tratada como algo opcional.

Ahora, si vemos a las empresas del ámbito de internet que proporcionan un navegador (usadas perfectamente para rastrear a los usuarios, qué hacen, dar IPs y demás)…

Empresas colaboradoras

Microsoft. Google. Apple.

E incluso redes sociales, como Facebook; correos electrónicos (Yahoo, Hotmail); mensajería con Skype. Todo, absolutamente todo.

Pero falta alguien. Falta Mozilla. Falta la fundación con una cuota de mercado del navegador del 30%, que antepone los intereses de los usuarios a las de las corporaciones, porque no responde a nadie excepto a ti. Un navegador diferente, una historia diferente, un fin diferente.

Mozilla logo

June 08, 2013 11:40 PM

May 20, 2013

Mark Surman (surman)

Mozilla and Badges: where next?

Open Badges started as a modest experiment: build open source badge issuing software for ourselves and others. As momentum around this experiment has grown, it feels like the opportunity is bigger: we could build openness and user empowerment into how learning — and professional identity — work all across the web. With Open Badges 1.0 out there in the world, now is the right time to ask: where next for Mozilla and badges?

Badges Evolution

When Mozilla and MacArthur Foundation first started work on Open Badges about 18 months ago, the plan was to build a badge interchange standard (like SMTP for skills) and a collection of open source software for issuing and sharing badges (Badge Backpack, Open Badger, etc.). We’ve built all these things. And we’ve put up a reference implementation that Mozilla and others are using. This was really the limit of our original plan: build some basic open tech for badges and put it out there in the world.

The thing is: there has been way more excitement and pick up of badges than we expected. Even though Open Badges only launched officially in March, there are already over 800 unique providers who have issued almost 100,000 badges. We are also starting to see the development of city-wide systems where learners can pick up hundreds of different badges from across dozens of learning orgs and combine them all into a single profile. Chicago is the first city to do this (June 1), but Philadelphia and San Francisco are not far behind. And, this is just the tip of the iceberg: orgs like the Clinton Global Initiative and the National Science Foundation are focusing on badges in a way that is likely to drive even more educators to pick up the Open Badges standard, making their badges interoperable with others.

Of course, the fact that educators and policy makers are interested in badges doesn’t represent a victory in itself. It just shows momentum and buzz. The real opportunity — and the real impact — comes when learners and employers get excited about badges. Mozilla never planned to build offerings for these audiences. Increasingly, it feels like we should.

Badges Audiences

In the Internet era, people learn things online and out of school all the time. Whether you want to make a web page, knit a sweater or get better at calculus, the internet makes it easy to learn on your own or with a group of friends outside of a school setting. However, there is no good way to get credentials or recognition for this kind of learning. And, even if there was, there is no trusted, verifiable way to plug that recognition into Facebook, and other places that make up your online identity. People have no good way to show ‘what they know’ online.

Similarly, employers are increasingly turning to the internet to find talent. They use sites like LinkedIn that let you search online resumes. Or, increasingly, to sites like Gild and TalentBin that use data mining to find potential hires. The problem: these services do not offer granular or variable skills profiles. And, with some of them, there are significant issues around privacy: people are being offered up as potential hires without even knowing that these sites are collecting data about them.

Mozilla could offer a distributed, open source and privacy-friendly solution to problems like these. We could help learners show their skills in all their online profiles and also help employers search for talent reliably. However, to do so, we’d have to build a Firefox-quality offering for learners and employers on top of Open Badges. While this hasn’t been our focus up til now, I’m thinking more and more that this is something we should consider.

Badge product

In some ways, there is a parallel to Gecko and Firefox. Gecko provides the underlying platform for shaping standards around our vision of the web. But we need a popular consumer offering like Firefox if we want this vision to actually become relevant in the market. Right now, with Open Badges, we’re mostly just playing at the underlying standards layer. If we really want to shape how learning and professional identity work on the web, we probably need to build our own offerings directly for the people who most want and need badges.

Now is the time to be looking at where the opportunity is in this space. Momentum and demand is amongst educators is growing. More and more start ups are appearing in the badges, portfolio and skills spaces. And likelihood that badges will be important for learners and employers is growing. We need to be asking ourselves: how can Mozilla — and its values — shape this space?

With this in mind, Erin Knight is leading an effort over the next few months to look at different badges product options. She’ll be providing updates on her blog. And I’ll be summarizing here as well. If you have ideas on where Mozilla should go on all of this, we’d love to have you involved as we think this through. Comments here on this post are a good place to start.

Filed under: badges, drumbeat, education, learning, mozilla, webmakers

May 20, 2013 11:22 AM

May 10, 2013

Frank Hecker (hecker)

RIP Dennis Lane

I knew Dennis Lane only slightly: I occasionally commented on his blog, he commented on mine once or twice, and I met and talked to him several times at Howard County blogger meetups and other events. I can’t speak to his life as a private person and how he came to a violent end, and even if I could I wouldn’t: I don’t blog about my own private and family life, and won’t do so about others. However I did want to say a few words to mark his death.

What I admired most about Dennis as a blogger was his ability to write frequently and seemingly extemporaneously on a variety of topics: local politics, real estate and business affairs, and just the normal goings-on of daily life. He mentioned recently that he hadn’t had time to blog as much as normal, but (as I commented on the post), even his infrequent blogging way outpaced my infrequent offerings. I also enjoyed reading his Business Monthly columns, whenever I picked up a copy at a local establishment; it was one of the few things I read on paper rather than online. Leaving aside all his other contributions to Howard County, he’ll be sorely missed in the local blogging community, of which he was a founder and guiding light.

A couple of other points: In 2012 four people were murdered in Howard County and I was acquainted with two of them, Mary-Marguerite Kohn and Brenda Brewington, who were shot at St Peters Episcopal Church in Ellicott City almost exactly a year ago. This year I think there have been two homicides in Howard County thus far, and I’m acquainted with one of the victims. Strange that in a placid suburban county of 300,000 violent death should so disproportionately strike people I know.

FInally, it’s fitting that I first learned of Dennis’s death while reading a blog post, in this case by TJ Mayotte. It’s a measure of how much I rely on local blogs and hyper-local sites like Ellicott City Patch and Explore Howard for my knowledge of Howard County affairs. I guess one way to honor Dennis and what he meant to our local online community is to get off my rear end and blog some more myself.

Filed under: mozilla

May 10, 2013 11:49 PM

May 08, 2013

Joshua Cranmer (jcranmer)

Understanding the comm-central build system

Among the build systems peer, I am very much a newcomer. Despite working with Thunderbird for over 5 years, I've only grown to understand the comm-central build system in gory detail in the past year. Most of my work before then was in trying to get random projects working; understanding it more thoroughly is a result of attempting to eliminate the need for comm-central to maintain its own build system. The goal of moving our build system description to a series of files has made this task much more urgent.

At a high level, the core of the comm-central build system is not a single build system but rather three copies of the same build system. In simple terms, there's a matrix on two accesses: which repository does the configuration of code (whose config.status invokes it), and which repository does the build (whose is used). Most code is configured and built by mozilla-central. That comm-central code which is linked into libxul is configured by mozilla-central but built by comm-central. tier_app is configured and built by comm-central. This matrix of possibilities causes interesting bugs—like the bustage caused by the XPIDLSRCS migration, or issues I'm facing working with xpcshell manifests—but it is probably possible to make all code get configured by mozilla-central and eliminate several issues for once and all.

With that in mind, here is a step-by-step look at how the amorphous monster that is the comm-central build system works:

python checkout

And comm-central starts with a step that is unknown in mozilla-central. Back when everyone was in CVS, the process of building started with "check out from the server, set up your mozconfig, and then run make -f checkout." The checkout would download exactly the directories needed to build the program you were trying to build. When mozilla-central moved to Mercurial, the only external projects in the tree that Firefox used were NSPR and NSS, both of which were set up to pull from a specific revision. The decision was made to import NSPR and NSS as snapshots on a regular basis, so there was no need for the everyday user to use this feature. Thunderbird, on the other hand, pulled in the LDAP code externally, as well as mozilla-central, while SeaMonkey also pulls in the DOM inspector, Venkman, and Chatzilla as extensions. Importing a snapshot was not a tenable option for mozilla-central, as it updates at an aggressive rate, so the functionality of checkout was ported to comm-central in a replacement python fashion.

./configure [comm-central]

The general purpose of configure is to discover the build system and enable or disable components based on options specified by the user. This results in a long list of variables which is read in by the build system. Recently, I changed the script to eliminate the need to port changes from mozilla-central. Instead, this script reads in a few variables and tweaks them slightly to produce a modified command line to call mozilla-central's configure...

./configure [mozilla-central]

... which does all the hard work. There are hooks in the configure script here to run a few extra commands for comm-central's need (primarily adding a few variables and configuring LDAP). This is done by running a bit of m4 over another file and invoking that as a shell script; the m4 is largely to make it look and feel "like" autoconf. At the end of the line, this dumps out all of the variables to a file called config.status; how these get configured in the script is not interesting.

./config.status [mozilla/comm-central]

But config.status is. At this point, we enter the mozbuild world and things become very insane; failure to appreciate what goes on here is a very good way to cause extended bustage for comm-central. The mozbuild code essentially starts at a directory and exhaustively walks it to build a map of all the code. One of the tasks of comm-central's configure is to alert mozbuild to the fact that some of our files use a different build system. We, however, also carefully hide some of our files from mozbuild, so we run another copy of config.status again to add in some more files (tier_app, in short). This results in our code having two different notions of how our build system is split, and was never designed that way. Originally, mozilla-central had no knowledge of the existence of comm-central, but some changes made in the Firefox 4 timeframe suddenly required Thunderbird and SeaMonkey to link all of the mailnews code into libxul, which forced this contorted process to come about.


Now that all of the Makefiles have bee generated, building can begin. The first directory entered is the top of comm-central, which proceeds to immediately make all of mozilla-central. How mozilla-central builds itself is perhaps an interesting discussion, but not for this article. The important part is that partway through building, mozilla-central will be told to make ../mailnews (or one of the other few directories). Under recursive make, the only way to "tell" which build system is being used is by the directory that the $(DEPTH) variable is pointing to, since $(DEPTH)/config/ and $(DEPTH)/config/rules/mk are the files included to get the necessary rules. Since mozbuild was informed very early on that comm-central is special, the variables it points to in comm-central are different from those in mozilla-central—and thus comm-central's files are all invariably built with the comm-central build system despite being built from mozilla-central.

However, this is not true of all files. Some of the files, like the chat directory are never mentioned to mozilla-central. Thus, after the comm-central top-level build completes building mozilla-central, it proceeds to do a full build under what it thinks is the complete build system. It is here that later hacks to get things like xpcshell tests working correctly are done. Glossed over in this discussion is the fun process of tiers and other dependency voodoo tricks for a recursive make.

The future

With all of the changes going on, this guide is going to become obsolete quickly. I'm experimenting with eliminating one of our three build system clones by making all comm-central code get configured by mozilla-central, so that mozbuild gets a truly global view of what's going on—which would help not break comm-central for things like eliminating the master xpcshell manifest, or compiling IDLs in parallel. The long-term goal, of course, is to eliminate the ersatz comm-central build system altogether, although the final setup of how that build system works out is still not fully clear, as I'm still in the phase of "get it working when I symlink everything everywhere."

May 08, 2013 01:46 AM

May 07, 2013

Mark Surman (surman)

Maker Party: let’s spark a movement

Plans are coming together for Mozilla’s Maker Party 2013. And I’m getting excited. Last year’s party had people making things on the web at 700 local events in 80 countries. This year it’ll be bigger. But, more important, I think this year will plant the foundations for something that lasts well beyond the campaign: a movement of people who want to teach 10s of millions of people how the web works.

Maker Party 2013

Mozilla has built this kind of movement before: when we first launched Firefox. Many people just downloaded Firefox 1.0 because it was great. But others became on-the-ground evangelists and promoters. They told their friends about Firefox. They installed it on other people’s computers. They showed them how to use bookmarks, and pop-up blockers and add-ons. And, over 10,000 of them of them put up money to tell the world about Firefox in a historic two-page Sunday New York Times ad.

Firefox New York Times Ad

In my view, the mentors and local champions who will step up to organize the Mozilla Maker Party are just like the early enthusiasts who helped Firefox get to 500 million users. It’s these people who will show the first million Webmakers what they can make. Who will start awarding badges that reward people for their skill and creativity on the web. And who will create excitement about all the tools and programs across the web the empower people to make and create. These mentors and local champions are the core leaders that Mozilla needs if we want to teach the world the web.

Building on last year’s successful Summer Code Party, Maker Party 2013 has a number of pieces designed specifically to help mentors and local champions succeed. Five that I’m really excited about are:

1. Teach the Web: a nine-week free and open online course for people who want to be Mozilla mentors and local champions. It’s highly collaborative, convening nearly 3,000 participants to share their teaching practice, learning materials and learn to hack the web on the way. The course started last week, but you can still sign up here

2. Super mentors: these are the passionate volunteers who really make the online course and marquee Maker Parties happen. They are experienced in teaching the web, running events and creating teaching materials. Starting with their work on Teach the Web, the Super Mentors are the leadership core of the larger Webmaker Mentor community. We already have over 100 super mentors. We hope to have many more by the time Maker Party 2013 is done.

3. A big tent with more than 40 partner organizations joining the Maker Party and carrying out making-and-learning activities across the globe. Like Mozilla, these organizations are part of a growing movement to teach the web and promote the maker spirit with hands-on learning. This network of partners is critical to growing this movement: there is no way any one organization can do this on its own. Mentors can bring their own organizations into this tent as a way to get publicity and recognition, or just as a way to be part of the party.

4. Hackable Activity Kits: simple ‘instructables’ that you can use show people how to make web pages, Popcorn videos, etc. The guides are hackable, forkable HTML pages so you can customize them. OpenMatt explains these kits well in this post.

5. An improved We’re launching some new features on June 15 to designed for making and learning on the web. Not only have these tools have been designed with mentors in mind, we’ll also be taking mentor feedback and improving them on a constant basis.

While the Maker Party campaign runs from June to September, Mozilla’s hope is to build a lasting network of people around the world who want to teach people how the web works. In September, we’ll be inviting mentors and local organizers to stay involved in Webmaker. This will include invites to MozFest 2013 in London this October, opportunities for continued online mentoring and local organizing and a chance to help shape where we take the Webmaker mentor program in 2014+. In many ways, Maker Party is a kick off for these lasting activities.

Maker Party Timeline

If you are someone who wants to teach the world how the web works — or even just show a few people how to get more creative online — you should get involved. You can start by joining the Teach the Web course or just signing up for Maker Party 2013 updates. Also, start thinking about what you might want to do in your town or city in the coming months. Getting people excited about the web is actually pretty easy. And fun.

Filed under: drumbeat, mozilla, webmakers

May 07, 2013 03:38 PM

May 03, 2013

Guillermo López (willyaranda)

Firefox Talk and Hackday in Málaga

Last 19th of March, I went to Málaga to talk about Mozilla, FirefoxOS and the web with my colleague from Mozilla Hispano David Bengoa (who is from Málaga, but it’s in UK with an Erasmus).


In the morning, we had the talk where we explained what is Mozilla, our values, what we have done in the past, and what are we doing right know, with emphasis in the history of Internet browsers and how we can achieve a 30% of marketshare thanks to the awesome work of thousands of contributors.

Also, we shown different projects Mozilla has, like FirefoxOS, all about Labs (with asm.js and new JavaScript features) and more stuff, like WebAPIs, presented by David.

887138_4207373477582_604970753_o 1

And, of course, we gave some swag!! :D


And in the evening we had a 2 hours hack with people that mostly did not know about HTML5 or JavaScript, so we did a quick and small demo on the capabilities of the web friends and show some demos about it.

It was fun!

May 03, 2013 11:49 AM

April 30, 2013

Guillermo López (willyaranda)

Test your Gaia localization


You need some tools on your machine (tested on MacOS and GNU/Linux, not yet on Windows – feedback appreciated).

Clone Gaia

  $ git clone
  $ cd gaia

Clone locales

You need to clone all locales that you want to have in the device (and set those in the languages.json from the next step).

  $ cd locales
  $ hg clone URL

where the URL depends on what are you trying on:

v1.0.1 is the first version (“es”, “pr-BR” and “en-US”) and you should work on v1-train or even master.

Set environment variables

You need to export the following variables. You can do that on your .bashrc file or even on the build time (before the command).

  export LOCALE_BASEDIR=/path/to/gaia/locales/
  export LOCALES_FILE=/path/to/gaia/shared/resources/languages.json

where languages.json must be edited to select what locales you want to test. For example:

  "ca" : "Catalan",
  "en-US" : "English (US)",
  "es" : "Español",
  "eu" : "Euskara",
  "gl" : "Galego"

Build and flash

Go to the gaia cloned folder, and just:

  $ make reset-gaia

if you want to test on your device (and you have it connected) or

  $ make profile

if you want to use it with your b2g-desktop build from here.

You can also put the environment variables before the launch of the make like:

  $ LOCALE_BASEDIR=/path/to/gaia/locales/ LOCALES_FILE=/path/to/gaia/shared/resources/languages-all.json make reset-gaia


Once finished, you will have a new Gaia build on your phone with the languages you just flahsed. Or you can use your b2g-18 binary to run it on desktop, with:

 # Profile Ready: please run [b2g|firefox] -profile /Users/willyaranda/projects/gaia-mozilla/profile
 $ /path/to/b2g-19/binary -profile /path/from/above/output

And that’s it. Enjoy your new gaia in your language!

April 30, 2013 03:02 PM

April 27, 2013

Guillermo López (willyaranda)

Dogfood your Geeksphone Keon or Peak

What a nice hardware for the first phones with FirefoxOS!

I just got a Geeksphone Keon (the little kid) and a Geeksphone Peak (the teenager) and there are some things I need to have on my phones to consider them real, like Twitter, Facebook, a Chat app and my Google Contacts.


If you like microblogging as I do, you need to install this app. Just go to the marketplace and install it. It lacks HTML5 offline cache to be really fast (it’s a really bad UX to open the app and wait for 10 seconds to start see something on the screen…), but it’s a first good step. Lacks desktop notifications…

Desktop notifications in HTML5

In JavaScript:

var not = navigator.mozNotification.createNotification('New mention from @willyaranda', 'This is shown on the Desktop!', null);;



Again, this is on the marketplace. It’s the mobile version of their portal, so the same as Works as expected. Lacks desktop notifications, that could be really useful (new message, new wall post…)


Chat app

We have some kind of apps for chatting, but nothing final. I am waiting for a Spanish company which will be releasing LOQUI, that will be the chat killer app for smartphones!

Google Contacts

If you have syncronization enabled on your Android and like Google services, you probably have all contacts in the cloud… And you can add them to your FirefoxOS phone!

So with this app, created by the awesome Francisco Jordano (@mepartocontigo) from O2, you will be able to download your Google Contacts and add (with picture!) to your FirefoxOS phone. What else do you need? :D

But the app is not yet on the Marketplace, but you can use it right now! What are the steps?

  1. Clone or download (and unzip) the project on Github
  2. Install the FirefoxOS Simulator on your Firefox from this page. Choose your correct platform!
  3. Once installed, go to Firefox -> Tools -> Web developer -> FirefoxOS Simulator
  4. On your FirefoxOS phone, go to Settings > Information > More information > Developer and enable “Remote debugging”
  5. Remote debugging

  6. Connect your phone, and the Simulator will show a “Device connected” at the left
  7. Click on the Push button.
  8. simulator

  9. The simulator will be open and the app installed. You can find on the homescreen. Open it. Follow the steps!
  10. importer

Protip: You can Control-R (Cmd+R on Mac) to refresh the app, that the simulator will package and push, and open again. So wonderful!

Happy dogfooding!

April 27, 2013 04:50 PM

April 25, 2013

Mark Surman (surman)

5 great things about Webmaker v2

A better picture of Webmaker v2 has snapped into focus over the past few weeks. The current plan builds on the ‘Webmaker as a popular way to make and learn on the web’ vision we set out in December. What’s clearer now is our focus on people who already take photos, blog and create online: we give them new ways to make, remix. and improve their craft. We also them access to mentors committed to helping others learn how the web works.


In this post, I wanted to pull out my top 5 list of things I’m excited about in Webmaker v2:

 1. Rebooting the brand to focus on makers of all ages

Cassie, Kate, Chris and others already working to reboot the Webmaker brand and UX to really emanate the maker spirit.


The idea is to appeal to teens and above, not kids. Also, to target people who already ‘make’ in some sense. You can see hints of this in their early mockups.

2. Building a gallery to show all the awesome makes

The biggest gap in Webmaker v1 was the lack of a gallery where you can see what people made. Fixing this is the top priority for Webmaker v2.


The site will lead with tiles of the best things people have made. More importantly, the site will be filled with all sorts of different galleries: makes that teach you how to make a similar thing; makes you made; makes on specific themes; makes that are actually curriculum materials.

3. Creating a Make API so anyone can make a gallery

In related news: we’ve started work on a ‘Make API’ that will let anyone pull a slice of Webmaker content to create their own gallery or service.


At the simplest level, this is a win as it gives us a common publishing model for both Thimble and Popcorn. But, in the long run, the Make API could be something more radical: it’s way for people to store, describe, slice, dice and share any blob of HTML from across the web. Ultimately, it could help people to take control of all the things they make online, no matter where they’ve made them.

 4. Deepening learning w/ challenges + badges

Webmaker v2 will include peer reviewed badges based where: 1) we describe a skill; 2) someone submits something they made that demonstrates that skill; 3) a peer or mentor reviews the submission and awards the badge (or not).


This is exciting because a) we can badge for skills defined in the Mozilla web literacy standard and b) people can submit ‘makes’ made with any tool (e.g. Scratch). This second piece is essential if we want to open things up widely on the making as learning front: people don’t just want to make things with Popcorn and Thimble.

5. Making it easy to make hacktivity kits using Thimble

For Webmaker to succeed, we need any mentor in our network to be able to write or remix Hackable Activity Kits.


Currently, that’s difficult as our learning materials are all hard coded web pages that need someone with commit privileges to check in. We’re going to change this by making it possible to create these kits directly in Thimble and then creating a special gallery for these pages. The result: a constantly updated community run gallery of learning materials.

These 5 things — and everything in the Webmaker v2 product vision — represent a big leap forward. When we started 2013, we had a fragmented offering with no single sign on, no gallery and no publishing model. We’re moving to a place where we not only have a unified offering but also something that is flexible in terms of how people publish and how they learn. New features and improvements will roll out weekly over the course of the summer, starting June 15. If you want to track progress on Webmaker v2, follow this scrumbug and this Tumblr blog.

Filed under: drumbeat, learning, mozilla, webmakers

April 25, 2013 01:52 PM

April 24, 2013

Benjamin Smedberg (bsmedberg)

Graph of the Day: Virtual and Physical Memory Starvation

Today’s graph is a scatter plot of out-of-memory crashes. It categorizes crashes according to the smallest block of available VM and the amount of available pagefile space.

There were roughly 1000 crashes due to bug 829954 between 10-April and 15-April 2013. Click on individual crash plots to see memory details and a link to the crash report.

Direct link to SVG file. Link to raw data.


After graphing these crashes, it seems clear that there are two distinct issues:

I was surprised to learn that not all of these crashes were caused by the VM leak.

The short-term solution for this issue remains the same: the Mozilla graphics engine should stop using the infallible/aborting allocator for graphics buffers. All large allocations (network and graphics buffers) should use the fallible allocator and take extra effort to be OOM-safe.

Long-term, we need Firefox to be aware of the OS memory situation and continue to work on memory-shrinking behavior when the system starts paging or running out of memory. This includes obvious behaviors like throwing away the in-memory network and graphics caches, but it may also require drastic measures such as throwing away the contents of inactive tabs and reloading them later.

Charting Technique

With this post, I am starting to use a different charting technique. Previously, I was using the Flot JS library to generate graphs. Flot makes pretty graphs (although it doesn’t support labeling Axes without a plugin!). It also features a wide range of plugins which add missing features. But often, it doesn’t do exactly what I want and I’ve had to dig deep into its guts to get things to look good. It is also cumbersome to include dynamically generated JS graphs in a blog post, and the prior graphs have been screenshots.

This time around, I generated the graph as an SVG image using the svgwrite python library. This allows me to put the full SVG graph directly into the blog, and it also allows me to dynamic features such as rollovers directly in these blog posts. Currently I’m setting up the axes and labels manually in python, but I expect that this will turn into a library pretty quickly. I experimented with svgplotlib but the installation requirements were too complex for my needs.

I’m not sure whether or not the embedded SVG will make it through feed aggregators./readers or not. Leave comments if you see weird results.

April 24, 2013 07:55 PM

April 22, 2013

Benjamin Smedberg (bsmedberg)

Graph of the Day: Empty Minidump Crashes Per User

Sometimes I make a graph to confirm a theory. Sometimes it doesn’t work. This is one of those days.

I created this graph in an attempt to analyze bug 837835. In that bug, we are investigating an increase in the number of crash reports we receive which have an empty (0-byte) minidump file. We’re pretty sure that this usually happens because of an out-of-memory condition (or an out of VM space condition).

Robert Kaiser reported in the bug that he suspected two date ranges of causing the number of empty dumps to increase. Those numbers were generated by counting crashes per build date. But they were very noisy, partly because they didn’t account for the differences in user population between nightly builds.

In this graph, I attempt to account for crashes per user. This was a slightly complicated task, because it assembles information from three separate inputs:

Unfortunately, each of these systems has their own way of representing build IDs, channel information, and operating systems:


Build ID





string “yyyymmddhhmmss”





integer yyyymmddhhmmss




“Firefox” (from product_versions)

integer yyyymmddhhmmss

“Nightly” when selecting from reports_clean.release_channel, but “nightly” when selecting from reports.release_channel.

“Windows NT”, but only when a valid minidump is found: when there is an empty minidump, os_name is actually “Unknown”

In this case, I’m only interested in the Windows data, and we can safely assuming that almost all of the empty minidump crashes occur on Windows. The script/SQL query to collect the data simply limits each data source separately and then combines them after they have been limited to windows nightly builds, users, and crashes.

Frequency of Empty Dump crashes on Windows Nightlies

This missing builds are the result of ftpscraper failure.

I’m not sure what to make of this data. It seems likely that we may have fixed part of the problem in the 2013-01-25-03-10-18 nightly. But I don’t see a distinct regression range within this time frame. Perhaps around 25-December? Of course, it could also be that the dataset is so noisy that we can’t draw any useful conclusions from it.

April 22, 2013 07:40 PM

Mark Surman (surman)

April Mozilla Board Slides: Q1 Review

At last week’s Mozilla Foundation board meeting, we looked at what we’ve done so far in 2013 and what we need to do next. Key messages from the discussion: We’re making good progress on Webmaker. We we shipped better Popcorn and Badges tools. We added a ‘teach’ section to We undertook experiments with new kinds of remixable content.

2013-04-21 20.54.38

However, we still need to roll all this into a Webmaker v2 that will excite and provide value to makers. Also, we need to recognize that we’re doing more than just Webmaker this year: Open Badges is growing even more rapidly than expected. I’ve posted slides from the board meeting here and summarized the content below

As a reminder: our overarching goal for 2013 is to turn Webmaker in a popular way to both make and learn on the web. We set these more specific goals:

While we didn’t explicitly make it a top level goal, it’s clear that ‘make Open Badges successful’ and ‘respond to the demand we’re seeing for badges’ have also become major priorities for 2013.

We’ve made a solid start on all these goals in Q1: building the foundations for Webmaker v2 and growing the Open badges project significantly. Some highlights re: things we shipped and balls we moved:

While this is solid progress, it’s important to recognize that we still need to roll all of this into a Webmaker v2 that will truly excite and provide value to makers, mentors and learners. Challenges we face include:

Another challenge is that the scope of our goals is shifting: Open Badges and Open News continue to grow as major initiatives above an beyond what we’re doing with Webmaker. We need to accept the fact that we’re still a multi product / project org and find a way to better support this growth.

The good news: we have a clear plans in place that aim directly at these challenges. The top three priorities as we move through Q2 are:

Work on all three of these priorities is well underway and we are making good progress. As we do this work, there a three questions we should be actively discussing:

We should all be keeping these things in mind as we build out Webmaker v2, Maker Party and Chicago Summer of Learning, especially the question: are we providing value?

It’s an interesting and intense time. Real traction on our big dreams is within sight: a Mozilla-backed movement where people champion creativity and making on the web; a new era of remixable, Legolike web content; a world of learning that works like the web. At the same time, we’re all heads down on the details of building tools, shipping web sites, making content, writing curriculum and recruiting partners. While it can be stressful, this its actually a very good, Mozilla-like place to be. Our hands are mucky shipping things while we are still aimed at and inspired by big dreams of making the web a better place.

Over the next few months, its going to be important to help each other keep this balance. Reach out to someone working on another part of the project to understand what they are working on. Pitch in as people test and irritate what they’re building. Offer advice to new community members as they show up for the first time (thats going to start happening slot). It may feel like we’re all working on different things: but everything we’re doing all points in the same direction of inspiring and empowering people using the web.

Filed under: drumbeat, mozilla, statusupdate, webmakers

April 22, 2013 01:12 PM

April 19, 2013

Benjamin Smedberg (bsmedberg)

Graph of the Day: Old Flash Versions and Blocklist Effectiveness

Today’s graph charts the percentage of Firefox users who have known-insecure versions of Flash. It also allows us to visually see the impact of various plugin blocks that have been staged over the past few months.

We are gradually rolling out blocks for more and more versions of Flash. In order to make sure that the blocklist was not causing significant user pain, we started out with the oldest versions of Flash that have the fewest users. We have since been expanding the block to include more recent versions of Flash that are still insecure. We hope to extend these blocks to all insecure versions of Flash in the next few months.

Flash Insecure Release Distribution

From the data, we see that users on very old versions of Flash (Flash 10.2 and earlier) are not changing their behavior because of the blocklist. This either means that the users never see Flash content, or that they always click through the warning. It is also possible that they attempted to upgrade but for some reason are unable.

Users with slightly newer versions seem more likely to upgrade. Over about a month, almost half of the users who had insecure versions of Flash 10.3-11.2 have upgraded.

Finally, it is interesting that these percentages drop down on the weekends. This indicates that work or school computers are more likely to have insecure versions of Flash than home computers. Because there are well-known exploits for all of these Flash versions, this represents a significant risk to organizations who are not keeping up with security updates!

View the chart in HTML version and the raw data. This data was brought to you by Telemetry, and so the standard cautions apply: telemetry is an opt-in sample on the beta/release channels, and may under-represent certain populations, especially enterprise deployments which may lock telemetry off by default. This data represents Windows users only, because we just recently started collecting Flash version information on Mac, and the Linux Flash player doesn’t expose its version at all.

Raw aggregates for Flash usage can be found in my dated directories on, for example yesterday’s aggregate counts. You are welcome to scrape this data if you want to play with it; I am also willing to provide interested researchers with additional data dumps on request.

April 19, 2013 03:04 PM

April 15, 2013

Benjamin Smedberg (bsmedberg)

Chart of the Day: Firefox Nightly Update Adoption Curves

In general, people who are running the Firefox Nightly and Aurora channel are offered a new build every day. But users don’t update immediately, because Firefox does not interrupt you with an update prompt upon receiving an update. Instead it waits and applies the update at the next Firefox restart, or prompts the user to update only after significant idle time.

This means that there is a noticeable “delay” between a nightly build and when people start reporting bugs or crashes against the build. It also means that the number of users using any particular nightly build can vary widely. The following charts demonstrate this variability and the update adoption curves:

Per-build usage and adoption curves, Firefox nightly builds on Windows, 1-March to 14-April 2013
Overlapped adoption curves, 1-March to 14-April 2013

Because of this variability, engineers and QA should use care when using data from nightly builds. Note the following conclusions and recommendations:

This data was collected from ADU data provided by metrics and mirrored in the crash-stats database. The script used to collect this data is available in socorro-toolbox.

April 15, 2013 07:09 PM

April 12, 2013

Benjamin Smedberg (bsmedberg)

Graph of the Day: Crash Report Metadata

When Firefox crashes, it submits a minidump of the crash; it also submits some key/value metadata. The metadata includes basic information about the product/version/buildid, but it also includes a bunch of other information that we use to group, correlate, and diagnose crashes. Using data collected with jydoop, I created a graph of how frequently various metadata keys were submitted on various platforms:

Frequency of Crash Report Metadata on 2-April-2013

The jydoop script to collect this data is simple:

import crashstatsutils
import json
import jydoop

setupjob = crashstatsutils.dosetupjob([('meta_data', 'json'), ('processed_data', 'json')])

def map(k, meta_data, processed_data, context):
    if processed_data is None:

        meta = json.loads(meta_data)
        processed = json.loads(processed_data)
        context.write('jsonerror', 1)

    product = meta['ProductName']

    os = processed.get('os_name', None)
    if os is None:

    context.write((product, os, '$total'), 1)
    for key in meta:
        context.write((product, os, key), 1)

combine = jydoop.sumreducer
reduce = jydoop.sumreducer

View the raw data (CSV).

If you are interested in learning what each piece of metadata means and is used for, I’ve started to document them on the Mozilla docs site.

This graph was generated using the flot JS library and the code is available on github.

April 12, 2013 05:31 PM

April 11, 2013

Benjamin Smedberg (bsmedberg)

Graph of the Day: Firefox Virtual Memory Plot

I spend a lot of time making sense out of data, and so I’m going to try a new “Graph of the Day” series on this blog.

Today’s plot was created from a crash report submitted by a Firefox user and filed in bugzilla. This user had been experiencing problems where Firefox would, after some time, start drawing black boxes instead of normal content and soon after would crash. Most of the time, his crash report would contain an empty (0-byte) minidump file. In our experience, 0-byte minidumps are usually caused by low-memory conditions causing crashes. But the statistics metadata reported along with the crash show that there was lots of available memory on the system.

This piqued my interest, and fortunately, at least one of the crash reports did contain a valid minidump. Not only did this point us to a Firefox bug where we are aborting when large allocations fail, but it also gave me information about the virtual memory space of the process when it crashed.

When creating a Windows minidump, Firefox calls the MinidumpWriteDump function with the MiniDumpWithFullMemoryInfo flag. This causes the minidump to contain a MINIDUMP_MEMORY_INFO_LIST block, which includes information about every single block of memory pages in the process, the allocation base/size, the free/reserved/committed state, whether the page is private (allocated) memory or some kind of mapped/shared memory, and whether the page is readable/writable/copy-on-write/executable.

(view the plot in a new window).

There are two interesting things that I learned while creating this plot and sharing it on

Virtual Memory Fragmentation

Some code is fragmenting the page space with one-page allocations. On Windows, a page is a 4k block, but page space is not allocated in one-page chunks. Instead, the minimum allocation block is 16 pages (64k). So if any code is calling VirtualAlloc with just 4k, it is wasting 16 pages of memory space. Note that this doesn’t waste memory, it only wastes VM space, so it won’t show up on any traditional metrics such as “private bytes”.

Leaking Memory Mappings

Something is leaking memory mappings. Looking at the high end of memory space (bottom of the graphical plot), hover over the large blocks of purple (committed) memory and note that there are many allocations that are roughly identical:

Given the other memory statistics from the crash report, it appears that these blocks are actually all mapping the same file or piece of shared memory. And it seems likely that there is a bug somewhere in code which is mapping the same memory repeatedly (with MapViewOfFile) and forgetting to call UnmapViewOfFile when it is done.


We’re still working on diagnosing this problem. The user who originally reported this issue notes that if he switches his laptop to use his integrated graphics card instead of his nvidia graphics, then the problem disappears. So we suspect something in the graphics subsystem, but we’re not sure whether the problem is in Firefox accelerated drawing code, the Windows D3D libraries, or in the nvidia driver itself. We are looking at the possibility of hooking allocations functions such as VirtualAlloc and MapViewOfFile in order to find the call stack at the point of allocation to help determine exactly what code is responsible. If you have any tips or want to follow along, see bug 859955.

April 11, 2013 04:43 PM

April 10, 2013

Joshua Cranmer (jcranmer)


When running final tests for my latest patch queue, I discovered that someone has apparently added a new color to the repertoire: a hot pink. So I now present to you a snapshot of TBPL that uses all the colors except gray (running), gray (pending), and black (I've never seen this one):

April 10, 2013 09:00 PM

April 07, 2013

Joshua Cranmer (jcranmer)

JSMime status update

This post is admittedly long overdue, but I kept wanting to delay this post until I actually had patches up for review. But I have the tendency to want to post nothing until I verify that the entire pipeline consisting of over 5,000 lines of changes is fully correct and documented. However, I'm now confident in all but roughly three of my major changes, so patch documentation and redistribution is (hopefully) all that remains before I start saturating all of the reviewers in Thunderbird with this code. An ironic thing to note is that these changes are actually largely a sidetrack from my original goal: I wanted to land my charset-conversion patch, but I first thought it would be helpful to test with nsIMimeConverter using the new method, which required me to implement header parsing, which is best tested with nsIMsgHeaderParser, which turns out to have needed very major changes.

As you might have gathered, I am getting ready to land a major set of changes. This set of changes is being tracked in bugs 790855, 842632, and 858337. These patches are implementing structured header parsing and emission, as well as RFC 2047 decoding and encoding. My goal still remains to land all of these changes by Thunderbird 24, reviewers permitting.

The first part of JSMime landed back in Thunderbird 21, so anyone using the betas is already using part of it. One of the small auxiliary interfaces (nsIMimeHeaders) was switched over to the JS implementation instead of libmime's implementation, as well as the ad-hoc ones used in our test suites. The currently pending changes would use JSMime for the other auxiliary interfaces, nsIMimeConverter (which does RFC 2047 processing) and nsIMsgHeaderParser (which does structured processing of the addressing headers). The changes to the latter are very much API-breaking, requiring me to audit and fix every single callsite in all of comm-central. On the plus side, thanks to my changes, I know I will incidentally be fixing several bugs such as quoting issues in the compose windows, a valgrind error in nsSmtpProtocol.cpp, or the space-in-2047-encoded-words issue.

It's not all the changes, although being able to outright remove 2000 lines of libmime is certainly a welcome change. The brunt of libmime remains the code that is related to the processing of email body parts into the final email display method, which is the next target of my patches and which I originally intended to fix before I got sidetracked. Getting sidetracked isn't altogether a bad thing, since, for the first time, it lets me identify things that can be done in parallel with this work.

A useful change I've identified that is even more invasive than everything else to date would be to alter our view of how message headers work. Right now, we tend to retrieve headers (from, say, nsIMsgDBHdr) as strings, where the consumer will use a standard API to reparse them before acting on their contents. A saner solution is to move the structured parsing into the retrieval APIs, by making an msgIStructuredHeaders interface, retrievable from nsIMsgDBHdr and nsIMsgCompFields from which you can manipulate headers in their structured form instead of their string from. It's even more useful on nsIMsgCompFields, where keeping things in structured form as long as possible is desirable (I particularly want to kill nsIMsgCompFields.splitRecipients as an API).

Another useful change is that our underlying parsing code can properly handle groups, which means we can start using groups to handle mailing lists in our code instead of…the mess we have now. The current implementation sticks mailing lists as individual addresses to be expanded by code in the middle of the compose sequence, which is fragile and suboptimal.

The last useful independent change I can think of is rewriting the addressing widget in the compose frontend to store things internally in a structured form instead of the MIME header kludge it currently uses; this kind of work could also be shared with the similar annoying mailing list editing UI.

As for myself, I will be working on the body part conversion process. I haven't yet finalized the API that extensions will get to use here, as I need to do a lot of playing around with the current implementation to see how flexible it is. The last two entry points into libmime, the stream converter and Gloda, will first be controlled by preference, so that I can land partially-working versions before I land everything that is necessary. My goal is to land a functionality-complete implementation by Thunderbird 31 (i.e., the next ESR branch after 24), so that I can remove the old implementation in Thunderbird 32, but that timescale may still be too aggressive.

April 07, 2013 09:47 PM

April 04, 2013

Mark Finkle (mfinkle)

Firefox for Android: Subscribing to Feeds

Firefox on desktop has nice support for previewing and subscribing to syndication feeds (RSS and Atom). There is even support for creating Live Bookmarks. Firefox for Android does not support anything related to syndication feeds… until now.

We just landed basic support for subscribing to feeds discovered on a web page. This is only initial support, so many things are not supported.

How it works:

The limitations:

In pictures:

Long-tap on the URLBar

Choose “Subscribe to Page”

Choose the feed

Choose the web service

April 04, 2013 06:56 PM

Following the Firefox for Android Team

On The Web

If you want to stay up to date on any new developments in Firefox for Android (Fennec), check out the new Tumblr and Twitter stream. Lots of updates on what’s landing in Fennec Nightly, tips and tricks and summaries of the Mobile Engineering Team meetings.

Mailing List

We have a new mailing list at The newsgroup at is being closed. Use the new mailing list to following along and give feedback on topics being discussed, or post your own ideas.


As always, you can jump on Mozilla IRC and talk directly to the Mobile team in the #mobile channel. This is great if you want to start working on the code, need help writing an add-on, want to help investigate a bug or just rant about some design decision.

April 04, 2013 01:17 PM

March 07, 2013

Benjamin Smedberg (bsmedberg)

Do You Love Your Debugger?

Short version: I’m hiring, apply here!

Do you love your debugger? Are you interested in a job making a better Firefox for millions of people? Making Firefox a rock-solid software program involves being a dedicated puzzle-solver. Figuring out what’s causing a crash, hang, or intermittent performance problem is a combination of data gathering, statistics, debugging, code reading, and proactive coding and debugging features.

Familiarity or even love of debugging is an important part of the job. The set of information in a crash minidump is limited, and frequently the stack trace is garbled. Understanding calling conventions and memory layout of compiled code is an important skill! Being able to read disassembly is essential. Familiarity with debugging across multiple platforms (Windows/Mac/Linux/Android) is going to be important.

The Mozilla crash and telemetry data is a treasure trove of information just waiting for analysis. The servers have some important builtin features for classification and querying, but the stability team frequently needs to slice the data in new ways. Familiarity with data processing using python, SQL, hbase, and elasticsearch is helpful.

Many of our stability issues are caused by third-party software (more than half of our crashes, according to recent statistics). This position will undoubtedly involve working with partners and engineers from other projects and companies to diagnose and fix problems.

Finally, we are always looking for ways to solve entire classes of stability issues. For example, we moved browser plugins into their own process so that even if a plugin crashes, it doesn’t affect the entire browser. We are continuing to reduce the impact of plugin and extension issues by having users opt into these addons. To be effective at these types of projects, an engineer will need to be a productive coder who is comfortable learning new pieces of code quickly, as well as writing in C++ as well as JavaScript.

If you’re interested, apply online, and feel free to contact me directly if you’ve got questions!

March 07, 2013 06:26 PM

February 16, 2013

Joshua Cranmer (jcranmer)

Why software monocultures are bad

If you do anything with web development, you are probably well aware that Opera recently announced that it was ditching its Presto layout engine and switching to Webkit. The reception of the blogosphere to this announcement has been decidedly mixed, but I am disheartened by the news for a very simple reason. The loss of one of the largest competitors in the mobile market risks entrenching a monoculture in web browsing.

Now, many people have attempted to argue against this risk by one of three arguments: that Webkit already is a monoculture on mobile browsing; that Webkit is open-source, so it can't be a "bad" monoculture; or that Webkit won't stagnant, so it can't be a "bad" monoculture. The first argument is rather specious, since it presumes that once a monoculture exists it is pointless to try to end it—walk through history, it's easier to cite examples that were broken than ones that weren't. The other two arguments are more dangerous, though, because they presume that a monoculture is bad only because of who is in charge of it, not because it is a monoculture.

The real reason why monocultures are bad are not because the people in control do bad things with it. It's because their implementations—particularly including their bugs—becomes the standards instead of the specifications themselves. And to be able to try to crack into that market, you have to be able to reverse engineer bugs. Reverse engineering bugs, even in open-source code, is far from trivial. Perhaps it's clearer to look at the problems of monoculture by case study.

In the web world, the most well-known monoculture is that of IE 6, which persisted as the sole major browser for about 4 years. One long-lasting ramification of IE is the necessity of all layout engines to support a document.all construct while pretending that they do not actually support it. This is a clear negative feature of monocultures: new things that get implemented become mandatory specifications, independent of the actual quality of their implementation. Now, some fanboys might proclaim that everything Microsoft does is inherently evil and that this is a bad example, but I will point out later known bad-behaviors of Webkit later.

What about open source monocultures? Perhaps the best example here is GCC, which was effectively the only important C compiler for Linux until about two years ago, when clang become self-hosting. This is probably the closest example I have to a mooted Webkit monoculture: a program that no one wants to write from scratch and that is maintained by necessity by a diverse group of contributors. So surely there are no aftereffects from compatibility problems for Clang, right? Well, to be able to compile code on Linux, Clang has to pretty much mimic GCC, down to command-line compatibility and implementing (almost) all of GCC's extensions to C. This also implies that you have to match various compiler intrinsics (such as those for atomic operations) exactly as GCC does: when GCC first implemented proper atomic operations for C++11, Clang was forced to change its syntax for intrinsic atomic operations to match as well.

The problem of implementations becoming the de facto standard becomes brutally clear when a radically different implementation is necessary and backwards compatibility cannot be sacrificed. IE 6's hasLayout bug is a good example here: Microsoft thought it easier to bundle an old version of the layout engine in their newest web browser to support old-compatibility webpages than to try to adaptively support it. It is much easier to justify sacking backwards compatibility in a vibrant polyculture: if a website works in only one layout engine when there are four major ones, then it is a good sign that the website is broken and needs to be fixed.

All of these may seem academic, theoretical objections, but I will point out that Webkit has already shown troubling signs that do not portend to it being a "good" monoculture. The decision to never retire old Webkit prefixes is nothing short of an arrogant practice, and clearly shows that backwards compatibility (even for nominally non-production features) will be difficult to sacrifice. The feature that became CSS gradients underwent cosmetic changes that made things easier for authors that wouldn't have happened in a monoculture layout engine world. Chrome explicitly went against the CSS specification (although they are trying to change it) in one feature with the rationale that is necessary for better scrolling performance in Webkit's architecture—which neither Gecko nor Presto seem to require. So a Webkit monoculture is not a good thing.

February 16, 2013 04:39 AM

February 15, 2013

Joshua Cranmer (jcranmer)

Updated DXR on

If you are a regular user of DXR, you may have noticed that the website today looks rather different from what you are used to. This is because it has finally been updated to a version dating back to mid-January, which means it also includes many of the changes developed over the course of the last summer, previously visible only on the development snapshot (which didn't update mozilla-central versions). In the coming months, I also plan to expand the repertoire of indexed repositories from one to two by including comm-central. Other changes that are currently being developed include a documentation output feature for DXR as well as an indexer that will grok our JS half of the codebase.

February 15, 2013 06:37 PM

February 13, 2013

Mark Surman (surman)

Tinkering together

‘Making is learning’ is a big theme for Mozilla this year. It’s at the heart of Mozilla Webmaker. More importantly, it’s the north star idea guiding the grassroots mentor community we’re building around the world. We want millions more people to get their hands dirty with the web. And we expect they’ll learn something as they do.


I realized today that we need to add two concepts into this theme: tinkering and social. This thought came from a good discussion on the Webmaker mailing list that starts with the question ‘is making learning?’ Rafi Santo both asked and began to answer this question:

The short answer: yes, but it’s complicated. The longer answer is that the best maker-driven learning is never just about the making. It’s about all the things that happen around the making. That initial spark of curiosity, the investigation and early tinkering, the planning and research that follow, the inspirations and appropriations from other projects, the prototypes, the failures, the feedback, and, perhaps most importantly, the iterations upon iterations towards a better make.

He then went on to say:

I’m willing to say that someone is always learning something when they’re making, but they learn best when it entails the sort of process, community and well configured structures of participation.

In part, the discussion around Rafi’s post is a debate about tag lines. Should we rally people under a ‘making is learning’ banner? Or should we be more subtle like ‘making as learning’ or ‘make to learn’? We’ll probably do the later.

However, there are also two important substantive points worth pulling out from the conversation: a) it’s the process of making that drives learning and b) the best learning happens when the making is social. Both of these points are critical to the success of Webmaker.

The process point may be obvious. It’s not just what I made, it’s the journey of the making. But it’s worth calling it out explicitly. Mozilla Rep Emma Irwin writes in response to Rafi’s post:

This spoke to my own learning in programming. I think I learned (and got confidence) more from debugging and being stuck than simply making. The sense of accomplishment of overcoming things that seemed really hard at first have motivated me more than anything. I think those experiences are why I am crazy enough to think I can ‘teach’ now.

Designing tinkering and iteration into Webmaker is critical. A first step is creating content built from the ground up for remix. And, then to support that with tools that let you tinker and play with that content, and share it again with your friends. The idea is to use remix as an onramp to tinkering with the web.

You see an early example in Jacob’s awesome Valentine’s video project on The thing about this video: it is designed to be forked. It wants you to add your own photos and change the text. It’s an invitation to tinker. It’s an early invitation, to be sure: we clearly have a lot to learn about how to do this well. But it’s clear to me that this kind of design for tinkering is ‘thing #1′ of key things Webmaker needs to pull in from this conversation.

Rafi’s other big point is about social: we learn best when we make together. Making together can mean a lot of things. At events. In school. With friends at home. In IRC. On Facebook. Etc. What all of these things have in common is that I can see what you are making and you can see me. We can critique each other. We can help each other. We can fail together. We can iterate together. And we can laugh together. Which makes learning funner, faster and deeper.

Making it easy to ‘make things together’ is ‘thing #2′ that Webmaker should pull from this conversation. Making it easy to riff on content on and in places like Facebook will be a part of this. But, as Rafi hints in his post, the most important factor here won’t be tools and web sites: it will be people. This is why the building a global mentor community is such a huge priority. Everyone needs a place where they can just show up to make and learn. A place filled with people. And a place you can find in 100s of cities around the world. Building on Hive and ReMo, I think Mozilla can create this place. It’s what we want our mentor community to be.

Anyways: thanks Rafi, Emma and others for getting this conversation started. It’s the kind of leadership this nascent Webmaker community needs. And it’s a great way to dig into what do we really want to build together with Webmaker.

Filed under: drumbeat, education, hive, learning, mozilla, webmakers

February 13, 2013 06:24 PM

February 03, 2013

Guillermo López (willyaranda)

Ellos son unos hijos de puta y nosotros unos gilipollas

Dejémoslo claro. Somos gilipollas. Y ellos unos hijos de puta. Una cosa no quita la otra, y la otra no quita a la una.

Llevamos años, ¡años! viendo la que se viene encima, sin hacer nada. Bueno, ha habido movimientos de gente que se unió para hacer algo más: protestar y expresar que lo que estaba pasando no era correcto. Los llamaban antisistema, perroflautas y demás lindezas. Por la parte que me toca, gracias, yo también os quiero.

Rajoy y Rubalcaba

Pero por la otra parte, seguimos siendo gilipollas. Seguimos viendo cómo día a día nos están robando, juegan con nuestras vidas como si de una partida de oca se tratara. Piden austeridad mientras cobran miles de euros en sobres, mientras llevan 20 años cobrando en negro, de forma sistemática, sin nada. Lo vemos, y lo único que decimos es: “uy, qué cabrones, con la que está cayendo…”. Y ya. Nos quedamos en casita o en el bar comentándolo, sin llegar a salir a la calle, sin protestar, sin que ellos noten que tienen una oposición social. Estamos muertos.

Celebración Mundial

Pero ellos son unos hijos de puta. Simple y llanamente unos hijos de puta. Están chupando todo lo que pueden de la sociedad hasta límites insospechados. Límites que se superan cada semana con una acusación más de corrupción, con un nuevo decretazo explicado sólo en las entrañas del legal BOE, con políticos que si ellos antes dijeron A, ahora es B, y los otros que antes dijeron B ahora es A, porque tienen “mayoría”, con “ruedas de prensa” realizadas por una televisión en frente a una sala llena de periodistas, la gran llamada “transparencia”…

Límites que nunca pensabas que se superarían. Que alguno tendría la ética de decir: “no, señores, hasta aquí hemos llegado”. Pero sólo les mueve su afán de codicia, de tener cada vez más, de vivir en una realidad diferente, una realidad que no se entera que el 99% de la gente está peor que el año pasado y que el anterior. Gente que no tiene el lujo de ponerse enferma porque temen que les echen de su ya precario trabajo. Gente que ve como sus hijos no pueden estudiar con garantías de calidad porque los profesores “sobran”. Gente que ve como sus hijos se tienen que marchar a otros países por “espíritu aventurero”, en vez de pensar por qué el 55% de los jóvenes menores de 30 años no tienen trabajo (y los que lo tienen a saber de qué forma).

Y damos asco. Nosotros y ellos. Porque sí, hay una gran diferencia, ellos son unos listos y nosotros unos tontos. Ellos saben cómo robar, y nosotros no. Ellos saben (bueno, pueden) hacer leyes para que ¡oh sorpresa! sus robos sean legales, y nosotros no. Ellos tienen una protección gigantesca de miles y miles de policías y nosotros… bueno, “la policía” “nos protege”. Ya, claro.

La sociedad está definitivamente enferma. Ellos no saben aceptar sus errores, dimitir y mejorar, sino sólo seguir y seguir chupando, seguir y seguir robando y seguir y seguir recortando derechos. Y nosotros no sabemos dar un puñetazo en la mesa y decir que todo lo sostenemos nosotros, y que sin nosotros, todo se va a la mierda.

Somos gilipollas, y ellos unos hijos de puta. Tenemos lo que merecemos.

February 03, 2013 12:15 PM

January 26, 2013

Siddharth Agarwal (sid0)

New version of ntfsutils

I’ve released a new version of the ntfsutils library for Python.

The small spin-off project from some work I did at Mozilla became far more popular than I expected, netting over 600 downloads last year. Pretty neat!

January 26, 2013 06:30 AM

January 23, 2013

Blake Winton (bwinton)

Saving and restoring your tabs when resetting your Firefox profile.

A while ago, I noticed that my Firefox was starting to run slower and slower. One of the common suggestions to fix that kind of problem is to reset your profile, and while I think that’s a good idea the biggest problem for me was that resetting my profile would not save my open tabs. Since I have 75 of them, trying to remember them all would be a bit of a problem. (Some people would suggest that having 75 open tabs is also a problem, but fie to them I say! Sure, I’m a little bit above the average, but I’ve hit a flow that works for me. :)

Fortunately, I’m a coder at heart, so I did what any coder would do, and wrote some code to save and restore them for me. A few days later, one of my co-workers wanted to do the same thing, so I ran similar code on his computer, and since at least the two of us found it useful, I thought it might be a good idea to post the code here so that other people might benefit from it, too.

First of all, we need a little bit of setup to prepare your computer to save the tabs.

  1. Go to Tools » Web Developer » Scratchpad to open up the Scratchpad. (This is where we’ll be doing most of our work.)

  2. Click Environment » Browser to make sure the Scratchpad is running in the browser’s context, so that it can access the list of tabs.

  3. Paste the following code into the Scratchpad:

    var x = gBrowser.tabs;
    var rv = 'var tabs = [\n';
    for (var i = 0; i < x.length; i++) {
      rv += '  "' + x[i].linkedBrowser.contentWindow.location + '",\n';
    rv += '];\n';
    rv += 'for (var i = 0; i < tabs.length; i++ ) {\n';
    rv += ' gBrowser.addTab(tabs[i]);\n';
    rv += '}\n\n';
  4. Click Execute » Display This will output stuff in comments (“/*…*/”) that looks like this:

    var tabs = [
      /* … */
    for (var i = 0; i < tabs.length; i++ ) {
  5. (This is the big step!) Save that code to an external file, and run the profile reset.

  6. Then, in your new profile, open the Scratchpad, and set the Environment to Browser again just like above, and paste in the code you saved.

  7. Finally, hit Execute » Run, and all your tabs should be restored! (If you’ve split them up into tab groups, the code I provided will save all your tabs, but will restore them all into one group, so you’ll need to re-group them after restoring. I thought about fixing the code to save a list of groups, but I didn’t really need to, so never got around to it. ;)

I hope this helps some of you, and please feel free to leave a comment letting me know whether you found it useful or not, or if you come up with any additions or changes to make it work better!

January 23, 2013 05:18 PM