Illustration of rows of social media profile pages

Illustrations by Adam Ferriss

Eric Baumer: A Human Approach to Algorithm Design

Baumer’s work ranges from humanizing algorithmic designs to the socioeconomic inequities of Facebook and online data privacy.

Story by

Stephen Gross

Photography by

Illustrations by Adam Ferriss

Have you checked your Facebook, Twitter or Instagram account recently? Maybe you’ve used Google Maps to get you to your current destination. Speaking of Google, you’ve probably used a search engine in the last 24 hours. How about shopping at Target?

All of those tasks, somewhere along the way, require an algorithm—a set of rules to be followed in problem-solving operations.

Social media uses algorithms to sort and order posts on users’ timelines, select advertisements and help users find people they may know. Google Maps uses algorithms to find the shortest route between two locations. And Target plugs a customer’s purchases into algorithms to decide which coupons to send them, for instance, deals on baby items if they’ve been purchasing items someone who is expecting a baby might purchase.

The inferences these computer algorithms make using data such as gender, age and political affiliation, among other things, can be frightening, and they raise a plethora of privacy concerns.

Eric Baumer, the Frank Hook Assistant Professor of Computer Science and Engineering, is studying it all, from how people interact with these systems, to how nonprofits can better use algorithms, to the privacy concerns that accompany the technology. He has received a number of National Science Foundation (NSF) grants for his work on human interactions with algorithms, including a National Science Foundation Faculty Early Career Development (CAREER) award for his proposal to develop participatory methods for human-centered design of algorithmic systems.

Illustration of various social media pages

Baumer’s research into algorithms coincided with an interest in politics, which began when he was a graduate student in the mid-2000s, a time when blogging became popular. Many researchers studied bloggers and analyzed blog content, Baumer says, but he wondered if anyone was reading the blogs. He completed a “qualitative open-ended exploration of people reading blogs.” Through that, he discovered the sub-ecosphere of political blogs.

That led Baumer to complete studies on how people read and engage with political blogs, and then, for his dissertation, he studied computational techniques for attempting to identify conceptual metaphors in written text. Once Baumer started working as a postdoctoral researcher at Cornell University, he says, he became interested in the broader concept of framing political issues. Funded in part by an NSF grant, his work—some of which he has continued at Lehigh—examined computational support for frame reflection.

Baumer refers to the approach he’s taking at Lehigh as human-centered algorithm design. His work specifically centers on getting people whose expertise lies outside of computing involved in the design of algorithmic systems. In addition to the people who are using the systems, Baumer wants to involve in the process the individuals whose data is being analyzed.

Baumer says he already had developed many computational techniques and interactive visualizations heading into the project, but he doesn’t want to assume those are the tools necessary for the job.

“I don’t want to go in looking for nails because I have a hammer,” Baumer says.

For example, he is currently working with a Ph.D. student on conducting interviews at a journalism nonprofit and a legal nonprofit to better understand their existing practices of data analysis. Baumer says the journalism nonprofit’s approach is very tech-savvy, but the legal nonprofit isn’t as well-versed in the technology. Instead, the legal nonprofit is working more on analyzing legislation across different jurisdictions and trying to understand differences and similarities.

Here are methods that people can use for doing this kind of work, and then here are concrete tools, artifacts and things in the world that have an impact on people’s lives.

Eric Baumer

“What we’re looking at is trying to figure out what is it that they’re doing right now and what are the design opportunities,” Baumer says. “Rather than going in wielding the computational hammer and saying, ‘We have this tool, wouldn’t it be awesome if you used it?’”

The main goal, says Baumer, is to create relevant and impactful systems that people are going to actually use. He envisions the tools built as a result of his research impacting and influencing the readers of the journalism nonprofit, as well as the legal experts consulting the reports generated by the legal nonprofit.

“In terms of both, the contribution would be: Here are methods that people can use for doing this kind of work, and then here are concrete tools, artifacts and things in the world that have an impact on people’s lives,” Baumer says.

The Power of Words

Technically, Baumer’s work might be classified as artificial intelligence, but he doesn’t like to add that label to his work because he says those within the field of computing would not consider what he’s doing artificial intelligence.

“Because of the techniques that I’m using, it’s much more natural language processing, which is machine learning techniques applied to textual data,” Baumer says. “AI is something much more around goal reasoning and planning.”

For this project he and his team have been developing a sketching exercise in order to get people to draw the current process they use for data analysis. They’re planning card-sorting exercises and others based on fictional vignettes, which will give the team a better understanding of the two nonprofits and their current practices, as well as future visions, in analyzing and using the data they have.

“It’s very much drawing on this tradition from participatory design and applying it to algorithmic systems,” Baumer says.

Finally, they’ll use computational constraints and design sketches to convey ideas from the card-sorting and other exercises.

Although Baumer has many ideas, he says it’s too early to commit to designing any specific tools until he has a better understanding of what both nonprofits do and what would be most impactful for each of them.

The tools could be built to assist the nonprofits, such as aiding the journalism nonprofit in writing a story, but it also could be a collaboration between Baumer and the nonprofit that allows readers to use and digest the data in a way they want. One example of something that could be created is a project Baumer worked on previously at Cornell demonstrating the framing of political issues.

Baumer and his students developed a tool that analyzed approximately 40 to 50 different political blogs in addition to about 10 major media outlets, creating a version of a word cloud. Users could enter the tool and select a word to see how it was currently being described across the blogs and media outlets. For instance, Syria in January 2013 was associated with words such as conflict, seas, kill, report, intervention, fight, battle, war and violence, among others. Words were also color-coded by outlet, so readers could see which outlets were using certain terms.

Every Friday, while the tool was live, Baumer ran a script to pull new content and run the analysis. On Monday, he posted the new analysis.

From there, Baumer says, they were able to use that data to compare different issues such as how the topic of contraception is discussed differently in coverage about abortion compared to coverage about health care. For health care, he says, some of the most common words were take, provide, get, pay, cover, afford and access. When looking at abortion, the words were ban, oppose, allow, support, legal, deny. For health care, Baumer says, the conversation was basically “does my health insurance cover contraception?” whereas in the context of abortion, it was “should it be allowed at all?”

“I think this is really interesting in terms of showing how these subtle linguistic differences frame the same issue in different ways across different contexts,” Baumer says.

In another timely example, with the 2020 U.S. presidential election quickly approaching, Baumer says he could envision building a tool that would create these word clouds with the Congressional Record or campaign speeches. Baumer previously created a tool that did just that with campaign speeches of the late Senator John McCain and former President Barack Obama when they ran against each other in the 2008 presidential election. Baumer was able to analyze their differences, as well as how they spoke about the same issue.

To take that a step further, Baumer wonders if you could use that approach to predict a Congress member’s campaign contribution profile, such as the different industries from which contributions originate, by the way he or she talks.

“You could segment it and say people whose campaign contributions come from this source, what are the ways that they are talking about a particular issue?” Baumer says. “And how does that differ from people whose campaign contributions come from a different source? That would be one way you could think about splitting it up.”

Blurry images of faces

Examining Facebook

In a separate NSF-funded project that Baumer began at Cornell and finished at Lehigh, he used Facebook to examine correspondences between linguistic patterns and non-linguistic attributes. His focus was to learn more about the use and non-use of Facebook.

“Clearly Facebook isn’t going to tell you,” Baumer says.

The study included individuals who pledged to leave Facebook and remain off the site for 99 days. It was called the “99 Days of Freedom” by a creative agency in the Netherlands, which collected more than 5,000 responses from people after 33, 66 and 99 days.

A main point Baumer raises is that the distinctions between use and non-use are not binary.

“There are lots of things in between and it’s a really messy middle,” he says.

Not everyone remained off the site for the full 99 days, but data was collected at each of the three checkpoints. Baumer and his co-authors used a computational technique called topic modeling to analyze the answers to the open-ended survey questions, including: How did you feel being off of Facebook? What was the best thing? What was the worst thing? How did your friends react? What do you miss the most?

The technique analyzed the relationship between the topics the respondents discussed and whether or not they returned to the site prior to day number 99—one-fifth returned to the social networking site prior to when they intended to.

For users whose responses included a topic described by words such as family, photo, friends, miss and birthday, Baumer says, there was a decreased chance of returning to Facebook early.

“The people who talked about missing out on things like photos from their families and friends,” Baumer says, “they didn’t go back to Facebook. Essentially they said, ‘Yeah, I missed out on so-and-so’s birthday announcement, but it wasn’t a big deal.’”

The meaning of a topic with other high-probability words—first, few, days, check, checking, minutes—may not be as clear to the average person. But Baumer says those people were referring to their break from Facebook in terms of withdrawal from an addiction.

“The survey asks, ‘How do you feel?’ and one person said, ‘Like I’m going through withdrawal from an addiction,’” Baumer says. “Another respondent said, ‘Every time I open an internet browser, my finger just goes to the ‘F’ key without even thinking about it.’ Those participants, the people who talked about that topic in their response, were much more likely to go back to the site before they intended to.”

Baumer says one of the ways his team interpreted the data is that social ties are not what draws people back to Facebook. People returned because of the habitual nature of their Facebook usage.

In another study, Baumer considered socioeconomic and demographic factors with regard to Facebook use. The data set, collected by Cornell’s Survey Research Institute in 2015 for the National Social Survey, comprises phone survey results of adults 18 years or older from 1,000 U.S. households. Baumer presented his findings, “Socioeconomic Inequalities in the Non-use of Facebook,” at the 2018 ACM Conference on Human Factors in Computing Systems in Montreal, Canada.

The survey indicated which respondents had deactivated their Facebook account, which had never had an account, which were current Facebook account holders and which had considered deactivating their account but hadn’t actually done so.

“The socioeconomic and demographic disparities, it’s not necessarily a focal point [of my research], but I think it’s a really important question, especially if we start asking things about whom in society do these technologies benefit?” Baumer says.

The analysis revealed that the lower a person’s income, the more likely it is that they never created a Facebook account. Baumer also discovered that those looking for employment within the previous four weeks were much more likely to either consider deactivating their Facebook account or actually deactivate it.

“It’s this really interesting tension, because there is other work that talks about how valuable social networks and the social capital that you get from things like Facebook are, in particular for finding a job,” Baumer says. “And so people who are best positioned to make use of that are deactivating it because they’re worried that it’s going to harm them.”

In other work under the same NSF grant regarding human-centered computing and Facebook, Baumer’s team compared three different types of non-use and use: users with active accounts, those who have deactivated their accounts and those who have considered deactivating but have not. The team writes that “various predictors (Facebook intensity, Facebook addiction and privacy behaviors) have different associations with each form of non/use,” in the paper “All Users are (Not) Created Equal: Predictors Vary for Different Forms of Facebook Non/use,” published in 2019. Baumer presented the findings with doctoral student Patrick Skeba at The ACM Conference on CSCW (Computer Supported Cooperative Work and Social Computing).

Baumer has written another paper, currently under review, that he says further dissects those categories of Facebook usage, deactivation and deletion in particular.

“Why are we asking about deactivations?” he asks. “Should we be asking about deletion? In some of the early studies that I did, when I asked people ‘Have you deactivated your account? Have you deleted your account?’ there are actually a lot of people who said, ‘Well, I thought I deleted it, but maybe I only deactivated it.’ People didn’t know the difference.”

Illustration of various social media profile pictures

People & Privacy

Throughout his research, including the study regarding those who pledged to give up Facebook for 99 days, Baumer found that many people had different privacy concerns. Some didn’t feel they had control over what their friends saw about them and that made them uncomfortable. Others didn’t like that Facebook was monetizing data about their social interactions, which Baumer says is fundamentally a different kind of privacy concern.

“There were several studies where people voiced different types of privacy concerns and there was this concern about ‘What’s being done with my data?’” Baumer says. “That was what got me into this strand of thinking about algorithmic aspects of data privacy.”

Those thoughts, and an NSF-funded collaborative grant, helped lead him to collaborate with Andrea Forte, associate professor of information science at Drexel University, on data privacy. Specifically, they are researching how people navigate a world in which data is collected at every turn and how systems can be better designed to support those behaviors.

One thing that interests Baumer as far as data privacy is concerned is the notion of obfuscation. In Obfuscation: A User’s Guide for Privacy and Protest by Finn Brunton and Helen Nissenbaum, which helped fuel his interest, Baumer says, the authors argue that in modern society, you don’t have the option to opt out.

“Non-use isn’t viable,” Baumer says, summarizing the authors’ thoughts. “You have to have a driver’s license and you probably have to have a credit card. You can’t do everything with cash. You need whatever your country’s equivalent of a social security number is. So [Brunton and Nissenbaum] argue, for people who are opposed to these schemes of data collection and analysis, the only means of recourse is obfuscation—doing things that conceal your data or conceal details within your data.”

Baumer says Brunton and Nissenbaum compare this to throwing flak in old radar systems such as in World War II when planes released shards of aluminum. One blip on radar—the plane—becomes dozens of blips, making it impossible to tell which one is the plane. The strategy—create noise so it’s hard to find the actual data—is used today in a tool called TrackMeNot. That tool periodically pings Google with random search queries so someone looking at a user’s search history won’t be able to tell what the user actually searched for and what the tool searched.

This all led Baumer to ponder why people think such a tool does anything for them.

“What are the mental models for how these algorithmic inferential systems work, and how do those play into the strategies that people are using?” Baumer asks. “That’s a large part of what we’re trying to look at there.”

Like Baumer’s participatory design process, the initial plan for his research with Forte laid out three phases for the study, with the team starting the project by conducting interviews to try and understand people’s strategies and mental models. This included discovering demographic differences between those who attempt to avoid algorithms and other internet users.

Next, they would try to develop some type of survey instrument to study the different strategies people use to avoid tracking technologies and determine what algorithm avoidance actually looks like. Users, based on their habits, would be put into a typology. Finally, the team would create a collection of experimental prototypes to explore the design space around algorithmic privacy and find out how designers can create systems that are useful based on how users approach data privacy.

Baumer says as they proceed into the study, they’re finding there is so much to unpack in just studying the empirical phenomenon. Now he believes that’s where they will focus the majority of their work.

Much like his work with the journalism and legal nonprofits, Baumer is thinking about how people who don’t have an expertise in computing believe these systems work.

“If a system is going to guess your age and gender, and target products or political advertisements or whatever based on those things, how do people think that those systems work and how are those folk theories [which use common instances to explain concepts] and mental models evidenced by the particular privacy-preserving strategies that people take?” Baumer asks. “On the one hand there’s an empirical question: What are the strategies that people are doing in response to this threat? And then there’s the more conceptual or theoretical question: What are the mental models and folk theories that give rise to those strategies?”

In terms of broader impact, Baumer believes there’s a real challenge in algorithmic privacy that goes beyond digital or privacy literacy—he says there is a phase shift in the way people need to think about privacy. Baumer says it’s not just how privacy is regulated and systems are designed, but also how people interact with systems. He also suggests privacy policies should disclose what is being inferred from people’s provided data rather than just what information is being collected.

Collective Privacy Concerns

In a world in which technology gets more intertwined in everything we do, it’s becoming increasingly difficult for individuals to manage their privacy.

And while privacy on an individual level is what Haiyan Jia says typically comes to mind first when most think about protecting data in the digital age, there are also social and collective aspects of privacy to consider.

Jia, an assistant professor of data journalism, uses as an illustration the scenario of posting online a picture of a group of people hanging out. While this situation brings up similar privacy concerns to that of an individual—a photo, even if deleted, could be saved and duplicated by others once it’s shared online—it becomes more complex because not everybody in the photo has a say in whether it’s shared.

“You might look great, but other people might be yawning in that picture, so they might hate for it to be published online,” Jia says. “The idea of privacy being individualistic starts to be limiting because sometimes it’s actually that the group’s privacy is a collective privacy.”

In order to study the collective privacy of online social activist groups, Jia and Eric Baumer teamed up with students through Lehigh’s Mountaintop Summer Experience, which invites faculty, students and external partners to come together and take new intellectual, creative and/or artistic pathways that lead to transformative new innovations. The project targets exploration of new possibilities for collaborative privacy management, developing new understanding, making new discoveries and creating new tools.

Another example Jia provides is to look at the issue as though it’s a sports team in a huddle. In the huddle, players are exchanging information that pertains to each of them but also the team as a whole. Although at the same time, they’re trying to keep the other team from learning that information.

“There’s this information that they want to retain,” Baumer says of Jia’s analogy. “What we’re trying to do is understand—[by] looking at a bunch of different examples—the strategies that people use to manage privacy collectively, or information about a group. How does the group regulate the flow of that information?”

In the Mountaintop project, students reached out to and conducted interview studies with a number of local activist organizations, such as feminist, political, charitable and student groups varying in age, gender and goals. They collected the types of information the groups share and what the groups do to decide what information is shared only within the group versus beyond the group. They also surveyed to find what strategies the groups used to make sure information they wanted to keep private was not revealed.

So far, it’s been interesting to see how different groups view their privacy differently, Jia says. Some are more open with privacy in order to attract more members and make a larger impact. Groups who want to be more influential have to sacrifice some of their privacy. Jia says the goal of other groups is to find people who share their social identity or to have a safe space to exchange ideas. Those groups would adopt rules that are more strict to protect privacy.

“We find that to be really interesting—how groups function as the agent, to some extent, to allow individual preferences in terms of expressing their ideas and their needs,” Jia says. “But at the same time, individuals would collaborate together to maintain the group boundary by regulating how they would communicate about the group, within a group and beyond the group.”

Story by

Stephen Gross

Photography by

Illustrations by Adam Ferriss