We are constantly working on improving our algorithm for discovering who your most important contacts are, and thus whom to prompt for feedback.
We used to have a very straightforward algorithm simply computing the number of emails you sent to a specific person to figure out the strength of the connection. This was made under the assumption that sending an email is more of an indicator of working together than receiving it; you could be receiving a lot of emails from office managers or event organizers, while barely knowing them.
However, especially now that you can provide data on your Slack communications, we need something more sophisticated. First of all, whether you talked to someone over Slack, sent or received an email, or had a meeting together (as indicated by your calendar), it's much more important if it was one-on-one interaction, or perhaps within a small group, as opposed to talking to dozen people at the same time, which isn't quite personal, and probably doesn't indicate working together closely. Second, it makes sense to prioritize recent communications, under the assumption that one can provide the most relevant feedback to people with whom they interacted most recently. This isn't to say, however, that old communications should be disregarded completely, as they may contain important information about the overall trends rather than quick fluctuations.
Finally, it is our understanding that the most natural pattern for real human communication between coworkers, rather than impersonal announcements and reports, is roughly the same number of messages in both directions. These are the communications we want to prioritize the most in our algorithm for computing the proximity of people in our social graph. However, since we want to be able to handle the case when a person is a recent employee and haven't been interacting with others a lot, we don't want to disregard any type of communication completely.
In our new algorithm we consider all these factors:
- how long ago the interaction has occurred
- how many people were involved
- how balanced the communication between them is
We are also improving the algorithm for detecting groups, mailing lists, robots, aliases, and other non-human entities who communicate by email, and whom we don't want to see in the social graph. Sometimes they are marked as such by mailing servers, but sometimes they aren't, and we implement heuristics to discover and omit them. On one hand, we look at the structure of communication - does it look like whenever someone sends an email to this address, multiple people receive it? Then it's probably a list. On the other hand, we look at the email address itself, and check if it looks like a personal address, or is along the lines of "no-reply", "support", etc. We discovered thousands of such emails, and omitted them in order to focus on actual interpersonal communications.
Maxim Kovalev, Data Science Summer 2015 Intern
PHD Student in Electrical & Computer Engineering at Carnegie Mellon